Improved Measure on Extended Marginal Homogeneity for Ordinal Square Contingency Tables

For square contingency tables with ordered categories, Yamamoto et al. (2007) considered a measure to represent the degree of departure from extended marginal homogeneity. It attains the maximum value when one of two symmetric cumulative probabilities is zero. The present paper proposes an improved measure so that the degree of departure from extended marginal homogeneity can attain the maximum value even when the cumulative probabilities are not zeros. An example is given.


Introduction
For the R × R square contingency table, let π i j denote the probability that an observation will fall in cell (i, j) (i = 1, . . ., R; j = 1, . . ., R).The marginal homogeneity (MH) model is defined by where π i• = R k=1 π ik and π •i = R k=1 π ki (Stuart, 1955;Bishop et al., 1975, p. 294).Let π st , for i = 1, . . ., R − 1.This model may be expressed as This states that the cumulative probability that an observation will fall in row category i or below and column category i + 1 or above is equal to the cumulative probability that the observation falls in column category i or below and row category i + 1 or above for i = 1, . . ., R − 1. Tomizawa (1984Tomizawa ( , 1995) ) considered the extended marginal homogeneity (EMH) model which is expressed as Assume that {H 1(i) + H 2(i) > 0}, H 1 > 0, and H 2 > 0. The EMH model may also be expressed as where This indicates that there is a structure of symmetry between {Q 1(i) , Q 2(i) }.Yamamoto et al. (2007) considered a measure to represent the degree of departure from EMH, using Patil and Taillie (1982) diversity index.The measure ranges between 0 and 1, and the degree of departure from EMH is maximum when [Note that for measures for other models, e.g., the symmetry model (Bowker, 1948) and the MH model, see (e.g., Tomizawa et al., 2001;Tahata et al., 2006;Tahata et al., 2009)].
However, for analyzing square contingency tables, all Q 1(i) and Q 2(i) (i = 1, . . ., R − 1) are positive in many cases.Thus, then Yamamoto et al. (2007) measure cannot attain the maximum value.So, we are now interested in a measure to represent the degree of departure from EMH such that it can attain the maximum value even when each of {Q 1(i) } and {Q 2(i) } is not zero.
For square contingency tables with ordered categories, the present paper proposes such a measure on EMH when all cumulative probabilities are positive.

New Measure
Let For a specified d with 0.5 < d ≤ 1 and 1 and the value at λ = 0 is taken to be continuous limit as λ → 0. Thus, when λ = 0, Note that W i is Patil-Taillie diversity index including Shannon entropy (when λ = 0).A value of d is chosen by the user such that 1 [Although the detail is omitted, note that Ω can also be expressed by using the power-divergence.] Then, we can obtain the following theorem: Theorem 1 For each λ and a fixed d, When d 1, the minimum value of it is L, which is not equal to 0, and the maximum value of it is the same as d = 1.Thus, the measure Ω lies between 0 and 1.So the proof is completed.
We note that the measure Ω is the modified measure of Yamamoto et al. (2007) by using a coefficient 1/K.Consider the artificial 4 × 4 table data in Table 1a on cell probabilities {p i j }.Then, we see the degree of departure from EMH by using the existing measure Ω with d = 1 (i.e., Yamamoto et al. measure) and the measure Ω with d < 1 (in this case we set d= 0.9).We see from Table 1b that the true value of Ω with d = 1 is 0.531 (when λ = 0), and that of Ω with d = 0.9 is 1 (when λ = 0).Thus, we can see that the new measure Ω with d < 1 attains the maximum value 1, though all cumulative probabilities are positive.

Asymptotic Variance for Estimated Measure
Let n i j denote the observed frequency in cell (i, j) (i = 1, . . ., R; j = 1, . . ., R).Assuming a multinomial distribution, the estimated measure Ω is given by Ω with {π i j } replaced by {π i j }, where πij = n i j /n and n = n i j .Using the delta method, Ω has asymptotically (as n → ∞) a normal distribution with mean Ω and variance where for λ 0, and for λ = 0, Let σ2 denote σ 2 with {π i j } replaced by {π i j }.Using these, the approximate confidence interval for the measure Ω is obtained as follows: where Z α/2 is the (1 − α/2) percentile of the standard normal distribution.

An Example
Consider the data in Table 2, taken from Hattori et al. (2002, p. 244).These data describe the cross-classification of father's and son's occupational status categories in Japan which were examined in 1955 and in 1975.(3) Blue-collar and (4) Farming.
It seems natural to assume that all cumulative probabilities are positive because any observations can fall in all cells of the table.Therefore, it may not be appropriate to use the measure Ω with d = 1 because there is not a structure of cumulative probabilities such that Ω with d = 1 attains the maximum value 1.So we should use Ω with d < 1 (for example, d = 0.99) so that the measure can attain the maximum value 1.
Since the confidence intervals for Ω with d = 0.99 applied to the data in each of Tables 2a and 2b, do not include zero for all λ (see Table 3), these would indicate that there is not a structure of EMH in neither of tables.
Table 3.When d = 0.99, the estimate of Ω, estimated approximate standard error (S.E.) for Ω, and approximate 95% confidence interval (C.I.) for Ω, applied to Tables 2a and 2b λ Moreover, we compare the degree of departure from EMH in Tables 2a and 2b using the confidence intervals for Ω.For any λ, the values in the confidence interval for Ω applied to the data in Table 2b are greater than those applied to the data in Table 2a.In addition, the values in the confidence interval do not overlap for Table 2a and for Table 2b.Thus, the degree of departure from EMH is greater for Table 2b than for Table 2a.

Concluding Remarks
We have proposed Ω which is an improvement of Yamamoto et al. ( 2007) measure (i.e., Ω with d = 1) to represent the degree of departure from EMH.For analyzing the data of square table such that all cumulative probabilities are positive, it may not be adequate to use the measure Ω with d = 1 because then the measure cannot attain the maximum value 1.For such data, it would be natural to use the measure Ω with d < 1 because then the measure can attain maximum value 1 even when all cumulative probabilities are positive.
The analyst may also be interested in how the value of d is determined.However it seems difficult to discuss this.The measure Ω depends on the value of a fixed d.Also, the value of Ω increases as the value of d decreases.But when we compare several tables, the result of comparisons is invariant without depending on the value of d.For analyzing a square table data, we note that if 1 − d ≤ Q 1(i) ≤ d is not satisfied for all i = 1, . . ., R − 1, the measure Ω cannot be used for the given data.Thus, the analyst must set the value of d carefully, so as to satisfy the condition 1 − d ≤ Q 1(i) ≤ d for all i = 1, . . ., R − 1. Therefore we recommend a value being close to 1 (for example, d = 0.99) as the value of d.

Table 1 .
(a) An artificial 4 × 4 table data on cell probabilities {p i j }, and (b) the values of measure Ω with d = 1 (existing measure) and Ω with d = 0.9 (new measure) applied to Table1a

Table 2 .
Occupational status for Japanese father-son pairs (fromHattori et al.