Index for Proportional Reduction in Error in Two-way Contingency Tables with Ordinal Categories

1 Center for Clinical Investigation and Research, Osaka University Hospital, 2-15, Yamadaoka, Suita, Osaka 5650871, Japan 2 Department of Information Sciences, Faculty of Science and Technology, Tokyo University of Science, Noda City, Chiba 278-8510, Japan Correspondence: Kouji Yamamoto, Center for Clinical Investigation and Research, Osaka University Hospital, 2-15, Yamadaoka, Suita, Osaka 565-0871, Japan. E-mail: yamamoto-k@hp-crc.med.osaka-u.ac.jp


Introduction
Consider an r × c contingency table in which one is an explanatory variable and the other is a response variable.Then, the measures which describe the relative decrease in the probability of making an error in predicting the value of either variable when the value of the other variable was known, as opposed to when it was not known, have been proposed by e.g., Goodman and Kruskal (1954), Everitt (1992) and Yamamoto, Nozaki and Tomizawa (2011).The measures are called proportional reduction in error (PRE) measures.In this situation, we assume that we can define which of the variables are the explanatory and response variables.
In some situations, the explanatory and response variables are not defined clearly.For this case with especially both nominal variables, Goodman and Kruskal (1954) and Yamamoto and Tomizawa (2010) considered some PRE measures.Let p i j denote the probability that an observation will fall in the ith category of X and the jth category of Y (i = 1, . . ., r; j = 1, . . ., c).Goodman and Kruskal (1954) measure is given by where Also see Yamamoto and Tomizawa (2010).
However, these measures cannot be applied to two-way contingency tables with both ordinal variables when one wants to use the information about the category ordering of the variables.So we are interested in a PRE measure for contingency tables with ordinal categories in which the explanatory and response variables are not defined clearly.
This paper proposes a PRE measure for such a situation (Section 2).Section 3 gives an approximate variance for the estimated measure and Section 4 analyzes unaided distance vision data.

A New PRE Measure
Consider an r × c contingency table with ordinal categories in which the explanatory and response variable are not defined clearly.
Then, we shall consider the following measure which represents the PRE in predicting the category of either variable as between knowing and not knowing the category of the other variable, defined by , where with The measure Λ has the properties that (i) Λ lies between 0 and 1, (ii) Λ = 0 if and only if the information about one variable does not reduce the probability of making an error in predicting the categories of the other variable, and (iii) Λ = 1 if and only if no error is made, given knowledge of one variable; namely there is complete predictive association.In addition, we note that if the variables X and Y are independent, then the measure Λ takes 0, but the converse does not necessarily hold (also see Section 6).

Approximate Confidence Interval for the Measure
Let n i j denote the observed frequency in the ith row and jth column of the table (i = 1, . . ., r; j = 1, . . ., c).
Assuming that a multinomial distribution applies to the r × c table, we consider an approximate standard error and large-sample confidence interval for Λ, using the delta method, descriptions of which are given by, e.g., Bishop et al. (1975, Sec. 14.6).The sample version of Λ, i.e., Λ, is given by Λ with {p i j } replaced by { pij }, where pij = n i j /n and n = n i j .Using the delta method, √ n( Λ − Λ) has asymptotically (as n → ∞) a normal distribution with Vol. 4, No. 4; 2012 mean zero and variance σ 2 [Λ], where where and I(•) is the indicator function.

An Example
Consider the data in Table 1 on unaided distance vision.Table 1a is the data, taken from Tomizawa (1984), on unaided distance vision of 4746 students aged 18 to about 25 including about 10% women in Faculty of Science and Technology, Science University of Tokyo in Japan examined in April 1982.Table 1b is the data, taken from Tomizawa (1985), on unaided distance vision of 3168 pupils comprising nearly equal number of boys and girls aged 6-12 at elementary schools in Tokyo, Japan, examined in June 1984.
For the data in Tables 1a and 1b, two variables, right and left eye grades, in each of tables have ordinal categories and we cannot define clearly which of the right and left eye grades is the explanatory variable and the response variable.Thus for these data, we are interested in applying the measure Λ.The value of Λ is 0.735 for Table 1a and  0.496 for Table 1b (see Table 2).This shows that the information about either eye grades reduces the probability of making an error in predicting the other by 73.5% for Table 1a, and by 49.6% for Table 1b, as opposed to when it is not known.
When the degrees of the relative decrease for Tables 1a and 1b are compared by using the 95% confidence interval for Λ, the value of Λ is greater for Table 1a than for Table 1b.Namely, the information about either eye grades reduces the probability of making an error in prediction more for college students than for pupils.
Table 1.Unaided distance vision data of (a) 4746 students (Tomizawa, 1984) and (b) 3168 pupils (Tomizawa, 1985) ( Note: Though the measure λ should be used for nominal case, we apply λ to the data in Tables 1a and 1b for comparison of Λ and λ.The values of estimated λ are 0.625 for Table 1a and 0.299 for Table 1b.

Simulation Study
Consider now random variables Z 1 and Z 2 having a joint bivariate normal distribution with means E(Z 1 ) = μ 1 and E(Z 2 ) = μ 2 , variances Var(Z 1 ) = σ 2 1 and Var(Z 2 ) = σ 2 2 , and correlation Corr(Z 1 , Z 2 ) = ρ.Suppose that there is an underlying bivariate normal distribution with the conditions, for example, μ 2 = μ 1 + 0.2, σ 2 2 = 1.2σ 2 1 , and suppose that a 4 × 4 table is formed using cutpoints for each variable at μ 1 , μ 1 ± 0.6σ 1 .Then, in terms of simulation studies, each subtable of Table 3 gives a 4 × 4 table of sample size 10000, formed from an underlying bivariate normal distribution with a fixed ρ (ρ = 0, ±0.3, ±0.6, ±0.9).Table 4 gives the estimated values of Λ for each value of ρ.From Table 4, we see that the estimated value of Λ increases as |ρ| increases.Therefore, when there is an underlying bivariate normal distribution, the proposed measure Λ may be appropriate as a PRE measure which describes the relative decrease in the probability of making an error in predicting the value of one variable when the value of the other is known, as opposed to when it is not known.

Concluding Remarks
For analyzing the ordinal-ordinal contingency table in which the explanatory and response variables are not defined clearly, we have proposed the measure Λ.The measure Λ is not invariant under the arbitrary permutations of row and/or column categories.Thus this measure should be applied for the ordinal-ordinal contingency table.On the other hand, the measure λ is invariant under the arbitrary permutations of row and/or column categories.Thus λ would not be appropriate for the ordinal-ordinal contingency table.
As described in Section 2, the measure Λ = 0 if and only if the information about categories of either variable does not reduce the probability of making an error in predicting the categories of the other.However, Λ = 0 is not always equivalent to the independence between two variables.We illustrate such an example in Table 5. Obviously, X is not independent of Y, but the measure Λ takes 0. Namely, X is not always independent of Y just because the measure Λ takes 0. Table 3.The 4×4 tables of sample size 10000, formed by using cutpoints for each variable at μ 1 , μ 1 ±0.6σ 1 , from an underlying bivariate normal distribution with the conditions μ 2 = μ 1 + 0.2, σ 2 2 = 1.2σ 2 1 , and ρ = 0, ±0.3, ±0.6, ±0.9 (a) ρ = −0.9

Table 2 .
Values of Λ, approximate standard errors for them and approximate 95% confidence intervals for Λ, applied to Tables1a and 1b

Table 4 .
The values of Λ applied to each subtable of Table3

Table 5 .
An artificial data on cell probabilities p i j