Nonparametric Tests of Trend for Proportions

A general method is proposed for constructing nonparametric tests of trend for proportions. Such alternatives arise in situations where it is of interest to test for monotonicity in rates of growth. The class of tests is based on the ranks of the observations. The general approach consists of defining two sets of rankings: the first describes the time and the other the binary data itself. The test statistics measures the similarity between the two sets. The asymptotic null distributions are determined for similarity measures due to Spearman, Kendall and Hamming. A limited simulation study shows that the Spearman test has greater power.


Introduction
There are several instances in practice when one is interested in testing for a trend in proportions.For instance, one may be interested in the trend in birth rates, in mortality rates or in the incidence of a certain disease.As an example, we will consider the mortality statistics in South Africa during the period 2000-2008 in Table 1.One may ask if there is an increasing trend in the mortality rates.We refer the reader for other examples of applications to Chen et al. (1997), to Arase et al. (2001) for controlled clinical trials and to Ku et al. (2001) for community based surveys.
In such problems, one usually observes at time values t 1 < t 2 < ... < t k a random sample of n i binary variables { y i j } with y i j taking values 1 or 0 with unknown probabilities p i and 1 − p i respectively.The observed data may be viewed as : where y i = ∑ j y i j , y = ∑ i y i .Let pi = y i /n i and p = y/n.Without loss in generality, we shall be interested in detecting a monotone increasing trend in p i .There are a number of different approaches to the problem as discussed in Terpestra (1952) and Armitage (1955).One approach founded on regression and analysis of variance and discussed below leads to the Cochran-Armitage test (Cochran 1954;Armitage 1955).Williams (1988) and Chen et al. (1997) have noted that when the expected values n i p i , or n i (1 − p i ) are small, the normal approximation may become unreliable.As a consequence the Cochran-Armitage test becomes conservative and may lead to a type I error rate greater than to the prescribed significance level Portier and Hoel (1984), Hothorn and Bretz (2000).Corcoran et al. (2000) have noted that the Cochran-Armitage test is very sensitive to the choice of scores and conclude that this "makes its general use suspect".Williams (1988) has proposed an exact conditional test whereby the test statistic is calculated for all possible 2 x k tables with identical marginal totals.This is the randomization model in which the groups or treatments have the experimental units assigned to them at random.Randomization models are predominantly chosen in biomedical research.Neuhäuser (2006) proposed a modification of the Baumgartner-Weiβ-Schindler (1998) two-sample statistic which utilizes ranks instead of scores.A simulation study is inconclusive and does not reveal a clear winner.In section 2, we consider the problem using ranking methods.
Let {x i } represent arbitrary pre-selected scores, x 1 < x 2 < ... < x k which mimic a monotone increasing time trend.The linear model regression of y i j expressed by and subject to the constraint that The hypothesis of homogeneity is (or equivalently β = 0).Possible alternatives are with at least one strict inequality H 2 : p i p j , for at least one pair (i, j) Hypothesis H 2 can also be expressed as Hemelrijk, 1955).
Under the null hypothesis of homogeneity, the estimate of variance of β is given by 2 and consequently we reject H 0 in favor of H 1 (or equivalently β > 0) for large values of the statistic which for large samples has a standard normal.If we suppose further that p is small (of the order of 1% − 2%) so that p2 is negligible, then the statistic becomes The difference between observed and expected frequencies n i ( pi − p) are multiplied by the score effect (x i − xk ) in the numerator whereas the square of the score effects are weighted by the expected frequencies in the denominator.The statistic provides a test of trend in frequencies as opposed to a test of trend on the proportions.
Since the sample correlation between { pi − p} and {x i − xk } is given by we may view the test of monotonicity as equivalent to a test that the correlation is 0.
Alternatively, the test of homogeneity may be conducted by treating the data as coming from a 2 x k contingency table.That tests rejects H 0 in favor of H 2 for large values of the statistic which for large samples n i → ∞, has a chi square distribution with (k − 1) degrees of freedom.In that case there is no need to define scores.We note that even though H 1 is included in H 2 , "one-sided" tests which focus strictly on H 1 will in general be more powerful.
The regression model may be extended to apply to two or more groups of individuals.For example, to model the birth rates of men and women in the population, the functional regression model becomes where γ takes value 1 for the first group and 0 otherwise.
In section two, we propose a general approach for constructing nonparametric test statistics based on the ranks of the observations to test H 0 against H 2 .We obtain the asymptotic null distributions for these test statistics whenever either the sample sizes gets large or the number of time points k gets large.In section three, we report on the results of some simulation studies and note that the test statistics perform well under a variety of different underlying patterns.We present an application and conclude with some final remarks.

The Construction of the Test Statistics
In the previous section we considered the problem of testing for trend in proportions when scores provide a proxy for time.In this section we consider the problem from an entirely new perspective using methods based on the ranks of the data.The approach by-passes the use of scores.We first provide an introduction to statistical methods based on ranks.
A complete ranking of n objects is a permutation of the integers (1, ..., n) .For any two rankings µ = (µ (1) , ..., µ (n)) ′ , ν = (ν (1) , ..., ν (n)) ′ , we may define the following measures of similarities due to Spearman, Kendall and Hamming respectively: where sgn(x) is either 1 or −1 depending on whether x > 0 or x < 0 and where I [•] is the indicator function which is 1 or 0 depending on whether the statement in brackets holds or not.Measures of similarity can be used to define rank correlations to provide tests of trend and of independence.A review of related results may be found in Alvo and Cabilio (1992).
In what follows, we make use of the notion of tie compatibility (see Alvo & Cabilio, 1999) to extend the measures of similarity defined above to the case where ties occur in the data.
Definition 1.A tied ordering of n objects is a partition into e sets, 1 ≤ e ≤ n, each of which contains d i objects, , resulting from such an ordering, is a tied ranking, and is one of n!/(d 1 !d 2 !...d e !) possible permutations.
Example 1. Suppose that n = 7 objects have the ranking (3172465) , that is object 1 is ranked 3, object 2 is ranked 1, object 3 is ranked 7, and so on.Suppose that ties are allowed and that the ordering assumes the form ⟨(24) (157) (6) (3)⟩ , where now the parentheses indicate members of the same tie class, so that objects 2 and 4 receive rank 1, objects 1, 5, and 7 receive rank 2, and objects 6 and 3 receive ranks 3 and 4 respectively.The tied ranking then becomes (2141232) .In this case e = 4, The ranking which describes the time points may be viewed as a tied ranking with tie pattern and with ordering On the other hand, the binary variables y i j have with e = 2 the simple tie pattern We now define tie compatibility whereby a tie ranking could be conceived as having arisen from a complete ranking in which some objects are grouped as being of equivalent standing.A re-ranking will then produce the tied ranking.All complete rankings which could give rise in this way to the specified tied ranking are then said to be compatible to it.More precisely, we have the following.
Definition 2. A complete ranking of n objects is compatible with a tied ranking of these objects with tie pattern δ = (d 1 , d 2 , ..., d e ) , if every pair of objects which receive distinct ranks is given the same relative ranking in both rankings.We shall denote by C (µ) and C (ν) the class of complete rankings compatible to µ, ν respectively.
Example 2. Suppose that k = 2, n 1 = 3, n 2 = 2, and we observe 2 successes 1 failure at time t 1 , and 1 success 1 failure at time t 2 .The compatibility class corresponding to the time ranking ν contains the 12 permutations obtained by permuting ranks 1, 2, 3 among themselves and ranks 4, 5 among themselves
Permuting the entries in blocks 1 and 2 respectively, we obtain a total of 72 compatible rankings in the class C (µ).
Definition 3. The measure of similarity for the case of tied rankings µ δ 1 , ν δ 2 is defined to be the conditional expectation where the expectation is computed by averaging over the complete rankings compatible to µ δ 1 , ν δ 2 .
In the next section, we compute the statistics corresponding to tied rankings for each of the similarity measures defined above.

The Test Statistic Corresponding to Spearman Similarity
To that end we note that The average of the compatible ranks at time t 1 is ) and so on.In general at time Hence, the conditional expectation Turning attention now to the binary observations, the average rank for the n − y which take value 0 is l 1 = n−y+1 2 whereas the average rank for the y observations which take value 1 is We shall define the Spearman statistic to be where c i = g i − n+1 2 and

The Test Statistic Corresponding to Kendall Similarity
We now consider the test statistic when using Kendall's similarity measure.Note that within a given time period, the difference between ties is zero and hence there is no contribution to the distance.Between different time periods we have Hence, the Kendall and Spearman statistics are equivalent.

The Test Statistic Corresponding to Hamming Similarity
In this section, we consider a test statistic based on the Hamming similarity.Returning to Example 2, we may define a matrix of scores given by where rows indicate time periods and columns indicate rank.At time t 1 , there are 3 observations and hence ranks 1, 2, 3 will occur 1/3 of the time.Similarly, at time t 2 , ranks 4, 5 occur with probability 1/2.Entries outside the block diagonals are 0.
Hence, in general we have for time t i , l = ∑ i−1 q=1 n q + 1, ..., We now consider the data ranking in accordance with the model.First the complete rankings corresponding to successes (and correspondingly failures) can be permuted among themselves in ways.Since entries in time blocks can be permuted in Πn i !ways, we have that the total number of compatible rankings is given by the product y! (n − y)!
Define the set of integers and let where w 0i = 0 if y = n and w 1i = 0 if y = 0.
It follows that at time t i and l = ∑ i−1 q=1 n q + 1, ..., In fact, at time t i , jϵA 0y , we have In Example 2, we obtain a total of 72 compatible rankings.Averaging the frequency of occurrences of each ranking leads to the following matrix of scores: 1/6 1/6 2/9 2/9 2/9 1/6 1/6 2/9 2/9 2/9 1/6 1/6 2/9 2/9 2/9 1/4 1/4 1/6 1/6 1/6 1/4 1/4 1/6 1/6 1/6 Noting that each row at time t i is repeated n i times, and since w 0i + w 1i = n i , the Hamming similarity measure for tied data becomes where w i = w 1i n i .We may define the Hamming statistic as The weights {w i } which depend on y may also be expressed as and in this form we see that most of the weight is assigned to later time points.In the next section, we determine the asymptotic null distribution of the Spearman and Hamming statistics.

The Asymptotic Distribution of the Test Statistics Under the Null Hypothesis
In this section, we consider the asymptotic distribution of the Spearman and Hamming test statistics under the null hypothesis.Let {y i } be k independent binomials (n i , p i ) and suppose we would like to test H 0 vs H 1 .In most applications the asymptotic situation of interest occurs when In Theorem 1, we shall show that the Spearman statistic has an asymptotic normal distribution under either (10) or under the condition that the {n i } are bounded while k → ∞.
Theorem 1. Suppose that both y → ∞ and n − y → ∞ as n → ∞.Under H o the Spearman test statistic has asymptotically a standard normal distribution, i.e.
under either i) (10) or ii) the {n i } are bounded and k → ∞.
We may estimate p i either by pi or by p.In the first case, the test rejects whenever where z α is the upper 100 (1 − α) % percentage point from a standard normal distribution.The expression for the estimate of the asymptotic power becomes It is seen that the power converges to 1 with increasing n.
Alternatively, we may use the statistic which in the simulation studies reported appears to more closely attain the prescribed significance level.In the case of equal sample sizes, n i = n 0 say, the test statistic (11) takes the simpler form Theorem 2. Under the null hypothesis and under (10) , the conditional mean and variance of ∑ k i=1 w i y i given that ∑ k i=1 y i = y, are given respectively by y p and In the next theorem we demonstrate that asymptotically the Hamming statistic converges to a normal distribution.
Theorem 3. Suppose that both y → ∞ and n − y → ∞ as n → ∞.Under H o , the Hamming test statistic has asymptotically a standard normal distribution, i.e.
under either i) (10) or ii) the {n i } are bounded and k → ∞.
Both test statistics share the same general form as the regression statistic, namely where for Spearman, x i = g i and for Hamming x i = w i .
We may also consider the asymptotic null distribution of the Spearman and Hamming test statistics under the condition that k → ∞.in lieu of (10) .

Simulation Study
Comparing Table 2 with Table 3, it is seen that the significance level actually attained by the Spearman similarity measure when the {p i } in the variance expressions are estimated by p are closer to the prescribed 5% level than when {p i } are used.
From Table 4 we note that the level actually attained by the Hamming similarity measure is closer to the prescribed 5% level than Spearman's.
In Table 5 we report on simulations for the power when k = 5.Three cases were considered: proportions which are strictly increasing, proportions which are non-decreasing and some which have no particular pattern.It can be seen that the Spearman measure is clearly only superior in the first two cases.Predictably the power is smaller when the {p i } are closer together than when they are further apart.

Applications
Returning to the example on mortality rates in South Africa, we calculated values of 260.1 and 70.3 for the Spearman and Hamming similarity measures respectively.These yielded p-values < 10 −4 .
Another application deals with water quality at Hong Kong beaches.Table 6 shows the geometric E. Coli count for each of 6 beaches in the Sai Kung district of Hong Kong during the period 1986-2009.A beach is classified as good if the count is at most 24.The Spearman test statistic yielded a value of 22.98 which points to strong evidence of an upward trend in the annual proportion of good beaches.The Hamming test statistic on the other hand yielded a value of 2.90 which has a p-value equal to 0.0018.

Conclusion
An approach has been proposed for constructing nonparametric tests of trend in proportions.Similarity measures due to Spearman, Kendall and Hamming led to new test statistics.It was shown that the Spearman and Kendall measures led to identical test statistics.The asymptotic null distributions for the Spearman and Hamming test statistics were shown to be normal.Simulations were performed to check on the significance level attained.It was seen that the Hamming measure was closer to the prescribed level.On the other hand, the Spearman similarity measure attained greater power under a variety of alternatives.Two applications were considered.

Appendix
Proof of Theorem 1.
i) The Spearman statistic ( 7) is expressible as a linear combination of independent binomials, ∑ k i=1 c i y i where c i = g i − n+1 2 and g i = ) .Hence, under (10), we have approximately y i ≈ d N (n i p i , n i p i (1 − p i )) .In view of the independence of the {y i } , The theorem follows.
ii) For this part, note that the Spearman statistic is expressible as where the and the theorem is proved.
Proof of Theorem 2.
Turning attention to the Hamming statistic (8), we note that under H 0 and conditional on ∑ k i=1 y i = y, the joint probability distribution of the {y i } is the multivariate hypergeometric It follows that Proof of Theorem 3.
Given (10), it follows that (13) converges to a multinomial distribution independent of p given by ( y y 1 y 2 ...y k ) λ y 1 1 λ y 2 2 ...λ y k k Suppose now that the observed weights {w i } are the values obtained from an i.i.d.sample from a population with mean µ and variance σ 2 .Consider a bootstrap random sample of size y, say W * 1 , ..., W * y obtained with replacement from that population of observed weights {w i } in accordance with the distribution The central limit theorem then asserts that for large y, the sum of the bootstrap sample is asymptotically normal.Now y will get large a.s.since n → ∞ for otherwise y/n p a.s. and thus contradict the strong law of large numbers.
To relate the Hamming distribution to that of the bootstrap, let x i denote the number of times w i is selected in the resampling.It follows that the vector (x 1 , ..., x k ) has a multinomial distribution (y; λ 1 , ..., λ k ) and S * can be equivalently written as Consequently, under the null hypothesis for large n and conditional on y and hence this is also true unconditionally.Note that ii) For this part, it follows that since 0 ≤ w i ≤ 1, for all i and and the theorem is proved.

Table 2 .
Significance level for Spearman's similarity when p i is estimated by p i

Table 3 .
Significance level for Spearman's similarity when p i is estimated by p

Table 4 .
Significance level for Hamming's similarityPublished by Canadian Center of Science and Education

Table 5 .
Power for Spearman (S) and Hamming (H) similarity measures using a 5% significance level