New Test Statistics for One and Two Mean Vectors with Two-step Monotone Missing Data

We consider the tests for a single mean vector and two mean vectors with two-step monotone missing data. In this paper, we propose new test statistics for one sample and two sample designs based on the simplified T 2-type test statistic. Further, we present the approximation to the upper percentiles of these statistics and propose the transformed test statistics. Finally, we investigate the accuracy and asymptotic behavior of the approximation for χ2 distribution by a Monte Carlo simulation.


Introduction
The test for the equality of mean vectors has been discussed using the Hotelling's T 2 test statistic and likelihood ratio test statistic (see, e.g., Morrison (2005)). We often encounter the problem of missing data in many practical situations. In one sample problem, Jinadasa and Tracy (1992) obtained a closed form expression for the maximum likelihood estimators (MLEs) for the mean vector and the covariance matrix in the case of k-step monotone missing data. Krishnamoorthy and Pannala (1998) considered the likelihood ratio test statistic with k-step monotone missing data. Chang and Richards (2009) considered the T 2 test statistic for the mean vector with two-step monotone missing data. Further, Seko et al. (2012) discussed the T 2 test statistic and likelihood ratio test statistic using linear interpolation. Meanwhile, it is difficult to obtain an exact covariance matrix for the MLE of the mean vector since the T 2 test statistic is complicated with missing data. Therefore, Krishnamoorthy and Pannala (1999) provided a simplified T 2 test statistic and approximated the upper percentiles using the F distribution. They adjusted the freedom of F distribution using the expected value and the variance of the test statistic. They discussed k-step monotone missing data although they only provided MLEs with up to three-step monotone missing data. Yagi and Seo (2017) provided the approximate upper percentiles of the simplified T 2 test statistic using linear interpolation with the notation used by Jinadasa and Tracy (1992). Yagi et al. (2019) provided an asymptotic expansion for the null distribution of the simplified T 2 test statistic and improved the χ 2 approximation for the statistic in the case of two and k-step monotone missing data. In a two sample problem, Yu et al. (2006) reported a simplified T 2 -type test statistic with k-step monotone missing data using Krishnamoorthy and Pannala (1999)'s idea. Linear interpolation approximation for the null distribution of the Hotelling's T 2 -type statistic and likelihood ratio test statistic with two-step monotone missing data were reported by Seko et al. (2011). Moreover, Yagi and Seo (2017) also considered a two sample problem, with the pivotal quantities similar to the Hotelling's T 2 -type statistic in Yu et al. (2006). Yagi et al. (2018) extended the result of the one sample problem in their recent work. In this paper, we propose test statistics for one and two sample problems with two-step monotone missing data under consideration by Krishnamoorthy and Pannala (1998) and Yu et al. (2006). In a one sample problem, the part of the likelihood ratio in Krishnamoorthy and Pannala (1998) is used as the test statistics than we have. The two sample problem is considered in the same manner as that of the one sample problem. Further using the asymptotic expansion of the distribution, we present the transformed test statistics based on the Bartlett adjustment. The detailed explanation of the Bartlett adjustment is found in e.g., Anderson (2003). The improved transformations for the general test statistic are discussed by Fujikoshi (2000). The organization of this paper is as follows. In Section 2, we first present the assumption and notations. In Section 3, we derive an asymptotic expansion of the null distribution of the new test statistics. In Section 4, we consider transformed test statistics for the proposed test statistics and approximated upper percentiles of the distribution. In Section 5, we describe the Monte Carlo simulation that was implemented to investigate the accuracy for the null distributions of these statistics. Finally, in Section 6, we conclude this study.

Assumption and Notations
Let us consider the -th data set X ( ) that has the same monotone pattern as that of the two-step monotone missing data ( * : missing part = 1, 2) : where X ( ) 11 is a n ( ) 1 × p 1 block matrix, X ( ) 12 is a n ( ) 1 × p 2 block matrix, X ( ) 21 is a n ( ) 2 × p 1 block matrix, and X ( ) is independent and distributed as a multivariate normal distribution with a common covariance matrix. Next, we assume the distribution of the observation vectors in the following manner: respectively, where and µ ( ) and Σ are partitioned according to the blocks of the data set. Therefore, µ ( ) j ( j = 1, 2) is a p j -dimensional vector and Σ jm ( j, m = 1, 2) is a p j × p m matrix. Further, let µ ( ) be the p dimensional mean vector of X ( ) , where p = p 1 + p 2 . Let x ( ) (12)1 be the sample mean vector, S ( ) (12)1 be the unbiased sample covariance matrix of X ( ) (12)1 . Let x ( ) 1(12) (= (x ( ) 11 , x ( ) 12 ) ) be the sample mean vector, and S ( ) 1(12) be the unbiased sample covariance matrix of X ( ) 1(12) , where

One Sample Problem
We first consider the one sample problem. Further, in this case, we define the notation of the dataset by omitting "( )" from superscript of the notation defined in the previous section, for example, X ( ) = X. Then, the simplified T 2 -type statistic for the hypothesis is the following: (without loss of generality, we can assume in (1) that µ 0 = 0) is given by where ,22 − S 1(12),21 S −1 1(12),11 S 1(12),12 .
We note that this statistic was originally obtained by Krishnamoorthy and Pannala (1999). We also note that Q 1 and Q 2 are not exactly independent. We propose a test using R 2 , which is independent of Q 1 , instead of Q 2 . We suggest as a new test statistic, where The likelihood ratio statistic with two-step monotone missing data using R 2 is discussed by Krishnamoorthy and Pannala (1998). We can derive an asymptotic expansion of the test statistic more accurately because Q 1 and R 2 are exactly independent. Without loss of generality, we may assume that Σ = I = Initially, we consider a statistic expansion of Q 1 . Let Yagi et al. (2019) have provided the expansions of Q 1 and Q 2 . According to their, Q 1 is expanded as and the characteristic function of Q 1 is Similarly, let Then, Q 2 can be expanded as Therefore, the characteristic function of Q 2 is Further, we derive the characteristic function of R 2 . The denominator of R 2 can be expanded as follows: Then, R 2 is expanded as International Journal of Statistics and Probability Vol. 9, No. 6;2020 This means that an extra term − 1 N 1 (z 1 z 1 z 2 z 2 ) has been added to the expansion of Q 2 . Therefore, the characteristic function of R 2 can be expressed as Therefore, is the distribution function of χ 2 -variate with f degrees of freedom. An approximation to the 100α percentile of Q M is given by where χ 2 p (α) is the upper 100α percentile ofχ 2 distribution with p degrees of freedom.

Two Sample Problem
In this section, we consider the two sample case. Let us consider the following hypothesis: Yu et al. (2006) provided the simplified T 2 statistic for the two sample case. We express the simplified T 2 statistic as follows: where Then, we suggest as a new test statistic, where Without loss of generality, we may assume that Σ = I = In a derivation that is similar to the one sample case, at the beginning we consider a statistic expansion of Q 1 . Let Yagi et al. (2018) have provided the expansions of Q 1 and Q 2 . According to their work, Q 1 is expanded as Then, the characteristic function of Q 1 can be written as Then, let Then, Q 2 is expanded as Further, we derive the characteristic function of R 2 . The denominator of R 2 can be expanded as follows: ( ). Therefore, ).
The characteristic function of R 2 is where, Conforming the order of Q 1 and R 2 to ν 1 , Therefore, where G f (x) is the distribution function of χ 2 -variate with f degrees of freedom. An approximation to the 100α percentile of Q M is given by 4. Transformed Test Statistics

One Sample Case
The statistics Q 1 and Q 2 that are transformed by the Bartlett collection are given by Therefore, we suggest the transformed test statistic of Q M as Moreover, Fujikoshi (2000) suggested the Bartlett-type correction for general test statistics. Using this method, the Bartlett-type corrections for Q 1 and R 2 are respectively. Then, we suggest the transformed statistic of Q M by Bartlett-type correction as We also have the Bartlett correction of Q M as where c 1 = 1 p 1 1 + t p 1 (p 1 + 2) + p 2 (p + 2) .
This approximation q MKP (α) is closer to the truth than the approximation for Q that Krishnarmoorthy and Pannala (1999) originally proposed because Q 1 and R 2 are independent, although Q 1 and Q 2 are not independent.

Two Sample Problem
We suggest the transformed test statistics similar to that of the one sample case. The statistic Q 1 and Q 2 that are transformed by the Bartlett collection are given by Therefore, we suggest the transformed test statistic of Q M as Moreover, by applying the Bartlett-type correction of Q 1 and R 2 , we obtain the following: and Y 2M = ν 1 + − 1 2 (2p 1 + p 2 + 4) log 1 + 1 ν 1 R 2 , for ν 1 + − 1 2 (2p 1 + p 2 + 4) > 0.
Then, we suggest the transformed statistic of Q M by the Bartlett-type correction as We have the Bartlett correction of Q M as where c 1 = 1 p 1 1 + s p 1 (p 1 + 3) + p 2 (p 1 + p 2 + 3) .
In addition, using the result of Fujikoshi (2000), we can obtain the Bartlett-type correction as where a = p(p + 2) 1 1 + s p 1 (p 1 + 2) + p 2 (p 2 + 2) Applying the approximation proposed by Yu et al. to Q M , we propose q MY KP (α) as where .

Numerical Simulations
In this section, we describe the accuracy of the approximations and the asymptotic behavior of approximate upper percentiles of the test statistic Q M for the one sample and two sample problems. We compare the approximate upper percentiles of Q M , some of the transformed test statistics proposed in Section 4, and the test using Krishnamoorthy and Pannala (1999) and Yu et al. (2006)'s approximation. Further, we compare the test Q M we proposed with test Q. Their upper 100α percentiles and Type I errors are defined as the following: 1. q M and α χ 2 M = 100Pr(Q M > χ 2 p (α)) : for test Q M in (3) and (6) 2. q MAE and α MAE = 100Pr(Q M > q MAE (α)) : for asymptotic expansion approximation (MAE) test in (4) and (7) 3. (a) q MKP and α MKP = 100Pr(Q M > q MKP (α)) : for applying Krishnamoorthy and Pannala's approximation to Q M in (12) (8) and (13) (10) and (15) 6. q Y M and α Y M = 100Pr(Y M > χ 2 p (α)) : for test Y M based on Bartlett-type correction in (9) and (14) 7 (11) and (16) 8. q and α χ 2 = 100Pr(Q > χ 2 p (α)) : for test Q 9. q AE and α AE = 100Pr(Q > q AE (α)) : for asymptotic expansion approximation (AE) test 10. (a) q KP and α KP = 100Pr(Q > q KP (α)) : for applying Krishnamoorthy and Pannala's approximation to Q (one sample) : for applying Yu, Krishnamoorthy and Pannala's approximation to Q (two sample) 11. q Q * and α Q * = 100Pr(Q * > χ 2 p (α)) : for test Q * with Bartlett correction 12. q Q † and α Q † = 100Pr(Q † > χ 2 p (α)) : for test Q † with Bartlett correction 13. q Y and α Y = 100Pr(Y > χ 2 p (α)) : for test Y M based on Bartlett-type correction 14. q Y † and α Y † = 100Pr(Y † > χ 2 p (α)) : for test Y † with Bartlett-type correction We note that 1 -7 are about test Q M , and 8 -14 are about test Q. A Monte Carlo simulation (10 6 ) was conducted, considering a significance level α = 0.05. The settings of the parameters p(= p 1 + p 2 ), and sample size for the simulation are as follows: Case 1  Case 2 p 1 = 2, 4, 6, 8, 10, p 2 = 2 : f ix, n 1 = n 2 = 30, where, in the two sample case, we set n 1 = n (1) 1 = n (2) 1 , n 2 = n (1) 2 = n (2) 2 . Tables 1-4 list the simulation results for (I). Tables 5-8 list the for (II), which are obtained by changing the sample size of the complete data (n 1 ) to a larger value and fixing the sample size of missing data (n 2 ). Tables 9 and 10 list the simulation results for Case 2, which are obtained increasing the dimension of p 1 and fixing the missing data of p 2 . Tables 3,4,7,8, and 10 are for the two sample case. The upper percentiles of the test statistics are closer to the upper percentiles of χ 2 distribution when n 1 is large. Comparing q MAE and q AE , q MAE is a better approximation to the true value q M . In particular, the transformed test statistics were observed to have a faster convergence. Comparing the type I error, α MKP and α MY KP values are approximately 5.00 when the sample size is large. α Y M and α Y † M are always stable (almost 5.00 or more) for any sample size although we find that the type I error using q Y M gose away from 5.00 for large p, as shown in Tables 9 and 10. Judging in this light, the result using q MKP and q MY KP is good approximation even when p is large. Additionally, q Y † M was observed to be more conservative than q MKP .