An Empirical Evaluation of a Test Procedure for the Median of Symmetrical and Asymmetrical Populations Using an Interpolated Nonparametric Confidence Interval

This paper investigates, evaluates, and highlights the performance of a test procedure for the median of a single population using an old nonparametric interpolated confidence interval. Simulation results show that the test procedure under investigation strictly maintains the size at its nominal level and has generally higher empirical power under both symmetrical heavy-tailed and asymmetrical populations.


Introduction
Hettmansperger and Sheather (1986) proposed a nonlinear interpolation of adjacent order statistics to form a confidence interval for the population median (M) with intermediate value of the confidence coefficient. They investigated the performance of their confidence interval under four symmetrical models (Normal, Cauchy, Uniform, Double Exponential) and just one asymmetrical model formed by piecing together double exponential and logistic distributions. Hence, they did not detect the effect of skewness on their proposed confidence interval. In this paper, the performance of Hettmansperger and Sheather's confidence interval in the context of testing hypotheses for the population median is evaluated under different asymmetrical models and compared with six competitor test procedures.
When testing the null hypothesis (H ∘ : M = M ∘ ) against a composite alternative H 1 , it is known that the t-test is the uniformly most powerful unbiased test under the normal model while the Fraser (1957b) Normal Scores test is the asymptotically most powerful rank test. If there is a severe departure from normality, there are two possible approaches to follow. The first is to correct the data for departure from normality, by applying some power transformation, and then apply one of the classical test procedures to the transformed data. The second is to consider some alternative procedures that do not require normality. The sign, the Wilcoxon signed ranks, the Baklizi (2005) tests, or some adaptive tests are possible such alternatives. Various adaptive test procedures have been suggested in the literature. These tests are mainly based on a preliminary test of normality or symmetry or some measure of asymmetry, and then choosing the sign test, the Wilcoxon signed ranks test, or generally a data-driven score function to be used in constructing the linear signed ranks test accordingly. Lemmer (1993) proposed two adaptive test procedures to choose between the sign test and the Wilcoxon signed ranks test. The first test procedure depends on a measure of symmetry introduced by Randles and Wolfe (1979), while the second test procedure depends on the runs test statistic of symmetry introduced by McWilliams (1990). A shortcoming of Lemmer's two test procedures is the discontinuous nature of the test selection method. Another shortcoming is that the runs test may reject the null hypothesis of symmetry not because the distribution is asymmetric but because it is symmetric about another value different from the hypothesized value of the median. Freidlin et al. (2003) proposed an adaptive test procedure that uses the p-value of the Shapiro-Wilk normality test to choose an appropriate linear signed ranks test (the Wilcoxon scores, the T(2) scores, or the Cauchy scores) to analyze the pairs. A shortcoming with this test procedure is the inflation of the type(I) error due to the dependence between the Shapiro-Wilk test and the signed ranks test. Another shortcoming is that as the sample size increases the Shapiro-Wilk and other tests of normality will detect minor departures from normality and thus selecting a test that is good for a very heavy-tailed distribution even if the data are not far from normal. Baklizi (2005) proposed an adaptive test procedure that is based on modifying the Wilcoxon scores according to the evidence of asymmetry of the distribution present in the data as indicated by the magnitude of the p-value from the triples test of symmetry introduced by Randles et al. (1980). As the p-value of the triples test approaches zero the scores approach that of the sign test. As the p-value of the triples test approaches one the scores approach that of the Wilcoxon signed ranks test. The main advantage of this procedure is that it adapts its scores smoothly and continuously while the main disadvantage is its sensitivity to minor inappropriately detected departures from symmetry. Bandyopadhyay and Dutta (2007) proposed two adaptive test procedures, one is a probabilistic approach while the other is a deterministic approach. The deterministic approach is based on calculating a measure of symmetry and using it as a basis for choosing between the sign test and the Wilcoxon signed ranks test. The probabilistic approach uses the p-value of the triples test of symmetry to choose between the sign test and the Wilcoxon signed ranks test. The major shortcoming of these two procedures is the discontinuous nature of the test selection method. Gastwirth and Miao (2009) proposed an adaptive test procedure that uses a measure of tail-heaviness of the underlying distribution to choose the appropriate score function (the Wilcoxon scores, the normal scores, the T(2) scores, or the Cauchy scores) to be used in the construction of the linear signed ranks test. The major shortcoming of this procedure is that it is constructed for only symmetric populations. Gaafar (2016) proposed using the Yeo-Johnson family of power transformations and the Shapiro-Wilk test of normality to modify the classical normal scores test. The simulation results showed that this test is superior to all competitor tests under consideration in terms of preserving the empirical size of the test at its nominal level and having higher empirical powers.
Based on the Hettmansperger and Sheather's (1986), a test procedure for right-tailed, left-tailed, and two-tailed tests for the median of symmetrical and asymmetrical populations is introduced. A simulation study is conducted to evaluate the empirical performance of the recommended test procedure (HS) and compare it with six competitor test procedures: the t-test (T), the normal scores test (NS), the Wilcoxon signed ranks test (W), the Gastwirth and Miao (2009)  Section 2 introduces the triples test for symmetry used in the Baklizi (2005) test. Section 3 introduces six competitor test procedures. Section 4 introduces the recommended test procedure. Section 5 describes the design of the simulation experiment. Section 6 reports the main results and section 7 gives concise conclusions of the study.

The Triples Test
The Randles et al. (1980) triples test for the symmetry of the distribution of the random variable Χ about some unknown value is to reject the null hypothesis of symmetry if | | ⁄ , where ⁄ is the quantile of the standard normal distribution and

Competitor Test Procedures
This section presents six test procedures and their exact or approximate distributions under the null hypothesis to be tested (H ∘ : M = M ∘ ).
Let 1 be a random sample selected from a population with median M.
 When the sampled population is assumed to be normal: 1) the uniformly most powerful unbiased test statistic of the above hypothesis is given as: where ̅ and denote the mean and the standard deviation of the sample observations, respectively. Under H ∘ , has a t-distribution with (n-1) degrees of freedom.
2) the van der Waerden type of the asymptotically most powerful rank test of the above hypothesis has the following test statistic: where (= | M ∘ | ) denotes the rank of the absolute deviation of the i th sample observation from M ∘ and Under H 0 , the statistic has approximately a standard normal distribution. Where,  When the sampled population is not normal but assumed to be symmetrical: 1) the Wilcoxon signed ranks test is one choice. The test statistic can be defined as: Under H ∘ , the statistic has approximately a standard normal distribution. Where,

The Recommended Test Procedure
Hettmansperger and Sheather (1986) proposed the following % equal-tailed two-sided nonparametric interpolated confidence interval for the population median (M): Where k is chosen such that ( ) = = ∑ ( ) ≈ 1 1 and is defined as: They showed that their confidence interval gives exactly the desired coverage probability for the double exponential distribution and approximately the desired coverage probability for several other common symmetric distributions.
To conduct a right-tailed, left-tailed, or two-tailed test for the median of symmetrical and asymmetrical populations at a significance level ( ), the following steps are proposed: [

The Simulation Study
The simulation study was executed using R version 4.0.3. For each experimental situation described below, 5000 pseudo random samples, of sizes n=20, n=30, and n=50, were generated with initial seed 9831815. These samples are then used to evaluate the empirical levels of the 7 tests described in Sections 3 and 4. This number of replicates makes the upper limit of estimating the 5% nominal level within 0.007 at probability 99%. That is, tests with empirical levels exceeding 0.057 are considered liberal and invalid for the underlying testing problem.
The null hypothesis (H ∘ : M = M ∘ ) is tested against both right-tailed and left-tailed alternatives. To evaluate the empirical powers, the medians of the simulated distributions are shifted gradually from the true value on both directions.
The hypothesized values to be tested (M ∘ ) are determined as M ∘ = M ± δ , where M and are respectively, the population median and standard deviation (σ). When the standard deviation (σ) does not exist, as for the T(2) and Cauchy distributions, the median absolute deviations from the median ,"MAD", is used instead of (σ). Values of δ are selected to cover different spots over the range of the power function, (0.05, 1). The empirical power is computed as the percentage of times each test statistic exceeded its corresponding 5% cutoff point.
The simulation study covered the uniform, the standard normal, the standard logistic, the double exponential, the T(2), and the Cauchy distributions to represent symmetrical models and some other distributions to represent positively skewed models as described in table 1.  Table 2 gives the empirical levels, where the data is generated according to the null model, associated with the recommended test procedure, HS, and the other six competitors described in section 3. Tables 3 & 4 summarize the empirical powers, of valid tests, associated with some symmetrical and asymmetrical models respectively for just two sample sizes and two values of the alternative hypotheses under each model. The empirical powers of invalid tests are replaced by asterisks (*). The Appendix contains the full empirical levels and powers associated with all symmetrical and asymmetrical models under all configurations of the sample sizes and the values of the alternative hypotheses as described in section 5.
Using table 2, the following results can be deduced concerning the significance level associated with each of the seven test procedures under investigation: (1) The sign test is the most conservative test procedure, in the sense of rejecting the true model with a lower percentage of times than the specified nominal level, among the seven test procedures under both symmetrical and asymmetrical models.
(2) Under symmetrical models, the recommended test procedure, HS, and the other six competitors are all valid.
(3) Under asymmetrical models, (a) When the alternative hypothesis takes the direction of the longer tail, the HS, Sign, and Baklizi test procedures are the only valid test procedures under investigation. The other four test procedures are liberal and invalid, in the sense of rejecting the true model with a higher percentage of times than the specified nominal level.
(b) When the alternative hypothesis takes the direction of the shorter tail, the T, W, and NS test procedures are too conservative, while the recommended HS test procedure strictly keeps the size at its nominal level with the Baklizi then the sign test procedures trailing it. Using table 3, the following results can be deduced concerning the power associated with each of the seven test procedures under symmetrical models: (1) Under the normal model, the HS test is trailing the five tests (T, W, GM, NS, and Baklizi), but as the sample size increases the performance of the HS test approaches that of the uniformly most powerful test (T) and that of the asymptotically most powerful rank test (NS).
(2) Under the double exponential model, where the Wilcoxon signed ranks test is known to be highly correlated with the maximum efficiency robust test (Gastwirth, 1966), the HS test outperforms all the six competitor test procedures, especially under contiguous alternatives (those which are close to H ∘ ).
(3) As the sample size increases, the performance of the HS test procedure approaches that of the best test procedure.
(4) As the degree of tail-heaviness increases, the outperformance of the HS test procedure increases and trails that of the GM test procedure.
Using table 4, the following results can be deduced concerning the power associated with each of the seven test procedures under asymmetrical models: (1) The HS test procedure performs well under contiguous alternatives.
(2) When the alternative hypothesis takes the direction of the longer tail, the HS test procedure outperforms the other two valid tests, the Sign and Baklizi test procedures, as the sample sizes and degrees of skewness increase.
(3) When the alternative hypothesis takes the direction of the shorter tail, the HS test procedure outperforms all other test procedures under consideration, especially as the degrees of skewness increase.

Conclusions
In practice, the following notes should be considered: (1) When the evidence is in favor of the normality assumption, the T test or the NS test should be used.
(2) When the normality assumption is violated but the evidence is in favor of the symmetry or near symmetry assumption, the GM test or HS test is recommended.
(3) When both the normality assumption and the symmetry or near symmetry assumption are strongly violated, the HS or the Baklizi tests are recommended.

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).