Type I Error Rates of Ft Statistic with Different Trimming Strategies for Two Groups Case

When the assumptions of normality and homoscedasticity are met, researchers should have no doubt in using classical test such as t-test and ANOVA to test for the equality of central tendency measures for two and more than two groups respectively. However, in real life we do not often encounter with this ideal situation. A robust method known as Ft statistic has been identified as an alternative to the above methods in handling the problem of nonnormality. Motivated by the good performance of the method, in this study we proposed to use Ft statistic with three different trimming strategies, namely, i) fixed symmetric trimming (10%, 15% and 20%), ii) fixed asymmetric trimming (10%, 15% and 20%) and iii) empirically determined trimming, to simultaneously handle the problem of nonnormality and heteroscedasticity. To test for the robustness of the procedures towards the violation of the assumptions, several variables were manipulated. The variables are types of distributions and heterogeneity of variances. Type I error for each procedures were then be calculated. This study will be based on simulated data with each procedure been simulated 5000 times. Based on the Type I error rates, we were able to identify which procedures (Ft with different trimming strategies) are robust and have good control of Type I error. The best procedure that should be taken into consideration is the Ft with MOM Tn for normal distribution, 15% fixed trimming for skewed normal-tailed distribution and MOM MADn for skewed leptokurtic distribution. This is because, all of the procedures produced the nearest Type I error rates to the nominal level.


Introduction
The classical tests of group equality such as the t test and analysis of variance (ANOVA) are always misrepresented due to variance heterogeneity and nonnormality.When the problems of nonnormality and variance heterogeneity arise simultaneously, Type I error rates are usually inflated resulting in spurious rejection of null hypotheses and reduction in the power of the test statistics.These problems could be overcome by (i) applying transformation to the data and proceed with the procedures or (ii) selecting an alternative procedure which is insensitive (robust) to the problems.When data are non normal and variances are heterogeneous, it is often possible to transform the data so that the new scores are normal with equal variances.Although transformation is capable of normalizing skewed data, the method still has some drawbacks.According to Wilcox (2002), transformation (e.g.square root of the mean and log of the mean) overlooks the inferences of real data.It is also unable to eliminate the influence of outliers and sometimes even a complicated transformation fail to normalize the data.For that reason, this study will look into the second approach in alleviating the problem of non normality and heterogeneity of variances simultaneously.Lix and Keselman (1998) stated that tests which are sensitive to the combined effects of nonnormality and variance heterogeneity can be obtained by substituting robust measures of location and scale such as trimmed means and Winsorized variances in place of the usual means and variances respectively.Trimmed mean is a good measure of location because its standard error is less affected by departures from normality.This is due to the fact that the extreme values or outliers are removed (Lix & Keselman, 1998).It is a popular robust estimator used by many researchers because of its ability in controlling Type I error rate (Luh, 1999;Keselman, Wilcox, Taylor & Kowalchuk, 2000;Locker, 2001;Luh & Guo, 2005).A few examples of robust methods which uses trimmed mean are the Welch test (Welch, 1951), the James test (James, 1951) and the Alexander-Govern test (Alexander & Govern, 1994).Trimmed mean summarizes data when trimming is carried out.By using the trimmed means, the effect of the tails of the distribution is reduced by their removal based on the trimming percentage that has to be stated in advanced (fixed amount).Even though trimmed mean has a good control of Type I error rate, the trimming is done regardless of the types of distribution due to the percentage of trimming that is set prior to the facts whether the outliers are presence or not.It will be a gross mistake to eliminate data which are not outliers such as in a normally distributed data.There is extensive literature regarding this trimming method that uses the fixed amount of symmetric trimming.Among them are Lee and Fung (1985), Keselman, Wilcox, Othman andFradette (2002), andWilcox (2003).To avoid unnecessary trimming, an alternative to a fixed trimmed mean is the modified one-step M estimator (hereafter MOM).This estimator empirically trimmed extreme data based on the types (shape) of distribution (Othman, Keselman, Padmanabhan, Wilcox & Fradette, 2004).If we have skewed distributions then the amount of trimming on both tails should not be the same.More should be trimmed from the skewed tail unlike trimmed mean which trimmed the data regardless of the shape of the tails.Another approach of trimming which was investigated by Keselman, Wilcox, Lix, Algina and Fradette (2007) used fixed asymmetric trimming.This trimming approach uses hinge estimators proposed by Reed and Stark (1996) to determine the suitable amount of trimming on each tail of a distribution.However, their method still adopts the fixed trimming percentages.Motivated by the good performance of trimmed mean, this study will investigate on the three approaches of trimmed mean mentioned above i.e. (i) fixed (percentage) symmetric trimming, (ii) automatic trimming and (iii) fixed asymmetric trimming.These trimmed means will be used as the central tendency measures for a robust test statistic known as F t statistic.The performance of these methods in terms of type I error rates for the case of two groups are determined and compared.

Method
Three methods of trimming namely the fixed symmetric trimming, automatic trimming and fixed asymmetric trimming were examined in this study.The fixed symmetric and asymmetric trimming adopted 10%, 15% and 20% total amount of trimming while the automatic trimming trimmed data based on different scale estimators i.e.MAD n , T n , and LMS n , Each trimmed mean generated through different criteria of trimming was used as the central tendency measure for F t statistic.Altogether, this study produced nine different F t procedures.Each of these procedures was then compared under conditions of normality and non-normality represented by the g-and h-distributions.Lee and Fung (1985) introduced a statistical procedure that is able to handle problems with sample locations when nonnormality occurs but the homogeneity of variances assumption still applies.This statistic was named trimmed F statistic, F t .Their work focused on the best trimming percentages used to produce trimmed means which are able to control Type I error and to provide good power rates of the statistical procedure.They recommended the trimmed F statistic with 15% symmetric trimming as an alternative to the usual F test especially when the distribution is long tailed symmetric.This method is simple and easy to program.To further understand the F t method, let

F t statistic
be an ordered sample of group j with size n j and let Hence the g-trimmed F is defined as , where [1] J = number of groups, j g represents the proportion of observations in j th group that are to be trimmed in each tail of the distribution.tj X = the j th group trimmed mean, and tj SSD = the g-Winsorized sum of squared deviations.F t (g) will follow approximately an F distribution with (J -1, H -J) degrees of freedom.

Fixed symmetric trimmed mean Let
represent the ordered observations associated with the j th group.In order to calculate the 100g% sample trimmed mean, we define where g represents the proportion of observations that are to be trimmed in each tail of the distribution.
where j gn is the largest integer  j gn and The j th group trimmed mean is given by The g-Winsorized sum of squared deviations is then calculated as be an ordered sample of group j with size n j .MOM trimmed mean of group j is calculated by using: median of group j and the scale estimator can be either MAD n , T n or LMS n .K = 2.24 (multiplier of scale estimator) n j = group sample sizes The value K = 2.24 was suggested by Wilcox and Keselman (2003) in place of the multiplier of the scale estimator in the above criteria.They adjusted the K value so that efficiency is good under normality especially for small sample sizes.They found that, by using simulation with 10,000 replications, the efficiency of tj  (the standard error of the sample mean divided by the standard error of tj  ) is approximately 0.9 for n 1 = n 2 = n 3 = n 4 = n 5 = 20 with K = 2.24.tj  was arrived at using MAD n .We conducted a similar simulation study on tj  using robust scale estimators T n and LMS n , and found that the efficiencies are approximately 0.83 and 0.91, respectively.Hence, we kept the value of 2.24 in our selection criteria.Note that 2.24 is approximately equal to the square root of the 0.976 quantile of a chi-square distribution with one degree of freedom (Wilcox & Keselman, 2003).Indicating that, it is also suitable for skewed distribution.
For the equal amounts of trimming in each tail of the distribution, the Winsorized sum of squared deviations is defined as When allowing different amounts of trimming in each tail of the distribution, the Winsorized sum of squared deviations is then defined as, MAD n is the median absolute deviation about the median.It demonstrates the best possible breakdown value of 50%, twice as much as the interquartile range and its influence function is bounded with the sharpest possible bound among all scale estimators (Rousseeuw & Croux, 1993).This robust scale estimator is given by where the constant b = 1.4826 is needed to make the estimator consistent for the parameter of interest, and j i  However, there are drawbacks in this scale estimator.The efficiency of MAD n is very low with only 37% at Gaussian distribution.Rousseeuw and Croux (1993) carried out a simulation on 10,000 batches of Gaussian observations to verify the efficiency gain at finite samples.They compared the variance of the standard deviation with the variance of MAD n based on the finite samples.MAD n also takes a symmetric view on dispersion and does not seem to be a natural approach for problems with asymmetric distributions.

T n
Suitable for asymmetric distribution, Rousseeuw and Croux (1993) proposed T n , a scale known for its highest breakdown point like MAD n .However, this estimator has more plus points compared to MAD n .It has 52% efficiency, making it more efficient than MAD n .It also has a continuous and bounded influence function.Furthermore, the calculation of T n is easy.

Given as
where T n has a simple and explicit formula that guarantees uniqueness.This estimator also has 50% breakdown point.2.3.3LMS n LMS n is also a scale estimator with a 50% breakdown point which is based on the length of the shortest half sample as shown below: are the ordered data and . The default value of ' c is 0.7413 which achieves consistency at Gaussian distributions.LMS n has an influence function the same as MAD (Rousseeuw & Leroy, 1987) and its efficiency equals that of the MAD as well (Grubel, 1988).

Adaptive trimmed mean
This adaptive trimmed mean uses hinge estimator HQ 1 (Reed & Stark, 1996) in order to adjust the trimming process that suits the shape of data distribution.Keselman et al. (2007) successively improved Welch test using this adaptive trimmed mean in controlling Type I error rates.The adaptive trimmed mean is calculated as where is the sample size.The percentage of lower and upper trimming identified using hinge estimator HQ 1 (Reed & Stark, 1996).However the total percentage of trimming is predetermined just like the usual trimmed mean.
To define the lower and upper trimming percentage, let consider an ordered sample J, L  is the mean of the smallest [n] observations, where [n] denotes n rounded down to the nearest integer, while U  is the mean of the largest [n] observations.As for example, let  = 0.05, therefore L 0.05 is the mean of the smallest 0.05n observations.The measurement of Q 1 is defined as [11] Q 1 classifies whether a symmetric distribution has light (for Q 1 <2), medium (for 2.6<Q 1 3.2) or heavy (for Q 1 >3.2) tail.It is a location free statistic and uncorrelated with other location statistics.Reed and Stark (1996) defined a general scheme of their approach based on the former definitions of tail length as follows: i. Set the total amount of trimming, , from the sample.
ii. Determine the proportion to be trimmed from the lower end of the sample (l) by the proportion where UW x and LW x are respectively the portion of the numerator and denominator of the previously defined statistic (Q 1 ).The notation for UW x and LW x are as follows: iii.The upper trimming percentage is defined as:

Empirical Investigation
This study investigated on the performance (robustness) of F t statistic using different types of trimmed mean as the central tendency measure for the case of two unbalanced groups.In studying the robustness of the procedures, two variables were manipulated, creating conditions which are known to highlight the strengths and weaknesses of the tests.The two variables were: (1) population distribution and (2) degree of variance heterogeneity.Unequal group sizes, when paired with equal and unequal variances, can affect Type I error control for some statistical tests.Thus, several ratios of variances were considered, namely, 1:1, 1:8 and 1:36.The total sample size for the two groups was set at N = 30 (12, 18).To test for the effect of distributions, the g-and h-distribution (Hoaglin, Mosteller & Tukey, 1983) was used to represent the skewed and normal data.Three types of distributions representing normal, skewed normal-tailed and skewed leptokurtic were considered.Observations of the g-and h-distribution were generated by converting the standard normal variates using the following equation: The g-and h-distribution is modified from standard normal distribution with constant g controlling the value of skewness and h controlling the value of kurtosis.The level of skewness and kurtosis will increase as the value of g and h increase, respectively.The data is normal when g = 0 and h = 0.The values of (g, h) used in this study are (0, 0), (0.5, 0) and (0.5, 0.5).Table 1 summarizes the skewness and kurtosis values for the three selected situations (Wilcox, 2005).This study was based on simulated data.For data generation, SAS function RANNOR (SAS Institute, 1999) was used to obtain pseudo-random standard normal variates.In examining the Type I error rates, the group location measures were set to zero.For each condition examined, 5000 data sets were generated.The nominal level of significance was set at  = 0.05.

Results
The performance of the nine F t procedures under unequal sample sizes with various variances is shown in Table 2.By convention, a procedure can be considered robust if its Type I error rates is between   5 . 1 to 5 .0 (Bradley, 1978).Thus, when the nominal level is set at  = 0.05, the Type I error rate should be in between 0.025 and 0.075.
Type I error rates are considered liberal when they are above the 0.075 limit while those below the 0.025 limit are considered conservative.

Normal distribution
For normal distribution with g = 0 and h = 0, under N = 30, the F t procedure using fixed trimmed mean with 15% and 20% symmetric trimming produced Type I error rates within robust criterion regardless of the variances.In contrast, fixed asymmetric trimming was able to produce good control of Type I error rates for 10% trimming under all variance ratios.For automatic trimming, F t with MOM -T n and MOM -LMS n produced robust procedure when the ratio of variances were 1:8 and 1:36.4.2 Skewed normal-tailed distribution F t with fixed symmetric trimmed mean produced all robust result except for 10% trimming with variance ratio of 1:36.As for automatic trimming, only F t with MOM -T n having variance ratio of 1:8 and 1:36, and F t with MOM -LMS n having 1:8 variance ratio were able to show Type I error within robust criterion.Nevertheless, fixed asymmetric trimming produced good control of Type I error for all variances ratio except when heterogeneity was 1:36 for 20% trimming.

Skewed leptokurtic distribution
When using equal variance 1:1, all Type I error rates for fixed symmetric, automatic and fixed asymmetric trimming produced Type I error rates within the Bradley's criterion of robustnesss except for fixed asymmetric with 15% and 20% trimming.Under unequal variances, the procedures with 15% and 20% symmetric trimming, MOM -MAD n and 15% asymmetric trimming produced robust procedure.

Discussion and conclusions
To evaluate the robustness of a test, several other benchmarks have been used in the past.Procedures that were considered not robust for some researchers could be deemed as robust for others.Some researchers would consider that the procedures with conservative Type I error rates fail to perform.However, Mehta and Srinivasan (1970) and Hayes (2005) stated that conservative procedures in which the true Type I error rate is less than or equal to the nominal level can still be considered as robust.Yet, a conservative test will be lower in power than a less conservative test because a more conservative test is less likely to reject any null hypothesis (Hayes, 2005).While for liberal tests, Hayes (2005) also had a piece of advice, i.e. to avoid using such tests.He defined liberal test as a test that tends to underestimate the true p-value.Using a liberal test for testing hypothesis will increase the probability of Type I error to a value greater than the nominal level, which implies that there is a bigger risk of making a Type I error.Nonetheless, Keselman et al., (2000) pointed out that there is no one universal standard by which tests can be judged to be robust, so different interpretations of these results are possible.This study also identified some promising procedures that performed well in terms of Type I error.
Based on the results, for F t with fixed trimmed mean, 15% symmetric trimming is the best alternative because this procedure was able to produce Type I error rates which were very close to the nominal level for all the types of distributions tested (normal, skewed normal tailed and skewed leptokurtic).For F t with automatic trimming, the MOM -MAD n , MOM -T n and MOM -LMS n performed well for most of the conditions under skewed distribution regardless of variance ratios.As for F t with asymmetric trimming using hinge estimator, 10% trimming showed good result for normal distribution while 15% trimming produced good control of Type I error rates for all variance ratios under skewed normal-tailed distribution which the Type I error rates are the nearest to the nominal level compared than the other trimming percentages.
Overall, the best procedure that should be taken into consideration is the F t with MOM -T n for normal distribution, 15% fixed symmetric trimming for skewed normal-tailed distribution and MOM -MAD n for skewed leptokurtic distribution.All of these procedures produced the nearest Type I error rates to the nominal level.
lower trimming percentage,  u = upper trimming percentage and n j

Table 1 .
Some properties of the g-h distribution