Impact of Skewness on Statistical Power

In this study, in order to research the effect of skewness on statistical power, four different distributions are handled in power function of Fleishman in which the kurtosis value is 0.00 and the skewness values are 0.75, 0.50, 0.25 and 0.00. In the study, Kolmogorov-Smirnov two sample tests are taken benefit of and the importance level of α is taken as 0.05. The sample sizes used in the study are the equal and small sample sizes from (2, 2) to (20, 20); additionally, in this study the mean ratio of the samples are taken as 0:0.5, 0:1, 0:1.5, 0:2, 0:2.5 and 0:3. As per the results obtained from the study, when the kurtosis coefficient in the same sample size is maintained fixed and taken as 0, and when the skewness coefficient is increased from 0 to 0.25, big scale of changes do not occur in the statistical power. When the skewness ratio is increased from 0.25 to 0.50; while the ratio of the means are 0:0.5, 0:1 and 0:1.5; decrease is viewed in the statistical power, and when the ratio of the means are 0:2, 0:2.5 and 0:3; increase is viewed in the statistical power. And when the kurtosis coefficient is increased from 0.50 to 0.75 it is viewed that almost in all of the ratios mean the statistical power is increasing. It is concluded that the statistical power increases as the ratio of the means is increasing for all of the sample sizes.


Introduction
Parametric tests make precise assumptions about the distribution of the population from which the sampling being analyzed is taken from.If the raw data are not consistent with such assumptions the researcher may use different alternatives in order to analyze the data.One of these alternatives is to use the nonparametric procedures which require less or no assumptions about the distribution of the data however which are less powerful comparing to its parametric substitutes (Boslough & Watters, 2008).However, if the homogeneity of the assumptions, normal distribution etc. prerequisites required for the parametric tests do not exists, if the hypothesis to be tested is not about the parameters of the population or if the existing data are measured with a less powerful scale for a parametric tests, Nonparametric tests may be used (Daniel, 1990).
In this study it is researched how statistical power is affected in nonparametric tests.For this purpose, from the nonparametric tests used for testing the data obtained from two independent samples, Kolmogorov-Smirnov two samples (KS-2) test are selected.The study intends to establish a guide for the researchers who are willing to perform analysis by using skewed data in case of fixed kurtosis.

Power of Test
The researchers, while structuring their hypothesis, believe that the alternative hypothesis is correct, hoping that the sample data will provide the refusal of the zero hypothesis (Tabachnick & Fidell, 2007).Two types of errors are encountered in hypothesis tests, namely type I error and type II error.The provision of type II error is power and the power is defined by 1-β (Boslaugh & Watters, 2008).The power of a statistical test is the possibility of that test to provide a result that has a statistical meaning (Cohen, 1988).
The power of the test is generally expressed as given below; where A  is space of alternative hypothesis.In this expression, written here, it is viewed that there is a  function on the left of the equation.This is called the "power function" of the test (Geyer, 2001).

Statistical power analysis examines the relation between the four compounds given below:
 Size of the standardized effect (size of effect and variation),  Size of sample (n),  Size of test (α level of significance),  Power of test (1-β) (Mazen, Hemmasi, & Lewis, 1985).

Normal Distribution, Skewness and Kurtosis
It is developed to approach the binominal distribution when the normal distribution trial number is big and when the Bernolli possibility p is not close to 0 or 1 (Evans, Hastings, & Peacock, 2000).The normal distribution is defined by the µ central tendency and σ standard deviation.In a special type of normal distribution which is named as "standard normal distribution", µ=0 and σ=1.This special case serves as a reference in the possibility calculations where the normal model is used (Ramirez & Ramirez, 2009).If the possibility density function of random variable X is as follows: It is defined that this variable has the standard normal distribution constituted by real numbers.The graphics of this possibility density function is generally named as the bell shaped curve and this is the convergence of the most common distribution used in statistical methods (Handcock & Morris, 1999;Balakrishnan & Nevzorov, 2003;Eaton, 2007;Freedman, 2009).Vogt (2005) defined the skewness a measure that reflects the degree where a point distribution is asymmetric or symmetric (Vogt, 2005).Sometimes skewness is shown by the 3  or  symbols.The mean value of the skewness to be by µ and the standard deviation to be indicated by σ; the skewness is expressed as the expected value of the third standard moment and it is shown by where, if t X is distributed symmetrically; the 3  value and depending on this  values will be zero.That is to say, the skewness coefficient in symmetrical distributions is 0 (Bai & Ng, 2005).According to Balakrishnan and Nevzorov, the distributions of which the skewness coefficient is greater than zero are called as positive skew distributions and those of which the skewness coefficient is lower than zero are called the negative skew distributions (Balakrishnan & Nevzorov, 2003).
Kurtosis is a possibility with single mode or is the measure for deviating from a normal distribution by having higher dots (platykurtic) on the frequency distribution top or by getting more flat (leptokurtic).Generally, kurtosis is measure as 2 4 2 /   for a possibility distribution.Here 4  is the fourth central moment of the distribution and 2  is the variance.For a normal distribution this index takes the value of 3 and generally the index is redefined as the value over minus; thereby the normal distribution value will be zero.For a platykurtic distribution the index is positive and for a leptokurtic distribution the index is negative (Everitt, 2006).The kurtosis value of a multivariate distribution plays an important role for the convergence of the sample distribution of 2 T statistics.Let us consider the kurtosis shown by 4  in order to comprehend this role.Let the mean value of this value µ and standard deviation σ be known.The kurtosis is the expected value of the forth standard moment.In this case it is written as follows (Mason & Young, 2002):

Literature Review
This section involves the recent studies that investigate the impact of skewness on the power of parametric and nonparametric tests and some of the corresponding studies are as follows: Rasmussen (1983) carried out a Monte Carlo simulation to see the effect of skeewness on U test and t test.The results of the study revealed that U test is more powerful than t test.Penfield (1994) considered equal and unequal variances and nineteen double skewness and kurtosis intervals for different sample sizes.According to the results, when equal population variances exist between two samples, type I error rates are close to α significance level for all skewness and kurtosis levels.Keselman and Zumbo (1997) compared the powers between two expert tests (Wilcoxon-Mann-Whitney test and RSKEW test) and between two non-expert tests (Yuen A test and Yuen S test) in a two-sample problem.
Expert tests were found to be more powerful in situations where distributions are symmetric and data are skewed to the right, while non-expert tests were found to be more powerful where distributions are not skewed to the right.Skovlund and Fenstad (1997)

Simulation Study
In the study Monte Carlo simulation is used; and SAS 9.00 computer software is benefitted from in order to realize the simulations of the study.RANNOR procedure taking place in the SAS software is used to produce random numbers from the normal distribution of which the standard deviation is one and the mean is zero as required for the power conversion method used by Fleishman (1978) to produce the population distributions (Fan, Felsovalyi, Sivo, & Keenan, 2003).The formula which used in Fleishman's power function is as follows: (5) where X is a random variable distributing normally which mean is zero and the standard deviation is one.X, is produced by the RANNOR procedure in the SAS software.The Y, used in the formula is a distribution depending on the constants.a, b, c and d coefficients that are the coefficients in the power function are defined by basing on the related conditions to work with the methods of standard deviations, skewness matching and kurtosis.The a is a constant , c a   b , c and d values are the values produced by Fleishman.After establishing the sample mechanisms, PROC NPARIWAY procedure is used to show the power simulations.
In the study four distribution having different skewness values are benefitted from; these take place in Fleishman's power function and have the kurtosis value of 0. The distribution which is subject of the study is a normal distribution and a skewed distribution having the kurtosis value of 0 as derived from a normal distribution.The skewness values of these distributions, respectively, are 0.75, 0.50, 0.25 and 0.00.
In the performed study, for four different distributions; from (2, 2) to (20, 20) nineteen equal and small sample sizes are handled.The ratio of the mean of the first sample to the mean of the second sample are taken as µ 1 :µ 2 =0:0.5, µ 1 :µ 2 =0:1, µ 1 :µ 2 =0:1.5, µ 1 :µ 2 =0:2, µ 1 :µ 2 =0:2.5 and µ 1 :µ 2 =0:3.α significance level is selected as 0.05 and the simulations are realized by determining formulas which will be used in tests statistics for KS-2 test.In the study four different distributions, nineteen different sample sizes and six different mean ratios are used and 4x19x6, that is, totally 456 syntaxes are written and the simulation results are obtained by doing 20.000 repetitions for each syntax.
The simulation steps used in this study were as follows:  The Fleishman power function was use for µ=0 and σ=1.In this context, 4 population distributions with different skewness and kurtosis values were generated by running SAS/RANNOR program.


The significance level was selected as α =0.05 for this study.


The null and alternative hypotheses for the comparison of KS-2 test simulations were as follows: , For all values of x's, there is no difference between two populations from -∞ to +∞.

: ( ) ( ) H F x G x
 , For at least one value of x there is some difference between two populations (Siegel & Castellan, 1988;Conover, 1999).


The data obtained by two samples were identified as (n 1 , n 2 ), where n 1 and n 2 denote the first and second sample sizes, respectively.


The formulation and decision rule to be used for small sample sizes (both n 1 and n 2 are less than 25) of KS-2 test are as follows (Daniel, 1990): Let S 1 (x) = (Number of observations of X, which are less than or equal to x)/n 1 S 2 (x) = (Number of observations of X, which are less than or equal to It may be stated that if two samples are selected from similar populations, for all x's, the values S 1 (x) and S 2 (x) will be close.For particular values of x, the null hypothesis (H 0 ) cannot be rejected, if the test statistics which gives the maximum difference between S 1 (x) and S 2 (x).In addition, if the value of D is large enough, H 0 will be rejected (Daniel, 1990;Conover, 1999). Two independent samples with 19 different sample sizes were randomly obtained by 4 population distributions from (2, 2) to (20, 20) sample sizes. KS-2 test statistics values were calculated for the corresponding samples. These test statistics were compared with critical table values to determine whether or not the null hypothesis (H 0 ) will be accepted.


This procedure was repeated 20.000 times for each possible condition and the numbers of rejections of the null hypothesis for KS-2 test were determined by running SAS/RANNOR command.


The result of the difference between the number of rejections and repetitions was divided by the number of repetitions.The initial result gives the researchers the value of statistical power.

Simulation Results
According to the simulation results obtained in the study, similar properties are encountered between the four distributions which are the subject of the study.In all of the distributions, excluding a few exceptional cases, it is concluded that as the volumes of the samples increase the statistical power increases and in addition to that the increase in the ratio of the means also have positive effect on the power.
In the study, the coefficient of kurtosis (γ 2 ) is a fixed; depending on the increase of the skewness coefficient (γ 1 ), it is aim to measure the change of the statistical power values of KS-2 test.According to the obtained simulation results, when the kurtosis coefficient is a fixed and taken as 0 in the same sample, and when the skewness coefficient is increased from 0 to 0.25, there does not occur any significant changes in the statistical power of KS-2 test.When the skewness coefficient is increased from 0.25 to 0.50, while the ratios of the means are 0:0.5, 0:1 and 0:1.5 a decrease is viewed in the statistical power of KS-2 test and when the ratios of the means are 0:2, 0:2.5 and 0:3 an increase is viewed in the statistical power.When the skewness coefficient is increased from 0.50 to 0.75, it is concluded that statistical power of KS-2 test is increased in almost all of the mean ratios.
According to another result obtained in this study; as the ratio of the mean of the first sample to the mean of the second sample (µ 1 :µ 2 ) increases, an increase is viewed in the statistical power of KS-2 test.The greatest increase in the statistical powers is observed when µ 1 :µ 2 ratio is increased from 0:0.5 towards 0:1.According to the simulation results, the lowest increase in the statistical powers is encountered when µ 1 :µ 2 ratio is increased from 0:2.5 towards 0:3.
The statistical power values related to KS-2 test are presented in Table 2, Table 3, Table 4 and Table 5; depending on increasing the skewness coefficient in small and equal sample sizes in case of constant kurtosis.

Conclusion and Suggestions
In all of the four distributions, which were subjected to the analysis, it is observed that as the sample sizes increase, the statistical power increases, however, there are still some exceptional cases.In all of the distributions, the statistical power of (4, 4) sample size is greater than the statistical power of (5, 5) sample size.Similarly, again in all of the distributions, the statistical power of (6, 6) sample size is greater than the statistical power of (7, 7) sample size; the statistical power of (9, 9) sample size is greater than the statistical power of (10, 10) sample size; the statistical power of (13, 13) sample size is greater than the statistical powers of (14, 14) and (15, 15) sample sizes and the statistical power of (17, 17) sample size is greater than the statistical powers of (18,18) and (19, 19) sample sizes.The greatest statistical power values for all of the distributions are encountered in (20, 20) sample sizes.
As the mean ratios increase in all distributions and in all sample sizes, it is observed that the statistical power is also increasing.It is concluded that the greatest power increase occur in all of the distributions when the mean ratios are increased from µ 1 :µ 2 =0:0.5 to µ 1 :µ 2 =0:1.In addition to this, the smallest power increase is encountered when the mean ratios are increased from µ 1 :µ 2 =0:2.5 to µ 1 :µ 2 =0:3.
Considering all the results we have obtained; if the researchers, who are willing to apply nonparametric tests on the data obtained from two samples, in case of fixed kurtosis select their samples from the distributions having skewness coefficient of 0.75, they may reach a greater statistical power.Similarly, the researchers may obtain maximum power for the KS-2 test if they study with 20 or more samples in case of fixed kurtosis when the skewness coefficient is 0 and µ 1 :µ 2 ratio is 0:2; with 13 or more samples when µ 1 :µ 2 ratio is 0:2.5; with 11 or more samples when µ 1 :µ 2 ratio is 0:3; and 20 or more samples in case of fixed kurtosis when the skewness coefficient is 0.25 and µ 1 :µ 2 ratio is 0:2; with 15 or more samples when µ 1 :µ 2 ratio is 0:2.5; with 11 or more samples when µ 1 :µ 2 ratio is 0:3; and with 20 or more samples in case of fixed kurtosis when the skewness coefficient is 0.5 and µ 1 :µ 2 ratio is 0:2; with 13 or more samples when µ 1 :µ 2 ratio is 0:2.5; with 11 or more samples when µ 1 :µ 2 ratio is 0:3; and with 17 or more samples in case of fixed kurtosis when the skewness coefficient is 0.75 and µ 1 :µ 2 ratio is 0:2; with 13 or more samples when µ 1 :µ 2 ratio is 0:2.5, with 11 or more samples when the µ 1 :µ 2 ratio is 0:3.
Lee (2007)t a research to determine in which situations it is appropriate to use t test, Welch t test and Wilcoxon order total test, in a normal population distribution and under different skewness values.The most important result obtained from the study is, the lack of power of all the tests in situations where the unequal variances exist together with skewed populations.Lee (2007)compared the powers and type I error rates of Mann-Whitney test and Kolmogorov-Smirnov two sample tests.The author carried out a simulation study at 0.05   significance level, taking into account different skewness and kurtosis values.Population values are different from each other, in terms of skewness and kurtosis, while Mann-Whitney test in small sample cases and Kolmogorov-Smirnov two sample tests in large sample cases are more powerful.Fagerland and Sandvik (2009) performed a simulation study in order to determine the type I error rates and powers of two sample T test, Welch U test, Yuen-Welch test, Wilcoxon-Mann-Whitney test and Brunner-Munzel test, under different skewness and kurtosis values and The results revealed that two sample T test and Welch U test are more powerful than all other tests.