Comparisons of the Satterthwaite Approaches for Fixed Effects in Linear Mixed Models

,


Introduction
Data analysts and practicing statisticians usually encounter problems in making inference for the fixed effects in normal linear mixed models where the fixed effects are of interest and the random effects are a source of errors.When some observations are lost or not available for some reasons, the data become not balanced and the analysis becomes more complicated, where the conventional Anova table does not provide an exact test, and approximation is needed.For a data vector y distributed as multivariate normal distribution with mean , Xβ and β is the vector of the fixed effects parameters, suppose that we are interested in testing the linear hypothesis 0 H:  L β0 where L is fixed and has independent columns.A Wald-type statistic of the form where β and () Var β are estimators of β and () Var β respectively, usually follows a chi square distribution with degrees of freedom for large samples.However, for small samples, this approximation might not be appropriate.Giesbrecht and Burns (1985) suggested a method to determine the degrees of freedom for an approximate t-test (GB test).Also, Jeske and Harville (1988) suggested a procedure to determine the degrees of freedom for an approximate t-test with adjusting the variance estimate of the fixed effects estimator () Var β (JH test).Both procedures are limited for hypotheses of rank 1, and the degrees of freedom is estimated in a way analogous to Satterthwaite's approximation (Satterthwaite, 1946).Based on these two procedures, Fai and Cornelious (1996) proposed four approximate F tests for the hypotheses of rank greater than 1.In fact, they extended the GB and JH procedures by matching the first and second moments of the F distribution with the test statistic for each procedure.For procedures obtained by matching the first moment, a Wald-type F statistic used, and the denominator degrees of freedom is estimated based on the data.On the other hand, for procedures obtained by matching the second moment, a scaled F statistic used, and the denominator degrees of freedom and scale factor are estimated based on the data.The extended procedures of the GB test by matching the first and second moments will be called the GB1 and GB2 approaches respectively, and similarly, the extended procedures of the JH test will be called the JH1 and JH2 approaches.Fai and Cornelious compared the performance of the four proposed procedures through a simulation study for split plot designs.They concluded that the GB1 and JH1 approaches performed similarly and adequately more than other approaches which found to be more liberal.The popular procedure called the Satterthwaite procedure to approximate the denominator degrees of freedom, and embedded in most statistical software packages is actually the GB1 procedure.This test might be executed in SAS by using the option DDFM=SATTERTHWAITE in the model statement of the PROC MIXED procedure (SAS Institute Inc, 2021).
Many researchers evaluated the performance of the GB1 approach and compared it to other known approaches (e.g., Luke, 2017;Monar & Zucker, 2004;Kuznetsova et al., 2017;Spilke, 2005;Schaalje et al., 2002).However, the other forms of tests derived by Fai and Cornelious, except the GB1 procedure, have not received enough attention in the research literature neither analytically nor through simulation studies.In this paper, we investigated the inner working of these four procedures in order to compare the performance of these procedures analytically and through a simulation study for block designs different from the design studied by Fai and Cornelious (1996).In addition, we proposed two more procedures analogous to procedures JH1 and JH2 with using the adjusted variance-covariance of the estimate of fixed effects parameters derived by Harville and Jeske (1992).
In Section 2, and in addition to the model and notation, we review the four approaches developed by Fai and Corenelius (1996).Also, we present two more approaches based on the adjustment of the variance-covariance matrix of the fixed effects proposed by Harville and Jeske (1992).These two approaches will be called HJ1 and HJ2 approaches.Comparisons of the six approaches (i.e., the GB1, GB2, JH1, JH2, HJ1, and HJ2 approaches) based on their test statistics, computed denominator degrees of freedom and scales is discussed analytically in Section 3. In Section 4, we present a simulation study for three block designs to evaluate and compare the performance of the six approaches based on the observed test levels.The computed denominator degrees of freedom and scales are presented and compared for each approach.

The Satterthwaite Approaches to Test the Fixed Effects in Linear Mixed Models
Consider n observations y follow a multivariate normal distribution with mean Xβ , and variance covariance .
Σ The matrix X is a known np  matrix, and β is a 1 p  vector of the fixed effects parameters, and Σ is a positive definite nn  matrix, and linear of r unknown variance components , G are known nn  symmetric matrices.Suppose that we are interested in testing the linear hypothesis 0 H:  L β0 where L is a fixed p  matrix.Fai and Cornelious (1996) proposed four approximate F-tests.The first two tests are extensions for the GB test (Giesbrecht & Burns, 1985) to accommodate the hypotheses of fixed effects for rank greater than 1.The first test, called GB1, derived by introducing a Wald-type statistic follows approximately an F distribution, and the denominator degrees of freedom of the F distribution is computed by matching the exact first moment of the F distribution and the approximated first moment of the Wald-type test statistic.For the second test, called GB2, a scaled Wald-type statistic follows approximately an F distribution, and the denominator degrees of freedom and scale factor computed by matching the exact two moments of the F distribution with the approximated two moments of the scaled Wald-type test statistic.Similarly, the third and fourth tests, called JH1 and JH2, are extensions of the JH test (Jeske & Harville, 1988), and the denominator degrees of freedom and scale factor computed by using the one moment and two moments approximations respectively.Fai and Cornelious (1996) extended the GB method for testing hypotheses of rank greater than one.They started with the sum of squares for 0 H as:

Review of the GB1 and GB2 Approaches
1) where β is the estimated generalized least squares estimator of the fixed effects β , which is and  ̂=  ̂1 1 + ⋯ +  ̂  ,where ˆ' i s  are the REML estimates of the variance components ' i s  (Corbiel & Searle, 1976).In the GB method, the matrix  ̂=  ′  ̂−1  is used as the estimator of ( ).

Var β
That is  ̂( ̂) =  ̂=  ′  ̂−1 , and t s are distributed approximately independently as t-distribution with degrees of freedom m  for each, we obtain: For extending the GB method by using the two-moment approximation, two quantities need to be computed: the scale where ()

GB
Var T is obtained from expression (2.5):

GB
ET is as in (2.6), and m  is approximated as in (2.7).

Review of the JH1 and JH2 Approaches
Since the traditional estimator of ( ), Var β matrix ˆ, Φ tends to underestimate, Jeske and Harville (1988) adjusted the estimator using the adjustment suggested by Kackar and Harville (1984) }  ̂ (2.12) These quantities to be estimated by substituting the REML estimates of the variance components θ for .θ Similar to the extension of the GB approach, Fai and Cornelious (1996) started the extension of JH approach with Wald-type statistic: In the first extension, that is the JH1 approach, we have suggested by Harville and Jeske (1992), and used by Kenward and Roger in their approach known as the KR estimation (Kenward & Roger, 1997) where Λ is approximated as in expression (2.9). Hence where Λ is as in expression (2.12).
Analogous to the JH1, and JH2 approaches, consider: (2.19) and two approaches developed by using the one-moment and two-moment approximations.The first approximation, which will be called JH1, is obtained by using the one-moment approximation, where we have  ).

( ) . (
) Also, similar to the extensions of the GB and JH approaches, the approximation of ()

HJ
ET is obtained by using the spectral decomposition '' HJ L Φ L = P DP for P an orthogonal matrix, D is a diagonal matrix of eigenvalues ' k ds and 0 and The variance matrix of the fixed effects estimates used in the proposed Wald-type statistic is the same as in the Wald-type Kenward-Roger test.In fact, in some special cases, both tests might deliver same values for the denominator degrees of freedom and scale factor.However, in general, the tests are different regarding the obtained denominator degrees of freedom and scale.The Kenward-Roger test uses the Taylor series expansion method to obtain an approximate moment for the test statistic, whereas the spectral decomposition method is used in the Satterthwaite based approaches.

Comparisons of the Satterthwaite Approaches
All approaches presented in section 2 use similar Wald-type statistics, and the differences among them may be described based on two sources: first, the estimator of () Var β used in the statistics, where it is  ̂,  ̂ , and  ̂ for the extension of the GB, JH, and HJ approaches respectively.Second, the GB1, JH1, and HJ1 approaches were developed based on using the one-moment approximation where the denominator degrees of freedom are computed, whereas the GB2, JH2, and HJ2 approaches developed based on the two-moment approximation, where a scaled test statistic used, and both the denominator degrees of freedom and the scale factor are computed.The performance of these approaches typically depends on the test statistic (including the obtained scale) and the denominator degrees of freedom.Accordingly, besides evaluating and comparing the performance of the approaches through simulation studies, it is reasonable and useful to compare their test statistics, computed denominator degrees of freedom, and scale analytically.In this section, we present some useful results about the approaches' statistics, computed denominator degrees of freedom and cale.

General Formulas for the Satterthwaite Approaches
It can be seen that the denominator degrees of freedom and scales produced by approaches presented in Section 2 follow general formulas.For the test statistic T as in (2.1), and for the one-moment approximation, we have ( , approximately under 0 H.To find one  , the approximate first moment of F statistic is matched with the exact first moment of ( , ), one F  and we have: ET is approximated by performing the spectral decomposition ' ( ) ' Var L β L = P DP for P an orthogonal matrix, D is a diagonal matrix of eigenvalues ' q ds and 0 The quantity q g is the gradient of () qq Var  a βa with respect to , θ and q a is the q th row of . PL Note that , , and

GB JH HJ T T T T =
for the extension of the GB, JH, and HJ approaches respectively.
For the two-moment approximation, the scale  and the denominator degrees of freedom two

Comparisons of the Test Statistics, Computed Denominator Degrees of Freedom, and Scales
From expressions (3.1b) and (3.4b), we obtain: where one  is the denominator degrees of freedom obtained by using the one-moment approximation, two  and  are the computed denominator degrees of freedom and scale obtained using the two-moment approximation.Since typically 1,   and the function ( ) / ( 2) is a decreasing function, it is reasonable to expect that typically one two   (see remark 2 below).On the other hand, and provided 1, where 1 F and 2 F are the test statistics used in the one-moment and two-moment approximations respectively for the extensions of the same approach (i.e., the GB, JH, HJ approaches).In addition, comparing the test statistics for different approaches using the one-moment approximation is analytically possible by adopting the definition of Loewner order of symmetric matrices (Pukelshiem, 2006).Since Λ is a non-negative matrix, we have: ̂ =  ̂+ 2 ̂≥  ̂ =  ̂+  ̂≥  ̂, and form Abadir and Magnus (2005), we obtain 1, = it should be true that both extensions for each approach; using the one-moment and two-moment approximations, are identical and they are equivalent to the original approach (i.e., the GB and JH approaches).The following lemma is to verify this fact.
From the expression of the scale (3.4b) above, and provided that ( ) , ET  the simulation studies show that typically 1.
  However, the expression formula does not guarantee that the scale  doesn't exceed 1.In fact, this occurs most likely when the computed denominator degrees of freedom using the second moment approximation, 2  gets closer to the value of 4. This happens because of the data character.
Consider the design matrix of the model X is partitioned as 1 2 [ , ] X = X X where: , and ( ) , θ A where f is a scalar function, and A is a fixed matrix (i.e., doesn't depend on θ ).Also, suppose that 2 X is the correspondent matrix to the fixed effects to be tested, so that Some well-known designs follow this model partition (e.g., the balanced incomplete block designs), and this model partition has been noticed to have specific properties for several approximate tests that use a Wald-type statistic.Alnosaier and Birkes (2019) studied this partition model for the Kenward-Roger approximate test and other alternative tests.
Lemma 2 For models that satisfy the partition as above, the approximations of q  as in expression (3.3) are the same for all 1, , .q = Lemma 3 (a) For models that satisfy the partition above, the denominator degrees of freedom using the two-moment approximation, two  is a linear function of the denominator degrees of freedom using the one-moment approximation, .If a model satisfies the condition that  XX P Σ ΣP for all , Σ where X P is an orthogonal projection operator on the range of X, it is said that the model satisfies Zyskind's condition (Zyskind, 1967).For models that satisfy Zyskind's condition, we add an assumption that met by most models which is Lemma 4 (a) For models that satisfy Zyskind's condition and the assumption above, the GB1, JH1 and HJ1 approaches are identical, and the GB2, JH2 and HJ2 approaches are identical.
Detailed proofs of lemmas 1-4 are in the Appendix.

A Simulation Study
To evaluate and compare the performance of the Satterthwaite approaches presented in this paper, we conducted a simulation study for 15 models with three different block designs (e.g., five models for each design).
Design 2: A complete block design with 36 observations, 6 treatments, 6 blocks, and deleting two observations from different blocks and different treatments.
Design 3: A partial incomplete block design with 21 observations, 9 treatments, 7 blocks, and the maximum block size is 3, (Kuehl, 2000, p. 329 with omitting the last two blocks).
The model for the block designs can be written as ijk e e  In order to have a full column rank matrix of the fixed effects, we reparametrize the model as : . The reparametrized model can be expressed as The null hypothesis to test that there are no treatment effects is For each design, 10,000 data sets have been simulated under the null distribution with no treatment effects, and assuming 0   = , for five settings of the ratio b e    = : 0.25, 0.5, 1, 2, 4. The simulation was done by generating the random terms in the model (i.e., the blocks and residual errors) for each setting of  using Matlab 2020.
To run the simulation study for the Satterthwaite approaches, the quantities that needed to be computed have their expressions presented in the previous sections.However, some expressions need to be derived to be in computable forms (See the Appendix).

Computing the Denominator Degrees of Freedom and Scales
The iteration algorithm in expression 4.1in the Appendix to compute the REML estimates of the variance components did not converge for some data sets.Table 1 presents the percentage of the generated data sets that converge for each model under the null hypothesis (rounded to two decimal places).The percentage increased as the ratio  increased, and almost all of data sets for models with designs 1 and 2 converged.For models with the smallest design (i.e., design 3), it was noted that there were significant number of data sets that did not converge for smaller values of . Table 2 presents the percentage of generated data sets satisfied these required conditions for each approach and each model under the null distribution.For models with larger designs (i.e., designs 1 and 2), it was noted that almost all data sets met the conditions for the GB1 and GB2 approaches.However, significant number of data sets did not meet the conditions with smaller values of  for the other approaches.It was also noted that more data sets did not meet the conditions for models with the smallest design (i.e., design 3), and especially with small values of .
 Only 14.02% of data sets met the conditions of the HJ2 approach for the model with design 3, and 1  = .In fact, except the GB1 and GB2 approaches, most data sets did not meet the conditions for 1   to compute the denominator degrees of freedom.Unlike the PROC MIXED procedure of SAS where the denominator degrees of freedom one  set as zero when the condition is not met, for the simulation purpose of evaluating the performance of the approaches, we considered only data sets met the conditions required to compute the denominator degrees of freedom.
Table 4 in the Appendix shows the average of the computed denominator degrees of freedom for the Satterthwaite approaches.It can be clearly seen from table 4 that the average of computed denominator degrees of freedom using the one-moment approximation is less than the average of computed denominator degrees of freedom using the two-moment approximation for all six approaches.This coincides with our expectation in Section 3.2.Table 5 in the Appendix shows the average of scales of all approaches using the two-moment approximation, and they found to be less than one.However, this does not mean necessarily that the scales were less than one for all data sets.In fact, it was found that the scale for the GB2, JH2, and HJ2 approaches exceeded one for few data sets.For the smallest design, the scale values computed by the approaches using the two-moment approximation, were not very stable, where more data sets produced scales exceeded one, and other data sets produced scales significantly smaller than usual (e.g.0.2 and 0.3).This explains why the average scales for design 3 were smaller than for the other designs.For the models with design 1, it was found that all produced scales for the GB2, JH2, and HJ2 approaches with all values of  did not exceed one, and this coincides with Lemma 3 (part b) in Section 3 since the balanced incomplete block design satisfies the model partition of lemma 3.

Comparisons of the Observed Test Levels
The performance of all six approaches was evaluated and compared based on the computed test levels of the approach.The observed test level was computed by the proportion of generated data sets under the null hypothesis, converged to compute the REML estimates of the variance components, and satisfied the required conditions to compute the denominator degrees of freedom of which 1 ( , ) one FF  for approaches using the one-moment approximation, and of which 2 ( , ) two FF   for approaches using the two-moment approximation, as these quantities described in Section 3. Typically, this proportion is desirable to be close to the nominal level which chosen to be 0.05.However, it is not expected that the observed test level to be exactly 0.05 due to the simulation error which is 22% / n where n is the number of generated data sets, converged to produce the REML estimates of the variance components, and met the conditions to produce the denominator degrees of freedom as explained in Section 4.1.Also, in comparing the observed test levels, the difference of 0.2% was considered unimportant.
As presented in table 3, it is notable that the test level for approaches using the two-moment approximation were more liberal than for approaches using the one-moment approximation for all models.This was expected because the computed denominator degrees of freedom tended to be larger for approaches using the two-moment approximation as seen in Section 4.1.Apparently, the effect of the computed scale factors on the test level was minimal comparing to the computed denominator degrees of freedom.For models with the largest design(i.e., design 1), the JH1 approach performed well for all values of ,  whereas the GB1 and HJ1 approaches performed well only for larger values of .  The GB1 approach found to be so liberal, and the HJ1 approach found to be excessively conservative for small values of ,  and this is due to their average of computed denominator degrees of freedom.For example, for 0.25,  = the average computed denominator degrees of freedom for the GB1, JH1, and HJ1 approaches are 34.61, 27.94, and 24.41 respectively.The approaches GB2, JH2, and HJ2 were found to be liberal for all values of ,  however, the HJ2 approach performed more adequately and less liberal than others.For models with design 2, the GB1, JH1, and HJ1 approaches performed almost the same, and they performed adequately.They found to be slightly conservative for 1  = and 2. The GB2, JH2, and HJ2 performed similarly and found to be liberal for all values of . For models with the smallest design (i.e., design 3), and for small values of ,  the GB1 approach was so liberal, and the HJ1 approach seemed to perform slightly conservative.The JH1 approach was found to outperformed other approaches for small values of .
 However, the majority of generated data did not meet the conditions to compute the denominator degrees of freedom for the JH1 and HJ1 approaches for small values of .
 For larger values of ,  the GB1, JH1, and HJ1 approaches performed adequately for models with design 3. The GB2, JH2, and HJ2 approaches did not perform adequately, however, the JH2 and HJ2 approaches were found to be more similar and more acceptable.In general, the approaches did not perform with the smallest design as well as with the other deigns, and this was expected from Section 4.1 where the approaches were not very stable in producing the typical denominator degrees of freedom and scales.

Conclusions
The six approximate F-tests considered in this paper are different based on the estimator for the variance of the estimate of fixed effects used in the Wald-type test statistic, and the type of moment method approximation used to compute the denominator degrees of freedom and scale.For a Wald-type statistic that distributed approximately as F distribution, the one-moment approximation was used to compute the denominator degrees of freedom.However, when a scaled Wald-type statistic distributed approximately as F distribution, the two-moment approximation was used to determine the denominator degrees of freedom and the scale factor.Analytically, and under specific conditions, we found that all the six approaches' performances are the same, and under other conditions, some of these approaches are identical.In general, the denominator degrees of freedom computed by approaches which used the two-moment approximation found to be greater than those for approaches which used the one-moment approximation.This typically affected the approaches which used the two-moment approximation to be significantly liberal comparing to those used the one-moment approximation.This coincided with the results of the simulation study, which consisted of running 15 models (3 block designs with 5 settings of the ratio of the variance components), where the approaches used the two-moment approximation (i.e., GB2, JH2, and HJ2) found produced larger denominator degrees of freedom, and hence these approaches performed liberally.The approaches used the one-moment approximation (i.e., GB1, JH1, and HJ1) performed similarly for some models which coincides with the findings of the simulations study of Fai and Cornelious (1996) for the similarity of the performance of the GB1 and JH1 approaches, however, these approaches performed differently for some other models.The JH1 approach which used the adjusted estimator of the variance of the estimate of fixed effects suggested by Kackar and Harville (1984), and derived by using one-moment approximation, found usually to outperform all other approaches.The GB1 approach which uses the traditional estimator of the variance of the estimate of fixed effects found to be significantly liberal for some models, whereas the HJ1 approach which uses the adjustment of the estimator of the variance of the estimate of fixed effects that suggested by Harville and Jeske (1992) found to be excessively conservative test for some models.For models with the small design and small values of ,  the approaches usually did not perform adequately.The simulation findings are for block designs, and it is suggested to do more studies for the Satterthwaite based approaches for other models with different designs in order to evaluate their performance and ensure the adequacy of the JH1 approach, and to explore the potential of developing the approaches such that the performance becomes adequately even for models with small designs.According to the partition of the model, we have: and from expression (2.11) for i Ρ and ij Q we have , and , where ( ) ( ) ( ) θ , and (*) refers to quantities in the matrices that we don't need to compute.Now, since [ ]   L = 0 B from the model partition, we obtain Note that according to the model partition, and also, we have Substituting expression (3.7) in (3.6), we obtain 1 ( ) , and for the HJ approaches, (a) From expressions (3.2), (3.4c), and lemma 2, we have (3.8) Also, from expression (3.4c), and lemma 2, we have (3.9) Substituting expressions (3.8) and (3.9) for E(T) and Var(T) in expression (3.4a), we obtain

 
Hence the proof will be for the data sets produce 1 4.
one  then the denominator degrees of freedom using two-moment, as in expression (3.4a) are equal.Also, the scales, as in expression (3.4b) are equal, and hence the test statistics are equal.
(b) Direct.From part (a), and since the scale is one (Lemma 1), then all test statistics are equal.

Preparing Formulas for the Simulation Study
The REML estimates of the variance components 2  were computed by the iteration algorithm on p.252 of Searl et al. (2006):  (Kenward and Roger, 1997).To approximate the quantity q  in expression (3.3), 2 2( ) , ' q q q q d   g Wg we need to compute the estimate of the quantity q g which is the gradient of () Var  a βa with respect to , θ and q a is the q th row of . PL So, we need to find expressions for ( ) The above expression consists of four terms.The first term and fourth term are respectively: where M is as in expression (4.1).So, the third term can be expressed as matrix of the REML estimators of the variance components ˆθ which can be approximated by the inverse of the expected information matrix .W quantity in the general expressions for the denominator degrees of freedom (3.1b), and (3.4a)Without loss of generality, it suffices to show that the approximates of s  for the extensions of the JH approaches (e.g., the JH1 and JH2 approaches) are the same for 1for P an orthogonal matrix, D is a diagonal matrix of eigenvalues ' doesn't depend on the row s, and the approximation of s  are the same for 1, , .s = For the extensions of the GB approach, from expressions (3.8) and (3.1b), we obtain 1 Because of the model partition, as we have seen in part (amoment approximation is not applicable since we require 1 4.
and θ is the solution of the equation.The variance-covariance matrix of the REML estimators of the variance the inverse of the expected information matrix W is used to approximate W. The expected information matrix E I can be expressed as on p. 253 ofSearl et al (2006):

Table 1 .
The percentage of data sets for which the iteration algorithm converges to compute the REML estimates of the variance components under the null hypothesis In addition, and to compute the denominator degrees of freedom for each approach, we limited the simulation to data sets which met the conditions as expressed in the general formulas (3.1b) and (3.4a) in Section 3.That is, to compute , two 

Table 2 .
The percentage of generated data sets that for which the iteration algorithm converged, and met the conditions to compute the denominator degrees of freedom under the null hypothesis 

Table 3 .
The observed test levels under the null hypothesis for generated data sets and met the conditions to compute the denominator degrees of freedom  the left side is two  as in expression (3.10).
degrees of freedom, as in expression (3.1b) are equal as well.Note that the test statistic using one-moment is different from the test statistic using two-moment because of the scale value.However, since ,

Table 4 .
The average of computed denominator degrees of freedom for generate data sets of which the iteration algorithm to compute the REML estimates converged, and met the conditions to compute the degrees of freedom under the null hypothesis 

Table 5 .
The average of computed scales for generate data sets of which the iteration algorithm to compute the REML estimates converged, and met the conditions to compute the denominator degrees of freedom under the null hypothesis 