Model Equivalence in General Linear Models: Set-to-Zero, Sum-to-Zero Restrictions, and Extra Sum of Squares Method

The paper is drawn from the authors’ experience in teaching general and generalized linear fixed effects models at the university level. The steps followed include model specification, model estimation, and hypothesis testing in general linear model setting. Among these steps, estimation of model parameters such as the main effect least squares means and contrasts were among the most challenging for students. Since no unique solution exists, students are first exposed to the equivalence between two popular techniques that an over-parameterized model can be subjected to in order to obtain the parameter estimates. This is particularly important because existing software do not necessarily follow the same path to produce an Analysis of Variance (or Covariance) of the general, generalized linear fixed or mixed effects models. These steps are generally hidden from the users. It is therefore crucial for the students to understand the intermediary processes that ultimately produce the same results regardless of the software one uses. The equivalent techniques, the set-to-zero and sum-to-zero restrictions, used to obtain solution of the normal equations of the fixed effects model, are presented. The relationship between them is also presented and in the process, data analysis makes use of two important concepts: the generalized inverse and estimable function. The invariance property of estimable functions is also explained in details in addition to the extra sum of squares principle which is introduced to supplement the other concepts. To exemplify these ideas and put them in practice, a simple one-way treatment structure analysis of variance is performed.


Introduction
The study is drawn from the authors' experience in teaching general and generalized linear fixed effects models at the university/polytechnic levels.The steps followed include model specification, model estimation, and hypothesis testing in general linear model setting.Among these steps, estimation of model parameters was the most challenging area for students.Since no unique solution exists, students are first introduced to the equivalence between two popular techniques namely, over-parameterized model so as to obtain the parameter estimates.This is particularly important because existing software do not necessarily follow the same path to produce an Analysis of Variance (or Covariance) of the general, generalized linear fixed or mixed effects models.These steps are generally latent from the users.It is therefore crucial for learners to understand the intermediary processes that ultimately produce the same results regardless of the software one applies.The two equivalent techniques, namely, the setto-zero and sum-to-zero restrictions, for obtaining solution of the normal equations of the fixed effects model are outlined as follows: In matrix notation, the following equation defines linear regression model: The model is also called the model equation of the general linear model (Searle, 1987).The linearity of the model equation is in terms of the parameters of β.The X is known as model matrix (Kempthorne, 1952).In the situation where the data come from an experiment, the matrix X is referred to as the design matrix (Searle, 1987) whereas the vector y is of order N × 1.The matrix X which is sometimes called incidence matrix is a set of determinist fixed effects variables.The β is the unknown vector of parameters and ε are the residuals.The expected value of ε and its variance is Thus, the expected value of (1) becomes: The following formulation is quite often used for linear model The notation in (5) means that the response y is distributed with a mean equal to Xβ and a diagonal variancecovariance matrix of constant values σ 2 .In practice, the use of general linear model is motivated by the desire to determine of how and to what degree variation in the dependent is related to the relevant fixed effects (the random effects are not treated in this study).These effects are reflected in the right hand side of (1) and more specifically, in the set up of the X matrix implied by the treatment structure.

Estimation Methods
The normal equations and their solution require the use of the least squares method in estimating β.The least squares method minimize the sum of squares of the residuals, leading to the normal equation: The solution vector β O minimizes (6) and since the X matrix is rank deficient, (X ′ X ′ ) −1 does not exist.This is always true for over-parameterized or over-determined models.There are many least squares estimates β O satisfying Equation ( 7).For non-singularity, the estimation procedure uses a pseudo-inverse or generalized inverse (X ′ X) −1 , which satisfies (8): The solution β O which is an estimator of β is given by This solution vector is not unique.There exist a large number of possible solutions to the same normal equations.
To obtain β O , the concepts of sum-to-zero and set-to-zero restrictions is introduced (see Searle, 1987;Milliken & Johnson, 1994).

Ilustrative Example
It is relatively easier to explain the concepts of set-to-zero, sum-to-zero restrictions, and extra sum-of squares by using matrix notation of a simple one-way treatment example.The following example is taken from McLean (1989): A researcher is interested in determining whether or not there is any effect due to different rations (α i ) in the gain weight (y i j ) of a certain breed of animal over a period of six weeks.Animals were selected at random and placed in each treatment group.The model to describe this experiment is (1) or alternatively The following set of data is obtained from the experiment: Table 1.Effect of ration on weight gain ration 1 2 3 7 3 6 9 4 8 The design matrix X contains the intercept α 0 , and the individual identification of treatment effects α i .
The following illustrates model (1), The columns of X define respectively (i) the overall intercept, (ii) the indicator of the first ration, (iii) the indicator of the second ration type, and (iv) the indicator of the third ration type.The matrix X is a non-full column rank matrix since one column can be obtained by manipulating other columns.For example, the first column defining the intercept is equal to the sum of the other columns.Assume α i become the levels of the i th ration, then we can express α 0 the column of the overall intercept, as the sum of the others, i.e., α 0 = ∑ α i .
Earlier information leads to the resulting normal equations for not having a unique inverse, making it impossible to find a unique estimate vector β 0 of the true parameter β.
In this example: This inverse is computed using the SWEEP operation in S AS /I ML R (SAS, 2004).

Set-to-Zero Restrictions
The problem could be calculated by the generalized inverse G 2 of a modified X ′ X matrix, by setting the last column and last row to zero.Searle (1987) referred as 'constraints on the solution' to obtain LIN (linear independent) functions of the solutions elements rather than restrictions.The reduced 3x3 matrix has a unique inverse.Below is such an inverse in which the zeros are inserted back to the original position.
One can also set-to-zero the column before the last.
The resulting generalized inverse G 3 is The generalized inverse G 4 can be found to be the same way, thus, by setting the second column to zero.

Solution Vectors
From each one of the above generalized inverse, a different solution vector to the normal equations is obtained.The table below presents such solutions: Table 2. Solutions to the normal equations using each generalized (G i ) Elements of Solution (see Equation ( 9)) the solution 1 2 3 4 α 0 4 5 3 8

Sum-of-Zero Restrictions
As for the set-to-zero restrictions, the sum-to-zero restriction converts an over-parameterized model to a model in which elements of the parameter are estimable and the normal equations have a single solution.These restrictions are mostly useful in balanced data but are less attractive for cases in which some subclasses are empty (Searle, 1987).If all the columns of the matrix X * are independent of each other, then multiplying by its transpose X * ′ generates a non-singular matrix that has a unique inverse (X * ′ X * ) −1 .This may be achieved by letting one of the columns of α i be the negative sum of the others, hence the name sum to zero restrictions.
The created columns under this restriction have the values of 0, 1, or -1 (Cook & Weisberg, 1987;SAS, 1999 and other standard software).In general for α levels, there are α − 1 independent terms (α 1 . ..α a−1 ).They are In our example, we have three treatment levels (rations).The sum-to-zero restriction implies that one of the level in α j , say α 3 is equal to −α 1 − α 2 .In other words α 4 is equal to − ∑ α i (i = 1, 2).In this case, the usual ordinary least square (OLS) solution is uniquely defined by: which is the solution vector of β and contains [α 0 , α 1 , α 2 ] where α 0 is the deviation from the overall mean for ration 1, α 1 is the mean deviation for ration 1, and α 2 is the mean deviation for ration 2. The restriction allows direct estimation of the model parameters.This method is also referred to as the Least Squares Dummy Variables or LSDV3 in Hun (2009).Now from the above we have: where the g-inverse matrix (G 5 ) is The solution b is Alternatively the matrix can be set differently with the restriction The generalized inverse is and the solution is The last restriction is on α 1 setting it equal to −α 2 − α 3 .As before, the important matrices are The solution b 7 is

Estimable Functions
It is obvious by now that there are many solutions to the same question of interest.From the simple example of Table 1, only seven ( 7) of all the possible results are shown.These estimators should never be considered valid estimates of the treatment effects since they are all biased.This property (though not desired) must be overcome.The use of estimable functions is a very important step to do that.Estimable functions lead to estimates that are best linear unbiased estimates (BLUE).The estimates such as the means or differences between means are linear functions of the observations.They are unbiased, thus, their expected values are equal to the true parameters.Such estimates have, among other possible linear functions, the smallest variances (Scheffe, 1959;Searle, 1987).
We then apply the estimable functions to obtain BLUEs of various descriptive statistics of interest.For this introduction in the treatment of the general linear model (1), we limit ourselves to the estimation of the following: • Treatment means (or means of the three rations), • Various contrasts between means and respective variances • Sum of squares between treatments (i.e., rations) • Error sum of squares and the corresponding mean squares • Hypotheses testing of the type H 0 : , where K i is an estimable vector.In particular we exploit the invariance property of the estimable functions.Table 3 presents different vectors, denoted by K that, when multiplied by the various solution vectors (from both set-to-zero and sum-to-zero restrictions).They yield identical results regardless of the solution vector used.Each estimate is defined by (12): where β defines any of the seven (7) solution vectors.The check of 'estimability' of K ′ β is done by verifying whether the product the KHK ′ = K is true, where the matrix H = X(X ′ X ′ ) − X ′ .
Table 3. Estimable vectors K for the treatment means and contrasts For example, to compute the mean of ration 1 using solution 1 of Table 4, we have To compute the same mean using solution 3 of Table 4, we have

The invariance property of estimable functions is shown in the
Similarly the hypothesis H 0 : K ′ 5 β = 0 vs. H 1 : K ′ 5 β 0 is computed the same way: From the above analysis, it is concluded that the contrast in K ′ 5 β is not statistically different from 0. However the analysis shows that the contrast K ′ 4 β is statistically different from 0. The t obs for the contrast is 2.85 with a p-value equal to 0.033.

Conversion From Sum-Restrictions to Set-to-Zero
Solutions obtained from the sum-to-zero restrictions can be converted to set-to-zero restriction (constraints) by solving for the restricted element of the sum-to-restriction and put it back in its place to obtain the full solution.This new solution is in fact one of the solutions to the set-to-zero constraints.Therefore, in the restriction As shown in Table 3, it is easily verifiable that the different contrasts, means, sum-of-squares are all invariant to these new solution vectors.

Cell Means Model
A third alternative is closely related to the set-to-zero restriction above.In this case, the design matrix X is the same as before except that the column of intercept is not included.Once the indicators of the rations are created, then one needs to run the regression model through the origin with the response being y i, j and regressors α i .In this case, we have LIN in the columns and the model parameters are directly interpretable (SAS, 2004;Cook & Weisberg, 1987).The re-parameterization is analogous to the dummy variables known as Least Squares Dummy Variables model approach or LSDV2 estimation method (Hun, 2009).
The details are as follow: A unique inverse G 8 is obtained, resulting in the unique solution b 8 : Note that the elements of the solution are the means of the ration 1 i.e., K ′ 1 β, ration 2 i.e., K ′ 2 β, and ration 3 i.e., K ′ 3 β respectively (see Table 4).Therefore all the contrasts and the combination of estimable functions are also estimable (Searle 1971(Searle , 1987;;Milliken, 1971).

Special Cases of the General Linear Model
Two special cases deserve to be mentioned.The first is the simple linear model where the predictor is made of one continuous variable x i measured on each individual i = 1, ..., I leading to the model: The estimation of the parameters β 0 and β are straightforward since the columns of the model matrix are LIN.
The second special case is simply a reduction of the number of treatment to two.In the simple case, the twosample t-test assumes the response y i j ∼ N(µ i , σ 2 ).The model can use dummy variables to indicate group membership (Kiebel & Holmes, n.d.).This parameterization of the model without an intercept term is similar to the one explained above.

Principle of Conditional Error or Extra-Sum of Squares
The principle for conditional error was first introduced by Bose (1949) in the context of estimates of combinations of the parameters of a linear model (Milliken, 1971).The basic idea is to compute sum of squares corresponding to the hypothesis by first obtaining the sum of squares due to error for the restricted or reduced model minus the sum of squares due to error from the unrestricted model.The degrees of freedom (d f ) are computed by the difference in degrees-of-freedom of the corresponding sum of squares.Milliken and Johnson (1992) used this procedure in a strategy to determine the final form of an analysis of covariance model.By other authors, the procedure is known as the model comparison method (Draper & Smith, 1981), or the extra sum of squares principle (Draper and Smith, 1998).The principle is quite general (McDonald & Milliken, 1974) and find extensive use in econometrics (Greene, 2003;Baltagi, 2008;Wooldridge, 2003), in linear models (Searle, 1971(Searle, , 1987;;Milliken & Johnson, 1984;Gujarati, (1970) as well as in non-linear models (Milliken & Debruin, 1978).The ratio of the difference between the S S due to error of the restricted model (RS S O ) and the S S error due to the unrestricted model (RS S 1 ) to the error SS of the full model is: where p 1 is the rank of the full matrix X, p 0 is he rank of X 2 , the matrix of the reduced model, and J is the total number of observations.The numerator is distributed as a non-central chi-square random variable; and the denominator is distributed as a central chi-square.Both chi-square variables are independent (Milliken, 1971).
The full and reduced models can be explained in matrix form by partitioning in two parts as follow: Equation ( 15) defines the full model where the complete X matrix and consequently the β vector were partitioned into two components respectively.The interest is to test the hypothesis In the above example, let the matrix X be subdivided into two matrices: one (X 1 ) to represent the column of intercept and the other matrix (X 2 ) is for the three levels of ration (treatment).In this special case, the F obs in ( 14) is exactly the standard F-statistics of the usual one-way analysis of variance (ANOVA).where Q is the quadratic given by K To put in the LRT the frame of the principle conditional error, Equations ( 22) and ( 24) are reformulated as The test-statististic is chi square distributed with the number of degrees of freedom (d f ) equal the number of restrictions of the so-called parameter space (that is, the number of restrictions imposed by the null hypothesis).

Conclusion
In this study , we have shown four ways of how a general linear unbalanced model can be analyzed.These different methods of analysis lead to exactly the same estimates regardless of how the re-parameterization of the linear model was done and what generalized inverse used.Importantly, we showed that the results are not software dependent.This was made possible through the use of estimable functions.The study also outlined the concepts of least squares means and the partition of sum-of squares in more complex design of experiments and again extended the property of estimable functions to derive interaction terms in a model that doest not specify such interaction in its original formulation.All these concepts have been explained and illustrated using a practictical examples.

Table 4 below . Table 4 .
Estimates of K ′ β from set-to-zero constraints SS Ration = β ′ X ′ y − Ny 2 are virtually identical for all the β ′ s.

Table 5 .
Application of extra-sum of squares in one-way ANOVA

Table 7 .
Some choices of LRT formulas in linear models