Comparative Performance of Estimation Maximization Among Residual Estimators: A Structural Equation Modelling Perspective

As the concept of methodology has advanced, varied methods of estimating residuals have been developed including regression method, Bartlett’s method and Anderson-Rubin’s method. The study utilized estimation maximization approach together with other methods of estimating residuals under the structural equation model. The results showed that the strength of the existing methods in structural equation modelling are the weaknesses of the estimation maximization method, and vice versa. It was, therefore, found that from the comparative model fit information that the Bartlett’s based method gave better residual parameter estimates compared to the Regression based and the Anderson Rubin based methods. However, the estimation maximization method gave better residual parameter estimates than the other three existing methods; the Regression, Bartlett’s and the Anderson Rubin based methods.


Introduction
Structural equation models (SEM) have been successfully utilised in different research areas, including educational studies (Miranda & Russell, 2011;Saç kes, 2014), clinical psychology (Little, 2013;Löfholm et al., 2014), developmental psychology (Geiser et al., 2010), organizational studies (Binnewies et al., 2010;Kiersch & Byrne, 2015;Mahlke et al., 2016), and Multi-Trait Multi-Method (MTMM) analysis (Carretero-Dios et al., 2011). Observed variables in SEM research are most frequently not weighed on a continuous but rather on a discrete scale (i.e., categorical dependent variables), imposing additional challenges for the estimation process. Varied methods utilized in SEM to estimate estimators could be viewed based on Maximum Likelihood (ML) covariance method as well as component-based approach such as Partial Least Squares (PLS) and Generalized Structured Component Analysis (GSCA). The frequentist approach (such as ML, PLS, GSCA) and the Bayesian method such as Markov Chain Monte Carlo (MCMC) are other methods used in SEM.
Methods such as the covariance based were developed for modelling, evaluating as well as validating. On the other hand, the component-based methods were meant to achieve how to compute and predict (Tenenhaus, 2008). In simple sense, the main difference is that covariance based was designed to test models while the component-based methods were meant to provide succinct meaning to variances as well as predict (Hulland et al., 2010;Tenenhaus, 2008). Meanwhile the frequentist technique usually identifies values of parameters which are due to measured data whereas the Bayesian methods look at estimate obtained from a parameter which are theoretical depictions of relations that rely on measured data. Again, adding to the varied reasons and dimensions of ML, PLS, GSCA, as well as MCMC usually varies in terms of how robust they appear due to different data scenarios. This is attributable, but not limited, to the size of the sample, variables considered, misspecifying the model as well as the kind of measurement-manifest observation link.
Inference and deductions made from outcomes of modelling generally rely on the methods adopted and implemented in SEM. It remains though to point out whether hypothetical model normally presents correct information based on an application of a study or simulated study that has the capacity to shed light on the effect of misspecified parameter among methodologies of estimations (Asparouhov & Muthé n, 2010;Hwang, et al., 2010). Moreover, the degree upon which parameters could be affected as a result of misspecifying a given model relies on the architectural makeup of the sample utilized (Henseler, 2010;Tanaka, 1987) and overall complexity of the model (Tanaka, 1987). In SEM concept, manifest variables could either be modelled as the cause of those measured observations (Bollen & Lennox, 1991), or as a representation of the unified values of those measured observations (Curtis & Jackson, 1962).
It is a necessity in specifying SEM to mirror the right conceptual links; however, the estimation methodologies more often differ in terms of how they perform based on the kind of association described. Developing indicator models were often deemed unsuitable for classical maximum likelihood method but for recently (Chin, 1998;Ringle et al., 2009). In recent studies, ML has been found to sometimes over-estimate parameters when the sample size is small. Ringle et al. (2009) opined contrary to the aforementioned notion that PLS could possibly under-estimate parameters under contemplative models. Meanwhile, owing to the amenable nature of GSCA to contain either developing or contemplative items it is on record as an effective method, though the assertion widely relies on conceptually motivated anticipations of the methodology without evidence from experimental studies (Hwang & Takane, 2004).
Many estimation methodologies as well as modifications of these methods have been researched upon and utilized in SEM, bordering on ML, plus ML which are robust standard errors, GLS and WLS (Muthé n & Muthé n, 1998-2010. Meanwhile, it is a notable fact that these methodologies are not effective when subjected to certain assumptions. For instance, ML as well as WLS basically fails to give definite parameters where the sample is not large (Hoogland & Boomsma, 1998;Hu et al., 1992;Olsson et al., 2000). The higher the degree of precision to produce an estimate under MLR the more generally it is restricted to estimates of standard errors rather than coefficient of the structural or measurement pathway. GLS is to a large extent unaffected by model misspecifications that may lead to overwhelming fitness (Olsson et al., 1999). By reacting to these hindrances as well as related estimation methodologies, more estimation methods have been utilized in estimating under SEM, such as PLS (Wold, 1975), standard structured component modelling (Hwang & Takane, 2004;Kline, 2011), as well as MCMC (Hastings, 1970). According to Hoyle, (2000) the commonest method of estimating parameters in SEM is maximum-likelihood. Studies on ML are across wide range of fields as well as data conditions and their challenges are on record. One of the conditions under which ML performs abysmally is when the sample is not large (Kline, 2011).
Over the period, advances have been made in the methods used in SEM. Even more pronounced are the different methodologies that have been established such as LS, WLS, PLS, GSCA as well as MCMC approaches. However, it is imperative to underscore the fact that these different methods are yet to be comprehended as their performance in terms of using real life data is normally challenging to predict (Henseler, 2012;Hwang et al., 2010& Malhotra et al., 2010. Some estimation methods, besides what has been described earlier in this study, were developed for specific use in SEM whenever assumptions underpinning ML were violated, particularly robust ML and WLS (Henseler, 2012;Hwang et al., 2010;Malhotra et al., 2010). It is worth noting that it is almost impossible to compare and examine the performance of all the aforementioned different estimation methods in one study. Therefore, the current study will mainly focus on differential performance of the regression, Bartlett's, Anderson-Rubin and the EM methods to estimating residuals emanating from both measurement and manifest variables in SEM.

Residual Estimators
Three residual estimators, comprising regression, Bartlett's and the Anderson-Rubin methods, in SEM have been proposed in the past. This study therefore incorporates the EM method in SEM framework.

Regression Method
The weight, W, of a matrix which is commonly preferred relies on the study accomplished Thurstone (1935). He applied the least squares method in the derivation of W. This was later named as the regression method. Thus, W was considered so that Where at the ℎ of independent values, i z is a vector of ( + ) × 1 measured variables, i L is a vector of ( + ) × 1 measurement errors for i z , and are covariance matrices. Also, is the coefficient matrix relating to , represents the coefficient matrix relating to , and represents the coefficient matrix relating to . Again, Β and Γ are the coefficient matrices for the observed and construct variables respectively. Thus, Φ, Ψ, vv  , LL  and zz  are the covariance matrices which are assumed to be positive definite (non-singular) and symmetric. Bollen and Arminger (1991) and Sanchez et al, (2009) made use of matrices with weight to develop residual estimates.

Bartlett's Method
Again, a desired option for choosing a weighted matrix is known as Bartlett's method attributed to Bartlett (1937) who proved the utilization of the weighted matrix through the principles of weighted least squares.
For this method, W is picked so that it yields which can be minimized (McDonald and Burr, 1967). Thus, the weighted matrix is given by Bollen and Arminger (1991) utilized the estimator in Equation (6) above which was subsequently well established by Raykov and Penev (2001). Anderson and Rubin (1956) came up with the third alternative of W, which comes across at the less known choice, by extending the earlier work done by Bartlett's. Again, this method relies on the principles of weighted least squares by assuming orthogonality of the factor model. Thus, this method minimises Equation (6)  In contrast to the aforementioned residual estimators, the EM method gives ML estimates in terms of the covariance matrices as well as average vector, in the initial step, which is thereafter utilized for modelling subsequently. Estimation maximization is often accessible in packages that are commercialized as well as few free software (Schafer & Graham, 2002). The EM technique, which was officially introduced by Dempster et al. (1977), is a double phase iterative method. For the first phase, the E or Expectation phase, the estimations are done to obtain sufficient statistics by summing both the variables as well as their products. A number of equations are then utilized to compute every missing observation and its contribution to the sufficient statistics obtained before. Further, the second phase, the M or maximization, utilizes the E phase by estimating an updated covariance matrix through a standard formula as well as the sufficient statistics. Subsequently, the covariance matrix is moved forward to the succeeding E phase, and then the double phase practice would be redone till the difference amidst the covariance matrices contained in the adjacent iteration is supposed to be trivial.

Anderson-Rubin Method
The M phase of the EM method can easily be implemented across many areas as the method of computations is similar Likewise, the E phase of the EM method can be implemented easily across numerous areas as it relies on standard complete-data concept for means of conditional distributions.
(11) The accompanying covariance matrices of Equation (6) were and . Meanwhile it is considered that = [1] and = [1]. Subsequent to Equation (11) are the manifest models which indicate that every latent variable connotes three indicators. For every indicator is linked to one and only one factor. The measurement models have their corresponding covariance matrix for the measurement errors . Where = and are supposed to be uncorrelated.
Further, the study simulated N observations for ( ) so that (which is iid) is (0,1). Same was done for simulating N observations for ( ) such that (which is iid) is (0,1). Also, N observations were simulated for ( ) so that (which is iid) is (0,1). Subsequent to the aforementioned procedure, the study generated observations for 1 based on Equation (11). Similarly, observations were generated for each random variable and based on the measurement models. It worth noting that the process above utilized three dissimilar sample sizes, consisting of 300, 600, as well as 1200.

Results
Determined factor scores were obtained at the preliminary stage in order to comprehend the impact of the number of components for each manifest observation, type of manifest observation indicator association and the estimation technique for the parameters in a recursive SEM, mean absolute deviation of the standard error and the overall fitness of the equation. The measurement observations of curious errors, manifest component and estimation methods for reasons of breaking down and arranging the outcome of the parameters considered here was laid out and explained under the circumstances for all four categories of estimators being compared.  Table 1) under all the methods applied here were closer and therefore makes choice, a bit trivial, among the three existing estimators utilized in this study.
It was, however, observed that when the EM method was eventually applied to change the estimation technique utilized in the other three estimators it yielded higher estimates. The EM method produced a fitness indices (CFI=0.984, SRMR=0.020). Close examination of both 2 and RMSEA indicates a kind of not good fit, though the effect of CFI was pronounced but the SRMR really provided a better fitness as compared to the other three existing methods.
Thus, it means that these methods applied here provided goodness of fitness indices which were close to show an obvious choice. Against this backdrop the standard errors mirroring the amount of error in estimating the parameters and its equivalent goodness of fitness could be utilized to further comprehend the specific residual method estimation that produced a better parameter estimate. This therefore supports the choice of Bartlett's and EM as they both recorded minimal standard errors. Also, the comparative fitness of the EM method was compared to the other three existing methods. Together, the AIC, CAIC, and BIC, strongly preferred the EM method against the other three methods with some amount of differentials though.
More so, the estimates shown in Table 1 demonstrate that much as the parameters were very close for the various estimators, there was an element of robustness in Bartlett's and the EM method in particular.

Discussions
Much as Muthen (2010) utilized maximum likelihood method to analyze categorical data, the study here used quantitative data in looking at the asymptotic properties of structural equation models (SEM). Again, whereas Asparouhov and Muthé n (2010b), and Muthé n and Asparouhov (2012) applied Baysian estimation method to categorical data in assessing the asymptotic properties of SEM, the present study utilized the estimation maximization method which was applied to quantitative data. Depaoli & Clifton (2015) compared the Baysian and weighted least squares, mean and variance adjusted (WLSMV) and therefore posited that the latter was better than the former but the present study utilized the EM to fill the gap in terms of relying on categorical data to using quantitative data. Moreover, Hulland et al, (2010) and Tenenhaus 2008) argued that varied estimation methods could be used when covariances are considered such as frequentist approach (such as ML, PLS, GSCA) as well as the Bayesian method (such as MCMC) but the gap the current study filled was utilization of EM method which makes it possible to maximize the process of estimation in the presence of outliers. It was unclear and difficult in arriving at a definite decision, in terms of which residual estimator yielded better residual parameter estimates, based on the model fit indices since the strength of one residual estimator may be the weakness of the other. Therefore, SRMR was the key index the present study relied upon in picking the method with the best estimator as opposed to the choice of root mean square residual (RMS) index, used in other studies (Binnewies et al., 2010;Carretero-Dios et al., 2011;Kiersch &Byrne, 2015 andMahlke et al., 2016).

Conclusions
It was worth noting that the Bartlett's estimator was preferred to the Regression and Anderson Rubin estimators with differential values. This therefore indicates somewhat slightly heavy tail in the distribution without considering EM method yet. To a very large extent, it is worth noting that most of the fitness figures and the estimates under all the methods applied here were closer and therefore makes choice, a bit trivial, among the three existing estimators utilized in this study. The comparative fitness of the Bartlett's method was referred to the other two existing methods (i.e the Regression and Anderson Rubin). Together, the EM method was strongly preferred to the other three methods with some amount of differentials. It is therefore worth noting that this present study's contribution to knowledge is a demonstration of the fact that EM method could be a better residual estimator within the SEM concept compared to other existing methods.