A Diagnosis and Prognosis for Generalized Linear Mixed Models with Correlated Ordinal Responses : Power and Area under ROC Curve

The Generalized Linear Mixed Models (GLMMs) are designed to account for the dependency inherent in data and permit both the fixed effects in the linear predictors and the random effects in the models. However, the GLMMs’ implementations are still limited only in few applications due to the complexity of the model and its efficiency. In this article, we have investigated both the diagnosis and the prognosis of the ordinal-category logit GLMMs and the probit GLMMs for their power of the tests and the ability to predict the right categories in each condition of parameters and the sample size of the number of clusters and the cluster size. It is shown that the cumulative logit GLMM is superior to the probit GLMM. Furthermore, as the number of clusters and the cluster size are increased, the precision of parameter estimates through the power of the tests is much improved. However, the increasing of the intra-cluster correlation affects the AUC estimates, which probably mean that it gets difficulty to the prognosis of the right categories when there are many units in the same cluster, that is when the intra-cluster correlation occurs. But, such impact is only little for the power of the tests. Hence, in conclusion the GLMMs may well be recommended to use in applications since their power are very high and approaching to 1 for moderate and large number of clusters and the cluster size with the satisfactorily high AUC values..


Introduction
For analyses of multilevel categorical response data, observations are often observed either nested within clusters (e.g., hospital, household, clinics) or repeatedly assessed across time (e.g., longitudinal ordinal outcomes).For analyses of such data, mixed effects models for ordinal response categories have developed (Ezzet & Whitehead, 1991, Hedeker & Gibbons, 1994).In addition, the implementation of statistical models such as Generalized Linear Models (GLMs, Nelder and Wedderburn, 1972;McCullagh and Nelder, 1983) as well as Generalized Linear Mixed Models (GLMMs) is currently interested, especially in the fields of medical diagnosis and engineer (Waegeman, et al., 2006).It is often used with the assessing of fits and the predicting ability of the model through the performance of diagnostic and prognostic technology.However, the GLMMs' implementations are still limited only in few applications or even rare in practices, due to the complexity and the interpretation of the models.Simpler alternative models are recently studied such as logit and fuzzy logic models using ROC curve analysis for predicting multilateral trade credit risks (Tang & Chi, 2005).They analyze the real data from WorldScope database and show that logit model outperforms the fuzzy logic model.In addition, mixed-effects logistic regression models are described for analysis of daily life longitudinal ordinal outcomes, where observations are observed clustered within subjects (Hedeker, et al., 2009).They illustrate the application of the mixed effects models for the data from ecological momentary assessment (EMA) studies that is interest in subject heterogeneity, both between-and within-subjects.
In this article we aim to primarily study the GLMMs from simulation studies in two aspects, firstly the diagnosis tests of the model through the statistical power and secondly the prognosis ability mainly through the area under the ROC curves.
The GLMMs are further extended from GLMs that permit only the fixed effects in the linear predictors by including the random effects in the model.Following up to Searle's classic linear models (Searle, 1997(Searle, , 1971) ) and variance components of Searle, Casella, and McCulloch (1992), the modern perspective on GLMMs can be useful to provide a unified and accessible treatment of the newest statistical methods for analyzing correlated and non-normally distributed data (McCulloch and Searle, 2001).Traditional diagnosis tests include different kinds of information; for example, medical tests, signs, symptoms, statistical modeling.Then, all interpretations are commonly investigated and evaluated.Meanwhile, assessment of predictive accuracy is a critical aspect of evaluating and comparing models.Thus, the accuracy of diagnosis is essential especially in medical care and that the attributes of the diagnosis tests should be further studied and measured.Sensitivity, specificity, accuracy, and the Receiver Operating Characteristic (ROC) curve or the Area Under the ROC Curve (AUC) is often used for these attributes.The AUC is used as a standard performance measure in many fields where a binary classification system is needed.A meaning of the area under an ROC curve in terms of the result such that of a signal detection experiment does correspond to the probability of correctly identifying which of the two stimuli is "noise" and which is 'signal plus noise" (Hanley & McNeil, 1982).In medical imaging studies, the more economical rating method is generally used: images from diseased and non-diseased subjects are thoroughly mixed, then presented in this random order to a reader who is asked to rate each on a categorical or discrete ordinal scale ranging from definitely normal to definitely abnormal, that generally a five-categories scale is used.The AUC can be also viewed and obtained from a rating experiment that has the same meaning as it has when it is derived from the experiment in a more indirect way, i.e., by successively considering broader categories of abnormal (e.g., category 5 alone, categories 5 plus 4, or categories 5 plus 4, plus 3, compared with the rest of categories).
Diagnostic and prognostic models such as GLMs and GLMMs are also increasingly common in the biomedical literature.They can be composed of multiple prognostic factors related to each other and an outcome by a complex statistical model such as logistic regression, logit model, loglinear model, and another form of GLMs and GLMMs.These models are useful to physicians, clinicians and other users who feel comfortable with the complexity of the models and their ability to help them to prognosticate about a clinically important patient-related situation or event under studies (Harrell, et al., 1996;Van & Lee, 1990).
Therefore, in this article, we have empirically investigated the performing of the modeling the ordinal response variable, primarily through the mixed modeling, GLMMs.The random effects usually apply to a sample; for example, the model may treat observations from a given clinics as a cluster, and it has both a random effect in each clinic and a fixed effect in each treatment.The efficiency of GLMMs' parameter estimations is based on a suitable and efficient estimation approach for GLMMs called Adaptive Gaussian Quadrature (AGQ, Pinheiro & Bates, 1995).All corresponding simulated data are performed and carried out to compare for the accuracy and the efficacy of GLMMs considering on the statistical power and the AUC from 1,000 datasets in each sample size (number of clusters and cluster size) and the intra-cluster correlations.All simulation works are generated and processed using the authors' developed macro run with SAS ® version 9.1.

Models for fixed effects (GLMs)
Let N be the number of independent clusters or subjects, n i the number of observations for cluster i, i = 1,…, N and let ij Y denote the response j, j = 1,…, n i , whose values consist of k level, k = 1, 2,…, c.Furthermore, let 1 ( ,..., ) denote a column vector of p-covariates associated with the response j, in cluster i.We then assume a cumulative logit model in the form: Where k  denote the cutpoint for the response level k, 1 2 1 ...
   and β denote the fixed effects of the explanatory variables.Thus, The maximum likelihood function for the ordinal response for the ML approach is The estimates of β (including k  ) that are the results from Fisher scoring (Agresti, 2002) have the form Where, D i denotes the matrix of derivatives for the element i, In some situations the probit link function for GLMS are probably able to give the best power of the tests for every goodness-of-fit test statistic (Pongsapukdee & Sukgumphaphan, 2008).In classical general linear model the response variable values are expected to follow the normal distribution, and the link function is a simple identity function.For the developed GLMs the response variable follows the exponential family distribution models, and the most often used link functions include logit, probit, complementary-log-log, and also the log links.

Models for mixed and random effects (GLMMs)
Let i u denote a column vector of random effect values for cluster i.This is common to all observations in the cluster.Let ij z denote a column vector of their explanatory variables.The linear predictor for a cumulative logit GLMM has the form ( 2 The random effect i u is assumed to have a multivariate normal distribution N( 0 , u Σ ).The covariance matrix u Σ depends on unknown variance components and possibly also the correlation parameters.N  , from which, (1) and ( 2) it follow that the cumulative logit GLMM and the cumulative probit GLMM have respectively the forms: This has the similarly form of ordinary GLM with unobserved values { i u } of a particular covariate.The random effects also provide a mechanism for explanatory over dispersion in basic models not having those effects (Breslow and Clayton, 1993).Conditional on i u , the conditional likelihood function for cluster i have the form Without conditional on i u , the conditional likelihood function for cluster i (marginal) have the form ( , ) ( ; , ) ( | ; ) ( ; ) Then its log likelihood function is 1 ( , ) log log . The log likelihood function is evaluated numerically and maximized as a function of Newton-Rhaphson or Fisher scoring may be used iteratively, (Agresti, 2002).Instead of using this MLE, we use the Adaptive Gaussian Quadrature (AGQ) approach which was proposed by Pinheiro & Bates (1995) and can be used efficiently and conveniently in SAS package.The approximation is a finite weighted sum that evaluates the function at certain points in the form Where, t w denote weights, t d denote quadrature points that are tabulated.The approximation improves as t, the number of quadrature points (t) increases.
Besides this, there are several GLMMs' parameter-estimation procedures as followings: Penalized quasi-likelihood (PQL) which was proposed by Green (1987).His integral has closed form.This type of integral approximation is called Laplace approximation that evaluates the function at certain points around , where, ( , ) q β Y is similar to the quasi-log-likelihood of GLMs.Then, Monte Carlo method in combination with Newton-Rhaphson or Fisher scoring may be used iteratively.However, this method still need further study as for example in case when the response is binary, PQL tends to produce estimates with negative bias (Breslow and Lin, 1995).Generalized Estimating Equation (GEE) is a alternative approach that was originally proposed by Liang and Zeger (1986) for modeling univariate marginal distribution, such that of the binomial and Poisson.GEE is an alternative to ML estimation but does not completely specify the joint distribution, then it does not have a likelihood function.The quasi-likelihood parameter estimates are the solutions of quasi-score equations, where, i  denote the probability associated with i Y , D i denotes the matrix of derivatives for the element i, , and A is the working covariance matrix for i Y , whereas i A denotes working correlation matrix.However, its consistency also follows from general results for unbiased estimating functions (Liang and Zeger , 1995).For more details Lipsitz et al. ( 1994) also outlined a GEE approach for cumulative logit models with repeated ordinal responses.Another last procedure that may be used for GLMMs is the Marginal Quasi-Likelihood (MQL) method which was proposed by Goldstein (1991).The model is developed for only the marginal distribution that has the form 1 * ( ) ( )

Simulation and Statistical Analyses
From the models (3) & (4) in Section 2.2, the simulations have been conducted for the ordinal response (Y) with three categories or K=3, and for two explanatory variables, X 1 Bernoulli (0.5) and X 2  Bernoulli (0.5) under the fixed effects (X's) and the random effects (Y, U), where, Y is from the true model, U is from  , 2000).Each condition is processed for each model fitted and carried out under the AGQ parameter estimation procedure for at least 1,000 repeated simulations using the developed macro program run with the SAS version 9.1.Statistical analyses are assessing of the goodness of fits of models using the likelihood ratio statistics for the models (3) & (4).Then, the Area under ROC curve (AUC), corresponding to the plots of the sensitivity=  (that is implemented in some advanced statistical packages such the practical SAS ® implementations) and the power of the tests (power = 1 -type II error), corresponding to the proportion of the rejection of 0 H when 0 H is false in 1,000 simulations, are computed and analyzed for all conditions.

Results
The results in Table 1 show that the area under the ROC curve (AUC) and the power of the test are obtained for the parameters intra-cluster correlation (ICC = 0.05, 0.20, 0.40); the number of clusters (N = 50, 100, 250), and the cluster size (n i = 5, 7, 10), under the logit GLMM and the probit GLMM models, respectively.
For the AUC, it is found that the AUC increases as the ICC and N increase but the AUC does decrease when the cluster size, n increases.These AUC's outcomes are similar results for both the logit GLMM and the probit GLMM.However, all the AUC's results are most satisfied because they are approaching 1 as N and ICC are large.Moreover, the logit GLMM is a bit superior to the probit GLMM due to the bigger AUC values (Table 1 and Figures 1-3).
When considering on the power of the tests, it is shown that for moderate and big N and n, the power is very high and approximately approaches to its maximum value, 1, for almost all ICC, N and n, except only for small N and n.However, the power increases as the ICC decreases.In conclusion, all results show that the power increases as the N and n increases.Furthermore, using the logit GLMM are probably more suitable than using the probit GLMM due to the bigger AUC values and the higher power of the tests (Table 1 and Figures 4-6).

Discussion
We illustrate that for ordinal correlated responses with categorical explanatory variables, the logit GLMM is superior to the probit GLMM for analyses and the assessing of model fits, as the same way as the GLMs do.That is in the case when the explanatory variables are all categorical data, the logit model is preferred to both the probit model and the logistic regression models (Agresti, 2002).In addition, in the case when the explanatory variables are mixed categorical and continuous variables under a real set of data, the logit model is preferred to the fuzzy models (Tang & Chi, 2005).Moreover, in certain cases when the explanatory variables are all continuous data or are mixed continuous and categorical data, the probit model outperforms the logit models (Pongsapukdee & Sukgumpaphan, 2008).Furthermore, these GLMMs' results reveal that as the number of clusters and the cluster size are increased, the precision of parameter estimates through the power of the tests is much improved.Although, for increasing of the intra-cluster correlation, it does affect the AUC estimates, this probably means that when the intra-cluster correlation occurs it might get a bit more difficult to classify units for the prognosis of categories due to many correlated units in the same cluster.But, it is found that such impact is only in small amount for the power of the tests which is normally a well known criterion for assessing the efficiency of models.Hence, the logit GLMM model trends to be suitable model and it may well be recommended to use for the correlated ordinal response categories because not only the power of the tests are very high for both moderate and big sample sizes of the number of clusters and that of the cluster size (Table1 and Figures 4-6), but also its AUC values are quite high (Table1 and Figures 1-3).

Conclusion
The results reveal that as the number of clusters and the cluster size are increased, the precision of parameter estimates through the power of the tests is much improved.Both the power estimates and the magnitude of the AUC values are getting on satisfactorily high.Although, for increasing of the intra-cluster correlation, it does affect the AUC estimates of models parameters; however, such impact is only little for the power of the tests, which is normally a well known criterion for assessing the efficiency of models.Hence, considering the bigger AUC values and that the power of the tests are still very high for moderate and big sample sizes of both the number of clusters and the cluster size, then the logit GLMM model will be preferred to the probit GLMM for the correlated ordinal response categories.

Σ
or Monte Carlo in combination with

(
The likelihood parameter estimates are the solutions for GLMs which follow the equation 1

2 U
and the cumulative probit GLMM (4) are fitted for the data under each condition of the number of clusters (N = 50, 100, 250), the cluster size (n i = 5, 7, 10), and the intra-cluster correlation are chosen similarly to those of Spiess & Hamerle