Application of Discrete Regression Models for Analyzing K-8 Students Nonchronic Absenteeism in the United States

Absenteeism is a national crisis in the United States, and must be addressed adequately at the early stages or at its onset, to prevent consequential disaster and burden due to absenteeism. A pervasive and persuasive nonchronic absenteeism results in chronic absenteeism, and causes severe damage to students’ life, schools and societies. While a good number of articles address various issues relating to chronic absenteeism, no evidence of research exists investigating nonchronic absenteeism. The aim of this article is to investigate factors affecting nonchronic absenteeism in K-8 students in the United States by applying discrete regression models. Initially, we investigate K-8 students nonchronic absenteeism discrepancies due to socio-demographic and parental involvement factors via descriptive analysis and then employ Poisson and negative binomial regression models for exploring significant factors of K-8 nonchronic absenteeism. The findings of this study will be of great use to stakeholders in developing appropriate incentive measures for reducing nonchronic absenteeism early and thereby reducing chronic absenteeism.


Introduction
The state of being absent from schools for 18 days or more in a given academic year for any reason, including excused and unexcused absences or suspensions is referred to as a chronic absenteeism (U.S. Department of Education, 2016a). On the other hand, the state of being absent from schools for less than 18 days in a given academic year may be termed as the nonchronic absenteeism. As a matter of facts, one can, therefore, argue that the pervasive and persuasive form of nonchronic absenteeism is the chronic absenteeism. It appears that the chronic absenteeism is widespread in the United States, and is often termed as a national hidden crisis or unidentified problem (Bruner et al., 2011;Education Commission of the States, 2010; U.S. Department of Education, 2016b). About 10% of kindergarteners and first-graders in the United States are chronically absent (Chang & Romero, 2008). Chronic absenteeism has been highlighted in a great number of articles in the literature (Allen et al., 2018;Bauer et al., 2018;Bruner et al., 2011;Gottfried, 2014;Jordan, 2019). It is evident that the severity and rates of chronic absenteeism vary between states, school districts, and even schools within a district (Education Commission of the States, 2010).
Many articles provide evidence of chronic absenteeism at different levels of education, and how it may cause adverse impact on students' lives and academic performance (Chang et al., 2014;Gershenson et al., 2017;Gottfried, 2010Gottfried, , 2011Gottfried, , 2014Kiani et al., 2018;National Association of Elementary School Principals, 2016;Washington, 2017;Reid, 2005Reid, , 2008, including interventions for reducing absenteeism (Jordan, 2019;Robinson et al., 2018;Rogers & Feller, 2018). According to Gottfried (2014), the chronic absenteeism reduces math and reading achievement outcomes, reduces educational engagement, and decreases social engagement. Washington (2017) notes that the chronic absenteeism can cause third graders unable to master reading, sixth graders failing courses and ninth graders dropping out of high school. Chronic absenteeism can devastate learning and is an early warning sign of academic trouble and dropping out of schools (Chang et al., 2014). Chronic absenteeism has been associated with increased likelihood to engage in criminal behavior, sexual risk behaviors, abuse of illicit substances, and dropout of school entirely (Henry et. al, 2009). A brief review of causes, course and treatment in relation to chronic absenteeism appears in Kiani et al. (2018). Jordan (2019) introduces a sample of practical strategies and intervention recommendations to local policymakers and educators for reducing chronic absenteeism, including a substantial sample of the leading work and latest thinking on improving attendance.
interventions, dealing with the chronic absenteeism is a big challenge unless appropriate measures are taken at early stages. This proposition implies the need of studying the nonchronic absenteeism so as to provide a safeguard against the pervasive and persuasive form of the chronic absenteeism. However, as of now, no evidence of research seems to exist targeting nonchronic absenteeism in the United States at K-8 school levels. The proper understanding of the nonchronic absenteeism at K-8 schools at early stages and its onset would be of a great use in developing and implementing intervention strategies for reducing nonchronic absenteeism, which eventually would contribute in the reduction of chronic absenteeism. Therefore, the aim of this article is to investigate K-8 nonchronic absenteeism in the United States in detail by descriptive analyses and model-based analyses of a nationally representative survey database.

Methods
Using the National Household Education Surveys Program (2016), we evaluate the effect of various potential factors on nonchronic absenteeism of K-8 students in the United States. In this section, we provide a brief description of the data resource, potential factors and statistical data analysis methods in relation to K-8 students' nonchronic absenteeism in the United States.

Data Resource
This study consists of a sample of 8,188 students with nonchronic absenteeism in K-8 schools in the United States. The sample has been derived from the National Household Education Surveys (NHES) Program public use data, which is a significant data resource in relation to K-12 students in the United States. We utilize this data for exploring the significant factors of nonchronic absenteeism of K-8 students because we believe that the reduction of K-8 students nonchronic absenteeism contributes to the reduction of chronic absenteeism automatically. Below we describe response and related factors of interest in this study:

Response:
The response is a discrete random variable denoting the nonchronic absenteeism of K-8 students in the United States, taking values 0, 1, …, 17 for qualifying to be nonchronic absenteeism as per U.S. Department of Education (2016a).

Socio-demographic factors:
The following socio-demographic factors are being assessed in relation to nonchronic absenteeism of K-8 students:  Table 1 is defined and complied using total household members and total household income variables using the algorithm in McQuiggan and Megra (2017) and Hanson and Pugliese (2020).

Parental involvement types:
A set of parent and family involvement (PFI) factors are being assessed for possible relationship with K-8 students' nonchronic absenteeism. The PFI factors have the following meanings and values:

Test of Percent Discrepancies of Subjects Due to Factors
Initially, we test for the percent or proportion discrepancies of subjects due to underlying factors across groups by setting following null and alternative hypotheses: 0 : There is no percent discrepancy among subjects due to different groups of an underlying factor. H 1 : Percent discrepancy exists among subjects due to different groups of an underlying factor.
The Chi-Square test of homogeneity of proportion or percent has been employed for each of the potential factors in regard to the specified hypotheses, by implementing SAS software.

Test of Median Absenteeism Due to Factors
We also test for the K-8 students' median nonchronic absenteeism discrepancies due to potential factors such as gender, ethnicity, parental education, poverty and various parental involvement factors. For discrepancies of an underlying factor across groups, we perform nonparametric Wilcoxon rank sum test (for comparing medians in two groups) and Kruskal-Wallis test (for comparing medians in more than two groups) as appropriate by implementing SAS software, for testing the following null and alternative hypotheses: 0 : There is no difference in medians of K-8 students' nonchronic absenteeism across factor levels. H 1 : Differences do exist in medians of K-8 students' nonchronic absenteeism across factor levels.

Analysis of Factor Effects Using Regression Models
In order to evaluate if an underlying factor has any effect on K-8 students' nonchronic absenteeism, adjusted for other factors, we perform analyses using Poisson and Negative binomial regression models, along with the assessment of the goodness-of-fit of such models. Let (taking values 0,1,…,17) refer to the number of days of K-8 students' nonchronic absenteeism for individual .

Poisson Regression
The Poisson regression is one the most popular models for count data, where mean and variance are equal. It assumes that an individual response , given the vector of covariates , is independently distributed as ( ) with probability mass function where = ( | ) = exp( ′ ) = exp( 0 + 1 1 + ⋯ + −1 −1 ) is the mean incidence rate per unit of exposure time (i.e., a given academic year, in this article) and = ( 0 , 1 , ⋯ , −1 ) is a × 1 parameter vector with intercept 0 and − 1 regressor coefficients 1 , ⋯ , −1 . The parameters , = 0, 1, 2, ⋯ , − 1, of the Poisson regression model are estimated by the maximum likelihood estimation (MLE) method by maximizing log-likelihood function The goodness of fit of the model is evaluated using the chi-squared statistic given by , where ̂= exp( ′̂ ) = exp(̂0 +̂1 1 + ⋯ +̂− 1 −1 ) and the statistic Χ 2 has an approximately chi-square distributions withdegrees of freedom (DF). While the mean and variance of Poisson distribution are equal, theoretically, often with real-life data variance is greater than the mean, and the situation is said to have an . Details about the Poisson regression method, parameter estimation procedure and goodness of fit of the model are available in Cameron and Trivedi (2013), NCSS Statistical Software (2018a) and SAS/ETS® 15.1 User's Guide (2018a). In this article, we estimate parameters and goodness of fit statistics by implementing SAS proc genmod procedure (SAS/ETS® 15.1 User's Guide, 2018a). Given a continuous predictor , if ̂> 0, = 1, 2, ⋯ , − 1, then the mean incidence rate for a unit increase of value will be times the mean incidence rate when = 0. If is a dichotomous variable, and > 0, then the mean response of a given level is times the mean response at base level. However, if ̂< 0, it will have reverse interpretation.

Negative Binomial Regression
The negative binomial regression model is a generalized Poisson regression model with a random heterogeneity terms , independent of the vector of regressors , for individual . This leads to the fact that | , follows a Poisson distribution In practice, the distribution of is assumed to be a gamma ( 1 , ) distribution with the density ( ) = and mean ( ) = 1 to allow constant (mean) term in the model. Then, it follows that By re-arranging the terms to match a gamma distribution, it follows that By the property of gamma distribution, the integral of the equation is 1, and it follows that Or, equivalently, The distribution of | obtained by the mixture of Poisson and gamma distribution is popular due to the fact that it allows Poisson heterogeneity in modeling count or discrete data. Relating the mean ( | ) = to a set of − 1 predictor variables by the expression = exp( ′ ) = exp( 0 + 1 1 + ⋯ + −1 −1 ), the parameters and ,

Results and Discussions
In Table 3, we present frequency (f) and percent (%) of subjects due to different groups of underlying factors, and the value of Chi-Squared test statistic (Chisq) along with degrees of freedom (DF) and p-value for the significance of tests specified in 2.2.  Table 3, one may conclude that discrepancies of the percent of subjects due to factor labels are statistically significant for all underlying factors except for volunteering (Vol) and attending parent-teacher organization meeting (Aptm).
The results of test specified in 2.3 have been reported for each of the underlying factors in Table 4. The results include means, variances (Vars), standard error of means (SEMs) and medians (Meds) for K-8 students' nonchronic absenteeism for different groups of underlying factors. It also reports value of chi-squared test statistic for comparing median absenteeism in two groups via nonparametric Wilcoxon rank sum test (equivalent to Mann-Whitney U test) or more than two groups via Kruskal-Wallis (chi-squared) test, along with degrees of freedom (DF) and p-value for the significance of the test.  Table 4 provide evidence that K-8 students' median nonchronic absenteeism are statistically significant due to ethnicity, parental education (Peduc), poverty, attending school meeting (Asm), attending parent teacher organization meeting (Aptm), fundraising (Fundr) and meeting with guidance counselor (Mwgc).
Note that variances (Vars) of K-8 students' nonchronic absenteeism are higher than that of the means (Means) due to all factors, which suggest that negative binomial regression would be a better fit compared to Poisson regression for modeling K-8 students' nonchronic absenteeism given the predictors.
In Tables 5.1a and 5.1b, we report results of assessment of goodness of fit and label-specific significance of each factor due to Poisson regression model. In Tables 5.2a and 5.2b, we report results of assessment of goodness of fit and label-specific significance of each factor due to negative binomial regression model. In Table 6, we report results of comparative analyses for the assessment of goodness of fit in modeling K-8 students' nonchronic absenteeism due to Poisson and negative binomial regression models.   Table 5.1a, both follow chi-squared distribution with DF=8170, and as such the Value/DF is expected to be close to 1 for the Poisson regression model to fit nonchronic absenteeism well. Since Value/DF for Deviance is 2.76 (>1), and the Value/DF for Pearson Chi-Square is 2.78 (>1), both suggesting that the K-8 students' nonchronic absenteeism is subject to over-dispersion. The overdispersion of K-8 students' nonchronic absenteeism resulting from the assessment of Poisson regression model agrees with results reported in Table 4, where variance of nonchronic absenteeism exceeds the mean of nonchronic absenteeism for each factor labels, suggesting over dispersion. Given over dispersion, the estimates of standard errors (SEs) from Poisson regression model are incorrect, and may result in an invalid chi-square test or inference. The two common fixes due to over dispersion are either to use a negative binomial regression or correct the estimated standard errors by scaled criterion. In SAS, the scaling is achieved by specifying the descale or scale=d option. This option forces the Scaled Deviance to be 1 by forcing Value/DF to be 1 (dividing Value/DF by itself). It also adjusts standard errors by a factor (correction factor), which is the square root of Value/DF. In this study, the scaling parameter or correction factor is equal to sqrt(2.76)=1.661325 (SAS reported value is 1.662). After scaling, the corrected SEs is equal to pre-scaling SEs*sqrt (of pre-scaling Value/DF).
In Table 5.1b, the adjusted standard errors have been reported for each factor labels and thus the test and inference made via confidence interval estimates are approximately correct. Based on results of Poisson regression analysis reported in Table 5.1b, it appears that ethnicity, poverty, volunteering, attending school meeting (Asm), attending parent teacher organization meeting (Aptm), participation in fundraising (fundr) and meeting with guidance counselor (Mwgc) have statistically significant impact on K-8 students' nonchronic absenteeism. It also appears that all parental education (Peduc) labels are statistically significant for K-8 students' nonchronic absenteeism except for students with parental educational label below high school.
In order to compare performance of scaled Poisson regression model compared to the alternative approach using negative binomial regression model, the results of negative binomial regression analyses have been reported in Tables 5.2a and 5.2b. From the results reported in Table 5.2a, it is evident that the Value/DF for Deviance is 1.15 (close to 1) and the Value/DF for Pearson Chi-Square is 0.99 (close to 1) for negative binomial regression model. Therefore, it may be concluded that negative binomial regression fits to K-8 students' nonchronic absenteeism well. Indeed, it appears that p-value for the Pearson Chi-Square test is 0.78975, which is obtained by computing probability of the area to the right of the Chi-Square value of 8066.8 with DF=8170. This p-value provides statistically significant evidence that the negative binomial regression model fits absenteeism of K-8 students well.
Regarding factor effects, based on results of negative binomial regression analysis reported in Table 5.2b, it appears that ethnicity, poverty, volunteering, attending school meeting (Asm), attending parent teacher organization meeting (Aptm) and meeting with guidance counselor (Mwgc) have statistically significant impact on K-8 students' nonchronic absenteeism. It also appears that all parental education (Peduc) labels are statistically significant for K-8 students' absenteeism except for parents with educational label below high school. Based on the results of Tables 5.1b and 5.2b, it appears that both Poisson regression model with scaling due to over dispersion and negative binomial regression models provide identical conclusion in regarding the significance of factors affecting nonchronic absenteeism of K-8 students in the United States, except for Poisson model identifies fundraising as a significant factor. For significant factors, the p-values differ insignificantly between two models.
In order to decide which of the two regression models fits better to the K-8 students' nonchronic absenteeism, let us take a look at the comparative analyses of goodness-of-fit statistics due to Poisson and negative binomial regression models as appears in 6. As reported in Table 6, the Deviance/DF or Pearson Chi-square/DF is closer to 1 for negative binomial regression than is for Poisson regression, suggesting that negative binomial regression is a better fit compared to Poisson regression. Identical conclusion can be reached on the basis of AIC (Akaike information criteria) and BIC (Bayesian information criteria) in regarding the goodness of fit of negative binomial regression to K-8 students' nonchronic absenteeism.

Concluding Remarks
Many studies address chronic absenteeism of K-12 students in the United States. The chronic absenteeism has been termed as an unidentified problem or hidden crisis (Bruner et al., 2011; Education Commission of the States, 2010; U.S. Department of Education, 2016b). Many articles address adverse impact in students' life and society chronic absenteeism could have (Chang et al., 2014;Gershenson et al., 2017;Gottfried, 2014;Henry et al., 2009;Jordan, 2019;Kiani et al., 2018;National Association of Elementary School Principals, 2016;Washington, 2017), including causes (Jordan, 2019;Kiani et al., 2018;National Association of Elementary School Principals, 2016), and interventions (Allen et al., 2018;Robinson et al., 2018;Rogers et al, 2018) in reducing chronic absenteeism. While all these studies relate to chronic absenteeism, none of them addresses nonchronic absenteeism. We believe that paying attention to nonchronic absenteeism and of its reduction would ultimately contribute to the reduction of the chronic absenteeism. In particular, an adequate understanding of the factors affecting nonchronic absenteeism at early stages and its onset is important for setting up interventions and strategies targeting the reduction of nonchronic absenteeism. Given this proposition, in this study, we utilize a nationally representative survey database due to NHES to study K-8 students' nonchronic absenteeism in the United States.
As appears in this study, negative binomial regression fits nonchronic absenteeism of K-8 students in the United States better as compared to the Poisson regression, given the sets of predictors in this study. It is noted that ethnicity, parental education, poverty, volunteering, attending school meeting, attending parent teacher organization meeting and meeting with guidance counselor are significant factors affecting K-8 students' nonchronic absenteeism in the United States due to negative binomial regression analysis. The Poisson regression model suggests that fundraising is also a significant factor affecting K-8 students' nonchronic absenteeism, in addition to other factors found significant using negative binomial regression. While there has been no significant evidence of research addressing nonchronic absenteeism, the findings of this study are expected to contribute in providing a safeguard against the pervasive and persuasive form of the chronic absenteeism in the United States given the implementation of intervention strategies at early stages and its onset. Given the findings of this study, we believe that interventions addressing differences in factors such as ethnicity, poverty, parental education, encouraging and ensuring parental involvement in education effectively could have an important role in reducing nonchronic absenteeism.