Predicting Student Performance in Statewide High-Stakes Tests for Middle School Mathematics Using the Results from Third Party Testing Instruments

In this study regression and neural networks based methods are used to predict statewide high-stakes test results for middle school mathematics using the scores obtained from third party tests throughout the school year. Such prediction is of utmost significance for school districts to live up to the state’s educational standards mandated by the No Child Left Behind Act by helping them take the necessary measures in a timely manner and avoid penalties such as decreased funding, salary cuts, job losses, the state taking over the school administration, etc. Although the predictive analyses were performed in the context of middle school mathematics, the suggested models can readily be applied to other grade levels and content areas as well.


Introduction
The No Child Left Behind Act (NCLB, 2001) supports standards-based education reform, based on the idea that setting high standards and establishing quantifiable goals are likely to improve individual student outcomes in education.NCLB requires states to develop basic skills assessments to be administered to all students in certain grades every year, if those states are to receive federal funding for schools.These yearly standardized tests are a major part of the research used to determine whether or not schools are living up to the standards that they are required to meet (Braden, 2002;Linn, 2000).If these standards are not met, the schools face decreased funding, salary cuts, job losses, and/or the state taking over a school district's administration, along with other punishments that contribute to the increased liability brought by the NCLB.
Conforming to the NCLB, the state of Arizona uses a standardized testing system entitled Arizona's Instrument to Measure Standards (AIMS) to track how well students are performing compared to state standards.Students in grades 3 through 8 and 10 take the AIMS in mathematics, reading and writing.The AIMS test is based on the Arizona state standards, which define what students should be learning each year.AIMS results show the level of proficiency a student demonstrates in each of the subject areas tested.For each student taking the AIMS test on a particular content area, the raw score, the scaled score, the placement rating and a pass/fail index is reported.
In order to keep track of the students' progress, many schools employ third party testing instruments developed in accordance with the state's educational standards and known to be highly correlated with the AIMS test.In this study, the scores obtained from such a third party testing system throughout the school year are used to predict the actual AIMS mathematics test results at grades 6, 7 and 8, using regression based methods, for an urban middle school district in Arizona.
Organization of this paper is as follows: First, the related theoretical framework and literature review will be given followed by the rationale and the research questions answered in this paper.In the methodology section, the regression and neural networks based algorithms, the prediction process and the participant profile will be elaborated.The analyses and results will separately display the findings while predicting the scaled score, placement rating and the pass/fail rating in the AIMS Mathematics test.Finally, the results will be discussed in detail and suggestions for future research will be given.

Theoretical Framework
Statewide tests, which are conducted near the end of the school year, are considered "high-stakes" tests, single assessments that have a predetermined cut score used to distinguish those who pass from those who fail, with direct consequences associated with passing and failing.For example, major decisions such as retaining students, terminating teachers, and removing funding, accreditation, or administrative control from schools are based on the outcomes of statewide tests.Given that the scores of all students in a school determine the school's success and schools' scores are used to determine state performance, there is substantial pressure on teachers to raise students' test scores.
Although these standardized tests are designed to measure overall academic achievement and are used to make high-stakes decisions, they typically provide too little information too late (McGlinchey & Hixson, 2004).It can be argued that decisions such as retention, which can result in detrimental consequences for students (Jimerson, 2001), should not be solely based on a one-shot assessment.Rather, students and teachers should be assessed and given performance feedback throughout the year, which can improve the probability of schools continuing effective practices and modifying or eliminating ineffective instructional procedures (Good, Simmons, & Kame'enui, 2001).Ensuring effective instruction is being provided during the school year not only prevents individual students from failing but also entire schools from performing poorly.Furthermore, identifying in advance, those students at the risk of not passing the statewide test can initiate taking the necessary measures in a timely manner (Fuchs et al., 2002) reducing a significant amount of pressure experienced by teachers and students as test dates approach.

Rationale and Research Questions
A third part testing system employed by an urban school district in Arizona was designed to monitor student progress by: a pretest administered at the beginning of the school year (Math Pretest); a posttest administered at the end of the school year (Math Posttest): and three additional tests administered at the end of each of the first, second and third quarters (Math1, Math2 and Math3).The system was developed for Arizona's AIMS test in mathematics at the middle school level.The strong positive correlation at each of the grade levels 6, 7 and 8, between the scores obtained from these tests and the actual AIMS test (Table 1) motivated the authors to go one step further in an attempt to predict the actual AIMS test results using the scores obtained from these five tests.Note.All correlations are significant at the 0.01 level.
Considering the decisions based on statewide test outcomes, having the ability to identify students who are unlikely to pass the test in advance is certainly considered highly desirable by school and district administrations.
In light of this critical issue, the authors seek to answer the following research questions in this paper: (1) Can the scaled score, the placement rating and the pass/fail rating in the AIMS Mathematics Test be predicted using the Math Pretest, Math1, Math2, Math3 and Math Posttest results obtained from a third party testing system?(2) How do the results of prediction differ based on the method selected for performing the prediction?

Methodology
In this study, regression and neural networks based models are used to predict the scaled score, pass/fail rating and the placement rating for the AIMS Mathematics Test using the Math Pretest, Math1, Math2, Math3 and Math Posttest scores.In the AIMS test students receive one of the four placement ratings in the AIMS test for each subject tested: 1-falls far below standards, 2-approaches the standards, 3-meets the standards, or 4-exceeds the standards; the placement ratings of 1 and 2 are considered as failing and the placement ratings of 3 and 4 are considered as passing.Due to the continuous nature of the scaled score, it will be predicted only by linear regression and neural networks based models, whereas the pass/fail and placement ratings will be predicted by all models which will be described later in this section.

Regression
In statistics, regression analysis includes techniques for modeling and analysis of several variables, when attention is focused on the relationship between a dependent variable and one or more independent variables.More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any of the independent variables is varied, while the other independent variables remain fixed.
Usually, regression analysis estimates the conditional expected value of the dependent variable given the independent variables (i.e. the mean (average) value of the dependent variable when independent variables are kept fixed).Regression analysis is widely used for estimation and prediction.It is also used to explore and comprehend the causal relationships that exist among the independent variables in relation to the dependent variable.Regression analysis is widely used for estimation and prediction (Kutner et al., 2005).

Regression Based Predictive Algorithms
Regression Model 1-The Multiple Linear Regression Model.Multiple linear regression model (Kent, 2001;Kutner et al., 2005) assumes that a linear relation exists between the dependent and the independent variables where the random errors are assumed to be independent and normally distributed random variables with zero mean and constant standard deviation, (i.e., assumptions of normality, linearity, and homogeneity of variance are met).
Regression Model 2-The Multinomial Logistic Regression Model.Multinomial logistic regression (Kent, 2001;Kutner et al., 2005;Peng, 2002) does not require any assumptions of normality, linearity, and homogeneity of variance for the independent variables.Because this regression model is less stringent it is often preferred to discriminant analysis when the data does not satisfy these assumptions.Suppose the dependent variable has M nominal (unordered) categories.One value of the dependent variable is chosen as the reference category and the probability of membership in each of the other categories is compared to the probability of membership in the reference category.For the dependent variable with M categories, this requires the calculation of M-1 equations, one for each category relative to the reference category, in order to describe the relationship between the dependent and the independent variables.Please note that multinomial logistic regression (MLR) model ignores the ordinal nature that might exist within the levels of the dependent variable and treats each category in a similar manner Regression Model 3-The Cumulative Odds (CO)-Ordinal Logistic Regression Model.The CO-ordinal regression model (Kent, 2001;Kutner et al., 2005;Peng, 2002) calculates the probability of being at or below category m of an ordinal dependent variable with M categories.Ordinal logistic regression (OR) is different from multinomial logistic regression in that it takes into account the ordinal nature inherent within the levels of the dependent variable, which might be useful in some cases.
Please note that due to the continuous nature of the AIMS Mathematics Scaled Score, Multinomial Logistic Regression and the CO-Ordinal Logistic Regression models cannot be used in the prediction process.On the other hand, the Multinomial Logistic Regression and The CO-Ordinal Logistic Regression models are likely to produce identical results when there are two categorical levels in the data (such as pass/fail rating).

Neural Networks
Neural-Networks (NN) are a powerful alternative to linear and nonlinear regression especially for predicting and forecasting but not widely used in the field of education.A neural-network is a mathematical model that imitates the structure and/or the functional aspects of biological neural networks.It is an interconnected group of artificial neurons, which processes information through a series of connections as a means of computing.The modern neural networks are generally used as tools for non-linear modeling of statistical data which may reveal the complex nonlinear relationships between the inputs and outputs better than nonlinear regression methods so as to the patterns in the data.
Neural networks have successfully been employed in a variety of applications that include system identification and control (vehicle and process control), quantum chemistry, game-theory and decision making processes, pattern recognition (radar systems, speech, face and object recognition), sequence recognition (gesture, handwritten text recognition), medical diagnoses, financial applications (automated trading systems), data mining (prediction, forecasting and modeling), visualization and e-mail spam filtering.
Despite the well-established theory of and various applications regarding neural networks, their capabilities are yet to be discovered and applied to analyses in the context of education.In this paper, we utilize two commonly used Neural-Networks models: the Multilayer Perceptron (MLP) Model and the Radial Basis Function (RBF) Model (Kutner et.al. 2005) to make predictions.

Neural Networks (NN) Based Predictive Algorithms
The MLP Model.An MLP neural network contains multiple layers of nodes in a directed graph, with each layer fully connected to the next one.Except for the input nodes, each node is a neuron (i.e. a processing element) with a nonlinear activation function.MLP utilizes a supervised learning technique called backpropagation for training the network and is a modification of the standard linear perceptron having the ability to capture the nonlinearities that may exist in the data.
The RBF Model.An RBF neural network uses radial basis functions as activation functions.Such a network is a linear combination of radial basis functions used in function approximation, time series prediction, and control.

Participants
The participants in this study are the 6th, 7th and 8th grade students from a southwestern urban school district in Phoenix Arizona.The population and ethnic composition of the participants are given in tables 2 and 3 where 94% of the student population are of Hispanic origin and 99% of the students qualified for free lunch as an indicator of their low SES.

Analyses and Results
The analyses performed investigated the correlation between the actual and predicted values as well as the percentage of predicted values that correctly matched the actual values.The results will be displayed by grade level.

Prediction of the AIMS Mathematics Scaled Score
Table 4 displays the results of predicting the AIMS Mathematics Scaled Score using multiple linear regression as well as the two neural networks based models.All correlations (R values) are significant at the 0.01 level and indicate strong positive correlation between the actual and predicted values for each of the grade levels.It can be seen that based on the correlational analyses, multiple linear regression and multilayer perceptron (MLP) models perform comparably and both models outperform the radial basis function (RBF) model.Note.All correlations are significant at the 0.01 level.
Figure 1 displays the mean squared errors associated with predicting the AIMS Mathematics Scaled Score using multiple linear regression as well as the two neural networks based models.It can be seen that the mean squared error is greater for the radial basis function model than the multiple linear regression and the multilayer perceptron models which performed comparably at all grade levels.Figure 3. Prediction rates as percentages for the AIMS Mathematics pass/fail rating using all regression models

Discussion of Results
A amount of time and taxpayer money is spent every year on the schooling system in the U.S. Therefore, students, parents and educators naturally desire to see favorable outcomes in the statewide high-stakes tests.Thus, an early detection system could be of immense assistance to students, parents and teachers to take the necessary measures when there is adequate time to prepare for the statewide high stakes tests so as to help improve the performance of students in such tests.Many schools employ third party testing instruments to monitor student progress throughout the school year.Using the results of such instruments that are already available, it is possible to predict student performance in the statewide high-stakes tests and take the necessary measures in a timely conduct.
In this paper we have shown that the scaled score, the placement rating and the pass/fail rating in the AIMS Mathematics Test can all be predicted using the Math Pretest, Math1, Math2, Math3 and Math Posttest results obtained from a third party testing system.It should be noted that the ultimate goal in making predictions is to use the data and predictive algorithms for the current school year as a means to predict how students will perform in the next school year.This is possible if and only if the content of the statewide high -stakes tests stay the same and the school district decides to use the same third party testing system throughout the next school year.
Prediction naturally becomes more accurate when more data is available.However, by changing the predictors used as they become available, it is possible to make a reasonable prediction throughout the school year.For instance, when the Math Pretest is available, a predictive model can be created to predict the AIMS results from the Pretest results only; or, when the Math Pretest and the Math1 test results are available, the AIMS results can be predicted from the Math Pretest and the Math1 test results; or, when the Math Pretest, the Math1 and the Math2 test results are available, the AIMS results can be predicted from the Math Pretest, the Math1 and the Math2 test results, and so on.Ultimately, the purpose is to pinpoint the students who are at the risk of failing the high -stakes test and introduce intervention programs targeting such students before it is too late.
In this study, we also showed that the results of prediction differ based on the method selected for performing the prediction and the nature of the predicted value.When, for instance, the AIMS scaled score, which is continuous in nature, is predicted, multiple linear regression and multilayer perceptron (MLP) models performed comparably and both models outperformed the radial basis function (RBF) model.On the other hand, when the predicted value is categorical in nature, Multinomial Logistic Regression (MLR) and The CO-Ordinal Logistic Regression (OR) outperformed Multiple Linear Regression.Please note that regardless of the nature of the predicted value or the method of prediction, we have observed strong positive, and statistically significant Pearson correlation values between the actual and predicted values of the statewide high stakes test scores in middle school mathematics.
In this study we also introduced Neural Networks as an alternative to regression based methods in order to predict statewide high-stakes test results for middle school mathematics using the scores obtained from third party testing instruments throughout the school year.The neural network models employed are the two commonly used ones, namely, the Multilayer Perceptron (MLP) and the Radial Basis Function (RBF) models.Simulations yielded strong, positive and statistically significant Pearson correlation values, between the actual and predicted values of the statewide high stakes test scores in middle school mathematics.
Please note that Neural Networks and regression based methods are expected to perform comparably when data satisfies the assumptions required by linear regression; this has indeed been the case depicted through the findings of this study.The assumptions required by linear regression are, normality, linearity, and homogeneity of variance which, in a perfect world, would be fulfilled by the nature of the data available and to be analyzed.In this study, data satisfied the assumptions of regression and as expected regression based methods performed comparably or slightly better.However it must be noted that Neural Networks do not necessitate any of the assumptions of linear regression to be met.As a matter of fact, data in general cannot be expected to meet the assumptions of normality, linearity, and homogeneity of variance.When it does not, i.e. when the assumptions of linear regression are not met, Neural Networks are expected to capture the nonlinear nature in the data in a much superior manner and thereby give rise to better results than those to be yielded by regression based algorithms.
Last but not the least, although the predictive analyses were performed in the context of middle school mathematics, the suggested regression models can be applied readily to other grade levels and content areas as well.The only requirement is that there should be strong positive correlation between the predicted values (such as AIMS scores) and the predictors used (such as the results of third party testing instruments) and that the student profile, instruments, and content of high stakes tests do not change dramatically from one school year to

Figure 1 .
Figure 1.Mean squared errors associated with the prediction of the AIMS Mathematics scaled score

Table 1 .
Correlations between the results from third party tests and the AIMS test in mathematics

Table 2 .
The participants in this study by school and grade level

Table 4 .
pertaining to the prediction of the AIMS Mathematics scaled score

Table 5 .
Correlations to the prediction of the AIMS Mathematics placement rating using all models

Table 6 .
Prediction rates as percentages for the AIMS Mathematics placement rating using all regression models Figure 2. Prediction rates as percentages for the AIMS Mathematics placement rating using all regression