A Comparison of Reading Measures as Indicators of Risk across Grade Levels

The purpose of this study was to assess relationships between early literacy skills of at-risk students assessed with curriculum-based and norm-referenced measures. Participants were students in first, second, and third grade. Trained examiners administered five curriculum-based reading passages and three norm-referenced reading subtests to each student. Strong relationships were evident between performance on the curriculum-based oral reading fluency assessments and the norm-referenced reading subtests of word attack, word identification and passage comprehension skills. Findings are discussed with regard to previous research and recommendations are provided for helping teachers and other professionals engaged in efforts to use assessment information in response to intervention as well as in targeting instructional support needed in reading.


Introduction
More than 25 years ago, researchers collaborating at the University of Minnesota simultaneously demonstrated that assessment practices for identifying students with learning disabilities were seriously flawed and that the use of curriculum-based rather than norm-referenced assessments held promise in efforts to improve their educational lives (Deno, 1985;Deno, Mirkin, & Chiang, 1982Shinn, 1989;1998;Ysseldyke, 2005;Ysseldyke & Algozzine, 1979;Ysseldyke, Thurlow, Graden, Wesson, Algozzine, & Deno, 1983).Position papers, scholarly reviews, and reports of research on curriculum-based measurement have consistently proposed its practical value and documented its technical adequacy in efforts to monitor progress in core academic skills (Christ, 2006;Christ & Silberglitt, 2007;Marston, 1989;Reschly, Busch, Betts, Deno, & Long, 2009;Wayman, Wallace, Wiley, Tichá, & Espin, 2007;Woodward & Brown, 2006).Recently, a continuing "…identity crisis or, perhaps more accurately, an identification crisis…" has driven researchers, practitioners, and policy-makers to "response to intervention and progress monitoring" in their efforts to improve assessment and identification practices related to meeting the needs of the very large number of students at risk of failure in America's schools, especially those likely to be classified with learning disabilities (Case, Speece, & Mollow, 2003, p. 557).
According to Vaughn and Fuchs (2003), efforts to improve identification of students with learning disabilities grew from a National Research Council study (Heller, Holtzman, & Messick, 1982) that focused on improving placement practices in special education.Response to intervention (RTI) initiatives have emerged as the promising practice being advocated and embraced as an improved approach for identifying children with learning disabilities (Fuchs & Fuchs, 2006).In this context, regular monitoring of progress provides evidence of the extent to which instructional environments are promoting learning and students are not learning as hallmarks of the approach (Vaughn & Fuchs, 2003).For a problem-solving approach like RTI to be effective, especially for at-risk students, practitioners need continuing evidence that supports the accuracy of predictive relationships between progress monitoring measures and outcome performance in key academic areas (Catts, Petscher, Schatschneider, Sittner Bridges, & Mendoza, 2009;Jenkins, Hudson, & Johnson, 2007;Johnson, Jenkins, & Petscher, 2010;Johnson, Jenkins, Petscher, & Catts, 2009;Reschly et al., 2009;Speece, Ritchey, Silverman, Schatschneider, Walker, & Andrusik, 2010).
Comprehensive analyses and reviews (cf.Marston, 1989;Reschly et al., 2009;Shinn, 1989;1998;Wayman et al., 2007) have supported the technical adequacy and practicality of curriculum-based measures (CBM).For example, Hosp and Fuchs (2005) investigated the extent to which relationships between CBM and traditional reading assessment scores (e.g., Woodcock Reading Mastery Test-Revised) differed as a function of grade level; they also explored how well cut scores served as benchmarks for mastery on summative measures.Their study included students (n = 310) in first through fourth grade in four schools.Roehrig, Petscher, Nettles, Hudson, & Torgesen, 2008;Wanzek, et al. (2010) evaluated the validity and usefulness of Dynamic Indicators of Basic Early Literacy Skills (DIBELS) Oral Reading Fluency (ORF) in predicting performance on the summative end-of-grade assessments; and, the strongest correlations were for third grade.Wayman et al. (2007) reviewed 160 documents addressing reading, writing, and math CBM in kindergarten (K) to Grade 12 and concluded that the data supported the value of the CBM reading-aloud measure for use by classroom teachers as an indicator of the performance and progress in reading for elementary school students, Grades 2 to 5.
Accepting that replication is a hallmark of science, we reasoned that continued study of relationships between CBM and academic skills was warranted especially in light of shifts in use from one of monitoring progress of students receiving special education to one more closely linked to decision-making with different educational consequences for students placed at risk (i.e., placement in special education).We followed the lead of others in documenting relationships at different grades and reporting the extent to which curriculum-based cut-off scores corresponded with benchmark performance on norm-referenced and other measures (cf.Hintze, Owen, Shapiro, & Daly, 2000;Hintze & Silberglitt, 2005;Hosp and Fuchs, 2005;McGlinchey & Hixson, 2004;Reschly et al., 2010;Schilling, Carlisle, Scott, & Zeng, 2007;Silberglitt, Burns, Madyun, & Lail, 2006;Silberglitt & Hintze, 2007;Wanzek, Roberts, Linan-Thompson, Vaughn, Woodruff, & Murray, 2010;Wayman et al., 2007).
We focused on descriptive and predictive analyses of use in high-stakes decisions that carry important social consequences for young students from diverse, at-risk backgrounds.Specifically, we addressed three research questions: 1) To what extent are scores for early literacy skills of at-risk students assessed with CBM statistically similar to those for norm-referenced measures across grades (descriptive validity)?
2) To what extent are scores for early literacy skills of at-risk students assessed with CBM statistically related to those for norm-referenced measures across grades (concurrent validity)?
3) To what extent are scores for early literacy skills of at-risk students on CBM statistically predictive of performance on norm-referenced measures across grades (predictive validity)?

Method
In this study, we investigated the state and usefulness of CBM indicators as benchmarks of performance on norm-referenced subtests of critical reading skills.We reasoned that such an investigation contributes to the existing literature in two important ways.First, most prior research comparing CBM with assessments of specific reading skills focused mostly on predictive relationships with comprehension.Our study expands previous work by exploring the value of CBM as predictors of specific reading skills, including, but not limited to, comprehension.Second, adding to what is known about curriculum-based and norm-referenced measurement provides a clear replication path in formative efforts to use assessment to improve instruction, such as in those defining the promises of RTI.

Participants
The participants were 475 first grade, 360 second grade, and 181 third grade students from seven public schools in an urban, integrated school system enrolling more than 120,000 students each year in the southeastern region of the United States.The ethnic backgrounds of students in the district were rich and diverse including African-American (43%), American Indian/Multi Racial (3%), Asian (4%), Caucasian (40%), and Hispanic (10%) groups.The children in this study were in seven elementary schools representative of those serving large numbers of students at risk for failure in the participating district, other states, and the nation.The percent receiving free or reduced lunch was also generally high for students participating in our study.Additional demographic information regarding these students' gender, ethnicity, social economic status represented by school lunch program, and special education status is reported in Table 1.None of these characteristics were statistically significantly different across the grade levels as indicated by chi-square tests.As a result, demographic variation across these variables was not considered in our subsequent analyses.

Procedures
One CBM set was collected three times during the school year as part of district-mandated formative and summative assessment practices.For research purposes, another CBM and norm-referenced subtests were administered at the end of the year.The specific assessments were identified in collaboration with district-level administrators interested in supporting the overall research effort while minimizing the time that participating students were being tested.All students were tested with the same battery of tests.

Curriculum-based Measures
Teachers trained by the school district administered and used the Oral Reading Fluency (ORF) subtest of Dynamic Indicators of Basic Early Literacy Skills (DIBELS) for progress monitoring benchmark assessments of oral reading proficiency (cf.Good & Kaminski, 2003).DIBELS ORF was used for K-2 benchmark, program monitoring, and end-of-year assessments within the district.It is a standardized, individually administered test of accuracy and fluency with connected text.The passages and procedures are designed to (a) identify children who may need additional instructional support, and (b) monitor progress toward instructional goals.The passages are calibrated for the goal level of reading for each grade level.Student performance is measured by having students read a passage aloud for 1 minute.Words omitted, substituted, and hesitations of more than three seconds are scored as errors.Words self-corrected within three seconds are scored as accurate.The number of correct words per minute from the passage is the oral reading fluency rate.In general, the DIBELS assessments have excellent technical adequacy (cf.Elliott, Lee, & Tollefson, 2001;Fuchs, Fuchs, & Compton, 2004;Good & Kaminski, 2002, 2003;Good, Kaminski, Simmons, & Kame'enui, 2001;Hintze, Ryan, & Stoner, 2003;Speece, Mills, Ritchey, & Hillman, 2003;Vadasy, Sanders, & Peyton, 2005).
At the end of the school year, teachers also administered two additional curriculum-based reading passages developed by Fuchs and Fuchs (1992).Each student was graded according to the average number of words read correctly in 1 minute across the two passages.Words read correctly were those pronounced accurately while reading the passage.Repetitions and self-corrections within 3 seconds were counted as correct.Errors were substitutions, omissions, and hesitations of more than three seconds.Previous studies have shown that this measure is a reliable and valid indicator of oral reading fluency in young children (Deno et al., 1982;Fuchs, Fuchs, & Maxwell, 1988;Hosp & Fuchs, 2005).

Norm-referenced Measures
Trained research assistants supervised by a school psychologist administered subtests of The Woodcock Reading Mastery Test-Revised (WRMT-R: Woodcock, 1998) at the end of the school year.The WRMT-R is an individually-administered norm-referenced measure of reading achievement and we used three of its six subtests in our study.The Word Attack subtest requires the student to read nonsense words (i.e., letter combinations that are not actual words) or low-frequency words from English.Each form of this subtest includes 45 items arranged in order of difficulty to measure students' ability to use phonic and structural analysis skills and knowledge.The Word Identification subtest requires students to identify isolated words in a text.Each form of this subtest has 108 items.The Passage Comprehension subtest measures students' ability to read a short passage (approximately 2 to 3 sentences) and identify the key word missing from the passage.To produce the correct response, the student needs to understand not only the sentence in which the word is missing, but also the entire passage.Each form of this subtest includes 68 items arranged in order of difficulty.We used decoding, word reading, and comprehension as labels for scores on these measures in our research.Grade-based standardized scores with a mean of 100 and standard deviation of 15 were used for each of these subtests which all have strong reliability and validity (Hosp & Fuchs, 2005;Woodcock, 1998).

Design and Data Analysis
Our study was a descriptive comparison of reading skills of children at risk for failure.First, we inspected the distribution of each measure but did not notice significantly skewed performances.As a result, we used the original test scores without corrective transformations.The average of the two CBM scores, the average of the three ORF scores, and the grade-based standardized scores for each of the three subtests of WRMT-R were used in data analyses.We reasoned that the mean was the best overall marker of performance on assessments provided on measures available on multiple administrations (i.e., ORF).Second, significant differences in the correlations between CBM and the three subtests of WRMT-R were tested with z-tests by transforming r values into z values across the grades and t-tests within each grade (Blalock, 1960).Finally, predictive discriminant analysis (PDA) was used to measure classification accuracy or how well CBM predicted the students' performance on WRMT-R subtests.These procedures have been used in previous research to identify the CBM cut-off scores to determine the benchmark associated with mastery versus non-mastery (cf.Hosp & Fuchs, 2005).

Results
We were interested in relationships between early literacy skills of at-risk students assessed with curriculum-based and norm-referenced measures across grades.We used descriptive and predictive statistics to address each of our research questions.

Descriptive Comparison of Curriculum-Based and Norm-Referenced Measures
Means and standard deviations for CBM and ORF are in Table 2. Average oral reading fluency scores for the participating students were at levels placing them at risk for academic failure (First Grade < 40, Second Grade < 90, Third Grade < 110) based on benchmarks recommended by Good and Kaminski (2003).They were also below levels for students needing intervention (i.e., 10 or more words below the 50 th percentile) based on norms and recommendations published by Hasbrouck and Tindal (2006) in an article in The Reading Teacher and in a technical report available online.Means and standard deviations for norm-referenced performance on WRMT-R word reading, decoding, and comprehension subtests (see Table 2) reflected below average and average performance which was similar (e.g., 96-102, 101-103, 94-95, respectively) across the three grade levels.

Relationships between Curriculum-Based and Norm-Referenced Measures
The correlation coefficients between CBM and ORF, word reading, decoding, and comprehension for each grade are also in Table 2.Because the relationship between CBM and ORF was high across all three grade levels, we used CBM as the only measure to correlate with the three subtests of WRMT-R.Results indicated that CBM was strongly related to all three subtests of WRMT-R and across all three grade levels.Statistically significant (p < .01)differences were noted with z-tests for the relationships between CBM and word reading as well as the relationships between CBM and comprehension across grade levels, but the relationship between CBM and decoding was not statistically significantly different between first and second grade students (z = 1.35, p > .01).Except for the relationship between CBM and decoding for the first and second grade students, the relationship between CBM and all three subtests of WRMT-R was lower as students moved up in grade levels.
We also found that the relationship between CBM and word reading within each grade level was significantly different from the relationship between CBM and decoding.Similar results were evident for the relationship between CBM and decoding and the relationship between CBM and comprehension.The relationship between CBM and word reading, however, was not statistically significantly different from the relationship between CBM and comprehension within each grade level.The differences indicated that the relationship between CBM and word reading was stronger than that between CBM and decoding.The relationship between CBM and comprehension was also stronger than that between CBM and decoding.Word reading and comprehension were comparable with respect to their relationships to CBM and these relationships are both stronger than the relationship between CBM and decoding.

Predictive Analysis for Curriculum-Based and Norm-Referenced Measures
Hit rate, sensitivity, and specificity of the PDA were displayed in Table 3. Overall, CBM separated students who mastered the skills from their counterparts who did not master the skills measured by WRMT-R at each grade level (i.e., greater numbers of true negatives and true positives).The numbers for false negatives were small, indicating that CBM predicted students who did not master the skills measured by WRMT-R very well because very few of them were predicted as having mastered the skills.In contrast, the numbers in false positives were relatively large, suggesting that CBM predicted less accurately students who mastered the skills measured by WRMT-R because quite a few of these students were predicted as not having mastered the skills.This result that CBM predicted better for students who did not master skills measured by WRMT-R was also indicated by larger sensitivity indices in comparison with specificity indices.Although not identical, these scores were similar to those reported by Hosp and Fuchs (2005) for each grade and each reading skill.Since diagnostic accuracy statistics are always influenced by the cut-scores used to define levels of risk, continued study will help to clarify the usefulness of CBM in predicting norm-referenced test performance.

Discussion
We followed similar procedures to those used in previous research and we obtained similar results although the populations represented by the samples were different.For example, Silberglitt et al. (2006) included children in upper grades and the Hosp and Fuchs' (2005) distributions of gender and ethnicity (Caucasian and African American) were somewhat balanced.The majority of our lower grades sample was males by gender and African American by ethnicity with Hispanic students representing the second largest group and Caucasian students as the third largest group in schools enrolling large numbers of students at risk for school failure.Moreover, our sample was predominantly children eligible for free or reduced price lunch program.While Hosp and Fuchs (2005) did not report the socioeconomic status of children participating in their study across grade levels, the percentage of students receiving free or reduced lunch in all but one of their schools (82%, 42%, 41%, and 34%) was lower than in our sample (79-86%).Students in Silberglitt et al. (2006) were predominantly "94.3% White, not of Hispanic Origin" and the "percentage of students meeting the federal definition of poverty…districts varied from 5.07 to 18.63" (p.529).These demographic differences should be considered when interpreting relationships of CBM and other measures of achievement.
Although our samples differed, we obtained similar results to Hosp and Fuchs (2005) with regard to statistically significant relationships between CBM and (a) word reading, (b) decoding, and (c) comprehension at each grade level.These outcomes suggest that CBM is a good indicator of students' performance on standardized tests in reading.This conclusion is supported by the results from PDA.Both studies reached high hit rate, sensitivity, and specificity in using CBM to predict students who mastered or did not master the reading skills measured by WRMT-R.Similar to Hosp and Fuchs, we found statistically significant differences between the relationship between CBM and word reading and the relationship between CBM and decoding for the first and third grade students.The finding that the relationship between CBM and word reading and the relationship between CBM and comprehension were not statistically significantly different in Grades 2 and 3 (cf.Hosp & Fuchs) was also supported in our study.
We also obtained some different results.For example, in Hosp and Fuchs' (2005) study, the relationship between CBM and decoding was higher in Grades 2 and 3 than in Grade 1, but we found it to be higher in Grades 1 and 2 than in Grade 3. The relationship between CBM and word reading was not significantly different from each other in Grades 1-3 in Hosp and Fuchs' study, but in our study it was highest in Grade 1, the second highest in Grade 2, and the lowest in Grade 3. A similarly different outcome was also noted for the relationship between CBM and comprehension.For each grade level, the relationship between CBM and decoding and the relationship between CBM and comprehension was not statistically significantly different in Hosp and Fuchs' study but was statistically significantly different in our study.

Implications for Future Research and Practice
Data are keys to successful efforts to use response to intervention (RTI) to improve the lives of children.One of the critical elements of RTI is the use of a technically adequate system of progress monitoring to inform decision-making (Deno et al., 2009;Fuchs, Mock, Morgan, & Young, 2003).More importantly, "[a] distinct feature of progress-monitoring methods is that educators evaluate student performance on material that represents or is closely associated with either the skills or general outcome that students should achieve by the end of the year" (Stecker, Lembke, & Foegen, 2008, p. 51).In our study, we obtained similar results to those reported in earlier research and extended them to students not previously studied.Clearly, continued research is warranted to support efforts to improve early literacy outcomes especially for students placed at risk by diverse ethnic, cultural, and language backgrounds.In this context, then, learning more about how well widely-used assessments, such as DIBELS ORF and other CBM, predict performance on other indicators of achievement, included norm-referenced measures often used to identify students in need of special education and end-of-grade tests used for high stakes decision making, is clearly important.Additionally, because many schools using curriculum-based progress-monitoring assessments serve a large number of students who are traditionally considered at risk for continued school failure, it is essential to study their predictive usefulness with different student groups.While there is evidence that oral reading fluency measures are unbiased predictors of reading comprehension for African American and White students (Hintze, Callahan, Matthews, Williams, & Tobin, 2002;Hixson & McGlinchey, 2004;Wanzek, et al., 2010), continued study of the usefulness of these measures with students with diverse ethnic, cultural, and language backgrounds is clearly warranted (Klein & Jimerson, 2005;Roehrig et al., 2010).Tiered continuous-improvement frameworks, such as RTI, direct schools and teachers to draw strong connections between student performance data and instructional decision making (Cummings, Atkins, Allison, & Cole, 2009;Mercier Smith, Fien, Basaraba, & Travers, 2009;Stuart & Rinaldi, 2009).Typically, in reading this means that all students participate in a comprehensive core program (Tier I) and receive strategically-focused (Tier II) or intensively-individualized (Tier III) instruction when evidence suggests inadequate progress is being made.The take-away message for schools and teachers from this study is that simple progress monitoring measures (e.g., one-minute oral reading fluency assessment) are excellent predictors of performance on individually-and group-administered standardized tests that provide a strong foundation for collaborative planning and instruction frameworks that guide RTI and other models focused on improving education.Working together in teams, teachers and other professionals should use progress monitoring measures to decide if individuals or groups of students are performing at levels likely to prevent progress.Then they should document the specific nature and level of the problem and the intervention to be directed to improving it, provide an intervention to correct the problem, peruse (i.e., examine or consider with focused attention and in detail) data to determine if the problem has improved, stayed the same, or worsened.The "decide-document-provide-peruse" decision making should occur regularly to support and ensure success for all students.

Conclusion
Reading is a critical tool skill for children to learn early in school.Reading has received and continues to receive attention at the national level as the focus of major initiatives designed to correct extant problems and ensure renewed progress.Efforts to enable school psychologists and other professionals to efficiently and effectively monitor children's early reading progress are essential to the success of practice-altering promises such as RTI.Our data support the validity of oral reading fluency measures as indicators of the progress and performance of elementary school students in first through third grade.Our outcomes are, in general, consistent with those reported by Hosp and Fuchs (2005) and also documented and discussed by others using different outcome measures (see Roehrig et al., 2007;Schilling et al., 2007;Wayman et al., 2007;Wanzek et al., 2010;Wiley & Deno, 2005).Different from Hosp and Fuchs's (2005) findings, our study suggested that the relationship between CBM and norm-references tests was lower as students moved up in grade levels.This difference warrants further investigation and future studies with students of higher grade levels.Teachers and school administrators should be cautious when applying this result, especially with older students.Moreover, we have added to the extant knowledge base by including younger students and those from diverse cultural and ethnic backgrounds that place them at risk for early and continued failure in school.In the end, our results reiterate that CBM is appropriate for formative and summative monitoring of early reading skill progress and outcome.Additionally, our research further illustrates that cut scores are useful in identifying students who require additional instruction in reading-a finding with broad value in efforts to improve educational outcomes for all with enhanced instruction grounded in improved progress monitoring and RTI.