Making Sense of Reading Scores with Reading Evaluation and Decoding System ( READS )

It is imperative that teachers need to assess their learners’ reading proficiency. Incidentally, most assessments developed and conducted by teachers merely discriminate who among their learners are performing better than the other. These assessments deplorably provide nothing more than norm reference data. Nonetheless, this is the only information that teachers have pertaining to the reading proficiency of their learners. A test score preferably should supply teachers with analytical information of what learners can or cannot do. The test scores should allow teachers to determine where the learners’ position in their reading development. It is long overdue for teachers and those in the education enterprise to take a closer look behind test scores and their learners’ precise abilities. As such, it was felt that a system that not only provides test scores but matches learners’ performance against a benchmark and divulge their precise reading abilities should be developed. This article traces the development of a Reading Evaluation and Decoding System (READS) comprising an Encoder, Analyser and Decoder components. A prototype system was first devised based on the Malaysian school curriculum. Next, a model encoder was developed and piloted on more than three thousand students. Their scores were then used to develop the Analyser. Finally, the Decoder was developed based on data gathered from the respondents. The three components of READS were then calibrated and refined through more tests for accuracy. The study found that READS is a reliable system to evaluate learners’ performance and decode their reading abilities.

Despite the equal importance attached to both categories of receptive (reading and listening) and productive skills (speaking and writing), reading is traditionally considered to be a vital requisite for the effective acquisition of knowledge and a major source of input for both writing and speaking.It appears that reading instruction and assessment has always been a prominent component in the Malaysian school curriculum.However, there is a major drawback in contemporary assessment systems as the grades or test scores obtained constitute the only source of information that the teachers have concerning the reading abilities of their learners.Contiguous to this shortcoming is the fact that the scores often provide ambiguous and superficial descriptions of reading capabilities.Currently, national standardised tools of assessments in Malaysia within the formal school system assign candidates a single grade irrespective of the fact that the assessment framework appraises a candidate's language proficiency in each of the 4 language skills.Candidates are assigned a generic grade ranging from A to F that encompasses all 4 skills.As these grades are transferred from raw scores, they imply that some learners perform better than others.However, the opaque nature of the grades and the ambiguity of the prevailing descriptors constrain the ability of pedagogues and learners alike on how to accurately interpret the actual reading ability of learners.This dichotomy in performance and ambiguity in outcomes is alluded to by Kubiszyn and Borich (2000:33) who noted that; "Too often teachers, parents, and others report that they know little or nothing about learners after testing than before" which to a certain extent nullifies, Temple, Ogle, Crawford & Freeppon's, (2008) contention that assessments are able to provide teachers and parents with feedback on learners' progress and identify the areas where the learners require further support.
Assessments are not merely tools for gathering data for scoring purposes but are vital conduits that inform and guide the teaching and learning process (Farr, 2003).In reiterating this feedback function, Weeden, Winter and Broadfoot (2002:18), opine that "the purpose of assessment is to improve standards, not merely to measure them".Johnson and Costello (2005), note that assessments must incorporate feedback mechanisms that can facilitate future improvements in performance since assessment has implications for what is learned.In a similar vein, Baker (ed.) (2004) and Baguley (2001), stress that performance measurement are empirical indicators of current performance and predictors of future outcomes.It can be surmised that assessment output provides a snapshot of not only prevailing ability but also latent potential which can plausibly be manifested in future outcomes through the application of facilitative learning initiatives and palliative pedagogical measures (Lenski, Ehlers-Zavala, Daniel and Sun-Irminger, 2006;Cobb, 2005;Tomlinson, 2009).Implicit in this comprehensive approach is the principle that, effective practices derived by informative assessments can improve instruction and learning as noted by Wilcox (2005).
It is obvious from the foregoing that within the reading domain, ESL teachers in Malaysia, as elsewhere, should be equipped with effective evaluative tools that can elicit accurate information with regard to their learners' actual reading abilities.This is especially pertinent to pedagogues as such information serves as a vital source of feedback through which teachers can continuously monitor progress within the reading classroom in order to ensure their consonance with their learners' abilities and potential (Allen, 2000).Perceptually, such an approach would also be in sync with the procedures of assessment as adopted by Routman (2003), wherein data is collated, evaluated and modifications are made to existing methodologies, materials and activities through such evaluation.
This paper purports to explicate in detail the conceptualisation and implementation of a Reading Evaluation and Decoding System (READS) designed to provide a comprehensive and accurate picture of a learner's actual capabilities within the reading classroom.

About READS
The purpose of this study is to establish a Reading Evaluation and Decoding System (READS) which functions to evaluate the learners' reading abilities, analyse the results and utilise them to obtain an in-depth and holistic understanding of an individual's reading capabilities.Zarrillo (2007) states that such processes of data collation and interpretation constitute assessment.Theoretically, the approach adopted in this study was also consanguineous with that of Routman's (2003), in which data is collected, and evaluated before the findings are utilised to make the necessary adjustments in teaching instruction, materials and activities so as to meet the needs and demands of learners.
READS comprises of three components as shown in Figure 1.Theoretically, READS allows a teacher to administer the standardised test, compile the scores and subject them to analysis.The analysis obtained can then be cross-referenced with the Reading Matrix to determine the band in which the reader belongs.Subsequently, teachers, learners and other relevant stakeholders in the learning enterprise can refer to the Descriptors of Reading Abilities guideline to decode the learners' ESL reading abilities.Through this procedure, students identified as 'below standard' or at 'academic warning' can then be provided early intervention assistance through the application of the appropriate reading methodology and materials (Wasburn-Moses, 2006).

Snipped of READS Developmental Processes
The development of READS basically encompassed three stages.The first stage involved the development of the Encoder or Test Instrument while the second stage portrayed the development of the Analyser or Reading Matrix.Finally, in the third stage, the Decoder or the Descriptors of Reading Abilities were established.

Getting Started: How to Develop the Encoder
The first stage involved the establishment of the Reading Performance Indicators before the development of a Prototype Test.The Prototype Test was then subjected to refinement and further modification subsequent to its validity and reliability being determined before it was used as the Test Instrument. Figure 2 illustrates how the Encoder or Test Instrument was developed.

Establishing the Reading Performance Indicators (RPI)
Essentially, the Reading Performance Indicators (RPI) refers to the detailed descriptions of the specific ESL reading skills that the learners need to master at different levels.The rationale for the development of the RPI was predicated on the premise that there was a need to have a standardised reference guide as a framework from which a reliable and valid Test Instrument could be conceptualised.The input for the RPI was mainly sourced from the relevant primary and secondary English Language Syllabuses, Barrett's Taxonomy of Reading Comprehension, textbooks and past year examination papers.
The Malaysian English Language Syllabuses (primary and secondary) were selected as one of the main sources of input in line with Rabbini's (2002) postulation that a syllabus acts "… as a guide for both teacher and learner by providing some goals to be attained".However, for the purpose of this study, only the reading objectives were focused upon.In the Malaysian English Language Syllabuses, the components on reading outline the reading skills learners need to acquire in order to achieve comprehension skills.This includes skills such as skimming, scanning, summarising, predicting, inferring and other relevant sub skills of reading (Ministry of Education Malaysia, 2003).
The skills and sub-skills suggested by the syllabuses were then matched with three of the major classifications of comprehension abilities as found in The Taxonomy of Reading Comprehension proposed by Barrett (cited in Alderson and Urquhart, 1984).The reference to this taxonomy was based on the fact that its conceptualisation of the cognitive and affective aspects of reading comprehension was deemed to be comprehensive, constructive, functional and eminently suitable for use in the development of a reading test.The three important comprehension reading sub-skills selected to develop the Test Instrument included: literal; reorganisation and; inferential comprehension.
The sub-skills of reading in the textbooks and past year examination papers were also carefully examined in the development of the RPI.During the developmental process, constant cross-referencing with the syllabus revealed a lot of similarities, since, on most occasions the syllabus is presented in the recommended textbooks (Harmer, 1991).Next, information extracted from the Malaysian English Language Syllabus and Barrett's Taxonomy of Reading Comprehension was re-categorised to develop the RPI.The results of the reclassification are illustrated in Table 1.

Developing the Prototype Test
The Reading Performance Indicators formed the foundation on which a Prototype Generic Standardised Reading Comprehension Test, also known as the Prototype Test was developed.
The development of the Prototype Test was predicated on the principle that reading standards are dynamic as reading proficiency invariably improves with proper instruction.Hence, learners at any educational level are postulated to be at different stages on the reading development continuum, i.e., learners come with diverse abilities, interests and attitudes (Ediger, 2009) with differing rates of progress.Each question consisted the assessment of three different types of reading skills namely; literal, reorganisation and inferential.

Establishing Content of Prototype Test
The Compendium Volume III (Ministry of Education Malaysia, 1991) stipulates that if a multiple-choice format is used in a test, there should be a minimum number of multiple-choice items with the recommended number being fifty.For this Prototype Test, we decided to use 60 multiple-choice questions to increase the reliability of the test.
The 60 multiple-choice reading comprehension questions comprised of 15 Year 6 level (Primary School Assessment Examination) questions which constituted 25% of the test, 30 Year 9 level (Lower Secondary Assessment) questions which constituted 50% of the test questions and 15 Year 11 level (Malaysian Certificate of Education) questions encompassing 25% of the test questions.The proportion of the questions was based on the distribution of the difficulty level with 25% being designated as easy, 50% as average and 25% as difficult (Mok, 2000).

Validating the Prototype Test
Basically, content validity is determined by expert judgement (Gay and Airasian, 2003).Five content experts comprising of three experienced examiners of Year 6, Year 9 and Year 11 English Language papers and two senior university lecturers (Teaching of English as a Second Language) analysed the suitability of the questions to test the content validity of the test instrument.The parameters of content validity examined included: type of texts, length, difficulty level and questions, vocabulary, the rubrics and distractors.The findings indicate that the content validity of the test was high and the questions were deemed to be appropriate.Another pilot study comprising 100 selected respondents from Year 7 to Year 11 of a specific school was conducted to determine the construct validity of the Prototype Test.

Ensuring the Reliability of Prototype Test
Gay and Airasian (2003), state that internal consistency reliability provides valuable information regarding item consistency in a single test.Since the Prototype Test is a multiple-choice test, it was decided that the utilisation of the Kuder-Richardson formulae particularly KR20 was more appropriate.This is because the Kuder-Richardson methods are more sensitive to sources of internal consistency (Oosterhof, 2001); a fact corroborated by Popham (2002) who noted that when a test consists of multiple-choice items, the most commonly used internal consistency procedure was the Kuder-Richardson method.On the other hand, the Coefficient Alpha, as developed by Cronbach was not used because it is more suitable for computing internal consistency of a set off test items which comprised short-answers or essay in which each of the items were scored using a range of points rather than marked as correct or incorrect.The KR20 of this test was found to be within the range of 0.78 to 0.85 for all educational levels which is consistent with the findings of Diederich, as cited in Oosterhof (2001:74), who proposes that "if a teacher's test requires a full class period to complete, its Kuder-Richardson reliability should be between 0.60 and 0.80".

Scoring Procedure for Prototype Test
A pilot study was conducted in two schools on 120 respondents (60 respondents from each school) in order to select the best scoring procedure.The test was conducted on learners with similar reading abilities using different scoring or marking schemes.This was designed to identify the most appropriate scoring procedure to be used for the Test Instrument.The test conducted in School A was marked using the 'traditional' marking scheme where one mark is awarded for every correct answer and zero for an incorrect answer while the test conducted in School B was marked using the 'partial-credit' marking scheme (OECD, 2006), where students who provide 'almost correct' answers receive partial credit.In this scheme, 3 marks were awarded for the most accurate answer, 2 marks for the almost accurate answer, 1 mark for the next most accurate answer and 0 for an incorrect answer.It was found that when the test was marked using the' partial-credit marking scheme, the KR20 was only 0.465 indicating a very low question reliability.This implied that the 'partial-credit' marking scheme was inappropriate for use in this test.Furthermore, the distractors were basically not constructed to be marked using the 'partial-credit' marking scheme.Ultimately, the 'traditional' marking scheme was adopted for use after preliminary grading yielded a KR20 score of 0.703.

Time Allocation for Prototype Test Administration
A pilot study was conducted to determine the time taken to complete the Prototype Test.90 selected respondents from Year 10 from a specific school comprising of 30 high performers, 30 average performers and 30 low performers sat for the Prototype Test.The time taken by the respondents to complete the Prototype Test is as recorded in Table 2. Based on the data, the amount of time allotted to complete the Test Instrument was premised on the average of the total time taken by the three groups surveyed i.e., 70 minutes.

The Encoder
After the necessary modifications were made to the Prototype Test, the Test Instrument or Encoder was used to measure the ESL reading performance of secondary school learners from Year 7 to Year 11.In order to benchmark the learners' reading abilities, the Test Instrument was administered and the learners' scores then matched to the Reading Matrix to identify the Performance Standards of the learners.The next section will explain how the Reading Matrix was developed.

How to Develop the Analyser
A Reading Matrix or Analyser refers to a chart which acts as a reading indicator to indicate the reading abilities of learners at a particular educational level, i.e. in this study, from Year 7 to Year 11 in Malaysian secondary schools.Basically, the Reading Matrix was developed through a combination of two perspectives on Progression through the levels as propounded by Horton (1990) and Green (2002).The fundamental criteria of Progression namely; levels of proficiency and age, which are vital components in determining progression, were co-opted into our model.
The purpose of this Reading Matrix is to gauge learners' reading abilities through a number of bands.These bands are indicators which are explicated upon by detailed descriptors of what the learners could or could not do.In order to obtain accurate and optimal information pertaining to their learners' performance, it is recommended that the Reading Matrix be used in conjunction with the Test Instrument and the Performance Standard (together with the Descriptors of Reading Abilities of Band 1 to Band 6).This will generate reliable data that can be further utilised as a basis to revise prevailing methodologies, refine materials as well as modify learning activities so as to meet actual learning needs and demands.

Components of the Reading Matrix
The Reading Matrix acts as a reading indicator to indicate whether the learners are below or above the relevant reading benchmarks (Refer to Figure 3).

The Performance Standards
The four levels of reading performance of Year 10 respondents were developed based on the Performance Bands and Reading Performance Indicators.The rationale being for this choice is based on Tomlinson's (2009) observation that, a cohort of children will not show the same reading levels with some reading at grade level while others a little above or below grade level, and still others, at levels far exceeding or well below the relevant benchmark.
In this study, the learners in Year 10 who are in Band 5 would be classified as 'meet standard'.If the Year 10 learner is in Band 6, he would be classified as 'above standard'.Learners in Year 10 who are in Band 4 would be classified as 'below standard' and learners who are in Band 1, Band 2 and Band 3 would be classified as 'academic warning'.
The following are the four levels of reading performance specifically developed to suit Malaysian secondary school learners as adapted from the Prairie State Achievement Examination (PSAE) standards (Illinois State Board of Education, 2004).The Reference Standards of the PSAE are more holistic in orientation in that it includes learners' knowledge and skills in comprehending a variety of literary and informational texts whereas the Malaysian Performance Standards focus only on informational texts.Table 3 illustrates the descriptors of the developed Performance Standards for Year 10 respondents.

Determining the Cut Scores for Performance Bands
After the Prototype Test was administered and the scores obtained, they were grouped into clusters in order for Performance Bands to be determined.As these Performance Bands should as closely as possible reflect the actual performance of those taking the Prototype Test, great care was taken to ensure that the cut score for each band was accurate and able to identify the actual reading capabilities of the learners.
According to the American National Association of State Board of Education (1999), it was implausible that a single 'best' method could exist given the fact that determining acceptable performance levels and rendering them as cut scores inevitably involved arbitrary decisions arising from subjective assessments.Consequently, no absolute, unequivocal cut scores can be plausibly yielded which in turn implies that there exists no single correct or true score (Wylie and Tannenbaum, 2006).To establish the cut scores of the Performance Bands, a comparison of the reading performance of the high, average and low performers of each educational level (Year 7 to Year 11) was conducted.It was determined that 6 bands be developed to represent the learners' reading abilities and the number of bands approximate the 6 years a learner would need to complete the normal public school examination cycle; Primary School Assessment Examination, Lower Secondary Assessment and Malaysian Certificate of Education.Figure 4 reveals that learners in Year 6 should be in Band 1, learners in Year 7 should be in Band 2, learners in Year 8 should be in Band 3, learners in Year 9 should be in Band 4, learners in Year 10 should be in Band 5 and learners in Year 11 should be in Band 6.This section explains how the cut scores for the Performance Bands (Band 1 to Band 6) were established.1430 respondents comprising high, average and low performers from Year 7 to Year 11 sat for the Prototype Test as shown in Table 4.
Essentially, the cut scores for Band 1 to Band 6 were developed based on the z-score.The combined mean score of Year 7 to Year 11 respondents was 29.4 and the standard deviation (sd) was 11.9.The mean and the relevant raw scores were then rounded to the nearest whole number.The scores for the various performance bands were then calculated based on z-score as shown in Table 5.Finally, the respondents were categorised into the 6 bands thus generated based on their reading performance in the Prototype Test.
As an illustration, a raw score of 41, as shown in Figure 5, would be assigned a z-score of +1.00 sd because it is one standard deviation above the mean.In contrast, a raw score of 6 would be given a z-score of -2.00 sd because it is two standard deviations below the mean.Figure 5 illustrates how the cut scores were developed based on the z-score.After determining the cut scores of the performance bands, we could then identify the specific reading abilities of the learners by referring to the Reading Matrix.

How to Develop the Decoder
The Descriptors of Reading Abilities for Band 1 to Band 6 function as the Decoder.These indicators were developed based on the respondents' reading performance in the Test.The development of the Descriptors of Reading Abilities involved the utilisation of North's 'Reading Scale for the Council of Europe Framework' as cited in Alderson (2000:132-134).
The Descriptors of Reading Abilities was primarily devised to describe a learner's reading performance and provide pedagogues and other relevant stakeholders with a holistic picture of what learners could or could not do in a certain band.Thus, it serves as a useful diagnostic tool for determining the learners' reading abilities.The performances of the learners in the written reading comprehension test were described in terms of their ability to answer literal comprehension, reorganisation and inferential comprehension questions and their scores were then represented within a Band 1 to Band 6 scoring range.Finally, the descriptor also provides the relevant stakeholders within the feedback loop to evaluate their own contributions or lack thereof in a more objective and empirical fashion.In short, information on the reading abilities obtained can provide teachers with powerful insights on instruction (Bishop, Reyes & Pflaum, 2006).
A quantitative analysis of respondents' test results was conducted to identify the respondents' reading abilities.Learners from Year 11 of 5 selected schools were identified and consequently the descriptors were developed.The reading abilities were interpreted based on the quantitative data and qualitative data.The qualitative data were gathered through a series of interviews.Two respondents from each band were selected to be queried as to what they were capable of.The researcher went over all the questions with the respondents over a number of days.Their responses were tabulated, analysed and interpreted.An extract of the main findings is appended in Appendix 1.This information was then combined with information obtained from the quantitative data.The combination of the two forms of data resulted in a more comprehensive Descriptors of Reading Abilities.Appendix 2 shows an extract of the newly formulated descriptors.

How to Use READS
The procedures for the application of READS are outlined as follows: Step 1: Conduct the Test.Learners are given 70 minutes to complete the Test.
Step 2: Use the test scores to identify the learners' reading abilities.The total score of each test is 60 marks.From the test scores, the learners are categorised into the various bands (Band 1 to Band 6) (Refer to Table 6).
Step 3: Identify the learners' reading abilities by using the Reading Matrix.Match the learners' reading performance against the Reading Matrix and then correlate them to the Performance Standards and Descriptors of Reading Abilities, ranging from Band 1 to Band 6.
Step 4: ESL teachers can refer to the Performance Standards to find out what learners from different performance levels of reading achievement could or could not do.Next, refer to the Descriptors of Reading Abilities of Band 1 to Band 6 to identify the learners' specific reading abilities.

How to Use the Reading Matrix
Table 7 and Table 8 explain how to use the Reading Matrix to identify a learner's reading ability.
For example, the reading ability of Amy, who is a Year 10 learner should correspond to Band 5 to "meet standard" but in this case, Amy is "above standard" by one band because she is in Band 6. (Refer to Performance Standards and Descriptors of Reading Abilities of Band 6).By referring to the Performance Standards, it is noted that learners who are "above standard" at their educational levels demonstrate advanced knowledge and skills in reading.Thus, the English language teacher should expose her to reading texts one level higher than her reading ability.This should be done to augment her future progress.

Conclusion
With this Reading Evaluation and Decoding System, ESL teachers can now identify where their learners are i.e. their current reading abilities.The test scores provide invaluable information that can inform teachers, school, parents and learners about the learners' specific reading abilities.At the same time, it can also help diagnose reading achievement problems and inform decision making in the classroom, the school, the district and the state.
explicitly stated along with one's own personal experience as a basis for conjecture and hypothesis) i. making inferences ii.drawing conclusions in simple texts i. Inferring supporting detail ii.Inferring main ideas iii.Inferring sequence iv.Inferring comparisons v. Inferring cause and effect relationships

••••
Learners able to achieve the learning outcomes related to the sub-skills of reading to be achieved by learners in Year 11.• Learners able to fulfil the requirements specified in the Malaysian English Language Syllabus and Barrett's Taxonomy of Reading Comprehension.MeetStandard Learners able to achieve the leaning outcomes related to the sub-skills of reading to be achieved by learners in Year 10.• Learners able to meet the requirements specified in the Malaysian English Language Syllabus and Barrett's Taxonomy of Reading Comprehension.BelowStandard Learners not able to achieve the learning outcomes related to the sub-skills of reading to be achieved by learners in Year 10.• Learners have gaps in reading at their educational level (Year 10), partially meeting the requirements specified in the Malaysian English Language Syllabus and Barrett's Taxonomy of Reading Comprehension.AcademicWarning Learners not able to achieve the leaning outcomes related to the sub-skills of reading in Year 7, Year 8 and Year 9.• Learners have major gaps in reading at their educational level (Year 10), not meeting the requirements specified in the Malaysian English Language Syllabus and Barrett's Taxonomy of Reading Comprehension

Table 1 .
Sub-skills of Reading

Table 2 .
Pilot Test to Find Out Time Taken to Complete Test for Year 10 Learners

Table 3 .
Performance Standards for Year 10 Respondents

Table 4 .
Number of Respondents for Each Educational Level

Table 5 .
Establishing Scores for Bands

Table 6 .
Performance Bands and the Scores

Table 7 .
Amy's Reading Performance Levels refer to the educational levels of the learners, i.e.Year 7 to Year 11 in a Malaysian secondary school ii.Bands refer to the performance indicators which indicate the learners' reading abilities.