The Development and Evaluation of an Achievement Test for Measuring the Efficacy of Task-Based Writing Activities to Enhance Iranian EFL Learners ’ Reading Comprehension

The present study examined the reliability of an achievement test to measure the efficacy of task-based writing activities to improve Iranian EFL learners’ reading comprehension at the intermediate level in a private language institute in Ilam, Iran, namely Alefba language institute. To achieve the goal, the techniques for evaluating reliability of criterion-referenced tests proposed by Brown and Hudson (2002) were employed. To calculate the reliability of the developed test, or using Brown and Hudson’s (2002) term “dependability”, two coefficients of agreement (ρo) and Kappa (к) were computed following the appropriate way Brown and Hudson have offered. The results demonstrate that the developed achievement test is a reliable measure to assess the efficacy of task-based writing activities to advance the selected participants’ reading comprehension skill. Implications of present findings and suggestions for further research are discussed as well.


Introduction
The focal objective of English language teaching is to provide students with the skills and knowledge they need to be successful in the world awaiting them.Thus, students need a good command of the four major skills namely, listening, speaking, reading, and writing.Generally, the field of English Language teaching (ELT) has witnessed dramatic changes throughout its journey towards the most efficient method for successful language teaching.Almost all of the current and traditional methods of language teaching are not free from drawbacks.Even communicative language teaching, as a method employed throughout the world, includes a number of significant shortcomings.In order to acquire the target language effectively, learners need to engage actively in processing the meanings of whatever they hear and read.Today, task-based teaching method is admired in EFL contexts.It is believed that it is a successful method that can substitute communicative language teaching and compensate for the weaknesses of it (Klapper, 2003).To adopt Willis's words (1998), a task can be considered as a purpose-oriented activity with a particular goal involving students in an attempt to achieve an outcome and finally a product that is appreciated by others.There are a variety of tasks that can motivate students to process the meaning and accomplish a desired goal purposefully since task-based instruction provides them a framework of structures, forms, and words (Willis, 1998).
Reading comprehension as the construction of meaning from text is the major focus of this study.It is generally considered one of the most central cognitive skills (Mason, 2004).Reading comprehension is fundamental to acquire knowledge in different subject areas in both elementary and secondary schools and is an essential prerequisite for prolong learning in adulthood (Alvermann & Earle, 2003).Available findings in the area of reading instruction reveal that emphasizing the traditional approaches and strategies has been the source of many problems.Task-based instruction has been a useful method for teaching different language skills including reading comprehension.Furthermore, one task-based technique recently used for enhancing students reading comprehension is task-based writing activities (Tilfarlioglu & Basaran, 2007).
Generally, language learning entails both conscious and unconscious acquisition of not only receptive skills such as listening and reading but also productive skills such as speaking and writing.To enable learners to do well in the source language, both receptive and productive skills have to be appropriately merged (Doff, 1998;Nunan, 1998;Woodward, 1991).Of suggested methods to merge all the skills for the improvement of language learning is activities coming from task-based writing (Tilfarlioglu & Basaran, 2007).Tilfarlioglu and Basaran (ibid.)state that such activities are done with the purpose of producing something, reaching a conclusion and creating a whole picture of something within a pre-set frame.The efficacy of all techniques and methods used over time has been investigated in many studies (Tilfarlioglu & Basaran, 2007;Behlol & Kaini, 2011;Nazeryan, Jahandar, & Khodabandehlou, 2013 to mention only a few).Clearly one possible way for assessing the efficacy of every method is developing tests.Accordingly, the primary purpose of this study is the development and evaluation of an achievement reading comprehension test measuring the effect of task-based writing activities on EFL reading comprehension ability.

Review of the Related Literature
During the last decades, there have been a plethora of studies and reports on enhancing reading comprehension.In one study, Astika (2004) states keeping engaged in the process of reading through task is effective in providing the students with the skill to tackle the reading problems encountered during their courses.Ataei (2000) conducted a large-scale study considering the effectiveness of the programs as implemented in the target settings or even the theoretical issues concerning EAP (English for Academic Purposes) instruction.In relation to reading ability and reading skills, he acknowledged the need for getting readers involved in extensive reading and highly recurrent tasks in college courses (Carrell & Carson, 1997, cited in Ataei, 2000, p. 125).Griva (2003) put forward reasons to maintain that a majority of the proficient readers in an academic context "appreciate the need to orientate themselves to the specific requirements of a reading, and that they need to participate interactively in the process of reading".Moreover, Spector-Cohen, Kirschner, and Wexler (2001) mentioned that "the nature of the tasks should be directly related to reading materials selected, so that the classroom experience can be utilized by the learner as a springboard for further task." In the most relevant study conducted by Hokmi (2005), the effects of teaching reading comprehension within the paradigm of task-based language teaching investigated.The results of the study suggest that reading for message influences the students reading ability positively.He concluded that "assigning students real-world tasks conveys the value of reading for message and influences the reading comprehension positively".In other words, if students are involved in the process of learning, they will find something that will be advantageous for later use and, consequently, better comprehension takes place.Willis and Willis (2007) support the previous arguments, asserting that "one of the prominent features of the task is involvement in real language use, in which there is an immediate problem to solve" and that this kind of language use reflects the type of language that learners would need in real-world situation.One technique of task-based instruction which has received attention during the last decade is task-based writing activities.Task-based writing activities have the advantage of enabling the learners to see their progress since their own hands shape the 'end-product'.They have the opportunity of reviewing the 'end-product' and doing necessary corrections on it whenever they want (Tilfarlioglu & Basaran, 2007).
Still, in another study, Tilfarlioglu and Basaran (2007) investigated the effectiveness of task-based writing activities in the promotion of reading comprehension ability.They conducted an experimental study to scrutinize implications of Task-based Learning.To this end, they selected two groups of 28 students.They administered a pre/post-test.Comparing the mean scores of both groups i.e. experimental and control, it was revealed that the treatment had positive impact upon reading comprehension.They concluded that results of the study provide a theoretical justification for the claims of the proponents of Task-based Learning.
Considering the process of test development, in a recent study, Samaie and Khosravian (2014) examined the validity of an achievement test as a measure for Iranian EFL learners' reading comprehension strategies.They selected several reading strategies, namely making connections, visualizing, inferring, and questioning the author to help learners in the process of comprehension.For the sake of practicality, they just focused on construct validity based on the Bachman and Palmer's (1996) framework.And for developing the test, they used Bachman and Palmer's (1996) model including three stages of design, operationalization, and administration.The results revealed that their developed achievement test worked well to assess the selected reading comprehension strategies for Iranian EFL learners.

Statement of the Problem
Reading failure brings about serious problems.Many students with poor reading skills suffer low self-esteem, break school rules, and so forth (Juel, 1996).Moreover, reading problems are as 60 percent pervasive as they are serious.Given the importance of reading failure, researchers wishing to make a difference in the lives of students and teachers must develop instructional methods that are both effective and practical for classroom use.Furthermore, for an effective source language acquirement, it seems essential for the learners to be vigorously involved in dealing with the meanings of what they are going to read or hear.As a result, the present study made an attempt to develop and evaluate the usefulness of an achievement test to see whether it could measure the effectiveness of task-based writing activities as an appropriate approach for enhancing reading comprehension ability.

Research Question
This study sought answer to the following research question: Is the developed achievement test of reading comprehension a reliable measure to assess the efficacy of task-based writing activities in improving Iranian EFL learners' reading comprehension?

Hypotheses
Based on the research question of the study, the following null hypothesis was tested: Ho: The developed achievement test of reading comprehension is not a reliable measure to assess the efficacy of task-based writing activities in improving Iranian EFL learners' reading comprehension.

Significance of the Study
Considering the students' future need to read and sometimes translate English books, journals, and magazines, it is clear that strong reading comprehension skill plays a central not only in the academic and professional success, but also in the productive social and civic life.This skill builds the capacity to learn independently, to absorb information on a variety of topics, to enjoy reading, and to experience literature more deeply.According to the importance of this skill like other skills, efficient teaching methodology consistent with changes in theories of language teaching should be applied for reading instruction.As stated earlier, Task-based instruction (TBI) is a prosperous method in the field of second/foreign language instruction.It is believed that it has the potential of becoming an alternative to the Communicative Approach (Klapper, 2003) or that at least it can be integrated into more traditional methods (Nunan, 1989;Pica, 2000).Assessing students' comprehension of texts is, also, of parallel significance for both teachers and learners.Quality of a test should be evaluated using available frameworks to ensure the usefulness of that test.Then, the purpose of the present study is twofold; developing an achievement test measuring the efficacy of the specified variable of the study and investigating the reliability of the test.

Method
The evidence to examine the reliability of the developed test was gathered through the differential-groups design, which involves selecting a sample with two mastery and non-mastery groups.

Participants
The study conducted in a private language institute in Ilam, Iran, namely Alefba language institute.The sample population selected from intermediate EFL students study English in the institute.Accordingly, 12 female students within the age range of 17 to 24 participated in the study.The rational for selecting the participants was the convenience sampling method which involves selecting those who are available to the researcher at the time.

Instrumentation
The study utilized a reading comprehension achievement test designed by the teacher-researcher.The prospective test consisted of four reading passages, each followed by five multiple-choice comprehension questions (See Appendix B).Reading passages were selected consistent with the texts that were practiced in the class with an appropriate level of difficulty.Topics of these texts were selected from those worked on during the term.In the following sections, the process of test development and test evaluation are presented.

Test Development
According to Bachman and Palmer (1996), test development is the entire process of creating and using a test, beginning with its initial conceptualization and design and ends up with one or more archived test and the results of their use.They proposed three stages for test development: design, operationalization, and administration.

Design Stage
In this stage, a detailed description of the components of the test design was prepared to insure that performance on the test task is compatible with language use and the test scores are maximally useful for their intended purpose.Bachman and Palmer (1996) divide this stage into six components: 1) A description of the purposes of the test; in this study the purpose of the test was to assess the effectiveness of task-based writing activities in promoting Iranian EFL learners' reading comprehension ability.
2) A description of the target language use domain (TLU) and task types; it was supposed that the results of this study could be generalized to the TLU domain, because all of students would probably encounter reading comprehension tasks in their daily life, e.g. in reading newspaper.
3) A description of the population of test takers for whom the test was intended; the test was prepared for Iranian EFL learners in an English language institute, they were all female with the 17-24 year age range.4) A definition of the construct(s) to be measured; the construct that was to be measured by the prospective test developed by the teacher-researcher was the effectiveness of task-based instruction particularly task-based writing activities in enhancing students' reading comprehension ability.5) A plan for evaluating the qualities of usefulness; evaluating the usefulness of the test was based on Bachman and Palmer's (1996) framework, in terms of construct validity, reliability, authenticity, interactiveness, impact, and practicality qualities.
6) A list of required and available resources and the plan for their allocation and management; the required resources in this test included human resources: test developer, administrator, and rater that would be the researchers themselves; material resources: two classes in a private EFL institute for testing the related materials; test material: related texts, paper, and pen; and the time resources: time to develop, administer, and score the test.

Operationalization Stage 
Setting: The constructed test administered in a session after the last session and in the learners' own classes to prevent any problems related to the unfamiliarity of the setting.


Rubric: Since the selected test takers' level was "intermediate", it was assumed that they are familiar with this type of testing.The tasks scored objectively by one of the researcher as the teacher. Input: The input was only the related printed texts.


Expected response: The students expected to answer the multiple-choice questions just by selecting the correct responses on their answer sheets.


Relationship between input and response: There was a direct relationship between the input and the expected response.The responses were formed directly from the information provided in the input i.e. the test takers read the input and answered the questions based on the information given in the text.

Test Administration Stage
Test administration stage involves giving the test to participating students, collecting information, and analyzing this information for the purposes of assessing the usefulness of the test and making the inferences or decisions.The developed test administered in a session after the last session and in the learners' own classes.The test included four parts each containing five items in multiple-choice format.All items possess equal points; each worth one point.The test time limit was forty minutes, two minutes for each item.Because of the nature of selected-response items, the test was scored objectively by one of the researchers as the teacher.

Evaluating the Usefulness of the Test
Bachman and Palmer's (1996) framework used for the purpose of investigating the usefulness of the test.This framework emphasizes six factors: construct validity, reliability which is referred to dependability for criterion-referenced test, authenticity, interactiveness, impact, and practicality.In this study, among the qualities of usefulness, reliability of the test explored which was of equal importance.

Reliability
When evaluating an instrument, the degree of which test scores are free from measurement errors and their consistency from one administration to another is so important.To investigate the reliability of a test, the statistical evidence is needed.This quality is also called dependability in the case of criterion-referenced tests (CRTs) like the one developed in this study.Brown and Hudson (2002) have introduced two approaches to measure the dependability of scores resulted from a CRT: "the threshold-loss agreement methods" and the "generalizability theory approaches".The first approach was applied in this study to investigate the dependability of scores.This approach involves the calculation of the agreement coefficient (ρo) and the kappa coefficient (к).In order for these coefficients to be analyzed, two administrations of the same test were needed.
The test was administered twice to the experimental group and the grades were recorded by the researcher.The intervening time between the two administrations was three weeks to make sure that it is long enough so that examinees do not remember the material, but short enough to make sure that the examinees have not experienced substantial changes.The test under investigation was administered twice to the pertinent participants in the same group and the results were analyzed.

Results
In the current study, a question was put forward regarding the usefulness of the criterion-referenced test developed by the teacher-researcher.To answer the question of study, Bachman and Palmer's (1996) model of usefulness was adopted.As aforementioned, among the six aspects of usefulness, namely reliability, construct validity, authenticity, impact, interactiveness, and practicality, only reliability (dependability) of the developed CRT was taken into account.In what follows, the agreement (ρo) and kappa coefficients (к) are analyzed.In so doing, two administrations of the test were required in a test-retest situation.The test was administered to 12 intermediate Iranian EFL learners with an intervention of three weeks.The results of the two administrations of the test are illustrated in Table 1.The examinees were classified as masters or non-masters on the two tests based on the cut-off score which was pre-defined by the institute curriculum as 15.There were four possible outcomes for the classification of each examinee as indicated in Table 2. Of the 12 examinees, 8 were classified in Category A, 1 in Category B, 1 in Category C, and 2 in Category D.
To evaluate a test, reliability in the case of this study, it is initially significant to investigate both agreement and Kappa coefficients.Brown (1996, cited in Brown & Hudson, 2002) has offered an appropriate way to compute these coefficients based on the methods developed by Subkoviak (1980) and Huyhnh (1976, 1978).The computations administered to scrutinize ρ o and к are presented in the following.
To compute the kappa coefficient, it is necessary to measure ρ chance which, according to Brown and Hudson (2002), applies to the proportion of those consistent classifications that are beyond chance.In the present study, this value was calculated as follows:

Discussion
Since task-based learning is gaining popularity among ELT/EFL researchers and in English teaching circles all over the world (Tilfarlioglu & Basaran, 2007), the present study concentrated on developing and evaluating an achievement test of reading comprehension to measure the efficacy of task-based writing activities in improving EFL learners' reading comprehension with regard to reliability.Task-based teaching was introduced in different counties across the world as part of a so-called target-oriented curriculum reform (Vural, 2000;Carless, 2003).Tilfarlioglu and Basaran (ibid.)state that the task-based structure and format of the international exams such as those of Cambridge University makes it a must to include task-based activities in the foreign language learning syllabus.Bearing this in mind and achieving the goal of the study, Bachman and Palmer's (1996) framework, the model of usefulness, as well as the techniques for evaluating reliability of CRTs proposed by Brown and Hudson (2002) was employed.To calculate the reliability of the developed test, or using Brown and Hudson's (2002) term 'dependability', two coefficients of agreement (ρ o ) and Kappa (к) were computed following the appropriate way Brown and Hudson have offered.The agreement coefficient (see preceding section) recommends that there is a fairly high consistency in the way examinees were classified.
Moreover, the Kappa coefficient was computed through Brown and Hudson' (2002) formulas.Nevertheless, to reach Kappa coefficient, it was required to compute ρ chance which, in accord with Brown and Hudson, refers to the proportion of those consistent classifications that are beyond chance.The value was computed indicating that 62 percent of the consistent classification could take place accidentally alone.Consequently, based on the attained agreement and Kappa coefficients, it can be proposed that the test enjoys a fairly high level of reliability.Therefore, the posed hypothesis is rejected.
Concerning the development of the test, akin to Samaie and Khosravian (2014), Bachman and Palmer's (1996) proposition encompassing three stages, namely design, operationalization, and administration employed.Bachman and Palmer (ibid.)maintain that test development is the entire process of creating and using a test, beginning with its initial conceptualization and design, and ends up with one or more archived test and the results of their use.All the stages are presented in detail in Section 3.

Conclusion
In order to investigate the research question, the developed achievement test is a reliable measure to assess the efficacy of task-based writing activities in enhancing EFL learners' reading comprehension, Bachman and Palmer' (1996) model consisting of six test qualities, namely reliability, construct validity, authenticity, interactiveness, impact, and practicality was employed.Of course, for the purpose of the study, only reliability was an account to be taken.The findings show that the developed test is a reliable measure of the construct it claims to be measuring, namely assessing the efficacy of task-based writing activities in enhancing EFL learners' reading comprehension.It was shown that the test enjoys a fairly high level of reliability.
Regarding the gained results, it is hoped that the developed and evaluated test in the present study would function well as an instrument for efficiently examining the efficacy of task-based writing activities in enhancing EFL learners' reading comprehension.However, there are limitations to be acknowledged and addressed regarding the present study.One limitation concerns the external validity or generalizability of the findings, since the findings were generated with a limited number of participants.There were only 12 students who participated in the study.Moreover, the time and budget constraints should be taken into account.To compensate the limitations, further research is required to be developed and evaluated tests of higher quality to assess the efficacy of task-based writing activities in enhancing EFL learners' reading comprehension.

Text Two
Some psychologists maintain that mental such as thinking are not performed in the brain alone, but that one's muscles also participate.It may be said that we think with our muscles in somewhat the same way that we listen to music with our bodies.
You surely are not surprised to be told that you usually listen to music not only with your ears but with your whole body.Few people can listen to music that is more or less familiar without moving their body or, more specifically, some part of their body.Often when one listens to a symphonic concert on the radio, he is tempted to direct the orchestra even though he knows there is a competent conductor on the job.
Strange as this behavior may be, there is a very good reason for it.One cannot derive all possible enjoyment from music unless he participates, so to speak, in its performance.The listener "feels" himself into the music with more or less pronounced motions of his body.
The muscles of the body actually participate in the mental process of thinking in the same way, but this participation is less obvious because it is less pronounced.
6. Some psychologists believe that mental activities are performed with the help of ……… Text Three We believe, nevertheless, that some two thousand million years ago a rare event took place, and that a second star, wandering blindly through space, happened to come within close distance of the sun.Just as the sun and moon raise tides on the earth, so this second star must have raised tides on the surface of the sun.But they would be very different from the puny tides which the small mass of the moon raises in our oceans; a huge tidal wave must have traveled over the surface of the sun, ultimately forming a mountain of prodigious (enormous, extraordinary) height, which would rise ever higher and higher as the cause of the disturbance came nearer and nearer.And, before the second star began to recede (retreat), its tidal pull had become so powerful that this mountain was torn to pieces and threw off small fragments of itself, much as the crest of a wave throws off spray.These small fragments have been circulating around their parent sun ever since.They are the planets, great and small, of which our earth is one.

Text Four
It is a fallacy that intellectual awareness of what is happening can always prevent a man from being indoctrinated.Once he becomes exhausted and suggestible, or the brain enters the paradoxical or ultra-paradoxical phases, insight can be disturbed; even the knowledge of what to expect may be of little help in warding off breakdown.And afterwards, he will rationalize the newly-implanted beliefs and offer his friends sincere and absurd explanations of why his attitude has changed so suddenly.Mental depressives are well aware, in their lucid periods, that as soon as a new attack occurs they will lose all rational insight into the foolishness of their depressive ideas.And political prisoners should equally realize that, after and induced failure in brain function, their normal judgment will be impaired or altogether lost; and that, as soon as they find themselves growing suggestible, they should make every effort to evade further stress.Above all, they must remember that anger can be as a potent a means of increasing suggestibility as fear and guilt.

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

3.
The main reasons that the least users choose to wear contact lenses is to ……… a) Protect their eyes in workplace b) Strengthen their eyesight c) Improve their appearances d) Adopt to misshaped corneas 4. A boxer wears contact lenses to ……… a) Improve their sight.b) Protect his/her eyes from probable harms.c) Improve their looks.d) Protect his/her eyes from flying particles.5. Contact lenses are ……. in an industry where flying particles may endanger the eyes.
to the text, while thinking the participation of the muscles of the body is not that much ……… alistening to a piece of music ……… is/are involved.a) Just the ears b) Ears and some parts of the brain c) The entire body d) Ears and some particular muscles 9.A person can mostly enjoy music if he/she ……… in its performance.to the music takes part in the performance with more or less pronounced motions of his/her body.
11.According to the passage, some two thousand million years ago, the second star ……… a) Was moving in a certain direction b) Was turning around the sun c) Was approaching the sun by chance d) Was distancing away from the sun 12.The current eyes in our oceans are different from those ones on the surface of the sun but the second star in that the present tides are ……… astrong pull of the second star ……… a) Separated the mountain from the sun entirely b) Was broken into pieces c) Ripped up the mountain into pieces d) Tore the sun into fragments 14.Our planet is one of the ……… of the ……… star.torn pieces of the mountain have been revolving around the ……… a 16.According to the passage, intellectual awareness can ……… a) Always prevent a man from being indoctrinated b) Never prevent indoctrination from penetrating a brain c) Not always stop indoctrination d) Only stop a mind from indoctrination 17.According to the text, insight does not seem to be distracted in ……… apolitical prisoner may come to an impaired or lost judgment due to ……… a) Sudden change of living conditions b) Special mental awareness c) Physical depressions d) An induced failure in brain function 19.Mental depressives lose their ……… as soon as a new attack happens.only anger but also fear and guilt can ……… political prisoners' suggestibility.

Table 1 .
The outcomes of two administrations of the same CRT

Table 2 .
Results for mastery/non-mastery based on the administrations