The Relationship between Peer Assessment and the Cognition Hypothesis

It is believed that peer assessment equips learners with a skill set withheld from them by teacher assessments that enhances language learning. However, the benefits of peer assessment are limited to how well learners can conduct peer assessment tasks. Therefore, improving the efficacy of peer assessment is essential. One way to increase the consistency of peer assessment is to increase learner attention during the assessment task. The Cognition Hypothesis states that L2 learners engaged in complex tasks pay attention to more complex linguistic structures; as a result, learning increases (Robinson, 2001a, 2001b, 2005). The purpose of this study was to investigate whether complex tasks, as outlined by the Cognition Hypothesis, improve the accuracy of peer assessment. Thirty female EFL learners conducted three speaking tasks. Each task had a different level of complexity, and participants were assessed by their peers using a rating scale. The results indicated that the absolute mean deviations for the items on the rating scale decreased as task complexity increased. In other words, the findings showed that as task complexity increased, there was more agreement among the assessors. This indicatedthat peer assessment wasmore accurate and consistent for more complex tasks.


Introduction
In any teaching environment, assessment is critical.In the last two decades, there have been conceptual shifts in the practice of assessment.These shifts have moved toward the involvement of the learner in assessment practices (Boud, 1995).Peer assessment, in which learners assess the work of other learners, is a form of learning that allows learners to provide feedback on each other's work.
Numerous studies have supported the claim that peer assessment is beneficial for learning (see Ballantyne, Hughes, & Mylonas, 2002;Boud, 1990).Additional studies have suggested that peer assessment promotes reflective thinking through observation of other learners' performances, which in turn allows learners to understand the requirements of a classroom task (see Falchikov, 1986;Topping, 1998).Birdsong and Sharplin(1986) have shown that peer assessment contributes to higher order reasoning.Peer assessment could also promote self-learning (Oldfield, Mark, & Macalpine, 1995) and deep learning (Entwhistle, 1987;Gibbs, 1992).Kwan and Leung (1996) have suggested that peer assessment encourages cooperative group work.If students are involved in individual assessment and instruction tasks, satisfaction with the class increases (Sluijsmans, Brand-Gruwel, & van Merriënboer, 2002).In sum, there is little evidence that peer assessment elicits negative reactions in the learning process (see Cheng & Warren, 1997 for negative reaction).
The benefits of peer assessment in the EFL/ESL context is limited to the extent to which learners could implement peer assessment practices.One method of increasing peer assessment consistency is to train the learners.In a foreign language context, studies (Berg, 1999;Stanley, 1992) have shown that training learners in conducting peer assessment increases learning efficacy.However, McGroarty and Zhu (1997) found that training learners for peer assessment does not impact learners' final grades.
Increasing learner focus and attention during peer assessment could be another way to improve peer assessment practices.The Cognition Hypothesis states that requiring L2 learners to engage in complex tasks facilitates L2 learning by promoting interaction, focus on form, and attention to more complex linguistic structures (Robinson, 2001a(Robinson, , 2001b(Robinson, , 2005)).If The Relationship between Peer Assessment and the Cognition Hypothesis Mona Khabiri, Soroush Sabbaghan and Sahar Sabbaghan complex tasks that increase attention and focus facilitate learning, could they also increase attention and focus in peer assessment?Robinson (2001aRobinson ( , 2001bRobinson ( , 2003Robinson ( , 2005) ) distinguished three sources of cognitive demands in a language task: (a) task complexity, which refers to the cognitive factors that relate to how a task is designed; (b) task conditions, which refers to the interactional factors relating to participation (e.g., one-way vs. two-way); and (c) task difficulty, which refers to affective and learner ability variables (e.g., motivation).Based on these complexity criteria, two dimensions were identified: resource-directing and resource-dispersing, as described in Table 1.Note: Adopted from Kim (2009).
In sum, previous studies regarding the Cognition Hypothesis have focused on the influence of task complexity on L2 production.Most of these studies have concluded that complex tasks increase attention and focus on form, which enhances L2 production.To date, no published study has investigated the effect of complex tasks on peer assessment.Given that peer assessment is beneficial to learning in the EFL/ESL context, improving this practice is essential.One way to do so is to increase learners' attention to peer assessment tasks.This may be accomplished by increasing task complexity.The purpose of this study is to investigate whether increasing task complexity increases the accuracy of peer assessment of L2 oral production.

Method
The study participants consisted of 30 female Iranian EFL learners.Each participant took the Oxford Placement Test (Allen, 2004) and obtained a score between 120 and 134, which designated them as low intermediate users of English; this corresponds with ALTE (2009) level B1.All participants were provided with a thorough explanation of the research, its purposes, and how the findings would be valuable to the field of English language teaching.All participants were free to leave the project at any time, and incentives were not provided for their participation.
The three speaking tasks in this study were designed to be either simple or more complex by adding and/or removing resource-directing and resource-depleting variables.The first and simplest task (Task 1) was a descriptive narration.The three topics selected were: (a) describe a great vacation, (b) describe a great roommate, and (c) describe a great restaurant.
These topics were selected because the participants had previously carried out these tasks during the course of their EFL training.The distribution of the resource-directing and resource-depleting variables, as described in Table 2, makes this task less complex.The topics in Table 2 require the learner to describe a person, an object, or an event.
Therefore, few elements of the resource-directing variables were given a plus because the learner was required to describe only one object.Furthermore, descriptive tasks do not require reasoning, so the no reasoning demands variable is also given a plus.However, since the task requires a description of a person, event, or object in the past without a mutually shared context, a minus is given to the here-and-now variable.

MJAL 3:2 Summer 2011 ISSN 0974-8741
The Relationship between Peer Assessment and the Cognition Hypothesis Mona Khabiri, Soroush Sabbaghan and Sahar Sabbaghan In the category of resource-depleting variables, a plus is given to planning because the researchers allowed the participants to work in groups.Furthermore, a plus was given to the single task variable because the participant only described the topic and was not required to answer any questions during the task.Finally, a plus was given to the prior knowledge variable because participants had at one time completed a task with similar topics.
The second task (Task 2) was to make a persuasive speech on three topics.The topics included: (a) persuade someone to learn English, (b) persuade someone to buy a used car, and (c) persuade someone to lose weight.These topics were selected because they were novel topics for the participants.Table 3 describes how Task 2 is more complex than Task 1.According to Table 3, the task complexity variable layout for Tasks 1 and 2 was similar except for two variables.Because the topics for the second task were persuasive and required reasoning, a minus was given to the no reasoning demands variable in the resource-directing category.Also, because the topics were new to the participants, a minus was given to the prior knowledge variable.However, these topics do refer to events happening now.For this reason, the here and now variable was given a plus.In sum, because there is one less variable in Task 2 than in Task 1, it is assumed that Task 2 is more complex than Task 1.The final task (Task 3) was a debate.The topics for this task included: (a) discuss the pros and cons of the quality of life in Iran and in other countries, (b) choose between two perfumes and decide which one to buy, and (c) decide whether it is better to be married or single.As with Task 2, these topics were new to the participants and had not been debated in their EFL training.
The arrangement of variables for this task is almost identical to the arrangement of variables in Task 2, with one exception.During the course of the debate, the participants were asked to challenge and question the speaker.Therefore, the speaker not only had to persuade the other participants, but she also had to answer questions and remark on the comments of other participants.In other words, the speaker had to perform two tasks simultaneously.For this reason, the single task variable was given a minus.Information on the level of task complexity for Task 3 is provided in Table 4.The final category, effectiveness, was divided into the three categories of language use, vocabulary, and purpose.The original rating scale for this category includes a subsection called topic.However, since the topics were given to the speakers, this subcategory was omitted.Language use referred to the use of grammatically correct sentences.Vocabulary referred to the speaker's use of words appropriate for the audience.Purpose was the degree to which a speaker was successful in completing the task that they were given during oral production.
The speaker's performance in the areas outlined by the subsections was rated on a 5point Likert scale.Possible scores ranged from 1 (needs work) to 5 (very good).The total scores of all of the ratings represented the speaker's ability in the speaking task.
Before the participants carried out the oral tasks, the researchers met with them and thoroughly explained the background and purpose of the study.The researchers explained the concepts on the rating scale and provided examples and demonstrations of how to use the ISSN 0974-8741 The Relationship between Peer Assessment and the Cognition Hypothesis Mona Khabiri, Soroush Sabbaghan and Sahar Sabbaghan rating scale.Then, the 30 participants were divided into three groups of 10, and each group was assigned to a separate class.This division of the participants was imposed by the institution where the study was being conducted.
The participants and the researchers met three times a week, and data collection occurred over several weeks.Tasks were conducted in an order based on the level of difficulty.In other words, Task 1 was done first, then Task 2, and finally Task 3.
The procedure for Tasks 1 and 2 was similar.At the start of both Task 1 and Task 2, the participants were randomly put into groups of 3 and 4. Each group was given one of the topics described in Table 2 and Table 3.The group members were encouraged to discuss their topic.They were given a total of 25 minutes for this purpose.Then, a member from each group was randomly selected to give a presentation on the topic.During the presentation, asking questions or making comments was not allowed.After each presentation, all participants, with the exception of the speaker, were asked to assess the speaker's performance using the rating scale.The rating scales were collected after a member from each group had presented a topic.
The procedure for Task 3 had minor differences.First, props (paper strips scented with different perfumes) were given to the group that debated choice of perfume (during the presentation, the paper strips were distributed to all participants).Second, during the presentation, the participants were encouraged to ask the speaker questions and make comments during the speech.When the debate was over, the participants were asked to assess the speaker.
Each participant conducted three speaking tasks and was assessed by her peers.
Therefore, each participant received three sets of scores, each set corresponding to a speaking task of a different level of difficulty.To investigate whether participants grew more vigilant during the peer assessment task, the degree of agreement among peers for every subsection of the rating scale was calculated.This was accomplished by calculating the absolute mean deviation (AMD) of the scores.A small absolute mean deviation indicated that scores in a subsectionwere similar.For instance, an AMD of zero would show that all participants gave the same score for a particular subsection of the rating scale.Therefore, the AMD is an indicator of the degree of agreement among participants who have assessed a particular speaker.
The AMD of the scores awarded to each participant for each of the three tasks was calculated.Then, the Friedman test (a non-parametric repeated measures comparison test) was used to compare the scores.The Friedman test indicated whether AMD distributions from different tasks were statistically different.In other words, the test indicated whether the amount of agreement among the assessors was statistically significant.An ANOVA was not used because a Levene's homogeneity test revealed that the variances of the scores were significantly different.

Results
The AMD for each of the 13 items on the peer assessment rating scale was calculated.
As mentioned before, the participants in the study were divided into three groups of 10.
Therefore, in every class, nine participants assessed the performance of the speaker for each of the three tasks.An example of this calculation is displayed in Tables 5, 6, and 7, which show the scores, given by the participants to Student 2, for each item of the rating scale for Task 1, Task 2, and Task 3, respectively.To compare the AMDs of the three tasks for each speaker, the Friedman test was employed.Table 8 shows that the absolute mean deviations are significantly different for the three levels of task complexity for each participant except for student 4 (p all participantsexcept for student 4 < .05).
Table 9 displays the averages of the absolute mean deviation for each of the three levels of task complexity for each participant.Figure 1 displays the graphic representation of Table 9.
As is displayed in Figure 1, the average of the AMD for each participant decreases as the complexity level rises.

Discussion
The results indicated that the AMDs of peer assigned scores decreased as task complexity increased.In other words, the efficacy of peer assessment increased for more complex tasks.As mentioned before, small AMDs are an indication of a high degree of agreement among peer assessors.
The most likely explanation for this outcome could be explained by the predictions of the Cognition Hypotheses, i.e., the AMDs decreased because complexity requires more attention and awareness.This increase in attention and awareness allowed the learners to be more accurate in their assessments.Thus, the results of this study support the claims of the Cognition Hypothesis.
Motivation is another factor that might have affected the results of the study.Studies conducted outside the field of foreign language learning (Campbell, 1988; Kernan, Bruning,  Miller-Guhde, 1994) have revealed the connection between performance motivation and task complexity.Within the field of language learning, studies have shown a connection between motivation, achievement, and effort (Chambers, 1998;Dornyei, 2002;Dörnyei, 1994;Williams & Bruden, 1997;Williams, Burden, & Al-Baharna, 2002).Some of these studies have revealed that cognitively difficult tasks increase the desire for achievement; people therefore put more effort into these tasks, which results in task motivation.It may be the case that, in this study, cognitively complex tasks increased motivation, which in turn increased the learners' precision in assessment.
The practice effect might also have had a role in the outcome of the study.Practice effects occur when a participant in an experiment is able to perform a task and then perform it again at some later time.Generally, the practice effect allows the participants to become better at performing the task.In the data-gathering phase of the study, each participant assessed nine peers three times over several weeks.Therefore, the participants might have gradually gained expertise in assessing their peers with the rating scale.
In sum, several different factors might have influenced the outcome of the study.
However, the researchers believe that an increase in learner attention and awareness, as predicted in the Cognition Hypothesis, resulted in the increased accuracy of peer assessors.
As mentioned before, motivation could have affected the outcome of the study, but there are not at present any published studies that examine the relationship between task complexity, as defined by the Cognition Hypothesis, and motivation.Therefore, the effects of motivation on assessment were difficult to define in our study.Also, although the practice effect might have influenced the outcome, because the study was conducted over several weeks and because assessment did not take place every day (it occurred every fourth day), the practice effects should have diminished.Therefore, it is highly likely based on the cognition hypothesis (Robinson, 2005) that the increase in task complexity explains the increase in the precision of assessment.
Peer Assessment and the Cognition Hypothesis Mona Khabiri, Soroush Sabbaghan and Sahar Sabbaghan ofYamashiro and Johnson's (1997)  rating scale was used to assess the performances of the speakers (see the Appendix).Yamashiro and Johnson assert that their rating scale can be used for peer assessment and self-assessment of public speaking skills.The rating scale is composed of four categories: (a) voice control, (b) body language, (c) content of oral presentation, and(d) effectiveness.Peer Assessment and the Cognition Hypothesis Mona Khabiri,Soroush Sabbaghan and Sahar SabbaghanThe category of voice control was further divided into the four sections of projection, pace, intonation, and dictation.Projection referred to the loudness of a speaker's voice.Pace indicates of the rate of speaking.Intonation referred to using proper pitch patterns and pauses, and dictation referred to speaking clearly without mumbling or using an interfering accent.The category called body language was divided into three sections.These sections were posture, eye contact, and gesture.Posture referred to standing up straight and looking relaxed.Eye contact referred to how much the speaker looked at the audience.Gesture referred to the speaker's use of suitable gestures and avoidance of distracting ones.The category called content of oral presentation "has obvious parallels with academic essay writing"(Yamashiro & Johnson, 1997, p. 1).This category was divided into three sections: introduction, body, and conclusion.Introduction referred to the speaker's inclusion of a thesis statement and attention getting devices.Body referred to the speaker's use of academic writing structures and transitions.Conclusion referred to the speaker's inclusion of a restatement, or summation, and a closing statement.
peer; AMD = Absolute mean deviation.Table 7.Peer Scores and Absolute Mean Deviation for Student 2 in Task sq = Chi Square; df = Degree of Freedom; Asymp Sig = Significant Value.

Table 1 .
Robinson's Task Complexity Dimensions + / -here & now More pictures to narrate vs. Fewer pictures to narrate Pictures presented in order of narration vs.Not in order of narration Pictures present during narration vs.Not present during narration Resource-depleting + / -planning + / -single task + / -prior knowledge Narration with planning time vs. Narration without planning time Narrate a picture vs. Narrate a picture and write a story Familiar with story plot vs.Not familiar with story plot

MJAL 3:2 Summer 2011 ISSN 0974-8741 The Relationship between Peer Assessment and the Cognition Hypothesis Mona Khabiri, Soroush Sabbaghan and Sahar Sabbaghan
). Resource-directing variables require more attention, working memory, and cognitive functions that help learners to focus on linguistic forms.These variables are: [± few elements], [± here and now], and [± no reasoning demand].As Table 1 shows, a less complex narration task requires [+ few elements], [+ here and now], and [+ noreasoning demand], but a more complex task requires [-few elements], [-here and now], and [-no reasoning demand].

Table 2 .
Task Complexity for Task 1

Table 3 .
Complexity in Task 2

Table 4 .
Complexity in Task 3

Table 8
displays the results of the Friedman test.

Table 8 .
Results of the Friedman Test

Table 9 .
Average of Absolute Mean Deviation for the Three Tasks