Development of Objective Standard Setting Using Rasch Measurement Model in Malaysian Institution of Higher Learning

Measurement and evaluation of students’ achievement are an important aspect to make sure that student really understand the course content and monitor students' achievement level. Performance is not only reflected from the numbers of high achievers of the students, but also on quality of the grade obtained; does the grade 'A' truly reflective of a high achiever student. As part of quality improvement, standard setting for students’ examination scoring should be set up for courses offered in the institutions. Setting the cut scores of the performance standard is important in making sure the performance is of expected standard. Commonly standard settings done through experts' judgement based on their knowledge and experience on the subject's matter and estimation made often too difficult and confusing for many. This paper discusses implementation of Objective Standard Setting (OSS) using Rasch measurement model in a Malaysian Institution of Higher Learning. This method gives an advantage for academic administrators to scientifically establish the standards or cut scores focusing on validity of the test being used and the Rasch measurement properties of the resultant scale.


Introduce
Engineering programs in Malaysia have been progressively going through changes in making sure higher learning institutions produced quality graduates fit for the working environment.Graduates not only obtained good grades but are being prepared for the industry through trainings as well.It is in line with the aims of the ministry who wants to produce graduates who are competent and able to obtained employment in the relevant field.In fact, the Ministry of Higher Education (MOHE) has formulated a strategic plan for nurturing a knowledgeable, innovative and first class human capital in 2020.The plan emphasizes on the improving quality of teaching and learning including enhancing the system of student assessment by paying attention to the students' achievement other than the writing test (http://www.mohe.gov.my/psptn/).
Besides MOHE, the Malaysian Qualifications Agency (MQA) provided the standards and framework for engineering programs in making sure that the institutions of higher learning are able to produce quality graduates (MQA, 2011).MQA also ensures quality of the courses offered either in local (IPTA) or private institutions of higher learning (IPTS).Code of Practice for Programme Accreditation (COPPA) and Code of Practice for Institutional Audit (COPIA) were developed to assess the courses offered in higher education institutions.Some key aspects which are guaranteed by MQA are a continuous quality improvement and student assessment (http://www.mqa.gov.my).Besides ensuring the goal and learning outcomes of the courses are met, students' achievement also are being monitored to produce competitive and quality students in the face of global competition in line with the aspirations and the National Education Philosophy.Measurement, evaluation and assessment were done with continuous effort in meeting the expected outcomes, without foregoing quality.
At the same time, Faculty of Engineering & Built Environment (FKAB) performs a variety of teaching and learning methods to meet the needs of the industry hence accordance with accreditation requirements.Subsequently, FKAB make improvements from time to time to make changes in the methods of teaching and learning in stages.Measurement of quality of students always been discussed especially in determining the achievement lines.Not only in IPTA and IPTS, Ministry of Education also agrees with the importance of achievement lines.In year 2012, only 3,000 among 12,000 polytechnics graduates offered to further their studies in local institutions of higher learning.It is because the scoring standard, Cumulative Grade Point Average is lower than the Matriculation and Malaysian Higher School Certificate (STPM) colleges.Because of that issue, the Ministry of Education has called all polytechnics' management to revise the examination standard and grading system to ensure their graduates managed to further their study in local universities (Utusan, 2012).
Given that this issue arose, it clearly shows that setting the benchmark standard for the students to pass the exam is also important to improve quality of teaching and learning, thus improving quality of students in an educational institution.Quality assessment is expected to produce credible information that facilitates critical decision making (Zamaliah et al., 2011;Stone, 1995).It depends on what are expected and calibrated instruments that provide the test-takers various levels of measures for decision makers to judge what is best and less achievement (Wright, 2000).It has always been an issue when deciding where the lines of achievement should be, taking consideration of students' measures and quality that been set.

Standard Setting Review
The process of teaching and learning is expanding and the standard setting procedure is one of the important aspects to improve teaching and learning.In preserving and ensure the test results will be useful and defensible, standard setting ideally involves with policy makers, developers test and measurement expert, not just a methodological process (Bejar, 2008).
Standard setting can be defineas a procedure of establishing one or single cut score (such as pass/fail and allow/deny) or more (multiple) cut score (e.g. level of achievement such as weak, moderate, excellent) in a single test depends on what test requirement.This cut score function as separation of two or more categories required by the test (Cizek & Bunch, 2007).Meanwhile, other definitions of standard setting besides determines the level of achievement or mastery is a method applied to obtain the corresponding cut score which can classify students who are under cut score to a higher level (Bejar, 2008).
There are many procedures for standard setting such as Angoff, Nedelsky, Ebel, Bookmark and Objective Standard Setting.Abu Kassim (2007) argued that educational standard is lack of objectivity due to the unsystematic judgement of the experts when constructing the cut scores.Angoff had its way by averaging the subject-matter experts predicted difficulty of the items (James et al., 1998) in which described by Sick (2009) as "too difficult and confusing".Objective (OSS) model and Bookmark represent clearly departures from Angoff model, Ebeland Nedelsky (Stone, 2011).Compared with both model, Angoff failed to define a legitimate and stable construct and avoids the item by item judgement may be tedious and difficult to judges (Stone, 2011;MacCann & Stanley, 2006).
Objective Standard Setting through Rasch measurement model has shown efficacy in psychometric of the stable formation of more reasonable standard (Stone, 1995).It can lead to the acceptable benchmark of the test taker.This method asks the experts who act as judges to determine the areas of knowledge required directly from the items or questions in the test.These experts will also need to set standard set for the course or subjects, at appropriate benchmark on the measuring line between essentialand higher items.This paper explores the objective standard setting with constructed-response items with the use of a research-based approach in the quantification of the judges' ratings.

Method
Objective Standard Setting method is based on the study conducted by Wright and Grosse (Wright & Grosse, 1993;Wright & Grosse, 1987).This method combines between expert judgement on the test content and consideration of the ability and the difficulty of students-item respectively.

Objective Standard Setting Methodology
This method involves evaluative decisions and translation in the quantification of the standard.The evaluative decisions steps are as below; a) Expect judgement on essential and optional items within the test.
b) The level of mastery required and the quantification of the standards.a) Calculation of the mean item difficulty for all the essential items within respective grade separation; µ difficulty.

Instruments
The students' raw scores gathered from final examination.19 items of linear algebra (KKKM1224) final examination paper were use in this study.Linear Algebra is one of the engineering mathematics subjects in UKM which discussed on complex number, hyperbolic functions, power series expansion, matrix and determinant and important applications of linear algebra in engineering.This course will highlight to the students on importance of linear algebra in engineering (Undergraduate Handbook, 2011).All items were given to judges to review their essentiality.Essential items refer to the major item which significant to achieve learning outcome and meet the objective of the course or subject.The details of test items are summarise in Table 2.
Table 2. Topics for each examination question

Results
The raw scores were then run using Winsteps version 3.68.2, a Rasch analysis software.To ensure that there are no misfit items which can affect the result for standard setting, the test items need verification procedure before proceed with OSS procedure.In Rasch analysis, there are three-step comparison procedures to determine the test items misfit.The procedure start with point measure correlation (x), followed by outfit MNSQ and then with outfit ZSTD (z) value.Each control have accepted region given as follows; 0.4<x<0.8,0.5<MNSQ<1.5 and -2.0<z<2.0.Item misfit if all controls did not meet the accepted region respectively.Figure 1  Raw scores always mislead our perception on students' ability.It does not indicate the marks earned are from difficult items to confirm their valid ability.Figure 2 shows that student 197 with score 93, student 65 with score 91, student 67 with score 86 are better ability students as compared to their corresponding higher scores students namely student 29 with score 96, student 138 with score 92, student 16 with score of 87.The latter students obtained higher scores from easy items, where else the former, despite having low scores but obtained the marks from more difficult items.Rasch position them of better ability according to their ascending ability order.Raw scores is only indication of achievement but not a measure of ability.OSS gives a better psychometry of the students' ability.
- -----------------------------------+--+------- Setting the standards starts by identifying the essential items among the items prescribed in the test.Experts in the area of linear algebra are to identify essential items within the 19 items in the test.This is crucial in determining the topics which deemed important and for the students to achieve in mastering linear algebra.For the purpose of this paper, an expert has identified 16 items as essential items.Those items indicated with 'E' in Figure 3. Usually, the grading system has 5 separations; A, B, C, D, F. However, due to leniency of grading, this academic institution has 11 separations; A, A-, B+, B, B-, C+, C, C-, D+, D, F; then dividing all the items in the test with the grades separations yielding approximately about 2 items for each grade.The separation should starts from the lowest item in measure table as in Figure 3 which sorted according to their respective difficulty logit measure with the lowest is the easiest items.
Distribution of essential items according to the 11 grading system causing the items being segregated into 8 groups altogether.Table 3 list out the items according to their respective groups and difficulty measures, from the lowest; The logit measures for all the items within each groups are then summed up and divided by the number of items for each group giving the mean value for each group.The mean value will act as the cut score value for that particular group.Similarly, mean of each standard error has calculated for each group.Table 4 shows the list of groups with mean and mean error.The mean standard error is allowance of range systematically provided in situations where decision need be made on reducing or increasing the cut score for each group grades.However, refer to Figure 3 again, take note that some of the items have similar logit measures and some have narrow measures between the items.For example items in Group 2, both items; A1_Determinant matrix and C10_Subset of subspace, both having same measures at -0.35 logit.Rasch analysis provides empirical evidence that the items are redundant.Lucky enough that both items are from different topics or dimension.
Items in Group 5 and 6, which the measures are so narrow, are vague in their grade separation.Considering the measurement for item B6_Vector transition at +0.01, SE 0.04; on the high side the item could be located at +0.05 logit and the lower end at -0.03 logit.At the worst scenario, the item has crossed over the Group 6 separation instead.This is another consideration to note before embarking to OSS where items need calibrated properly.However, for this exercise in view of insufficient items, so the process has skipped.
The students expected to meet 60% 'mastery' level of among the essential items in order for them being considered as meeting the objective of linear algebra subject.This gives the demarcation point of roughly meeting or achieving 10 out of the essential 16 items; marked at -0.03 logitmeasure or at A3_Eigenvectors item.
Mapping the mean logit measures on to the person measure table where the person sorted in ascending order according to their respective ability measures.Cut score 60% at -0.03 Cut score 60% at -0.06 Figure 4 of the person measure reveals that the 'mastery' point is within 'C' group, however, five (5) of the students did not reach the -0.03 logit measures even though there are within 'C' group in the conventional fixed-grade setting.This reveals, even though that the five (5) students are within grade 'C' but they do not reached 'mastery' acceptance of linear algebra; they may only managed to answer correctly on easy items and have some difficulties answering difficult items.
Having considering passing 'mastery' level of linear algebra at 60% which marked at -0.03 logit, revealing that 139 out of 217 students managed to reached the mastery level.This is equal to 64% passes.However, if the academic administrators decide to have high number of passes, the SE value could be use.Rasch applies the SE of measures as the limit for variance.Referring to Table 5, the 60% mastery level measure is at -0.03 logit, SE 0.03, will have a lower limit of -0.06 logit.This allows 12 students for possible inclusion as deemed mastery.Now the number of passes has increased to 151 (from 139) or 70% (from 60%).The academic administrators can report such achievement at high 70% of mastery skill achieved and yet does not breach the ethics and values of morality.This is contrary to many practices to achieve higher percentages of passes by arbitrarily increased the marks of all students.

Figure 4 .
Figure 4. Snippet of person measure table b) Translating the mastery level decision onto the logit measurement.c) Calculating the standard error (SE) of each grade separation; for the allowance of grade demarcation changes.The steps to fix the cut score is outlined are; a) Judges to determine the essentiality of items in the prescribed test; E item, thus establishing total of E items; ΣN E b) Considering the number of grade separation, and to divide the number of total essential items with the number of grade separation; ΣN E /N S to give the number of items per grade.c) Calculate the mean difficulty of essential items within each grade; µ item.Include also calculation for mean SE; µ SE. d) Decision by the expert to set the 'mastery' level, which is normally set at 60%.Therefore, 60% out of the total number of essential items would give the number of items actually needed to achieve the mastery level taking consideration from the easiest essential items, in ascending order from item measure table, thus having the logit measurement.Upon having the logit measure on accepTable 30% essential items, then map it on to the person measure table; to consider the number of accepted students in mastery level.3.2 Participant 217 of the first year students of engineering faculty at a Malaysian institution of higher learning involved in this study.It comprises of four (4) disciplines of the engineering program; civil structural, mechanical, chemistry process, electric electronic.The students distributions based on department and gender are shown in Table 1.

Table 3 .
Groups and respective logit measures

Table 4 .
Groups with mean and mean error