Chemistry Test Items Development : Assessing Conceptual Understanding among Malaysian Students

TIMSS has reported that, Malaysian students' achievement in science has exceeded the international average; however, it was still far behind its neighbouring country Singapore which is on the top three ranking. This paperwork aims to develop and validate instruments to assess the level of empowerment for chemistry-content knowledge. The instruments were adapted from previous researches on Conceptual Alternative in chemistry content knowledge. The test was administered to a group of 134 Form Five students who took chemistry subject in school. Development of a good research tool for Chemistry Content Test (CCT) needs analyses of the 31 items to find the reliability and validity values. This article discussed the steps in the item development process and a summary of how Rasch Modelling were applied to analyze the items. The implication of this research is that teachers can use the CCT as a guideline to develop questions that are needed for teaching and learning.


Introduction
Data collected from TIMSS reported that, Malaysian students scored 510 in a science subject, on average, which exceeded the international average of 474 (Mullis, Martin, Gonzalez & Chrostowski, 2004).This report also mentioned that, in comparison to other countries, Malaysia was outperformed by 19 of the 44 participating countries.The top three were Singapore, Chinese-Taipei and Republic of Korea.Thus, it is important to construct test items that must be fair and suitable in assessing all Malaysian students for those who came from high socioeconomic status as well as students with low socioeconomic status.
The instruments were adapted from previous researches on Conceptual Alternative in chemistry content knowledge.Multiple-choice items make use of common misconceptions as distracters, which allows researchers to simultaneously test for students' correct chemistry ideas and assessed their content knowledge.Thus, the development of a good research tool needs analyses of the 31 items to find the reliability and validity values.For the purpose of developing a research tool that provides an accurate assessment of the students' ability in Chemistry subjects, this paper applies Statistical Package for Social Science (SPSS) software version 18 and Rasch measurement model with Winsteps 3.69.0software (Linacre, 2009).Rasch modelling is used to estimate and compare the items' difficulty and the popularity of each answer choice for students of differing ability (Linacre, 2007).

Purpose of Study
Mainly the purpose of this study is to develop a test that will assess chemistry knowledge.Specifically the objectives are: i) to gain an empirical evidence for reliability and validity of items in Chemistry Content Test (CCT), ii) to determine item difficulty separation values, and iii) to identify types of items that are difficult to answer.

Methodology
The preliminary version of CCT was pilot tested and administered to a group of 134 Form Five students who took the chemistry subject in school.Linacre (1994) explained that 30 examinees are enough for well-designed pilot studies.The test was administered at the beginning of the school year with the assumption that the level of students' knowledge was still the same as form four students' level of knowledge at the end of the school year.These students have learned all the seven chapters of form four chemistry syllabuses.The students were given 35 minutes to complete the test.The data collected were used to investigate the tests' reliability and validity.

Design Instrument
The researcher, with the help of chemistry experts has developed a collection of objective test items that met the criterion of face validity.There were 31 multiple choice items constructed altogether for the Chemistry Content Test (CCT).Test items covered seven topics in form four chemistry syllabuses.The topics covered are: (i) Atomic Structure; (ii) Chemical Equation and Formula; (iii) Chemistry Periodic Table of Elements; (iv) Chemical Bonding; (v) Electrochemistry; (vi) Acids and Bases; and (vii) Salts.
In order to minimize errors, the developments of the test items were based on input of expert and teacher's opinion to serve the purpose of exhibiting the face validity.To make the test free of the confounding factors of reading ability, researcher provides a visual illustration such as diagrams, tables, and figures that serve to assist the clarification of ideas.In addition, items were referred to the Integrated Secondary School Curriculum for Chemistry Form Four Syllabus.

Data Collection
Since an appraisal of whether the data fit the model reasonably well is required.The item fit index was used to show how well the items function in the reflection of the traits.Point Measure Correlation (PTMEA CORR.), Outfit Mean Square values (MNSQ), and Standardized Fit Statistics (Z-std) were used in analyzing the CCT items for the purpose of determining whether items are goods and fit for the test.For fit statistics, Bond and Fox (2007) proposed that acceptable fit values fall between -2.0 and +2.0 with a sample size between 30 and 300.This pair of researchers also mentions that negative values indicate less variation than modelled: The response string is closer to Guttman-style response string in which all easy items correct then all difficult items incorrect.Positive values indicate more variations than modelled: The response string is more haphazard than expected.
It is known that separation thought of as the number of levels into which the items and persons can be separated.As Green and Frantom (2002) suggested that, for an instrument to be useful, the separation should exceed 1.0.Besides, these researchers also mention that the separation determines the reliability.Consequently, higher separation will yields to higher reliability.The reliability for the test was determined by the use of index coefficient of Cronbach Alpha.
From the same PTMEA CORR.Index values, the Discrimination Index values of items can be determined.Ong Saw Lan, Zurida, and Foong Soon Sok (2006) stressed that item value above 0.30 was considered to have satisfactory power of discrimination.Rasch analysis also has the mapping facility to allow us to see the distribution of each item in CCT together with the persons along a continuum.This graph intends to give information about person position along with the item position.The item index and the mapping facility were examined for the purpose of item revision.

Reliability of CCT
In the analysis, the separation values are reported together with the reliability values.Items and person reliability are assessed based on person reliability and item reliability coefficient, to which are equivalent to Kuder-Richardson (KR-20).From the analysis in Table 1, it is found that the person reliability is low at 0.34 which is suggesting that similar ordering of person placement cannot be expected if this sample of people were given another set of item measuring the same construct CCT.This low reliability occurs due to the low separation value (0.66) which is supposed to be more than 1.0.

Item Fit
Attributes were checked on the Point Measure Correlation with acceptable parameters; PMC = x, 0.4 < x < 0.8.In order to determine the item as 'problematic', Rasch requires further verification by looking at the Outfit column for Mean Square value, MNSQ = y, 0.5 < y < 1.5 and Z-std value where Z-std = z, -2 < z < +2.The output table for item measure is exploited to determine misfitting item which is shown below.Based on the Table 2, analysis of the three attributes PMC, MNSQ and Z-std values show that none of the items were a misfit.The items are considered as a misfit only when all the three controls cannot be met.Hence, all items are acceptable for further analysis.The researcher continues on viewing the mapping facility in the Rasch model as a method of determining items distribution with the persons along a continuum.The map analysis is shown in Figure 1.

Item Person-Map
Rasch analysis also has the mapping facility to allow us to see the distribution of each item in CCT together with the persons along a continuum.According to Herrmann-Abell, DeBoer, and Roseman (2009) when item difficulty and person ability match, the person has a 50% chance of answering the item correctly.The item index and the mapping facility were examined for the purpose of item revision.Figure 1, below shows the item person-map for CCT.

Figure 1. Item Person-Map analysis
The distribution of persons' positions is located on the left side of the vertical line and items on the right.From Figure 1, the items cover the range of -1.85 to +1.65 logits in difficulty, which is broader than the range of about -2.24 to 0.62 for persons.It also can be seen that the numerous item's position is beyond the person's capabilities.These indicate that items on the upper end of the graph are very difficult for students and as a result, the items have dropped from the item list.Meanwhile, lowest position in the graph is item s2.This item is the easiest questions, but still in persons capabilities to answer it correctly.At one point on the scale, there are 4 items at the same position.The researcher has to consider either dropping or revising one or two of them as redundant.
Although Table 2 showed that all item a fit with the model, based on the Discrimination Index, some of the items were identified as 'problematic' and needed to be revised or rejected.Before the action is taken in improving the instrument, map analysis in Figure 1 also contributes in giving information for improving quality of the items.Thus based on the data that have been collected from Table 2 and Figure 1, researcher categorized the items into four criteria for further action to be taken in developing the instrument.The categories are as follows: retain, rephrasing, simplified, and rejected.
Table 3. Categories for items and action to be taken Table 3 shows the total of 31 items.A total of 14 retained items were items that have value within acceptable range, as discussed earlier.The table also shows that 4 items need to be rephrased to help students understand the questions in CCT.There are another 5 items which are located at the upper end of graph need to be simplified and matched with the students' capabilities.Lastly, 8 items that are located at the far upper end of the graph were taken out of the item list.This is because these items have negative value that really poor in discriminating students.Students with high achievement fail to answer correctly, whereas students with low achievement accidentally able to answer questions successfully.Some of the items were also taken out due to redundancy.

Conclusions and Implications
The objective of this paper is to develop an instrument to measure students' Chemistry Content Knowledge with referenced to the Malaysian Secondary Science Curriculum.Applying Rasch Modeling in test item development can be a powerful tool for evaluation, and refinement of items.Thus, it results in precise, valid, and relatively brief instruments that minimize response burden.The findings from the analysis showed that improvement is still needed for some of the test items to ensure that the instrument is reliable and useful.After some improvements, the test items will be distributed to sample research.

Table 2 .
Item measures