Going Global: The Successful Link of IELTS and Aptis to China’s Standards of English Language Ability (CSE)

The development of a common language proficiency scale is essential to language teaching, learning, and assessment. While some general English proficiency scales already exist, no such scale is available in an Asian context. China’s Standards of English Language Ability (CSE), as the first scale of its kind, promises to address this deficiency with a clear focus on the student population, grounded in the well-established framework of communicative language ability. As such, it not only illuminates the learning patterns of English language learners at different stages, but also provides a benchmark for curriculum design, student evaluation, and the improvement of educational programs. More importantly, its recent link with such international tests as Aptis and IELTS marks a significant step towards the internationalization of this scale, making student grades on different tests more comparable. The official mapping of CSE to the international examination system opens China’s education further to the rest of the world, and would facilitate student exchanges and deepen educational ties between countries in the future.


Introduction
English as a foreign language (EFL) education has played a prominent role in China's academic and social life since the country restored its university matriculation examination system in the late 1970s. For decades, efforts to promote English language learning, teaching, and assessment in China have been made at the individual, societal, and governmental levels, exemplified by the numerous training centers established, the status assigned to English as a compulsory course in secondary-level and tertiary-level education, and the miscellaneous tests devised as measures of students' English language ability Qian & Cumming, 2017;Xiao, Liu, & Hu, 2019). In many ways, achieving a certain level of English proficiency is perceived not only as the mastery of an essential communicative skill, but also as a ladder of opportunity and upward mobility in an increasingly globalized world (Bolton & Graddol, 2012;Hu, 2014). Currently, China boasts the largest number of English language learners in the world (Kunnan, 2014). In their experiences of English language learning, they are often compelled to take high-stakes tests in order to enter or complete educational programs, which inevitably will have consequential implications for their study and careers (Chen, Zhang, & Hu, 2020;Chen, Zhang, Wei, & Hu, 2019;Cheng & Curtis, 2010).
Among the most influential English language tests in China are the National Matriculation English Test (NMET) for secondary school graduates, the College English Test (CET) series (CET Band 4 and CET Band 6), and the Test for English Majors (TEM) series (TEM Grade 4 and TEM Grade 8), which are all administered on a national basis. These do not even include tests that are designed for individual purposes and at institutional levels, or those purely introduced from abroad, such as IELTS, TOEFL and GRE. Without a comprehensive and coherent framework for streamlining the examination process at different stages of English language learning (Cheng & Curtis, 2010;Hu, Chen, & Liu, 2020;, this wide array of tests has led to great confusion in China's education market and carry many potential problems. namely, elementary, intermediate, and advanced, with nine levels on the aggregate. These levels cover students' educational backgrounds from elementary school to university, thus providing a unified testing system for Chinese English language learners and users. To promote cooperation and mutual recognition between the English tests in China and those that are better known globally, the British Council, along with China's Ministry of Education, published the results of their collaborative research on the successful link of IELTS and Aptis to CSE on the 15th of January, 2019 (the British Council, 2019). This was a milestone achievement in China's English language teaching and assessment, and more importantly, it represents an important step in merging China's English proficiency scale into the international examination system. Combining linguistic theory, empirical evidence, and teacher-student feedback, CSE holds promise as an effective assessment framework that will advance EFL education in China and globally. Therefore, the article begins with an introduction to the rationale behind and the formulation process of the CSE, before presenting a detailed description of its content. Next, some dominant language proficiency scales in the world were reviewed and their unique characteristics were highlighted. Then the comparisons of CSE and the existing scales were enumerated. This was followed by an account of the recent linking of IELTS and Aptis to CSE, along with its possible implications. Finally, the future development of CSE at home and abroad in the global context were discussed.

CSE in China: Rationale, Content, and Application
The search for a common language assessment metric is considered "essential for the development of a meaningful national language policy in foreign language learning and use" (Lambert, 1993, p. 155). However, China has lagged far behind other countries in developing such a common framework scale. It is not until the early 2000s that a host of scholars and language experts in mainland China began to realize this problem, and later proposed the establishment of CSE to systemize China's foreign language assessment system, as can be seen in their publications in some leading Chinese journals (Han, 2006;Jin & Wu, 2014;Jing, Li, Chen, Li, & Hu, 2015;Yang & Gui, 2007). The development and ultimate release of the CSE scale is the culmination of years of efforts to achieve this goal.
In designing such a scale, the National Education Examinations Authority at China's Ministry of Education played a central role in coordinating the work of various departments, and established a panel of experts to oversee the entire process, with different subgroups in charge of different aspects of the scale design, such as listening comprehension and writing. There are three steps that characterize this process.
Step 1 is the collection of descriptors based on a search of the relevant literature, sampling of teaching practices, and typical language activities.
Step 2 is to classify these descriptors based on a combination of expert judgements and those of in-service teachers.
Step 3 is to scale the descriptors using questionnaire results and statistical techniques for validation and other purposes. To ensure that the scale is scientific, practical and operational, the CSE was based on Bachman's communicative language competence framework (Bachman, 1990) and took a use-oriented approach by using "can-do" descriptions to define what specific tasks language learners and users can perform in real-life contexts.
As the first evaluation system for English language ability in China, the CSE framework scales language learners and users at an ascending series of nine levels, with each three levels corresponding to one stage (levels 1−3 to the elementary stage, levels 4−6 to the intermediate stage, and levels 7−9 to the advanced stage) (for the general architecture, see Table 1). Under this general architecture, a descriptive framework is formulated to define different aspects of the learners' language abilities, which include language comprehension, language expression, pragmatic ability, linguistic knowledge, translation and interpreting, and language use strategies. Each of these aspects can be subcategorized to encompass more detailed information. Take language comprehension for example. It bifurcates into listening comprehension and reading comprehension, which is further divided into oral/written description, oral/written narration, oral/written exposition, oral/written argumentation, oral/written instruction, and oral/written interaction, respectively. Based on the general architecture and the descriptive frameworks, the EFL learners and users' overall English ability and their ability in each aspect are described in detail using "can-do" descriptors (Ministry of Education of the People's Republic of China, 2018). The general scale defining the overall language ability of Chinese English language learners and users is presented graphically in Figure 1. Devising such a proficiency scale has implications for multiple stakeholders, and may serve to bridge three key aspects of EFL education in China: language learning, language teaching, and language assessment. For language learners, they can judge their own relative English ability with reference to the scales, and make study plans accordingly. They can also choose the most appropriate learning materials, assess their own learning process, and sign up for proficiency tests that best meet their needs. In terms of language teaching, educational institutions may tap into the information on the scales, such as the learners' needs, motivations and personalities, to design tailor-made instructions. It can also assist teachers in curriculum design and setting attainable teaching goals. As to language assessment, both language teachers and educational institutions can design language assessment projects to evaluate their own teaching, and develop more appropriate tests based on such scales. Finally, as one key aspect of CSE is to link with renowned international English tests, the release of CSE may help merge China's language assessment into the international evaluation system, thus extending the influence of CSE to the rest of the world.

Influential Language Proficiency Scales in the World
Modern research on the development of different language proficiency scales has had a fairly long history, spanning approximately 60 years (Han, 2006;Ounis, 2017). Most of the research is concentrated in areas like North America, Australia, and western Europe. But the resulting scales developed have often transcended national boundaries and become widely applied in many parts of the world.
As early as the 1950s, the U.S. became the first country in the world to develop a language proficiency scale aimed at assessing the oral abilities of its military personnel stationed overseas. Developed by Foreign Service Institute in 1955, this scale came to be known as the FSI scale. It was originally composed of six main levels, with a plus level between each two levels, thus culminating in a 11-point scale ranging from No Proficiency to Functionally Native Proficiency. Due to its huge influence, this scale was later utilized by other agencies of the government, such as CIA and FBI, thus adopting the name the Interagency Language Roundtable Scale (ILR). In the 1980s, it was expanded to include not only the speaking dimension, but also listening, reading and writing (Herzog, 2006). The official guidelines for the ILR were ratified and put into use in 1985. Two advantages characterize the FSI scale. First, the format of one-on-one discussion between the examiner and the examinee was adopted for the first time as a way to assess the examinees' oral proficiency. The test method Oral Proficiency Interview (OPI) also became the most important and certified test nationwide. Second, FSI was an exemplar for later language proficiency scales as it initiated the use of descriptors to define a person's oral ability in real-life contexts. However, the FSI scale also has its problems. For example, it does not provide descriptions of a person's overall language ability, and the gradations between the elementary and intermediate levels were not balanced.
As an immigrant country, Canada also has its own proficiency scale called the Canadian Language Benchmarks (CLB). This scale was developed partly as a policy initiative to increase the new immigrants' adaptability to the new environment, and partly to dispel the confusion of the testing market in Australia (Fleming, 2015). Released in 2000, CLB comprises three levels from elementary, intermediate, to advanced, with each level covering listening, speaking, reading and writing. According to the CLB scale, each level comprises three aspects of information: global performance descriptors, performance conditions, and competency outcomes and standards. Global performance descriptors are descriptions of language learners' ability in listening, speaking, reading, and writing. Performance conditions specify the communicative goal, language contexts, interlocutors, topics, and the length of the tasks in question. Competency outcomes and standards provide representative examples of tasks that language learners at a certain proficiency level are capable of performing. What is special about this model is that it is based on the communicative language competence framework, thus reflecting the latest research results from second language acquisition and language testing. The most well-known and widely applied proficiency scale is perhaps the Common European Framework of Reference for Languages: Learning, teaching, and assessment (CEFR). It is a common framework used in all European regions, serving an important function in language teaching, language learning, and language assessment. The designers of this framework took an action-oriented approach in delineating the language learners' language use and learning (North, 2000). In this process, they defined the learners as active social agents, who can mobilize their communicative language abilities to fulfil different tasks using the appropriate strategies. The language tasks were defined as output tasks, input tasks, and agency tasks (North, 2000). CEF categorizes language abilities into three strata, including elementary, intermediate, and advanced, which corresponds to A1, A2, B1, B2, C1, and C2, with A1 indicating the lowest level and C2 the highest. CEF is comprehensive in that it not only provides global performance descriptors for a language learner or user's language ability, but also provides descriptors at each level and regarding each task. In that sense, CEF is multidimensional and stratified. Other advantages that characterize CEF is the systematic combination of real-life experience and quantitative and qualitative approaches in its design, which makes the framework more comprehensive and trustworthy.
Other commonly used proficiency scales include those designed by American Council on the Teaching of Foreign Languages (ACTFL), Association of Language Testers in Europe (ALTE), and the International Second Language Proficiency Ratings (ISLPR), among others.

Comparisons of CSE with Existing Proficiency Scales
Despite the widespread use of the above scales and others, there are also problems connected to them. For example, some scales (e.g., ACTFL, CLB) have not been empirically tested (Alhussain, 2019;Tigchelaar, Bowles, Winke, & Gass, 2017), and thus may raise reliability and validity concerns. Even CEFE, as authoritative as it is, has been criticized by some scholars for taking a broad brush in its scaling process (Wisniewski, 2017).
Compared with these scales, CSE enjoys several advantages. First, it is clearly based on Bachman's (1990) communicative language competence framework, and draws on Anderson, Krathwohl and Bloom's (2001) redefinition of the cognitive domain as the intersection of the Cognitive Process Dimension and the Knowledge Dimension, which are represented as hierarchical steps. Different cognitive tasks thus serve to differentiate different communicative tasks in terms of their complexity. In this sense, CSE is more firmly grounded in existing theories than some other scales. Second, the descriptors of CSE refined the systematic approach by defining listening speed using quantitative values, providing a clear criterion for the scaling of different levels of listening difficulty. This was a step forward compared with CEFR, which does not have specified data to clearly define different levels of ability (Min, He, & Luo, 2018). Third, the scaling of CSE descriptors in terms of their difficulty is based on a combination of teacher judgement and students' self-reporting in the form of questionnaires. The scores obtained from these questionnaires were analyzed using Rasch modelling, which would yield a specific value for each single descriptor and students' ability (Min, He, & Luo, 2018). Thus, the decision to scale the descriptors was rooted in informed data collection and analysis, making the final products more reliable. In addition, CSE also distinguishes itself from current scales in that it focuses specifically on students in schools. Many of the international proficiency scales are designed for the general population, and thus may not be appropriate for certain learning situations or groups. With CSE, the student population is the main concern, reflecting both the reality of EFL education in China, and an effort to complement the existing scales.

Going Global: Linking CSE to IELTS and Aptis
As the first of its kind, the successful link of IELTS and Aptis to CSE marks a significant step towards making the CSE scale part of the global examination system, and promises more steps to come to make China's language testing part of a global enterprise.
The results of this linking are presented as a juxtaposition of the cut scores of IELTS and Aptis and their corresponding levels on the CSE scale. According to the results, a score of 6 in IELTS reading is commensurate with CSE level 6, a score of 7 in IELTS speaking is equivalent to CSE level 8, while a score of 14 in Aptis listening corresponds to CSE level 3. Once IELTS scores reach 6 or higher, they are equal to the corresponding CSE levels. A detailed chart displaying the corresponding scores is listed in Figure 2 and Figure 3.

Future Development of CSE in a Global Context
One crucial step following the release of the CSE scale is to design its corresponding language tests based on such a scale. A brief overview of the major language proficiency scales in the world suggests that they are all accompanied by their respective tests. The FSI scale, for example, is applied in language testing in the form of the Oral Proficiency Interview (OPI). The CLB scale is accompanied by the Canadian Language Benchmarks Assessment (CLBA). In China, where English language teaching and learning is assigned paramount importance, there has been a diverse array of language tests designed for various purposes. Each of these tests may be built on a distinct language assessment framework. In most cases, however, there is no clear theoretical underpinning for such tests (Cheng, 2008). Commonly used English language tests include CET Band 4, CET Band 6, TEM 4, TEM 8, etc., representing a confusing language testing market. To dispel this confusion, it is imperative for the government and other authorities to design a test based on the CSE scale, which must be widely applicable so as to meet the diverse needs of test takers in China's language learning market. As China deepens its reform in education, this would improve fairness and efficiency in recruiting students to colleges and universities, stem the proliferation of language tests in the market, and provide a common metric for multiple stakeholders in the testing process. And with China opening up more and more to the rest of the world, such a step would also help cultivate a new generation of capable language learners and users who will rise up to the challenges of the 21st century, which could contribute to China's standing and representation in the global landscape.
Another important step is to increase the connectivity of CSE to the global examination system. With IELTS and Aptis already linked to China's CSE, China has made significant strides in merging its language proficiency standards into this system. The next step, according to the authorities on this subject, is to link TOEFL to China's CSE. This would consolidate the status of CSE in the global examination market, and juxtapose these different tests to warrant a direct comparison. As transnational and intercultural communication becomes the rule of the day, student exchanges are expected to become commonplace. As long as English remains the global lingua franca it is today, English language testing is inevitable. Whether students take one test or another, or they use the results of one test in comparison to those of another test, this would require a comparable relationship between different tests.
Language tests are never for testing purposes only. Instead, they carry social, economic, and political implications for all those involved in the testing process. CSE's link with other tests in the world thus holds promise to deepen China's ties to other countries in many aspects, such as educational cooperation, economic partnerships, and beyond.

Conclusion
This article sketched the internationalization of China's Standards of English Language Ability Assessment as it took shape in China and became assimilated into the global examination system. An introduction was first given to the rationale behind its design, the content of the scale, and its possible applications in language learning, language teaching, and language assessment. Some of the major language proficiency scales in the world were then presented and evaluated regarding their unique characteristics, with the goal of providing a historical context for viewing CSE in perspective. The successful link of IELTS and Aptis to CSE was discussed with reference to the linking results and its possible implications. Future development of CSE was also brought into focal attention by highlighting its possible trajectory in the global context. As an initial step for China to engage with the world, the development of CSE and its link with IELTS and Aptis has wider relevance to many other forms of international communication between China and other countries in the future.