A Study on Lexical Chunks in Different Moves of Abstracts of Native and Chinese Applied Linguistic Journals

The norm for abstracts of research articles (RAs) is important and will win the RAs more chances of being received. The analysis of how abstracts are written in papers published in well-recognized international journals can serve as a reference to help writers conform to the norm. This paper, adopting Hyland’s classification and Santos’ framework, employing a corpus-based method, compares how native and Chinese writers use lexical chunks differently in abstracts from a perspective of move analysis. The results show that Chinese writers tend to use more lexical chunks; there is no significant difference in the functional distribution on the whole but significant differences are found in two moves of “situating the research” and “discussing the research”.


Introduction
Norms are important for academic writing. Hyland et al. (2017) conclude that informality has become a prevailing phenomenon. Huang et al. (2010) find that linguistic problems, negative transfer of the mother tongue, and too much time needed for composition are common problems came across by scholars both in Hongkong and mainland China, and it is the different writing styles between different languages that cause the linguistic problems. Genre study will reveal the formality of academic writing. Move in writing is a concept put forward by Swales (1981) for genre analysis. Swales (2004) claims that moves are the rhetoric units for discourse to achieve the purpose of communication. One move can be further analyzed into one or more sub-moves. Discourses have different moves or sub-moves, which can show how the discourses are organized effectively, corresponding to different communicative purposes.
Abstracts of RAs have two communicative purposes, being informative and persuasive. Abstracts that fail the norm will cause an impression of writers' inadequacy of knowledge and possibly lead to failure of papers' being received and accepted (Li, 2019). A sufficient study of how abstracts are written is necessary. Santos (1996) once constructs a scheme of five moves for abstract writing of RAs in applied linguistics. These five moves are situating the research, presenting the research, describing the methodology, summarizing the results, and discussing the research. There are sub-moves for all five moves, as shown in Table 1. This scheme gives a macro description of abstract writing but the application of it to the writing process needs a micro description as well. In other words, how language is used as a carrier of the scheme has to be further found.
Analysis of lexical chunks is chosen for the micro description not only because it involves a data-driven and thus bottom-up method of language study, but also because lexical chunks are closely related to writing styles. Some researchers have already identified some lexical items as critical markers for moves (Swales, 1981;Cortes, 2013). Biber et al. (1999Biber et al. ( , 2004Biber et al. ( , 2007 set high frequency as a criterion, and use the corpus to sort out frequent strings of linguistic items, which are called lexical bundles or chunks. Biber's series of research indicate that analysis of the lexical chunks' grammar and discourse can distinguish formal written language from informal oral language. Hyland (2008) shares the same belief with Biber et al and further classifies these lexical chunks into three categories, which are more closely related to academic writing (RA). The three categories are research-oriented chunks, text-oriented chunks, and participant-oriented chunks.
This study is a combination of both macro and micro perspectives. In other words, lexical chunks according to Hyland's classification will be studied in moves or sub-moves of the abstracts in Satos' scheme. Two corpora with abstracts from well-recognized journals will be built, one from native speakers and the other from writers in mainland China. comparison of which will show similarities and differences in three aspects, the overall usage quantity of lexical chunks, the overall discourse distribution of lexical chunks, and discourse distribution in different moves or sub-moves. The comparison will shed some light on the norm of abstract writing academically.

Literature Review
Hyland (2008) groups lexical chunks grammatically and functionally and studies them respectively in RAs among four disciplines of business, biology, information technology, and applied linguistics (AL). Discipline-related variations have been found in both grammar and discourse functions. For example, compared with the soft knowledge field, the science and engineering texts employ more passive + prepositional phrases and anticipatory structures, which is due to the communicative purposes of marking logical relations, identifying tabular or graphic display of data, and downplaying the authors' roles. Moreover, in terms of discourse function, there is a concentration on research-oriented chunks in the hard science field to give a laboratory-focused sense and it is especially important to illustrate the research procedures to show how the experiments have been done. In contrast, the soft science filed like AL and business employs text-oriented chunks the most, which indicates an emphasis on displaying arguments and evaluations. A finding that there is little overlapping of lexical chunks across disciplines, so it seems impossible to make a list or even compile a dictionary of high-frequent lexical chunks for academic writing is specially mentioned in the research. It is concluded that patterns of lexical chunks are discipline-specific and depend on communicative purposes. This conclusion is later supported by both Beker (2019) and Gilmore et al. (2018). Beker focuses on academic literature in engineering and found considerable overlap in stings of language. Gilmore et al, tackling the field of civil engineering, also find more than half of the lexical bundles are research-oriented, "reflecting the 'real world' focus of the specific field and its descriptions of physical objects, materials, contexts, processes, and quantities" (Gilmore & Millar, 2018, p. 12).
Besides a horizontal analysis of lexical chunks in different disciplines, there is a growing tendency to study them longitudinally in moves or sub-moves in writing discourses.
Cortes (2013) compiles a corpus of introductions in RAs from thirteen disciplines and sorts the highly frequent lexical bundles of different lengths. A relationship between lexical chunks and moves is expected to be established by presenting a list of different moves and their most used chunks correspondingly. However, no connection has been built between moves and the chunks' grammar and discourse. Instead, the length of the chunks matters more. The longer a chunk is, the stronger the connection it has with a specific move or step. In other words, there seems an exclusive match between some longer chunks and some moves. These longer chunks often function as triggers for a move or step or complements to complete clauses that may trigger moves. Wright (2019) holds the opinion that answering the question of which move and what kind of lexical chunks are often used may lead to the production of stand-alone literature reviews. His results indicate that even though more bundles are used in the stand-alone literature overall, the distribution of bundles' discourse function is similar to other RAs, but "referential bundles were far more frequent" (2019, p. 12). Two chunks are specially mentioned in that they are frequent in this research but not that common in other RAs. One is "it is possible that" and the other is "it should be noted". The reason may lie in the review feature that it is common and necessary to show the writers' stance, give comments and make evaluations.
Kashiha (2019) chooses the conclusion section in AW of non-native writers for study. It is found that far more noun and prepositional phrases are used than verb phrases and dependent clause ones. Instead of a correspondent relationship between moves and lexical chunks, some specific chunks are repeatedly used in more than one move, which is a characteristic of non-native writers.
Some researchers look into lexical chunks in moves of abstracts especially. Omedian et al. (2018) take lexical chunks in abstracts from six different disciplines as the research subject and analyze their discourse functions distributed in different moves to find out how the chunks achieve different communicative purposes in different moves. They conclude that 'hard' science like biochemistry or medical science will recommend their articles by giving prominence to research methods and instruments while 'soft' science like linguistics will do this by highlighting research objectives and contributions.
Liu (2019) provides both interdisciplinary and intradisciplinary viewpoints by comparing abstract writings between chemistry and linguistics on one hand and native and non-native writers on the other hand. Results show that discipline features exist in both native and non-native speakers. For native speakers, more passive bundles are used in chemistry than in linguistics to form an objective tone. Chinese users are also aware of discipline variation, using similar bundles with the native speakers in two disciplines, especially in linguistics. However, despite similarities, Chinese users showed less richness in their chunks. What is distinctive with Liu's study is that a statistics method is used to find whether significant differences exist across disciplines and among two groups of writers.
Li et al. (2020) study five-word lexical chunks in Ph.D. abstracts in that five-word chunks carry more information and thus more easily build up a one-to-one connection with rhetorical moves, which are considered more intensive in Ph.D. abstracts. This study innovatively supplements one more move, which is structure, to the traditional five moves mentioned by previous studies based on the sorted lexical chunks. Two specialties in this study are worth mentioning. First, only lexical chunks at the initial position of a sentence are chosen as research targets to avoid much overlapping like "is one of the" and "one of the most". Second, after the "unique for Ph.D. abstracts" scheme of moves is constructed, it is applied to analyze the grammatical structures and functions of the lexical chunks and later linguistic features have been described by investigating in moves of specific functions, what kind of structures are mostly used. The investigation and description form a lexicon-grammatical pattern. The findings are inspiring for ESP teaching. For example, noun phrases with post modifier fragment of, noun phrase verb phrase fragment, and noun phrase + be are the most popular ones in different moves. "Verb tenses and aspects of voice are also found as signals of rhetorical moves". Different from previous studies, no propositional phrase chunks are found in this study.
Analysis of lexical chunks from the perspective of rhetorical moves has become a tendency in that it tells writers how to use language items to compose RAs following norms, and thus is very practical and meaningful. Despite all the previous enlightening research, whether there is a one-to-one connection between moves and chunks is still controversial and whether there is a preference for chunks of some kind remains unsettled. My study attempts to address these unsettled using statistics. What's more, by comparing RAs between native speakers and non-native ones, the existing problems in non-native speakers' RAs will become explicit and correspondent suggestions can be given later. Few researchers have addressed the translated versions of the abstracts of RA of non-native speakers, while it is well admitted that these translated versions are important since they play the role of making the research achievements known and received by other researchers not only in their own country but also all over the world. Therefore, this study will study the translated versions of abstracts of Chinese writers.

Research Methodology
A corpus-based and data-driven approach is adopted to obtain lexical chunks of high frequency, and the Chi-square test is used to test whether there are significant differences in data of each group.

Corpus Construction
This study aims to find out differences and similarities between native speakers and Chinese writers and thus focuses on only one discipline in AL. Two corpora for abstracts of native writers' academic writing (NWC for native writers' corpus) and translated versions of Chinese writers' abstracts (CWC for Chinese writers' corpus) are built. For NWC, 40 academic papers are randomly chosen each from three SSCI journals, namely English for Specific Purposes, Applied Linguistics, and TESOL Quarterly from year 2009 to 2020, with altogether 120 papers and 20495 words. For CWC, also for the same period, every 60 papers, altogether 180 ones with 21552 words are chosen randomly from three core journals in AL namely Foreign Language World, Modern Foreign Languages, Foreign Language Teaching, and Research. In this way, NWC and CWC have a similar number of words and include academic research from different but well-recognized journals written by different authors. In building up NWC, the authors' names and their background information have been further reviewed to confirm the native speaker's origin. Abbreviations will be taken as a whole unit like SLA or TESOL while some meaningless markers or messy codes will be deleted.

Move Identification
There should be references to identify moves and steps in abstracts of RAs. First, Santo's (1996) taxonomy is adopted. Second, the author and a co-researcher who has been receiving doctoral training will study example sentences in Santos' article in detail and have a clear picture of communicative purposes of moves and their sub-moves. Then the coding is done and only more than 80% of agreements have been made will the coding process be considered complete and the final results are adopted. All discrepancies are resolved through pair discussions. Each sentence is coding according to the main communicative purpose of the main clause instead of the subordinate one.
After move identification, there are five sub-corpora for both NEC and CWC, and they are named NWC 1 to 5 and CWC 1−5, Number 1 standing for the first move and Number 5 standing for the last move, so NWC 1 refers to the corpus of first moves of native writers' abstracts. Altogether there are ten sub-corpora and lexical chunks will be sorted out in each corpus.

Lexical Chunks Collection and Grouping
Antconc 3.2.4's N-gram will be used to collect three-word lexical chunks at a criterion of a minimum frequency of 6 times in at least five different RAs in all the ten sub-corpora. Three words are chosen in that although it has been found out that there is a correspondent relationship between moves and chunks which contain five or more words, there is no such finding with shorter lexical chunks while three words lexical chunks like "in terms of", "in order to" are friendly for memory and quite applicable for writing.
The lexical chunks, after collection, go through a grouping according to Hyland's taxonomy. In grouping, the examples given in Hyland's paper are summarized and referred to. The grouping is considered effective only when 80% agreement is made between the author and the co-researcher.

Results and Discussion
In this section, comparisons will be made between NWC and CWC in terms of the overall frequency of the collected lexical chunks, the overall distributions of discourse functions, and the distributions of different discourse functions in specific moves and sub-moves.

Comparison of the Overall Frequency of Lexical Chunks
The results show that there is a total of 33 types of lexical chunks in NWC and the token is 203, accounting for 0.99% of the total tokens in NWC. Thus, the Ratio of Token/type (TTR) is 6.15. For CWC, there is a total of 43 types of lexical chunks and a token of 447, accounting for 2.07% of the total token of CWC, and the TTR is 10.40. The higher the TTR, the less variable the lexical chunks are. Chinese writers use more chunks than native writers (2.07% vs 0.99%), and chi-square test shows a significant difference (2 = 80.705, DF = 1, P = 0.000 < 0.05). Chunks in CWC are less variable than those in NWC (10.40 VS 6.15), showing a significant difference (2 = 4.609, DF = 1, P = 0.032 < 0.05).

Comparison of Functional Distributions of Lexical Chunks
Some studies (Hu & Yan, 2017;Liu & Lu, 2020) indicate that the frequency of participant-oriented chunks is significantly lower than those of research-oriented and text-oriented ones. On the contrary, the results of this study show no participant-oriented chunks in either CWC or NWC. The following may be the reasons. On the one hand, participant-oriented chunks are used to involve text authors or readers, show their joint engagement with the text, and are often used in the analysis of results. For example, "can be seen" and "we can see", two participant-oriented chunks, are used to invite readers' attention to data or a discovery. On the other hand, participant-oriented chunks are used to convey the author's attitudes and opinions, which often appear in the discussion of results. For example, "is due to", "it is possible", "it is important", and "it is necessary" are all participant-oriented and they can be used to interpret data, provide reasons or make an assertion in discussing data. Abstracts are highly condensed. In the fourth move of summarizing the results, results are presented directly and straightforwardly with no need to involve readers' participation. In the final move of conclusions, after the importance of the research results is indicated, there is not enough space for the authors' further personal opinions. These two reasons explain the absence of participant-oriented chunks in RAs. Therefore, the discussion in the following sections only focuses on research-oriented and text-oriented chunks (hereinafter referred to as "R&T" chunks). Tables 2 and 3 show the distribution and proportion of type and token of "R&T" chunks in CWC and NWC respectively.  There are similarities in the distributions of "R&T" chunks in CWC and NWC. There are more research-oriented chunks, both in types or tokens in the two corpora. There is no significant difference in the proportion of types between CWC and NEC (2 = 0.258, df = 1, p = 0.593 > 0.05), nor is there any significant difference in the proportion of types (2 = 3.611, df = 1, p = 0.057 > 0.05).

Comparison of Functional Distributions in Moves and Sub-Moves
To further compare how CWC and NWC use "R&T" chunks, the following table shows how two types of lexical chunks are distributed in specific moves (tokens).  The data show that there are significant differences between CWC and NEC in the use of "R&T" chunks in two moves of situating the research and summarizing the results. In situating the research, CWC uses research-oriented chunks more frequently than NWC, while text-oriented chunks are less used than NWC (2 = 6.081, DF = 1, P = 0.014 < 0.05). In summarizing the results: CWC uses both "R&T" chunks more frequently than NEC (2 = 6.081, DF = 1, P = 0.048 < 0.05).
To further investigate what causes the differences, a further investigation is done in the two moves where significant differences are found. The first five high-frequent "R&T" chunks in the two moves in CWC and NEC are listed and compared in detail. By comparing Tables 6 and 7, it can be found that the first five high-frequent chunks of CWC in situating the research are all research-oriented chunks, accounting for 48.6%. Among them, "College English Teaching" is for describing a topic, "one of the" is referring to a number, "the development of" is introducing a status, "at home and (abroad)" is describing a range, and the study of "is introducing research objects. However, only three research-oriented chunks ARE found in the top five in NWC, accounting for 46.4% of the total and they are all describing topics, less varied in functions than those in CWC. This reflects NWC's tendency to introduce research topics briefly while CWC provides more details such as the range or quantity in situating the research. Noteworthy are two text-oriented chunks in NWC, namely "as well as" and "in order to". "As well as" is a popular chunk in both NWC and CWC. In NWC, it is used in all moves except the last move of discussing the research and thus is the top fourth most frequent one among all chunks. In CWC, "as well as" is also commonly used, mostly in presenting the research to connect two research subjects and thus rank the top fifth frequent one, for example, "this research has used the matched guise method… to explore the effects of teacher-learner homophily on the results of learners acquisition as well as learners' teacher evaluation among Chinese L2 Learners". "In order to" has ranked top fifth in NEC and has an even distribution in all moves except the move of discussion, indicating a coherent device by providing purposes. However, "in order to" is not a frequent one in CWC. A similar one is "so as to" but it is not included even in top tenth frequent, indicating an absence of coherence utilizing giving purposes.  Research-oriented 4 (9.8%) as well as Text-oriented 4 (9.8%) in terms of Text-oriented 4 (9.8%) the use of Research-oriented 3 (7.3%) in order to Text-oriented 3 (7.3%) the role of Research-oriented 3 (7.3%) analysis of the Research-oriented 3 (7.3%) in the context (of) Text-oriented 3 (7.3%) Note. five lexical chunks share the same frequency and thus occupy the fifth position together. Tables 8 and 9, it can be found that in discussing the research, CWC and NWC have a common high-frequent chunk "in terms of", which is used to refine the conditions of drawing a conclusion. Here are two examples from two corpora, "In terms of instructional supports for international students, both students and faculty reported using the or appreciating visual (NWC)." "The critical thinking ability of the students excess weight increased in terms of analysis, inference and evaluation skills… (CWC)." Compared with NWC, CWC relies more on text-oriented chunks in conclusions with the top four high-frequent chunks being all text-oriented. Except for "in terms of", the other three including "results show that", "results indicate that" and "is found that" are all used to lead to results. However, in NWC, results are directly presented without using these chunks as a kind of discourse marker of indicating a new move of results. Moreover, in the top five, there is only one research-oriented chunk in CWC, which is "Chinese elf learners" (7.1%) for topics. NEC has more research-oriented chunks in the top five, including "English language learners" (9.8%), "the use of"(7.3%), "the role of" (7.3%), and "analysis of the"(7.3%). This finding seemly contradicts the chi-square test which indicates that CWC uses both "R&T" chunks more than NEC. One of the possible reasons for the contradiction may lie in the usage of other research-oriented chunks, besides "Chinese elf learners" in the top five. Other research-oriented chunks of more variable functions such as "Chinese and English (English and Chinese)" (5%), "the acquisition of" (2.9%), and "College English Teaching" (2.9%), for topics and "There is a" (4.3%) for quantity although not ranking the top five, are still commonly used in CWC. Among them, "there is a" is uniquely found in CWC. But the definite reason for the contradiction is not found in my study.

Conclusion
This study adopts a corpus-based method, taking the abstracts in research articles as the subject, and compares the usage of lexical chunks between Chinese writers and native writers from a perspective of move analysis.
Here are the five findings, (1) Chinese writers tend to use more but less variable chunks than native writers; (2) There are no participant-oriented lexical chunks in the abstract of either Chinese or native writers' RAs. The "short in length but condensed in information" Characteristic of RAs may explain this. The interaction between writers and readers, which is often realized by using participant-oriented lexical chunks to involve readers in the reading process, is often sacrificed to give way to other kinds of information like the research topic, sample quantity, etc. (3) In terms of the overall distribution of discourse function, there is no significant difference between Chinese and native users. In both corpora, research-oriented chunks have a higher proportion than text-oriented chunks. (4) In terms of the distribution of discourse function in specific moves, significant differences have been found in the two moves. In situating the research, Chinese writers use more research-oriented chunks but fewer text-oriented chunks, and the research-oriented chunks are more variable in semantics. This indicates a difference in writing style between the two groups of writers. In discussing the research, Chinese writers use more research-oriented and text-oriented chunks. Noteworthy is that Chinese users tend to use a kind of text-oriented chunk indicating reasons before stating reasons while native writers state reasons directly. (5) Also in the two moves where significant differences have been found, among the top five frequent lexical chunks, "as well as" is a chunk found in both corpora, but used to situate the research by Chinese writers while to discuss the research by native writers. "In terms of "is another one shared by groups of writers. They both use it in discussing the research and specifying conditions of conclusions. In situating the research, native writers have a high-frequent chunk "in order to", which couldn't be found in any move of Chinese users. In discussing the research, Chinese writers have a high-frequent chunk "there is a", which couldn't be found in any move of native writers.