Effectiveness of Corpus in Distinguishing Two Near-Synonymous Verbs: Damage and Destroy

This study aims to explore how corpus-based approaches can be used to address the distinctions of English near-synonyms effectively. Especially, it collected source data from the British National Corpus (BNC) and adopted Sketch Engine (SkE) as an analyzing tool to compare the near synonymous pair damage and destroy commonly misused by Chinese-speaking learners of English in terms of frequencies, genre distribution, colligation and collocation, differences in meanings and uses. It is found that damage and destroy are near-synonyms because they are relevant words and share most collocates but they are not fully intersubstitutable for certain contexts. Some words related to the human body or physical health are more collocated with damage and some such as military affairs and one’s thought or belief more with destroy . In addition, the core meaning of damage gives more emphasis on something that can be recovered but does not work well as before, while destroy offers more senses for something that no longer exists. Furthermore, the British tend to collocate the two near-synonyms with the same word to create a build-up, because destroy is endowed with a stronger degree of destruction than damage . The study ends by suggesting corpus-based analysis should be promoted in language teaching and learning to improve the accurate use of English vocabulary by language learners.


Introduction
The English language has a large number of synonyms. This idea is echoed by Liu & Espino (2012) that rich synonyms enable English speakers "to convey meanings more precisely and effectively" (p. 198). But these kinds of variations in meanings and usage have greatly challenged English language teaching and learning (Ahmad et al., 2019). Traditionally, dictionaries are the main reference materials for language teachers and learners to discriminate synonyms. Although they could offer general and core meanings of the concepts of these synonyms, there is an absence of information on the nuances of near-synonyms or overlap of interpretations. For example, according to the Oxford Advanced Learner's Dictionary (OALD) (2009), damage denotes "to harm or spoil sth/sb" (p. 500), while destroy means "to damage sth so badly that it no longer exits, works, etc." (p. 543). In this descriptive comparison, destroy has shared some basic meanings with damage, since one is decoded by recourse to another. This semantic overlap between definitions may cause the potential for ambiguity. In addition, the definitions of damage and destroy do not sufficiently define contextual bounds. In light of this, synonym distinction and appropriate lexical choice really daunt language teachers and learners (Mackay, 1980). The advent of corpus linguistics has made a great shift in vocabulary studies. Language educators and researchers can use a corpus, "a large collection of authentic texts that have been selected and organized following precise linguistic criteria" (Sinclair, 1991(Sinclair, , 1996Leech, 1991;Williams, 2003), to do linguistic analysis (e.g., lexis, multiword phrases). The corpus-based approach of language analysis is rationalized by many scholars and researchers (e.g., Shahzadi et al., 2019;Flowerdew, 2013;Albader, 2001;Richard & Tony, 2006). They believe that this approach is more reliable because authentic data, rather than intuition, can assist language teachers and researchers to find differences in the use of language. They also argue that people can get much larger amounts of text from an electronic corpus than the time when they do that manually. Moreover, it is an effective computational tool to reveal patterns that may not be obvious to the naked eye. Additionally, corpus analysis is suitable to be used to understand the similarities and differences among near-synonyms, and it helps identify more specific criteria and suggestions for the usage of these apparently similar and interchangeable words. Therefore, this approach will be applied in the study to investigate two English near synonymous verbs damage and destroy commonly misused by Chinese-speaking learners of English in British National Corpus.

Synonyms
Synonymy is one of the relations that exist between different lexical items. Two types of synonyms, namely "perfect or absolute synonyms" and "near synonyms", are mentioned in previous studies. According to Lyons (1995), "prefer or absolute synonyms" refer to a pair of synonyms in which (a) "all meanings [being compared] are identical"; (b) two words are "synonymous in all contexts", and (c) they are "semantically equivalent on all dimensions of meaning, descriptive and non-descriptive" (p. 61). "Near synonyms" are defined by Cruse (1986) as "lexical items whose senses are identical in respect of 'central' semantic traits, but differ […] in 'minor' or 'peripheral traits'" (p. 237). Many researchers agree that most synonyms are likely to be near-synonyms. For example, Taylor (2003) argued that "it is commonly asserted that 'perfect', or 'full', synonyms do not exist, or if they do, they are exceedingly rare" (p. 265). Divjak & Gries (2006) also noted that "even if synonyms name one and the same thing, they name it in different ways; they present different perspectives on a situation" (p. 24). Thus "near synonyms" are more widely used by linguists than "synonyms". Since near-synonyms are "not fully intersubstitutable" (Inkpen & Hirts, 2006, p. 223), it is important to identify their variations in different contexts and perspectives through language analysis based on a corpus.

Corpus-Based Approach for Discriminating Near Synonyms
Many scholars found that near-synonyms always differ from a semantic, syntactic, or pragmatic point of view (e.g., Cruse, 1986;Taylor, 2003;Divjak & Gries, 2006). In terms of these differences, corpus linguists make extensive use of computers to conduct a frequency or statistical approach on linguistic features of near-synonyms, such as their comparisons in collocation, colligation, semantic preference, and semantic prosody. The four parameters, which take different values and go from concrete to abstract, are assumed to be the internal structure of words by Sinclair (1996). His core notion is that lexical meaning is not purely ascribed at the level of words because a word, as the unit of meaning, is related with other words around it (Sinclair, 2004, p. 27).
The first parameter is collocation, which is defined as "the items in the environment set by the span" (McEnery & Hardie, 2012, p. 107). In corpus studies, collocation is regarded more in terms of probability, where "the strength of a particular collocation is assessed on the basis of how frequently it appears in a large representative sample of discourse" (Walker, 2011). Moreover, many corpus linguists assume that the term only refers to "significant collocations" which co-occur more frequently than "their respective frequencies and the length of the text in which they appear would predict" (Sinclair et al., 2004, p. 10). For a precise degree of significance to each co-occurrence, statistical measures, such as the MI (mutual information), z, t, log-likelihood, log-log, MI3 scores, are used to measure collocational strength (Richard & Tony, 2006). The second parameter is colligation. The concept refers to "the interrelation of grammatical categories in syntactical structure" (Firth, 1957, p. 12). The difference between colligation and collocation is that the former helps to study a word's grammatical functions while the latter emphasizes a word's lexical inter-relations. One example of collocation is that the word powerful is likely to collocate in a large general corpus with concrete nouns like cars, computers, countries, while strong is more closely associated with abstract nouns and concepts, such as sense, feeling, belief (Castello, 2014). Another example for colligation is that consequence has a very low likelihood of appearing as the object of a clause in contrast to preference and use (Hoey, 2005). The third parameter is semantic preference. It means "by a lexical set of frequently occurring collocates [sharing] some semantic feature" (Stubbs, 2002, p. 449). For instance, Partington (2004, p. 148) found that "absence/change of state" is a common feature of the collocates of maximizers such as utterly, totally, completely, and entirely. This finding unveils that semantic preference is beneficial in developing a profile of a word and understanding how certain collocates can be "bound together in extended units of meaning" (Sinclair, 1996). The fourth parameter is semantic prosody. Louw (1993), who popularized the term, defined it as a "consistent aura of meaning with which a form is imbued by its collocates" (p. 157). This means that semantic prosody is viewed as affective meanings of a given word with its typical collocates (Stubbs, 2001). It can be favorable, neutral, and unfavorable prosodies (Partington, 2004). Cause, for example, is an unfavorable semantic prosody because it co-occurs regularly with words like accident, cancer, death, etc. (Stubbs, 1996, pp. 173-174). In addition to the parameters above, the non-linguistic features can also be found in the corpus, such as varieties defined by register and periods of time (Biber et al., 1998). For example, Cai (2012) concluded that awesome, fabulous, and fantastic have increasingly been used over time. Regarding the genre, fabulous, fantastic, great, terrific, and wonderful were observed more in the spoken genre, whereas awesome and excellent highly occurred in magazines.
Overall, the relevant literature shows that different methods are available for researchers to study near-synonyms and they can choose the approach that best fits their goals.

Studies on English Near Synonymous Verbs
In the past decades, several corpus-based approaches on near synonymous verbs have been established. In earlier times, Church et al. (1994) carried out a corpus-based analysis comparing ask for, request, and demand in terms of substitutability. Biber et al. (1998) differentiated begin and start in their grammatical construction with regard to their different lexical associations across registers using the Longman-Lancaster Corpus.
Recently, more powerful tools are available for corpus-based language research. Lee & Liu (2009) adopted VIEW as a tool to focus on the syntactic pattern to compare and contrast affect/influence gathering data from BNC and COCA. Using Sketch Engine, Hu & Yang (2015) and Yang (2016) analysed the collocation, concordance, word sketches and sketch difference of synonyms raise and increase, learn and acquire in British National Corpus. In the same fashion, Shahzadi et al. (2019) used Sketch Engine to examine arrive and reach. Adopting different online tools such as Sketch Engine, BNC Web, and Just the Word, Gu (2017) examined gain and obtain in genre, collocation, colligation, and semantic prosody.
In different studies, the English causative verbs get and have were investigated by Gilquin (2003), and intra-and extralinguistic factors in the contexts of hassle, brother, and annoy were compared by Glynn (2007). In addition, selectional and collocational restrictions of the linguistic meanings between create and produce were inspected by Chung (2011) in the Brown Corpus and the Frown Corpus. Covering local speakers' corpus LOB and non-local speakers' corpus CLEC, Rui (2016) differentiated between two English action words start and begin. Furthermore, Lin & Chung (2021) attempted to explore the syntactic and semantic information of two synonymous verbs propose and suggest in a specific genre gathering from COCA.
However, more studies are needed on a various set of synonyms and near-synonyms (Cai, 2012;Uba, 2015). Accordingly, this research sheds insight and understanding on how two near synonymous verbs damage and destroy work in terms of frequencies, genre distribution, colligation and collocation, differences in meanings and uses.

Data Collection
All data in this study were collected from British National Corpus (BNC). This is a monolingual, synchronic, general, and sample-based type of corpus containing 100 million words. Data in this corpus covers 90% written and 10% spoken texts from disciplines of a wide range from 1960 to 1990. The written genre includes, for instance, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, popular fiction and academic books, published and unpublished letters and memoranda, school and university essays, etc. The spoken genre contains, for example, orthographic transcriptions of unscripted informal conversations as well as spoken language collected in different contexts, which range from formal business or government meetings to phone-ins and radio shows.

Corpus Tool
Sketch Engine (SkE) is a powerful tool for corpus-based language research (Kilgarriff, et. al., 2004). It was first used in lexicography and then applied to other different fields such as translation, discourse analysis, language teaching, terminology (Kilgarriff, et. al., 2014). SkE provides easy access to many ready-to-use corpora, for example, BNC is one of the sub-corpora. It can be used to perform different functions. In the present study, such functions are used: Thesaurus, Concordance, Collocation, word sketches, and Sketch Diff. Thesaurus automatically generates a list of synonyms or words belonging to the same category (semantic field). Concordance provides concordance lines showing keywords in context, which helps to define lexical and structural information about the keyword. Collocation provides the span, the minimum frequency of each collocate, and the strength of collocation. Word sketches offer a one-page summary of both word's grammatical and collocational behavior. Sketch Diff offers collocation differences in a straightforward setting.

Analysis Procedure
Identify that the two verbs damage and destroy are similar by the tool Thesaurus.
The frequencies for damage and destroy in BNC were gathered by using concordance. From the frequency, we can know how many times two words are used in communication. elt.ccsenet.org English Language Teaching Vol. 14, No. 7;2021 The genre in which damage and destroy were used is easily retrieved from BNC by using TEXT TYPES, which allows a researcher to look for genres and sub-genres where a word appear.
The colligation of damage and destroy in BNC were based on Word sketches, which present the grammatical patterns of the two verbs.
For transitive verbs such as damage and destroy, the researcher focused on the noun collocates and adverb collocates based on syntactic patterns (v+n, n+v, adv+v, v+adv). The positional constraint adopted in this research is the left and the right horizon of the keyword within a span of five words. Only those collocations with a minimum frequency of 10 or above in the given range (-5, 5) were considered. When the top list of most frequent collocates is retrieved, the collocates are further graded by their logDice scores, which is a reasonable, stable, and reliable interpretation (Rychly, 2008).
In order to get a better understanding of the words in question, the use of damage and destroy for the same reference in a given context is compared and examined, given that the collocation of near-synonyms with the same word can best show their differences in nature (Taylor, 2003).
In addition, this study pays attention to subtle meaning differences across damage and destroy based on examination in context.

Thesaurus for Identifying Damage and Destroy
The thesaurus entry for the verb damage is shown in Table 1.

The Frequencies of Damage and Destroy
It is necessary to figure out the overall frequency of two near synonymous words in a corpus. These are shown in Table 2.  Table 3 and Table 4 demonstrate the genre comparison of damage and destroy in terms of raw frequency and relative text type frequency in BNC. A frequency limit of 5 is chosen. Rel (%) (the number of relative text type frequency) means the relative frequency of the query result divided by the relative size of the particular text type. Above 100% refers to typical of this text type; below 100% is the opposite.  Table 3 highlights that the occurrence of both damage and destroy in written books and periodicals is significantly higher compared with in any other text types. The frequency of destroy is greater than damage in all text types. Moreover, both damage and destroy are used more frequently in Written-to-be-spoken text types (TV news scripts) than in the corpus.

The Genre Difference of Damage and Destroy
The detailed comparison of frequencies of damage and destroy in different written texts is shown in table 4.  Table 4 manifests that both damage and destroy are more frequently used in world affairs (e.g., business, politics, juridical matters) than in the whole corpus. Damage is 1.08 times as common in social science (e.g., Health, History, and Philosophy of Science) of written English than in the whole corpus, which is significantly more as compared to the frequency of destroy in these texts. But it is less frequently used in informative texts related to belief & thought and arts-related texts. Destroy is 3 times more than damage in belief and thought, and 1.8 times more in arts.

The Colligation Difference of Damage and Destroy
In terms of colligation, damage and destroy as verbs are summarized in the following patterns based on the Word Sketch of SkE (see Table 5). Note. object (V + n), subject (n + V), modifier (adv+ V, V +adv), pp (V+ prep +obj) From Table 5, it can be found that damage and destroy mainly are collocated with object nouns. These two words share a similar frequency ratio in the pattern of "n + V" and "V+ prep +obj". However, the difference in colligation lies in modifier. It shows that damage is more frequently used with adverbs.

The Collocation Difference of Damage and Destroy 4.5.1 Nouns Collocates
The first part will focus on the pattern of "V + n" to see the collocation with noun given the high frequency. The following table 6 illustrates the top 32 right side nouns collocates of damage and destroy. From the evidence of those object noun collocations above, it can be inferred that damage is more collocated with human body or physical health, while destroy is more contingent with military affairs. In addition, destroy is used more frequently with abstract nouns related with one's thought or belief, such as myth, dream, and hope. Skethc Diff echoes the findings. Table 7 shows the difference in terms of noun objects. It means that it is more usual to say "damage health" and "damage brain", while it is more fluent to say "destroy enemy" and "destroy army", or "destroy hope" and "destroy myth". It also explains in some degree the more frequent distribution of damage in the domain of social science (e.g., Health) as well as destroy in the domain of belief and thought as indicated in table 4.  Examples containing both damage and destroy for the same references were collected. The constraint was that one of the near synonyms appeared within five words on either side of another of the near synonyms in this group. The frequencies are shown in Table 9. Example 1 You will sometimes find that if your employer acts towards you in an unsatisfactory way, he is in breach of his implied duty of mutual trust and confidence. </s><s> Employers should not, without good cause, behave in a manner that is likely to destroy or seriously damage the employment relationship. </s><s> The law reports provide many illustrations of this kind of conduct. Example 2 There were no bridges left standing between Verona and Rome and, according to Eric, who from now on was to travel this way once a week, every village south of Florence on the main road to Viterbo and Rome had been either destroyed or severely damaged. </s><s> The churches had suffered particularly badly. </s><s> The tally of religious paintings and statues lost was immense. </s><s> Everything was broken; it is difficult now to imagine the devastation.
In example 1 and 2, destroy(ed) and seriously damage/ severely damaged are used in a parallel structure. The two instances of destroy can be deemed as equivalents of seriously/severely damage. So, destroy is stronger than damage in degree of destruction. Example 3 The more phosphate that is available, the more algae grow and multiply, sometimes to such an extent that they form a dense mat over the surface of the water. </s><s> The consequence of this is that aquatic life below is seriously damaged and even destroyed. </s><s> The algae on the surface prevent sunlight from reaching deeper plants so they don't grow well. </s><s> When they eventually start decaying, they use up oxygen dissolved in the water. </s><s> This, of course, makes life for aquatic animals extremely difficult.
Example 4 There are risks involved in credit trading. </s><s> Apart from the chance that the customer may default on his payments (perhaps even go bankrupt), there is the risk that he may also sell, damage or even destroy the goods. </s><s> In the typical transaction, it is the finance company which runs these risks. </s><s> The finance company can to some extent safeguard itself by including certain terms in the contract it makes with the customer.
In example 3 and 4, there is a building up in degree of "destruction" shown by "The consequence of this is that aquatic life below is seriously damaged and even destroyed." and "there is the risk that he may also sell, damage or even destroy the goods." This again proves that destroy is stronger in degree than damage.
Example 5 The whole Steam Tank is protected by a thick armoured skin, making it immune to fire from arrows and light missiles and impervious to blows from most warriors. </s><s> Machines which have been destroyed or damaged in battle have so far been recovered and rebuilt, although since Leonardo's disappearance many of the secrets of their construction have been lost and the engines become increasingly unreliable and inefficient.

Example 6
Rule (although weekly maintenance including 5-10% part water changes are also to be recommended) should keep your aquatic system and livestock in steady conditions and peak of health. </s><s> Never rinse reusable filter materials in tap water and never use detergents or you will damage or totally destroy the beneficial bacteria that have built up the efficiency of the filter unit. </s><s> Always rinse gently to remove the build-up of mulm and debris in a clean bucket of aquarium or pond water.
Example 7 They will also generate 50,000 new vehicle journeys a day, adding to pollution and local traffic congestion. </s><s> A total of 47 of the sites affected are ancient woodland, the RSNC claim. </s><s> Some of the areas which are not directly destroyed could be damaged to the point where they are no longer viable as areas of conservation importance, says the report. </s><s> The organization believes that the new plans could be in breach of European law requiring full environmental impact studies to be carried out.
In example 5, "Machines which have been destroyed or damaged in battle have so far been recovered and rebuilt, … and the engines become increasingly unreliable and inefficient.", two points are observed. First, destroy emphasizes something that no longer exists. This can be seen from its collocation with rebuilt in the sentence. What's more, in example 6, "… you will damage or totally destroy the beneficial bacteria…" the helper of the adverb "totally" provides information about the degree in which something disappears. Second, damage underscores something that can be recovered but does not work well as before. This can be proven by the cooccurrence of recovered, unreliable and inefficient in the sentence. In addition, in example 7, "Some of the areas which are not directly destroyed could be damaged to the point where they are no longer viable as areas of conservation importance", the clause following damage serves as a further explanation of the degree of destruction: "no longer viable", which means that something is not capable of doing what it is intended to do.
In a nutshell, from the comparison of damage and destroy in a given context, it can be identified that the use of destroy is to deepen the degree of destruction when native English speakers collocate the two near synonyms with the same word. What's more, the fact that destroy is stronger than damage in degree may explain why damage is more frequently used with adverbs as indicated in table 5. As Cai (2012) put it, "adverbs, functioning as intensifiers, are used to provide more information about degree or show emphasize, amplify, or down-tone." That is to say, certain words in English, such as fabulous, perfect, destroy, catastrophe, have extremely positive or negative meaning themselves even when unmodified. Thus, these words do not need much modification. Moreover, this fact may also explain why the subjects of destroy are natural disaster that will cause much greater destruction, such as earthquake, flood, while those of damage are frost, rain, heat and so on. And it is not surprising that we say "earthquake destroyed sth.", rather than "earthquake damaged sth.", or "frost damaged sth.", rather than "frost destroyed sth.". Another finding is that the meaning underpinned destroy is something that no longer exists, while damage implies something that can be recovered but does not work well as before.

Adverb Collocates
The near synonyms in question can be modified by adverbs. The collocation frequency in the "modifier" pattern in Sketch-Diff is shown in table 10. Table 10 reveals that the adverb to describe damage can be categorized into adverbs of degree, such as badly, severely, seriously, extensively, irreparably, permanently, which are used to intensify the degree of destruction for damage. The common adverbs to describe destroy are adverbs of positive denotation such as humanely, adverbs of negative denotation such as ruthlessly, and adverbs of degree, such as completely, totally, virtually, utterly, which are used to provide more information about degree for destroy. The findings correspond to results of the comparison of damage and destroy in a given context in the last section.

Conclusion and Implication
Based on the corpus analysis, the differences of two near synonymous verbs, damage and destroy, lie in frequency, genre, colligation, collocation, subtle meanings and uses. This study echoes previous claims that near synonyms are "not fully intersubstitutable". The overall findings can be summarized as follows: First, although near synonym destroy has wider occurrence in BNC than damage, they both tend to occur more often in written books and periodicals, and TV news scripts.
Second, there is a significant difference in collocation of damage and destroy with different nouns. Damage is more collocated with human body or physical health, which partly explains its more frequent usage in social science (e.g., Health). Destroy is more widely used with military affairs and one's thought or belief. It is often used in the text types of world affairs (e.g., politics) and belief and thought.
Third, the British tend to collocate the two near-synonyms with the same word to create a build-up, because destroy is endowed with a stronger degree of destruction than damage. This may be the potential reason why damage is more frequently used with intensifying adverbs, and why destroy is more generally used with natural disaster, such as earthquake, flood, while damage is used more with frost or rain, and so on. What is also noticeable is the fact that the use of destroy tends to refer to something that no longer exists and the use of damage for something that can be recovered but does not work well as before. The findings of this study implicate that the use of corpora is a good supplementary tool for not only beginners but also advanced learners when they feel confused between and among lexical choices, especially near-synonyms, of their second language. A corpus study is of great value because traditional dictionaries and language teaching will not suffice for vocabulary learning. First, the traditional dictionaries and language teachers failed to add genre distribution. Genre is essential because it helps language learners to recognize a particular setting and communicative function for words (e.g., damage and destroy are often used for news report). Second, the dictionaries and teachers may not provide the frequently used collocation, such as "destroy myth". As a consequence, language learners are likely to have vocabulary issues, for example, they misuse synonyms, in speaking or writing. Third, the dictionaries and teachers have not always noted important context information related to words, especially authentic usage of near-synonyms, though they have made an attempt to give illustrative sentences. For native speakers, they are able to acquire vocabulary naturally in first-language contexts; however, for second language learners, especially those where there is little or no exposure to first-language contexts, it is necessary to provide the authentic context to a greater extent. Therefore, corpus-based in language teaching and learning should be promoted and generalized so that it could be beneficial for more people.