Using Wmatrix to Explore Discourse of Economic Growth

Growth is a concept of particular interest for economic discourse. This paper sets out to explore a small corpus of economic growth, which consists of articles from The Economist. The corpus software used in this study is a web-based tool Wmatrix, an automatic tagging software able to assign semantic field (domain) tags, and to permit the extraction of key words and key semantic domains by applying the keyness calculation to tag frequency lists. The results show that at 99% confidence (or p < 0.01), the cut-off of 6.63 (log likelihood value) indicates that there are 1051 positive keywords (including multiword expressions) and 80 key semantic domains generated from the corpus. It is evident that BRICs or the emerging economies/markets, in particular China and India, are a big concern regarding economic growth over the past years. A number of examples of possible ways forward in teaching methodology are presented.


Introduction
There has been a growing literature on the economic discourse over the past three decades.Some studies have dealt with the linguistic and discursive aspects of the widely read masterworks: Adam Smith's The Wealth of Nations (Bazerman, 1993;Henderson, 2006), Marshall's Elements of Economics (Del Lungo Camiciotti, 2005), and Keynes' General Theory (Bondi, 2010).Other studies have instead focused on the distinctive characteristics of key written economics genres used for the transmission and dissemination of knowledge: prediction in schematic structure in research articles (Dudley-Evans & Henderson, 1990), and economic metaphor in media articles or textbooks (Alejo, 2010;Boers, 2000;Charteris-Black, 2000;Henderson, 1982Henderson, , 1994Henderson, , 2000;;Hu, 2014;Sun & Jiang, 2014;White, 2003).
While research on economic discourse can be based on researchers' intuitive knowledge (Tribe, 2015), there are a growing interest in corpus approaches to social sciences.Gaballo (2012), for example, explores the discourse of economics and the way it is used and organized in the business section, and in the finance and economics section of The Economist by bringing together methodologies of critical discourse analysis, systemic functional linguistics and corpus linguistics allowing the texts to be explored from different perspectives while providing multiple insights.
Economic growth is a big concern in world-wide economy, and not surprisingly, it is also a concept of particular interest for applied linguists.Unlike previous studies that are mainly concerned with growth metaphor (Crawford Camiciottoli, 2007;Henderson, 1994;White, 2003), this paper aims to explore discourse of economic growth by conducting keyness analysis with the aid of corpus tool Wmatrix.
taken by Williams to represent the most distinctive features of contemporary western culture, by integrating synchronic and diachronic perspectives in a full appreciation of meaning.These words are 'key' because they are of social, cultural or political significance and they are the 'dictionary' of a culture and a social group.The first hints to these words can be traced back to Firth (1957), though he used the term "focal or pivotal words" instead of keywords.
The second concept of keywords has its root in John Sinclair's corpus-driven approach to language.In such a perspective, keywords are studied through their typical co-occurrence with other lexico-semantic units.For example, Stubbs (1996, p. 172) refers to 'cultural key words' as "words which capture important social and political facts about a community".The important feature for Stubbs is that the cultural and ideological implications of a keyword can be illuminated by an analysis of its collocation and semantic preference -the tendency of the word to co-occur with other words and with words belonging to a specific semantic category or field (see also Sinclair 2004).Stubbs (1996, p. 166) traces his efforts back to that of Firth (1935) on "focal and pivotal words" and to Williams' book on key words.
The third concept of keywords has much to do with stylometry in which keywords are statistically significant lexical items.Guiraud (1954, cited in Culpeper 2009, p. 32), for example, used 'mots-clés' (keywords, based on relevant frequency) to contrast 'mots-clés' (based on absolute frequency).It is, however, Mike Scott who has popularized the practice of keyword analysis in this line.Interestingly, Scott (1997) also relates his work to that of Williams in the 1970s in terms of its purpose.Unlike Williams' study that hardly paid attention to text and genre and left methodological tools for the analysis of meaning completely undiscussed, Scott's work is a text-focused one and adopts a standard procedure of keywords extraction.With the aid of KeyWords facility of his program WordSmith Tools designed by Scott, it is a relatively easy and rapid task to calculate the incidences of each and every single word in the target data as well as a comparative data set, to undertake statistical comparisons between incidences of the same words in order to establish significant differences, and finally to see the resulting keywords ranked according to degrees of significance of difference.
Keyword analysis and its extension (key cluster analysis) have been used in numerous studies and applied to a wide variety of research questions ranging from language education (Scott & Tribble, 2006) to sociolinguistics (Baker, 2010), economics (Bondi, 2010), and to literary stylistics (Mahlberg, 2013), to name only a few.While this line of research as practiced by Scott and many followers has exerted immense influence, it is not without its problems.To begin with, keyword analysis tends to deliver more keyword results than is possible for the researcher to analyze (Berber Sardinha, 1999).Second, relatively low frequency words may not be identified as key (Baker, 2004).Third, keywords "only focus on lexical differences, rather than semantic, grammatical, or functional differences" (Baker 2004, p.354).Scott (2010, p.52) also believes that there are at least five restrictions and limitations regarding the status of keywords: 1) Keyness is not intrinsic to the word or cluster itself but is context-bound; 2) The size of context is a matter of choice; 3) Keywords act as pointers to specific textual aboutnesses and/or styles, and the reference corpus affects this in ways which are still not fully understood; 4) Keywords are statistically arrived at but are not fully and completely established.Definitive status for a whole set of KWs cannot be claimed; and 5) The procedure is far from being able to handle related forms which readers can easily distinguish.
One way to address the above problems is to conduct key part-of-speech and key semantic analyses which give rise to analytical categories that (1) are fewer than keywords, thus reducing the number of categories a researcher needs to take into account, and (2) group lower frequency words which might not appear as keywords individually and could thus be overlooked (Rayson, 2008).

Data Used in This Study
The corpus data used in this study consist of about 41,800 words from 55 essays downloaded from The Economist by using WebCorp.The original WebCorp project was an experiment to see whether it was likely to develop a system to extract linguistic data from web text efficiently and present this to the linguist in as usable as fashion as it is presented in traditional corpora (Kehoe & Renouf, 2002).WebCorp attempts to treat the web itself as a corpus.Meanwhile, researchers can also use the web as a source of texts to build smaller corpora, as practiced in this study.
The system receives a word or phrase and other requirements from the user (growth in this study), passes these to a commercial search engine (Google, AltaVista, etc), and extracts the 'hit' pages from the search engine results.Below is the interface of WebCorp that was used to extract the essays including growth from the site www.Each page is accessed and processed and the extracted concordances are presented to the user in a choice of formats.The Google Search API returned 64 hits (out of an estimated 54200).WebCorp accessed 64 web pages and generated 700 concordances, but only 55 pages were reliable and used as the source of corpus data.

Corpus Tool Used in This Study: Wmatrix
Wmatrix is a web-based tool for corpus analysis that combines techniques from corpus linguistics and natural language processing.It provides a web interface to the Constituent Likelihood Automatic Word-tagging System (CLAWS) and UCREL Semantic Analysis System (USAS) corpus annotation tools, and standard corpus-linguistic methodologies such as frequency lists, keyword lists, collocation and concordances.
The first stage of annotation involves CLAWS, a part-of-speech tagger which assigns a part-of-speech (POS) tag or grammatical word classes to every word in running text with about 96-97 percent accuracy (Leech & Smith, 2000), e.g., 'NN1' for singular common noun and 'VM' for modal auxiliaries.The POS-tagged text is then fed into SEMTAG, which assigns semantic tags representing the general-sense field of words from a lexicon of single words and a list of multi-word combinations called idioms, e.g., 'as a rule'.Currently, the lexicon contains nearly 37,000 words and the idiom list contains over 16,000 multi-word units.An idiom list enables the corpus tool to identify any idiomatic expressions, usually non-decompositional sequences, and to assign a special set of tags to the words in that particular idiomatic phrase to denote a part-of-speech relation above the level of the word (Rayson, 2008).The semantic tagset, loosely based on the LongmanLexicon of Contemporary English, has a multitier structure with 21 major semantic fields and more than 232 subdivisions.Items not contained in the lexicon or idiom list are assigned a special tag, Z99.Antonymity of conceptual classifications is indicated by +/− markers on tags, e.g., N3.8+ (Speed: Fast) as opposed to N3.8-(Speed: Slow).Comparatives and superlatives receive double and triple +/− markers respectively, e.g., larger (N3.2++) and largest (N3.2+++).
Users can load their own corpus data, make frequency lists, generate key words, run concordances and calculate collocations via a web-browser interface.In addition, the corpus data is automatically tagged for part-of-speech and semantic fields.These two extra levels of annotation can then be used to generate tag frequency lists, tag concordances and collocations of tags and to apply the log-likelihood keyness statistic to the tag level.The log-likelihood statistic (LL) is employed by Wmatrix; only items with a LL value of equal to or more than 6.63 are considered to be statistically significant, since 6.63 is the cut-off for 99 percent confidence of significance.
Wmatrix enables researchers to annotate their data sets relatively easily and rapidly for both grammatical and semantic categories, and then to identify which categories are key.Recent studies include: Afida ( 2007), focusing on semantic domains in business English, and Archer et al. (2009), focusing on semantic domains in Shakespeare's plays.

Corpus Query and Analysis Procedures
Step 1: Upload the corpus of economic growth to the web-browser interface, and then the corpus data are automatically tagged for part-of-speech and semantic fields, generating a set of three frequency lists: a word frequency list, a part-of-speech (POS) and semantic field (domain) frequency list, as demonstrated in Figure 2. Step 2: In order to generate keyword list, key POS list, and key semantic domain list, choose the reference corpus for each component.In this study, BNC Sampler Written was chosen as the reference corpus, as shown in Figure 3.The final step is to save all the relevant lists for further analysis.Since the key POS list is not the main concern of this study, only results from keyword list and key semantic domains list will be presented in the next section for further analysis.

Keywords
The results show that at 99% confidence (or p < 0.01), the cut-off of 6.63 would indicate that there are 1051 keywords (including multiword expressions such as interest rates, current account, Mr Obam, long term, population growth, health care).Table 1 presents some top keywords occurred in the corpus.The keywords ranking from 31 to 50 are Chinese, inflation, 90%, 2011, slower, economist, IMF, global, 2012, year, more, spending, than, manufacturing, Ms, Singapore, urbanisation, property, trade, and  The concordancing lines of other keywords such as emerging and BRICs demonstrate that emerging is primarily used to modify economies and markets, and that emerging economies/markets correspond to BRICs including China, India, Brazil and Russia.In fact, Russia, Russian, Brazil, and Brazil's are all keywords in the corpus, except that their keyness is far lower than that of China, China's, India, and Indian, as shown in example 1-3.
(1) Europe and the U.K. actually contracted, while China (and several other emerging economies) grew notably less briskly.(The Economist 2012-12-17) (2) Normally, prices slump during a Us recession but demand form emerging markets means that hasn't happened; the west has turned from being a price-setter into a price-taker.(The Economist 2013-01-24) (3) IN THE decade after Jim O'Neill of Goldman Sachs coined the acronym "BRICs" in 2001, grouping together four big countries with the potential for sustained growth, the "B", Brazil, really put itself on the economic map.(The Economist 2013-05-07) It is evident that BRICs or the emerging economies/markets, and China and India in particular, are big concern regarding economic growth over the past years.

Key Semantic Domains
For p < 0.01 with 1 d.f., the cut-off of 6.63 (log likelihood value) indicates that there are 80 USAS tags significantly overused and 68 tags significant underused.The underused semantic domains include Pronouns, Anatomy and physiology, Kin, Clothes and personal belongs, Knowledgeable, Music and related activities, Religion and the supernatural, Warfare, etc.Given the limited length, negative key semantic domains will not be discussed in the rest of this paper.
Table 2 presents the top 20 semantic domains.The occurrence of some semantic domains is not a surprising finding, because one would expect that a corpus of economic growth has much to do with Business, Money, Measurement, Industry, and the like.Domains like Geographical names can be easily identified by manually grouping the keywords, as presented in previous section.Nevertheless, the ability to extract key semantic domains demonstrates clearly the advantages of the comparison at the semantic level.Because one only needs to examine a much smaller number of key domains in the semantic comparison than the number of keywords, the overall trends are easier to identify., large, vast, expanding, growing, expanding, profound, substantial, etc, as  The second most significant difference (LL value 1128.48)indicates the overuse of the semantic domain of Business: Generally (I2.1) in the corpus of economic discourse.Upon examining the concordance for this tag (part of which is shown in Figure 1), it can be seen that under this umbrella term are words like economy, economies, economist, business, company, firm, productivity, infrastructure, financial services, etc. (16) Nevertheless, there are signs that spending, though continuing to rise, is growing more slowly.(The Economist 2013-5-11) Interestingly, in contrast to the slowdown of emerging markets, America's economy was reported to grow faster: (17) Evidence is mounting that America's GDP grew faster in the second quarter than the initial estimate of 1.7%, and has accelerated since.Healthy retail sales, rising production orders and low jobless claims all suggest that growth could be around 2.5%.(The Economist 2013-08-17) Closely related domains also include the 38 th most significant category Healthy (B2+) and the 44 th most significant category Health and disease (B2).The metaphor that ECONOMY IS AN ORGANISM (or more specifically ECONOMY IS A PERSON) can be easily traced when examining the significant domains.

Discussion
The analysis of discourse of economic growth from The Economist reveals important aspects concerning current status quo of world economy.First, it is highly evident that BRICs or the emerging markets/countries (especially China and India) have played an active and leading role regarding economic growth over the past years.Another concern is the slowdown of the emerging markets which once created a miracle over the past decades.
The starting point to investigate a specialized corpus is normally to conduct keyword analysis.Influential as it is, the keyword analysis has a number of problems.By extending keyness analysis from word level to grammatical and semantic levels, some problems can be resolved.The apparent merit of Wmatrix lies in the mere fact that it allows macroscopic analysis (the study of the characteristics of whole texts) to inform the microscopic level (focusing on the use of a particular linguistic feature) and thereby suggesting those linguistic features which should be investigated further (Rayson, 2008).Furthermore, Wmatrix enables researchers to identify some of the significant semantic domains which are virtually unlikely or at least difficult at the word level.Third, an important semantic class can group together these low frequency words that would otherwise have been missed.For instance, words like lag and sluggish are quite infrequent and are thus not keywords, but they are still grouped into the 11th most significant category is Speed: Slow (N3.8-).The final advantage of Wmatrix briefly mentioned in Section 4 is its capacity to help researchers to quickly access economic metaphors or the absence of some kind of metaphor like war metaphor in this study.
Nevertheless, Wmatrix is not without its problems.Since the semantic tagger is automatic, there are some mistakes in the process.Error rates quoted for the semantic tagger quoted are 91% (Rayson, Archer, Piao & McEnery, 2004).So when carrying out the manual analysis of the results the researcher should also take account of possible tagging errors.Some errors, however, seem to has its roots in the initial semantic classification.As shown in Table 2, the 4 th most significant category is Cheap (I1.3-).When the concordance output is under scrutiny, it is found that of total 144 occurrences there are 134 instances of the word economic.Nevertheless, it seems unreasonable to regard economic in the sampling concordances as the same semantic field with the word cheap or duty-free.As illustrated by the context, the phrase run out of puff may as well be grouped into the Speed: Slow.

Conclusion
This study has contributed to the ESP (or more accurately English for Business Purposes, EBP) field in several ways.Firstly, it introduces EBP teachers to a very useful and powerful corpus tool Wmatrix for extracting keywords and identifying key semantic domains.Secondly, the analysis based on the small corpus reveals an important trend of the world economy.Finally, it bridges the new gap between economists and linguists that calls upon more profound dialogue between two seemingly remote disciplines as advocated by McCloskey (1985) three decades ago.
It has to be noted that this study is by nature descriptive and demonstrative due to its limited data.Further research can, building on the methodology used in this paper, explore a much bigger corpus of economic growth, or compare two corpora of economic growth from articles published from different journals or magazines.

Figure 1 .
Figure 1.The interface of WebCorp

Figure 2 .
Figure 2. Upload the corpus data and automatically annotate the data via the web-browser interface

Figure 3 .
Figure 3. Choosing reference corpus exemplified below: (4) Wage deals struck by big firms with big unions are imposed on others by bargaining councils.(2013-06-01) (5) As a result, large parts of the developing world will narrow the income gap between themselves and richer nations.(The Economist 1997-06-12) (6) But the world will need wisdom and stamina to reap the potentially vast benefits.(The Economist 1997-06-12) (7) China is a vastly more prosperous and expansive country than it was 20 years ago.(The Economist 2012-05-26) (8) Asia's third-largest economy expanded by 5% in the year to March…(The Economist 2013-06-29)

( 9 )
The problem doesn't exist in America and other big economies because the large number of companies in such places-the US has about 6m employers to Canada's 1m-means no one firm will stand out when investment statistics are given by sector or region.(The Economist 2013-06-13) (10) THE "productivity paradox" has been solved.Robert Solow, a Nobel laureate in economics, once famously observed that "you can see the computer age everywhere but in the productivity statistics".(TheEconomist 2003-09-11)    (11) Film buffs now view Swiss dream-sequences as cheesy, but India's big offshore hubs are more in fashion than ever.They present a mirror image of India's red tape, weak infrastructure and graft.(The Economist 2013-08-10)The third most significantly used domain in the corpus of economic growth is Money and pay (I1.1).Under this umbrella term are words like income, investment, saving,GDP, advances, capital, remittances, spoils, afford, as shown in the sampling concordancing lines: Full and low spending relative to GDP .Each rise of ten percentage 8 More | Full s found to raise the overall GDP growth rate by one percentage 11 More | Full ates fall first , as medical advances reduce infant mortality and e 12 More | Full free access for exporters to capital and intermediate goods , atte 13 More | Full y heavily on flows of worker remittances : Bolivia , Burkina Faso , El 17 More | Full t how to divide the economic spoils than how to enlarge them .Bu 35 More | Full le exchange rate and can not afford to let its rural masses settl 38 More | Full Needles to say, the semantic domains are closely related.The 7 th most significant semantic category is labeled as Measurement-Speed (N3.8).There are 121 occurrences regarding this domain (N3.8), of which 80 instances fall into the 11th most significant category Speed: Slow (N3.8-), and 23 instances fall into the 21th most significant category Measurement-Fast (N3.8++).That is, the words and phrased used to describe slow is five time more frequent than those to describe fast, indicating that the recent concern of economic growth is its slowdown of the speed.Below are some examples containing words that symbolize the category Speed: Slow.(12) When giants slow down: The most dramatic, and disruptive, period of emerging-market growth the world has ever seen is coming to its close (The Economist 2013-7-27) (13) Inflation is relatively modest, but wages lag far behind.(The Economist 2013-8-10) (14) In the first quarter of 2013 sluggish sales to the United States, by far Mexico's largest export market, helped reduce growth to a modest 0.8% compared with the same period in 2012.(The Economist 2013-05-25) (15) Since 1970, however, the observed rate of technological advancement has slowed sharply, despite the temporary bump from the internet (The Economist 2012-9-8)

( 18 )
After a slump in the 1980s and 1990s, middle-income economies rebounded in the 2000s and have maintained a healthy growth rate more recently.(The Economist 2013-04-13) (19) Over the longer run a healthy rate of nominal GDP growth would absolutely make reductions in public debt easier.(The Economist 2013-04-24) (20) As a result, the healthy-nominal-GDP route to deleveraging would probably entail a bout of inflation above recent levels.(The Economist 2013-04-24) (21) AS MOST of Africa begins to prosper, the continent's biggest economy is faltering.Figures released on May 28th showed that GDP in South Africa rose at an annualised rate of just 0.9% in the first quarter.(The Economist 2013-06-01) (22) Slow growth and a sliding currency are alarming symptoms of a deeper malaise (The Economist 2013-06-01)

Table 1 .
consumption.The top 30 keywords in the corpus of economic growth Lexis relating to places: China, China's, America, India, Indian, tropical and world

Table 2 .
The top key semantic domains in the corpus of economic growthThe most significant difference (LL value 1645.65) in the semantic comparison is for the tag N3.2+ representing the semantic field size (big).Under this umbrella term are words like big Full poverty rates tumbled .Gaping economic imbalances fuelled an era of fin 115 More | Full ations, enormous quantities of cheap new labour became accessible .A 116 More | Full Still, there are words or phrases that seemingly belong to one semantic domain but are not grouped into a certain category.(23) Having grown by 2.3% a year between 1995 and 2002, it grew by 4% annually in the following eight years.But Brazil then ran out of puff.It grew by a disappointing 2.7% in 2011, and a dismal 0.9% in 2012.(The Economist 2013-05-07)