Vocabulary Size and Depth of Knowledge: A Study of Bahraini EFL Learners

This study investigates the size and depth of vocabulary knowledge and its relationship to the general language proficiency of EFL learners. The study sample included 120 students from the University of Bahrain. The sample was randomly selected from the student population and split into two groups in terms of their level of English: intermediate and advanced. The study aims to answer four questions: (1) What is the effect of general language proficiency on the sizes of the receptive and productive vocabularies of learners of English at the University of Bahrain? (2) How does general language proficiency affect the depth of vocabulary knowledge of learners of English at the University of Bahrain? (3) What is the relationship between receptive and productive vocabularies and the depth of vocabulary knowledge? and (4) What is the relationship between vocabulary size and the nature of lexical networking? All the students in the sample completed three vocabulary tasks. The first two tasks were Meara and Jones’s Eurocentres Vocabulary Size Test (1990) and Meara and Fitzpatrick’s Lex30 word association task (2000), which were used to measure the sizes of receptive and productive vocabularies. The third task was Gyllstad’s COLLEX test (2007), which was used to investigate the depth of vocabulary knowledge. A quasi-experimental approach was adopted using a quantitative approach to analyze the data. The data of the study were analyzed by comparing the results of the two groups in relation to the three tasks using SPSS 16.0. The findings of the study have revealed that general language proficiency has a positive effect on learners’ receptive vocabulary size, a moderate effect on learners’ productive vocabulary size, and a very low effect on the depth of vocabulary knowledge. In addition, no relationship was shown between the size of vocabulary and the nature of lexical networking. With reference to these results, pedagogical and future research recommendations are made.

concrete words like cup, student, table, and chair have more semantic value in comparison to words like peace, pain, love, and feel, which are known as abstract words. Finally, the last assumption of Richards's framework proposes that knowing a word is to know its different and multiple meanings. For example, the word bow can mean a knot tied with two loops and two loose ends, or bending as a sign of respect.
There has been criticism in the literature that Richards's framework is not an integrated framework of vocabulary knowledge. Besides the lack of convention in the productive and receptive vocabularies, Meara (1996) comments that Richards's assumptions cannot be held as a systematic account of word knowledge for several reasons. One of these is that Richards's endeavor to identify word knowledge is related to another genre, in which it is "an honest attempt to give an account of contemporary linguistic research with inferences and applications to teaching where appropriate" (Meara, 1996, p. 2). Added to that, for example, assumption four is mainly involved in a "short-lived development in syntactic theory, case grammar, which faded shortly" after the appearance of these assumptions (Meara, 1996, p. 2). Assumption eight "does not appear to be based on any specific research" but, instead, is based on "language teachers' points of view", which show a clear gap in the research at that time (Meara, 1996, p. 2). By contrast, Nation's (2001) recent contribution to the literature provides an improved framework of vocabulary knowledge, which is discussed in the next section.

Nation's (2001) Vocabulary Knowledge Framework
The second theoretical framework of this research is known as Nation's vocabulary knowledge framework. According to Schmitt (2010), this framework has been regarded as one of the most widely used by researchers. Nation states that vocabulary knowledge is composed of three categories. As Table 1 demonstrates, these categories are form, meaning, and use, which are essentially related to both receptive (R) and productive (P) aspects of word knowledge. Table 1. What is involved in knowing a word (Nation, 2001, p. 27 knowledge of spelling words correctly (Schmitt, 2000).
Prefixes, suffixes, and stems are involved when discussing the knowledge of word parts. It is possible to know the meaning of any unknown word when the prefix, suffix, or stem of the word is known. For example, when learners know that the word unbelievable contains the prefix un-, which means not, and the suffix -able, which means can be, then it is easy to know that the meaning of the word is not to be believed. Also, Nation believes that the connection between form and meaning, concept and referents, and word associations is meant to be the knowledge of word meaning. For example, when a word is seen or heard, the meaning of this word is retrieved simultaneously; similarly, when a meaning is needed to be expressed, the form of the word also comes to mind. As Richards (1976, p. 81) claims, words are not stored separately in blocks in a human being's mind: "words do not exist in isolation." Words are linked or joined together with their associations in the minds of human beings. Each single word is stored in the mental lexicon of language learners with its associations-words that are related to each other and share common things. For example, words like school, chair, table, classroom, students, and teachers can be stored together in the mental lexicon.
In addition, Nation argues that the knowledge of using a word correctly refers to the grammatical function of that word, collocations, and an awareness of the constraints of that word. To make it clear, the grammatical function factor plays a vital role when using a word. For example, learners do not utter sentences like I reading a lot or I eaten a lot; instead, they say I read a lot and I eat a lot. Choosing correct word classes is needed to form grammatical sentences. Added to that, register and frequency are two other factors that are essential in using words. Schmitt (2000, p. 31) defines register as the stylistic constraints that "make each word more or less appropriate for certain language situations or language purposes." Frequency, however, is a different word capacity level; e.g., a word like laugh is used more than guffaw, giggle, and chuckle, and therefore it is a high frequency word. That is, the process of recalling and recognizing high frequency words is easy to do in the mental lexicon. Hence, learners have to be aware of how to use the word with its constraints.

Research Problem
Literature shows that there is a strong emphasis on assessing vocabulary knowledge and its importance in achieving excellence in learning EFL (Meara, 2009;Milton, 2009;Schmitt, 2010;Nation & Webb, 2011). A systematic study examining the extent of the general language proficiency of EFL learners in terms of the size and depth of vocabulary knowledge in the context of Bahrain is lacking. Therefore, this study sheds light on the importance of measuring the size and depth of the vocabulary knowledge of EFL learners at the University of Bahrain (ranked number one in the Kingdom of Bahrain). This study investigates the extent to which the size and depth of vocabulary knowledge affect the general language proficiency of EFL learners who are taking English language courses at the University of Bahrain.

Research Objectives
To answer the research questions stated above, this study aims to do the following: 1) Measure the development of the receptive and productive vocabulary sizes of EFL learners at the University of Bahrain. In particular, the study focuses on the effect of general language proficiency on the development of both the size and depth of vocabulary knowledge, and the nature of mental lexical networking.
2) Evaluate the organization of the mental lexical networking of EFL learners at the University of Bahrain to examine the role of general language proficiency.
3) Investigate the relationship between the size of vocabulary and the organization of lexical networking in relation to general language proficiency.

Research Questions
The research questions for this study stem from the call in literature for more emphasis on measuring the size and depth of vocabulary knowledge. Several lines of evidence report on the importance of vocabulary for EFL learners. Therefore, this study aims to answer the following questions: 1) What is the effect of general language proficiency on the sizes of the receptive and productive vocabularies of learners of English at the University of Bahrain?
2) How does general language proficiency affect the depth of vocabulary knowledge of learners of English at the University of Bahrain?
3) What is the relationship between receptive and productive vocabularies and the depth of vocabulary knowledge? 4) What is the relationship between vocabulary size and the nature of lexical networking?

Method
Measuring the size and depth of vocabulary knowledge requires dealing with scores. Scores enable the researcher to come up with valuable recommendations by analyzing statistical findings. Therefore, this research was conducted using a quantitative methodology. As mentioned before, this study investigates the size and depth of vocabulary knowledge and its relationship to the general language proficiency of EFL learners.

Population and Sample
The population of this study was University of Bahrain English major students at various proficiency levels. 120 L2 learners of English were chosen from the Department of English Language and Literature in the College of Arts. These participants, both male and female, were selected randomly based on their language proficiency level. They were then divided into two groups in terms of their proficiency level. As shown in Table 2, group one consists of 60 first-year intermediate students, aged between 18 and 19: 42 females and 18 males. Group two consists of 60 fourth-year advanced students, aged between 20 and 21: 45 females and 15 males. All participants in the study had started to learn English by the age of six or seven (Rixon, 2013), which generally refers to Grade 1. The proficiency levels of the two groups were represented by two English courses provided by the university: participants at the intermediate level were enrolled on the ENGL130 course (3 sections were involved), whereas participants at the advanced level were enrolled on the ENGL450 course (3 sections were involved).

Research Tools
This study uses three different tools to assess three aspects of vocabulary knowledge: receptive vocabulary, productive vocabulary, and vocabulary depth. Each aspect has its own measurement tool. The first tool is the EVST, used to measure receptive vocabulary knowledge (Meara & Jones, 1990). The second tool is the Lex30 word association task, which is used to measure the productive vocabulary knowledge (Fitzpatrick & Meara, 2004). The third tool of this research is the COLLEX test, used to investigate the depth of vocabulary knowledge (Gyllstad, 2007).

Eurocentres Vocabulary Size Test (EVST)
The EVST was developed by Meara and Jones (1990) to measure the size of learners' receptive vocabulary knowledge based on Thorndike and Lorge's (1944) most frequent 10,000 words of English. This test includes 300 words divided into five blocks. Each block represents 1,000 frequent words: "the first block contains a sample of items from the first 1,000 most frequent words in English; the second block is a representative sample from the second 1,000 most frequent words in English; and so on" (Meara, 1990, p. 3). Each block contains 60 words, although not all of them are real English words. These 60 words are divided into 40 real words and 20 non-words. In order to complete this test, the test takers need to mark the words that are familiar to them and leave the words that they do not know.
Historically, the format of the EVST was derived from one of the simple formats used to measure the size of receptive vocabulary knowledge, and it first appeared in the field of L1 studies (Sims, 1929;Tilley, 1936;Zimmerman, Broder, Shaughnessy, & Underwood, 1977). The format of this test is known as the 'checklist', which includes a list of words with the requirement for test takers to mark those familiar to them. However, later on, researchers realized that the result of this test could be affected by the overestimation of knowledge factor. Therefore, Anderson and Freebody (1983) found that the list of words in the EVST lacked pseudo-words (non-words) to maintain accurate results. Thus, they developed the test by adding 20 pseudo-words to the real words. These pseudo-words help to provide an accurate knowledge estimation of real words. Later on, Meara and Buxton (1987) used this developed EVST in the L2 field as a first attempt to check its workability. The test was then developed further by separating the words into 40 real words and 20 pseudo-words in each of the five levels, and a computer-based format was also provided (Meara & Jones, 1988, 1990. A reliability test was run using the Statistical Package for the Social Sciences (SPSS), as can be seen in Table 3: the Cronbach's alpha is .887, which indicates a highly reliable test. In addition to the EVST, which is used to measure the size of receptive vocabulary knowledge, the study also measures the size of productive vocabulary knowledge using a method known as Lex30, a word association task designed by Meara and Fitzpatrick (2000). Lex30 contains a list of 30 stimulus words. To fulfil the task, test takers need to produce a range of one to four responses, or associations, to these stimuli (Appendix 2 contains a full version of Lex30). The following example shows one of the stimulus words, animal, with its possible associations of elephant, tiger, farm, and wild.
animal: elephant -tiger -farm -wild As Meara (2009) argues, all the stimulus words provided in Lex30 were chosen in accordance with a number of criteria. The first criterion is that all of the 30 stimulus words are highly frequent; they were chosen from the first 1,000-word list by Nation (1984). Nation's 1,000-word list comprises low-level words, and learners are expected to identify and recognize them. The reason behind choosing these words is to make the task flexible so that it can be used with varying ranges of proficiency levels. The second criterion for choosing these words is that they do not elicit a single response, or association, like the words black and dog. To put it simply, words that may lead to a very narrow range of associations are not used in Lex30, whereas Meara and Fitzpatrick (2000) selected words that generate a wide range of associations. The third criterion is that each item on the stimulus words list naturally leads to associations that are not common words, and, by avoiding the usage of common words that are given by native speakers of English, learners who would like to measure their productive vocabulary size have the opportunity to generate a wide range of associations.
When Lex30 was developed, it was criticized for not having a clear vision of validity and reliability (Baba, 2002). However, in 2004, Fitzpatrick and Meara proved that Lex30 is valid and reliable by conducting a testretest study and two concurrent measures of validity. One of these two measures used native speakers' data, and the other used two parallel tests: the productive version of the VLT (Laufer & Nation, 1995), and a translation task that required learners to translate contexts from Chinese to English, as Table 4 illustrates. The reliability test-retest study consisted of 16 L2 users of English who were from different L1 backgrounds with different levels of language proficiency: lower intermediate to advanced. Participants completed the task twice, with a three-day gap in between. After collecting and analyzing the data, this study proved that Lex30 has a high degree of reliability. As Meara (2009, p. 46) states, "the correlation between the two sets of scores is .866 (p < .01)", which means that the "subjects taking the Lex30 test more than once at a given point in their L2 development will achieve broadly similar scores each time".
After proving that Lex30 is a reliable task that measures the productive vocabulary size, Fitzpatrick and Meara (2004) then proved that Lex30 is a valid task through two studies. The first study compared the performance of ijel.ccsenet.org International Journal of English Linguistics Vol. 12, No. 1; 46 adult L1 speakers of English with 46 non-native speakers. After collecting the data, two points were observed. The first was that the responses of the native speakers to the stimulus words differed from those of the non-native speakers in that the native speakers produced a high rate of low-frequency words. The second observed point was that 18 non-native speakers achieved a higher score than the native speakers. In addition, six native speakers reached a higher score compared with the highest scores of the non-native speakers, and this was due to the nature of the non-native speakers: they were teachers of English at an Icelandic secondary school.
The second study involved 55 Chinese learners of English, varying from intermediate to advanced. They translated a set of 60 Chinese words from their L1 to English, the target language, and they also completed the productive version of the VLT. When Fitzpatrick and Meara (2004) analyzed the data of this study, they found a high correlation between the tests; most importantly, there was a modest correlation with Lex30.
Although Lex30 has been validated, a reliability test was also run using SPSS. As Table 5 shows, the Cronbach's alpha is .930, which indicates a highly reliable test. One possible way to deal with the depth of vocabulary knowledge is by assessing learners' collocational knowledge (Gyllstad, 2007). As Milton (2009, p. 149) notes, "depth is generally used to refer to a wide variety of word characteristics, including the shades of meaning a word may carry, its connotations and collocations, the phrases and patterns of use it is likely to be found in, and the associations the word creates in the mind of the user." Moreover, most collocational studies in the literature concentrate on the content words (Gyllstad, 2005(Gyllstad, , 2007, whereas delexical verbs, which are verbs used with nouns as their objects to indicate simply that someone performs an action, are difficult to identify as collocations, even by advanced learners of English: i.e., have, take, make, give, go, and do (Altenberg & Granger, 2001;Nesselhauf, 2004).
Thus, Gyllstad (2007) developed a method to assess the depth of vocabulary knowledge by evaluating collocational knowledge. This method is known as the COLLEX test. The presented research adapted the fifth version of the COLLEX test to measure learners' depth of vocabulary knowledge (Appendix 3A contains the full version of the test). COLLEX 5 includes a set of 40 collocations, and each set has three options. Learners in this test have to select the most real collocation out of the three options, as shown below:

do a visit hit a visit pay a visit
In addition, as Gyllstad (2007, p. 163) notes, two main points are specified with regard to COLLEX 5. The first point is that "only verb + NP items were used, which means that adjective + NP items were discarded" from the other versions. The second point is that "in terms of item-total correlation and item facility values, new items were created by adding a second distractor to each item", which means that the previous version of COLLEX 5 only had two options, and one of these should be selected, as shown below:

solve a problem break a problem
In addition, the high-frequency words used in COLLEX 5 were chosen carefully through the JACET 8000, which is "a radically new word list designed for all English learners in Japan. This list is based on two kinds of corpora: the British National Corpus (BNC) and JACET 8000 sub-corpus" (Ishikawa, Uemura, Kaneda, Shimizu, Sugimori, & Tono, 2003). As Gyllstad (2007, p. 164) proposed, "a total of 112 different words (72 verbs and 40 nouns) were used, and 88 per cent of these words came from the 1-3K bands." Furthermore, as Gyllstad (2007, p. 164) argues, COLLEX 5 includes some lower-frequency words, and the reason behind this is "to make the distractors plausible" (Appendix 3B contains the word frequencies of the test).
In terms of reliability, literature has shown that all versions of the COLLEX text were reliable, as Table 6 demonstrates.  Table 6. COLLEX test reliability (Gyllstad, 2007, p. 154;Schmitt, 2010, p. 234) Test

Data Collection Procedures
The study took place in March during the second semester of the 2018−2019 academic year. The process of collecting the data began by asking the participants in the two groups to complete three tests. Confidentiality, privacy, and the right of withdrawal were explained to them. The first test was Meara and Jones's (1990) EVST, which was used to measure their receptive vocabulary. The second test was Meara and Fitzpatrick's (2000) Lex30 word association task, which was used to measure their productive vocabulary. The third test was Gyllstad's (2007) COLLEX test, which was used to investigate the depth of their vocabulary.
Of all three tests, only Lex30 had to be completed within a limited amount of time in order to evaluate the variation of responses between the intermediate and advanced groups. Therefore, participants were given 18 minutes to finish it. The EVST and the COLLEX test were not time limited; however, the participants were asked to finish within 30 minutes to avoid taking a long time.

Scoring and Statistical Tools Used
The criteria for scoring the three tests are totally different. First, as discussed before, Meara and Jones's (1990) EVST contains 300 words divided into five levels, with each level containing 40 real words and 20 pseudo-words. Therefore, this test has four possible marks, i.e., hit, false alarm, miss, and correct rejection. The maximum score for each block is out of 60 marks, as shown below: Second, there are a number of steps to score the responses to Meara and Fitzpatrick's (2000) Lex30 word association task, which is a list of 30 stimulus words that require learners to produce a range of one to four responses or associations, as discussed before. The first step is to discard the stimulus words. For example, in the following example, the word animal should be discarded, and the words elephant, tiger, farm, and wild should be analyzed.

animal: elephant -tiger -farm -wild
After discarding the stimulus words, the responses are typed into a machine-readable file. Responses with inflectional suffixes (plural forms, past tenses, past participle aspects, comparatives, superlatives, etc.) and frequent regular derivational affixes (-er, -less, -ness, -able, -ly, -ish, etc.) are lemmatized and counted as base forms of words, whereas responses with unusual affixes are treated as separate words, not lemmatized. At this stage, learners were given their scores out of 120.
After the responses are typed into a machine-readable file, they are analyzed by considering them as a generated text made by each learner. This is then typed into a computer program that analyzes texts according to the frequency level of each word. The current study classified words according to the largest available lists of word families, the British National Corpus/Corpus of Contemporary American English (BNC/COCA) word frequency lists (Nation, 2016), by using one of the most popular profiling programs known as Compleat Web VP V.2. The BNC/COCA word frequency lists "consist of 32-word family lists. Twenty-eight of the lists contain word families based on frequency and range data. The four additional lists are (1) an ever-growing list of proper names, (2) a list of marginal words including swear words, exclamations, and letters of the alphabet, (3) a list of transparent compounds, and (4) a list of acronyms" (Nation, 2016, p. 132).
When it comes to the mental lexicon, the word association responses are analyzed using Moreno Espinosa's (2009) adaptation of Fitzpatrick's (2006) model of word association. That is, test takers' responses to the stimulus words in Meara and Fitzpatrick's (2000) Lex30 word association task are classified into five different categories: syntagmatic responses, paradigmatic, clang, misunderstanding, and uninterpretable, as shown in  Gyllstad's (2007) COLLEX test (version 5) is a collocational test that contains a set of 40 collocations. Each set has three options, of which one is the correct combination of words that represents the real collocation. The scoring of this test is done by counting the correct choices of the collocations. The maximum mark in this test is 40.
The data were analyzed statistically by comparing the mean and calculating the percentages of both the intermediate and advanced groups using SPSS. The responses to the first and the third tests, the EVST and the COLLEX test, were analyzed using SPSS, whereas the responses to the second test, the Lex30 word association task, were analyzed using both a computer program called VocabProfile and SPSS.
The results for both the EVST and the Lex30 word association task were used to answer the first question of the study, and the result of the COLLEX test was used to answer the second question of the study. After this, a comparison was made between all the previous results to answer the third question of the study. Next, the responses to the Lex30 word association task were compared in order to answer the fourth and final question of the study.

Results and Discussion
The criterion used to assess the results of the three tests is the percentage score, as shown in Table 8.

Question One: What Is the Effect of General Language Proficiency on the Sizes of the Receptive and Productive Vocabularies of Learners of English at the University of Bahrain?
To answer this question, both the receptive and productive vocabularies of the intermediate and advanced groups were assessed using Meara and Jones's (1990) EVST and Meara and Fitzpatrick's (2000) Lex30 word association task. After using these two methods, the participants' responses provided in the Lex30 word association task (Meara & Fitzpatrick, 2000) were classified into different lexical categories using a computer program called Compleat Web VP V.2.  Table 9 shows the findings of Meara and Jones's (1990) EVST for both the intermediate and advanced groups. It illustrates a number of items: the number of participants who took part in the test; the participants' minimum, maximum and mean scores; and standard deviations. As can be seen in Table 9, the mean scores of the intermediate and advanced groups in Meara and Jones's (1990) EVST are slightly different: the overall level of the intermediate group is high (77.67% -M = 233), whereas the overall level of the advanced group is very high (89% -M = 367). The high score level of the advanced students is likely to be the effect of language proficiency. The standard deviation, however, indicates that the advanced-level students are more homogeneous as a group (SD = 14.91) than the intermediate-level students (SD = 46.89). Table 10 shows the results of Meara and Fitzpatrick's (2000) Lex30 word association task for both the intermediate and advanced groups. It contains several items: the number of participants who took part in the test; the minimum and maximum numbers of responses provided by the participants; the participants' response means and standard deviations; and the difference between the means of the two groups.  Meara and Fitzpatrick's (2000) Lex30 word association task was moderate (52.91%), whereas the mean score of the advanced group was very high (81.67%). In addition, an independent samples t-test was conducted to compare the numbers of responses by the two groups included in the analysis. The results in Table  1 and Table 11 show that there was a significant difference between the number of responses from the intermediate group (M = 63.56, SD = 16.28) and the advanced-level students (M = 98.6, SD = 19.57) (conditions: t (118) = 10.665, p = 0.000). The same words used in Meara and Fitzpatrick's (2000) Lex30 word association task were included in the Compleat VocabProfile V.2 to classify them into frequency categories based on the BNC/COCA (Nation, 2016) word frequency lists. The results of the Compleat VocabProfile V.2 are shown in Table 12 and Table 13. First, ijel.ccsenet.org International Journal of English Linguistics Vol. 12, No. 1; Meara and Fitzpatrick's (2000) Lex30 word association task. The comparison in Table 12 shows that the mean score of the   In addition, Table 14 demonstrates the distribution of the participants' words provided in Meara and Fitzpatrick's (2000) Lex30 word association task. These words were profiled according to the BNC/COCA (Nation, 2016) word frequency lists using Compleat VocabProfile V.2. The table compares the number of words produced by both the intermediate group and the advanced group in a classification of 25 categories (K1 to K25).

Discussion of the Results for Question One
Based on a comparison between the sizes of the receptive and productive vocabularies, several major points were shown. First, the use of the EVST revealed that learners of English at the University of Bahrain have a distinctive receptive vocabulary: the intermediate group's mean score for their receptive vocabulary was quite remarkable. The mean score of the participants in this group was high (233 = 77.67%), and the advanced group's mean score was very high (267 = 89%). These figures demonstrate that general language proficiency has a positive effect on the development of receptive vocabulary. Second, the use of the Lex30 word association task supports the fact that general language proficiency plays a significant role in developing the size of learners' vocabulary. When comparing the mean scores of the intermediate and advanced groups, the effect of general language proficiency becomes obvious: the mean score for their productive vocabulary increased from moderate (63.56 = 52.91%) to very high (98.6 = 81.67%). In addition, the classification of words in the Compleat VocabProfile V.2 indicates that this growth in vocabulary size was expanded among different categories of word frequency.
Overall, the intermediate and advanced groups' performance of vocabulary knowledge supports Meara's (2009) and Milton's (2009) assumption; that is, a learner's receptive vocabulary is larger than their productive vocabulary. This corroborates Ab Manan, Azizan, Fatima and Mohd's (2017) study: when they investigated the levels of receptive and productive vocabularies, they found that the former was between 2,000-and 3,000-word families, and the latter was around 2,000 word families. Likewise, Karakoç and Köse (2017) found that the size of their participants' receptive vocabulary (M = 87.18) was larger than the size of their productive vocabulary (M = 45.14).

Question Two: How Does General Language Proficiency Affect the Depth of Vocabulary Knowledge of English Learners at the University of Bahrain?
In order to answer question two, the depth of vocabulary variation of both the intermediate and advanced groups was investigated using Gyllstad's (2007) COLLEX test. Table 15 shows the findings of the COLLEX test for both the intermediate and advanced groups. It illustrates many items, including the number of participants; the minimum and maximum scores achieved by the participants; and the means and standard deviations of the participants' scores.  Table 15 shows that the intermediate group's mean score was 21.56 (53.89%) with a standard deviation of 6.65, while the advanced group's mean score was 28.08 (70.19%) with a standard deviation of 5.85. The difference between the mean scores was very low (6.52 = 16.30%).

Discussion of the Results for Question Two
Based on the results of the COLLEX test for both the intermediate and advanced groups, it is clear that there was a very low improvement in the learners' depth of vocabulary (difference in mean score: 6.52). The results of the COLLEX test show that the role of general language proficiency has developed the participants' depth of vocabulary from moderate to high: as Table 15 shows, the participants' mean score increased from 21.56 (53.89%) to 28.08 (70.19%).
With reference to the literature, Ebrahimi (2017) investigated depth of vocabulary using a number of tests, one of which was a productive test of collocation. The results of the test showed that learners' knowledge of collocations was considerably low. Furthermore, Bagherzadeh Hosseini and Akbarian (2007) evaluated the relationship between collocational competence and general language proficiency and found that these two variables are much related to each other.

Question Three: What Is the Relationship Between Receptive and Productive Vocabularies and the Depth of Vocabulary Knowledge?
To answer question three, the mean scores of the three methods used in questions one and two were compared to investigate the correlation between the receptive and productive vocabularies and the depth of vocabulary. In other words, answering question three required an investigation of the correlations between the intermediate and advanced groups' mean scores in the EVST, the Lex30 word association task, and the COLLEX test. Table 16 illustrates the findings of the correlation test between the mean scores of the receptive vocabulary test (Meara and Jones's (1990) EVST), the productive vocabulary test (Meara and Fitzpatrick's (2000) word association task), and the depth of vocabulary test (Gyllstad's (2007) COLLEX test). To illustrate, starting with the first method used in the study, the correlation between the EVST and the Lex30 word association task and the COLLEX test was 0.984 and 0.947, respectively. In addition, the table shows that the correlation between the Lex30 word association task and the EVST and the COLLEX test was 0.984 and 0.989, respectively. In addition, the correlation between the COLLEX test and the Lex30 word association task and the EVST was 0.989 and 0.947, respectively.

Correlation Results
The results indicate that the tests' three mean scores are positively correlated. In other words, the table demonstrates that the sizes of the receptive and productive vocabularies and the depth of vocabulary are significantly correlated.

Discussion of the Results for Question Three
On the basis of the correlation results shown in Table 16, question three can be answered logically by stating that, as there are significant correlations between the results of the three tests, the relationship between the sizes of the receptive and productive vocabularies and the depth of vocabulary knowledge is positively correlated. This finding is consistent with a number of studies. For example, Shin, Chon and Kim (2011) demonstrated that, overall, their participants' receptive and productive vocabulary sizes indicated moderate correlations when an investigation was undertaken to provide an assessment of their learners' vocabulary sizes. In addition, a similar finding was also reported in Fleckenstein (2018): a strong, positive relationship was revealed when these two aspects of vocabulary knowledge were assessed. Also, on the theme of size and depth of vocabulary knowledge, Bardakçi (2016) found that these two dimensions are closely correlated, i.e., these dimensions have significant effects on learners' vocabulary profiles.

Question Four: What Is the Relationship Between Vocabulary Size and the Nature of Lexical Networking?
In order to answer question four, the Lex30 word association task responses of the intermediate and advanced groups were classified according to Fitzpatrick's (2006) model of analyzing word association categories and sub-categories. As discussed before, five classifications were used in this model: syntagmatic, paradigmatic, clang, misunderstanding, and uninterpretable associations.  On the one hand, as seen in Table 17, the intermediate group provided 3,814 responses in the Lex30 word association task. These responses were analyzed and classified as follows: 53.04% of the responses were paradigmatic associations, 37.97% were syntagmatic associations, 4.82% were uninterpretable associations, 3.04% were misunderstanding associations, and 1.13% were clang associations. On the other hand, the advanced group provided 5,916 responses: 58.25% were paradigmatic associations, 32.49% were syntagmatic associations, 6.63% were uninterpretable associations, 1.83% were misunderstanding associations, and 0.81% were clang associations.

Discussion of the Results for Question Four
According to Table 17, the responses of the intermediate and advanced groups were similar to each other. Most responses were paradigmatic, then syntagmatic, and the least popular responses were clang; no shifts were identified among all the responses. What is interesting here is that, although the effect of general language proficiency affected the sizes of the responses, the classification of the responses did not change among the categories of Moreno Espinosa's (2009) adaptation of Fitzpatrick's (2006) word association model. Therefore, question four can be answered by saying that there is no relationship between the size of vocabulary and the nature of lexical networking. However, the figures shown in Table 17 support Meara's (2009, p. 6) assumption that "normal adults produce two main types of association", syntagmatic and paradigmatic, and they "produce more paradigmatic responses than syntagmatic ones." Moreover, these figures are interestingly in line with Moreno Espinosa's (2009) findings when the researcher investigated learners' lexical profiles using the Lex30 word association task (Meara & Fitzpatrick, 2000): the vast majority of responses were syntagmatic, then paradigmatic, and the least were clang.

Conclusion
This study attempted to answer four research questions related to the size and depth of vocabulary knowledge. The first and second research questions were aimed at evaluating the effect of general language proficiency on the development of the vocabulary size and depth of knowledge of EFL learners at the University of Bahrain. The third research question was aimed at investigating the relationship between vocabulary size and depth of knowledge. Finally, the fourth research question was aimed at investigating the relationship between the size of vocabulary and the organization of lexical networking in relation to general language proficiency.
The results indicate that general language proficiency has a prime and positive effect on the size and depth dimensions of vocabulary knowledge. According to the findings, the size of the learners' receptive vocabulary changed from high (77.67%) to very high (89%), and the size of their productive vocabulary changed from moderate (52.91%) to very high (81.67%). In addition, the learners' depth of vocabulary changed from moderate (53.89%) to high (70.19%). The study also shows that there is a positive correlation between vocabulary size and depth of knowledge. However, there is no evident relationship between the size of vocabulary and the nature of lexical networking; the order of responses by the participants in the Lex30 word association task did not change or shift.

Limitations
Using hardcopies of the tests was a major limitation in the presented study, with the researcher transferring all the participants' responses into software programs in order to find the answers to the research questions. Unfortunately, the computerized version of the Lex30 word association task used in the study to assess learners' productive vocabulary size was, for various reasons, no longer supported by the owner. Therefore, the time required for analyzing the findings was extended.

Recommendations
Based on the investigation into vocabulary size and depth of knowledge in the current study, several recommendations can be made to help develop EFL learners' L2 vocabulary learning competence.
The receptive vocabulary sizes of the intermediate and advanced groups were significantly distinctive: the intermediate group's mean score was high (77.67%), and the advanced group's mean score was very high (89%). It seems worthwhile to recommend that qualitative studies are carried out to investigate and then generalize the strategies for learning receptive vocabulary used by EFL learners at the University of Bahrain. In terms of their productive vocabulary size, the intermediate and advanced groups' mean scores demonstrate the need to focus on teaching more productive vocabulary: the intermediate group's mean score was moderate (52.91%), and the advanced group's mean score was very high (81.67%). It would also be worthwhile to investigate the mechanism that stimulates EFL learners to use their receptive vocabulary in writing and speaking. Furthermore, the difference between the two groups' mean scores for vocabulary depth reveals that EFL learners should be exposed to learning more semantic relationships, i.e., word associations and collocations.
Further research could also be conducted to assess the exact size of the receptive vocabulary knowledge of EFL learners at the University of Bahrain using Nation and Beglar's (2007) VST. Once vocabulary knowledge has been assessed, a valuable overview of learners' vocabulary competence will be revealed to the instructors and curriculum designers engaged in building and developing English courses. It could be argued that English learning courses would be more relevant if an assessment of vocabulary was included at the beginning and end of these courses. Whether these courses cover listening, reading, writing, or speaking skills, an overview of the vocabulary assessment has considerable benefits. In addition, the assessment of EFL learners' vocabulary knowledge could also be beneficial when compared with that of other learners around the world. The comparison explores major points on the strengths and weaknesses of the vocabulary learning process. Furthermore, it would be beneficial to investigate vocabulary knowledge in relation to different gender perspectives alongside the language proficiency levels of EFL learners in Bahrain. ijel.ccsenet.org International Journal of English Linguistics Vol. 12, No. 1; More broadly, greater efforts are needed to build a vocabulary assessment application for the University of Bahrain to be used with the admission aptitude test and personal interview.