Vocabulary Learning Through Audio-Visual Input of Thai Primary School EFL Students

,

In the EFL context, vocabulary is taught at all levels, from primary school to university. Teachers start teaching vocabulary from the beginning and throughout their language lessons, assigning activities and testing vocabulary knowledge during the exam. However, the primary goal of English language learning is to be able to communicate. Yet, there is no clear-cut way of learning vocabulary because various traditional methods, such as vocabulary notebooks, paper wordlists, or bilingual dictionaries, are widely used. However, innovative ways can help EFL students master vocabulary items more effectively, such as digital flashcards (Yowaboot & Sukying, 2022) and audio-visual inputs (Peters, 2019;Peters & Webb, 2018).
Approaches that involve learning words are seen as intentional vocabulary-learning techniques. Research shows that vocabulary can be learned through planned activities, such as word cards or flashcards (e.g., Yowaboot & Sukying, 2022) and word parts (e.g., Bubchaiya & Sukying, 2022;Matwangsaeng & Sukying, 2023;Sukying & Matwangsaeng, 2022). The usefulness of activities according to these incidental and intentional approaches is that the classification helps to differentiate between the relative effects of these broad approaches to learning vocabulary. However, a limitation of classifying activities as incidental and intentional is that effectiveness tends to be generalized across each category. For instance, intentional learning activities are described as being more effective and providing the greatest chance that words will be learned (Schmitt, 2010). Schmitt further argues that "intentional vocabulary learning almost always leads to greater and faster gains, with a better chance of retention and of reaching productive levels of mastery than incidental vocabulary learning" (p. 341). There is justification for these statements since the efficacy of intentional approaches to vocabulary learning is supported by studies showing that deliberate methods contribute significantly greater vocabulary learning gain than incidental vocabulary ones (Nation, 2013). However, the extent to which incidental and intentional study of vocabulary contributes to vocabulary learning is still an ongoing debate. Further, the extent to which learning is consistent across different incidental and intentional vocabulary learning tasks remains inconclusive.
Recent studies indicate that technology has played an essential role in language enhancement, resulting in more successful learning. Audio-visual input, described as English vocabulary videos, are among the most effective vocabulary learning strategies (e.g., Durbahn, 2019;Peters, 2019;Peters & Webb, 2018;Rodgers & Webb, 2020). Audio-visual input encourages language learning by making it more perceptible and practical. They are widely used in the classroom to improve students' motivation by relating real-life circumstances to learning processes (Ibe & Abamuche, 2019). Language students will grasp various inputs in multiple cultural contexts using this strategy without the teacher's unnecessary explanation (Ogasawara, 1994). Plus, it helps language students become acquainted with the language through media's function in more authentic contexts, promoting more effective educational instruction. Language students learn 83% better through sight than through other senses (Ibe & Abamuche, 2019). More specifically, hearing and seeing improve students' memory by 50%. (Gautam, 2022). Audio-visual materials can attract students' attention longer than spoken courses. Audio-visual input is helpful, innovative materials that give multiple sources of pictures and sounds (Sukma, 2018). Teachers may add diversity to their lesson plans, while students may be inspired and more positive toward English learning. It encourages language students to comprehend precise and accurate concepts of vocabulary knowledge, which is beneficial.
Most previous studies showed positive effects of audio-visual input and found that students' knowledge was developed and increased (e.g., Yawiloeng, 2020). This approach was established to advance students' vocabulary knowledge, specifically primary school students (e.g., Aziz & Sulicha, 2016;Ezebuiro, 2019). However, in a Thai EFL context, students' size and depth of vocabulary knowledge were small and insufficient to use in context (e.g., Mungkonwong & Wudthayagorn, 2017;Nontasee & Sukying, 2021;Sukying, 2017Sukying, , 2018. Indeed, it is imperative to grow students' comprehension of vocabulary knowledge, which further advances vocabulary use. Therefore, the research is willing to primarily improve students' vocabulary knowledge in a Thai context by using an instructional intervention as an audio-visual aid. Hopefully, the results will be an initial foundation to develop students' vocabulary knowledge and benefit pedagogy development in Thailand. Based on previous literature analysis, deliberating vocabulary instructions influence students' vocabulary knowledge acquisition and development. Audio-visual input can help them improve their understanding of vocabulary knowledge more effectively than given uninstructed knowledge. Understanding the form and meaning of a word is language students' first language learning process (Laufer & Goldstein, 2004;Nation, 2013). It will be suitable for measuring Thai EFL primary school students' current knowledgeability. Therefore, this research explored the hypothesis of the effectiveness of instructions via audio-visual input in acquiring a word, specifically focusing on form-meaning links, in Thai EFL primary school students. The research also investigated the participants' perceptions of audio-visual input for enhancing vocabulary knowledge. The following research questions were established to guide the research: 1) Does audio-visual input enhance Thai EFL primary school students' receptive and productive knowledge of form-meaning links?
2) What are Thai EFL primary school students' perceptions of using audio-visual input to enhance the form-meaning links of vocabulary knowledge?

Vocabulary Knowledge
The concept of overall vocabulary knowledge is based on word aspects. Nation (2013) proposed a comprehensive list of vocabulary aspects, including the 18 sub-knowledge aspects and the receptive-productive learning process. The process begins with becoming familiar with the word and ends with using the word correctly in context. Therefore, this process represents a receptive and productive vocabulary knowledge continuum, starting with word comprehension and leading to word production. Indeed, vocabulary knowledge can range from knowing that a given form is an existing word to mastery of all aspects. The extent of such knowledge applies to all students, including native (L1) speakers and L2/EFL students (Schmitt, 2010). The framework of vocabulary knowledge by Nation (2013, p. 49) is shown in Table 1. Note. R = receptive knowledge, P = productive knowledge.

Audio-Visual Input
These days, the incorporation of multimedia in language classrooms has grown in popularity worldwide. Over the past years, such media has been extensively used in English language classrooms (Duffy, 2007;Wei & Fan, 2022). As such, it can be argued that this form of media is suitable for enhancing students' exposure to the language (Watkins & Wilkins, 2011). It can also stimulate students' visual and auditory systems, which will, in turn, contribute to maximizing learning outcomes (Low & Sweller, 2014;Milosevic, 2017;Shah & Khan, 2015).
Scholars have attempted to define the term audio-visual input or audio-visual materials. Kathirvel and Hashim (2020) proposed that audio-visual materials are any tool that can provide a more realistic and dynamic learning experience. Francis (2011) also describes them as instructional materials that convey meanings independent of symbols or language. In the same way, audio-visual inputs are supplementary materials allowing teachers to explain, establish and link concepts and interpretations (Muliana, 2018). It can be briefly concluded that audio-visual materials are interactive materials that, through the sense of hearing and sight, can create a more dynamic learning experience, stimulate students' learning, and can explain meanings and concepts.
Using audio-visual input can benefit teaching (e.g., Batool, Ahmed, Rehan, & Zahra, 2022;Hariffin & Said, 2019;Peters, 2019;Lestari & Selian, 2021;Safitri, Farmasari, & Thohir, 2022;Yawiloeng, 2020). Mansourzadeh (2014) outlined a set of values that pertain to the effective use of audio-visual input in instruction. Among many values is its potential to enhance retention; the repetition inherent in audio-visual input can lead to better student information retention. Using such input can contribute to more enjoyable and engaging learning experiences, helping to alleviate boredom. Another potential benefit of audio-visual input is their capacity to promote creative responses in students; viewing videos allow them to visualize the relevant content. Furthermore, audio-visual input can be a suitable substitute for real-life experiences, bringing realism and context to the learning process. Finally, this input can facilitate interest and assistance in self-expression; simply put, they enable students to express their thoughts and feelings about the material.

Previous Studies
Widiastuti (2011) points out how using such audio-visual input as videos can benefit English students, especially young students between the ages of three and eight. Video offers a superior mode of conveying meaning compared to other forms of media as it presents language in a contextualized manner, which is not feasible with cassette tapes. With videos, students can observe visual cues such as the speaker's identity, the location, and the ongoing activities of the individuals in the video, which aid in understanding. Specifically, its first upside lies in that videos facilitate an enjoyable language learning experience for young students, which aligns with intending to cultivate a positive attitude towards language acquisition at an early age. Video contents create an engaging and appealing learning environment that promotes language learning among the students. Second, these materials enable effective learning of non-verbal communication, specifically body language, which is critical for younger language students who are still developing an understanding of the world around them. Finally, the students are afforded opportunities for repetition, an essential element of language learning for young children.
Through repeated exposure to video content, their learning is fostered through imitation and absorption, contributing to an increase in confidence and motivation in language learning.
According to McKeown, Crosson, Moore and Beck (2018), videos can enhance diverse aspects of vocabulary knowledge by enabling students to generalize a meaning of a particular lexical item across different contexts, thus promoting a flexible application in novel contexts. Videos can offer L2 students a meaningful context that is unavailable in the EFL setting; they portray how the target lexical items should be used in the real-world context, thereby facilitating the development of flexibility in communication and speech production skills. Possessing flexibility in their vocabulary knowledge enables them to integrate the meaning of a word into unfamiliar contexts, enhancing their ability to comprehend the overall meaning of such contexts.
Hartono's study (2013) aimed to examine the students' vocabulary learning with and without audiovisual media, along with the difference between the two groups of students. A quasi-experimental design was adopted in this study. The sample was 70 grade 10 students of SMA N 1 Cepiring Kendal who were equally assigned to the experimental group taught with audiovisual media and the control group without such media. The pretest and the posttest -a 20-item fill-in-the-blank test -were administered to the students before and after the treatment and must be completed within 60 minutes. This finding suggested that using audiovisual media could improve students' vocabulary mastery effectively.
Similar to the prior study, the positive results from using audiovisual media, i.e., a cartoon film, are found in Munir's study (2016). Specifically, the study sought to measure the students' scores before and after being taught a cartoon film and the movie's effectiveness in enhancing their vocabulary mastery. A one-group pretest-posttest design was used. 25 grade 4 students at MI AI Hidayah 02 Betak were selected as the sample. The results showed that the students' mean scores on both tests were 75.16 and 87.92, respectively. In addition, a statistically significant difference in their scores was identified; it implied that using a cartoon film to teach students was practical and could enhance students' vocabulary knowledge.
Just as both studies, Grathia's (2017) investigation found the same trend in improving students' vocabulary mastery through audiovisual media, specifically videos. The study was intended to explore the effect of using an English video on students' vocabulary learning. The findings showed a significant increase in the experimental group's score, suggesting that those learning with the English video outstripped those without it. It can be stated that the use of the English video facilitated vocabulary learning. Therefore, the audio-visual inputs, such as a video, a cartoon film, and a documentary, benefit L2 learners in acquiring new vocabulary items. Previous jel.ccsenet.org Journal of Education and Learning Vol. 12, No. 4; studies also suggest a positive attitude toward audio-visual aids in vocabulary learning.

Participants and Setting
This research used convenience sampling to investigate one experimental group with 51 participants who were sixth-grade students at the government schools in the Primary Educational Service Office in the Northeastern Region of Thailand. Their age ranged from 11-to-12 years old, and all participants were Thai native speakers.
All participants had learned English as a foreign language (EFL) and received English lessons for at least five years of systematic schooling. Their English proficiency was at the A1 level based on Thailand's Common European Framework of Reference for Languages (CEFR) (The Ministry of Education, 2008), indicating they were beginning English students. The size of the vocabulary was specified in the core curriculum. According to the Basic Education Curriculum B.E. 2551 (A.D. 2008), Graduates of Grade 6 should have a vocabulary of nearly 1,050-1,200 words. L1-L2 or L2-L1 translation was considered an early learning process that EFL students must understand. Therefore, form-meaning links were closely related to their current need to learn English.

Word Selection
The one hundred and twenty-five target words were selected from Happy Campers Book on Grade 6 by Macmillan education publishers, the school textbook, then cross-checked with the first 1,000 high-frequency words of the New General Service List (NGSL) (Browne, Culligan, & Phillips, 2013). The target words should be relevant to the level of the student's current vocabulary knowledge (Nation, 2013). Therefore, an English vocabulary try-out was used to check the familiarity of the target words in the research setting. Another consideration was that unknown and known words were removed based on participants' scores (Bruton, 2009). This led to a final list of 80 target words to present in the audio-visual instructions. Next, the 30 items were selected randomly out of the 80 target words to test the form-meaning links of students.

Research Instruments
Two tests measured students' receptive and productive knowledge of form-meaning links, i.e., the L2-to-L1 Translation Test and the L1-to-L2 Translation Test. Through audio-visual input, a questionnaire was also used to examine their perceptions of form-meaning relations of vocabulary knowledge. The content validity was rated by five experts (All items > 0.5) (Lynn, 1986). The reliability indicated the acceptance of the internal consistency reliability estimates for all research instruments (all Cronbach's α values ≥ 0.7) (Dörnyei, 2007). The item difficulty and discrimination analysis indicated that all items were rated moderate at 0.2 to 0.8 (Creswell, 2002).

The L2-to-L1 Translation Test
The L2-to-L1 translation test was developed based on the L2 translation test (Sukying & Nontasee, 2022) as a multiple choice and produced a reliability of 0.70 on Cronbach's Alpha. In testing form-meaning links, the participants must show that they understand the target words or generate the target form for a specific meaning. The L2-to-L1 translation test is frequently advised to assess meaning comprehension and form recognition (Laufer & Goldstein, 2004). This test was a receptive measure to capture receptive knowledge of form-meaning links. Participants must translate the highlighted word from English into Thai. The test consists of 30 items. The sentence provides the word's context to minimize confusion about the intended meaning. A correct word definition was awarded one point, and no points were given for no answer or an incorrect answer, such as an incorrect form-meaning match definition. The following is a sample of this test (Table 2). form-meaning links and was used to measure students' ability to recall English words (Laufer & Goldstein, 2004). The instructions let the participants recall the specific form with the relevant meaning, and the definition was distributed to the participants. The participant must read the 30 Thai meaning words and rearrange the English letters into the correct ones. An accurate English word was awarded one point, and no points were given for no answer or an incorrect answer. Below is an illustration of this test (Table 3).

Questionnaire
A five-point Likert scale questionnaire developed based on Sukying's (2020) perception questionnaire of vocabulary knowledge was used to examine EFL primary school participants' perceptions of form-meaning link instructions via audio-visual input (see Table 1). The reliability of the questionnaire in the current research was 0.86 on Cronbach's Alpha.
Form-meaning link instructions through audio-visual input were assumed to foster students' English language proficiency. Therefore, the questionnaire questions were designed based on vocabulary knowledge of form-meaning links. Further, the questions involve understanding form-meaning links at both reception and production by learning through instructional intervention. A five-point Linkert scale was used from strongly disagree (1) to strongly agree (5). This developed questionnaire was used to check participants' understanding of form-meaning links via audio-visual input. Therefore, it was first designed in English and, alternatively, translated into Thai to make a precise understanding of the participants in the questions.

Audio-Visual Input
The participants received regular English instruction based on systematical schooling (Ministry of Education in Thailand, 2008) and English form-meaning link through audio-visual teaching as an additional instruction to prove the qualified benefit of this extra instruction added to EFL primary school students.
According to the Ministry of Education in Thailand, all students had been registered in EFL lessons as mandatory and compulsory subjects. The participating primary school scheduled four 50-minute English sessions with Thai EFL teachers and one 40-minute session with native English speaker teachers weekly. Regular English instructions deliver basic English content, such as grammar tense and vocabulary, through a grammar-translation approach. Less productive activities and no detailed teachings on vocabulary knowledge are used. Therefore, this additional instruction aims to provide explicit instruction on vocabulary knowledge and an improved understanding of a word's particular form and meaning. The teaching period was eight weeks, Gains and Redman's (2007) lesson plans were modified for this study, and the learner was expected to study eight to twelve words on average in fifty minutes from each unit (see Table 5).  The teacher presents the topic of the day that the student will learn. Presentation 3.
The teacher presents the new vocabulary by using audio-visual input as a video. Audio-visual instruction video Practice 4. The teacher asks students to do the "Say it" activity. The teacher shows the video without sounds or words, and students must say the correct word.

5.
After the teacher shows the word, its meaning, and its pronunciation, students must repeat after word again. 6.
The teacher then asks the students to do the "Match me!" activity. In this activity, the student will recall and retrieve the words and the meaning of each word by matching the word to the proper meaning.

Audio-visual instruction video
Production 7.
The teacher assigns students to do the worksheet. Authentic worksheet/online worksheet Wrap up 8. The teacher and students review what they have learned together.
The instruction on English form-meaning links of vocabulary knowledge was given to the participants from the second to the ninth week after the pre-test to allow the students to acquire basic knowledge of form-meaning links. Teaching English form-meaning link presented via the audio-visual approach as an instructional video. The 80 target words were taught, and each word delivered through audio-visual input includes a related picture, word spelling, pronunciation, and an example sentence. Participants will receive eight weeks of audio-visual instructions. Specifically, there were eighty target words in teaching, and each theme, including ten items, was presented to the participants, and each item was shown for about 4 minutes. The participants, for example, learn a target word through video and will receive a relevant picture of the word, word spelling, pronunciation, and sample sentence simultaneously (see Table 6).

Data Collection Procedures
The data collection procedures of this research were divided into three main stages.
Stage 1: The pilot study was conducted. The research instruments were going to examine the content validity with five experts and measure reliability with 30 students, excluding the main study.

Stage 2:
The main participants were asked to complete the pre-test in the first week. From the second week to the ninth week, instructions on vocabulary knowledge via audio-visual input were presented to the participants. Participants received eight weeks of audio-visual instructions.

Stage 3:
The post-test was administered after one week after the treatment, and a questionnaire was used together to ask the participant's perceptions of the treatment.
The productive test of the form-meaning link was conducted before the receptive test of the form-meaning link. The duration of each productive test was 30 minutes, and 20 minutes were allocated for each receptive test. More time was allocated to the productive test as they require more demanding knowledge strategies than the receptive test (Hayashi & Murphy, 2011). The data collection procedures are shown in Table 7.

Data Analysis
The test scores were analyzed using statistical analyses to answer the research questions. First, a paired-samples t-test was used to determine a significant difference between a receptive and productive form-meaning link and pre-and post-tests. Second, an effect size analysis was used to investigate the strength of the effect when it was found in the population. Finally, the data from the questionnaire was analyzed based on the agree or disagree levels of each item rated by participants. It calculated the raw data from the questionnaire into result interpretation.

Results
The results showed that the participants performed better on the receptive test of form-meaning link than on the productive test (see Table 8). Specifically, they had higher scores on the post-test at both receptive and productive tests than their pre-test scores. These indicated that the gap between reception and production tests in the pre-test was 16.86% less than in the post-test (23.57%). The scores between the two-time points (pre-test and post-test) revealed a 23.34% advancement between the reception tests and a 16.63% improvement between the production tests. Furthermore, skewness and kurtosis values were shown around ± 1 and ≤ 0.2, which was verified to be a normal distribution. Then, there was no violation of the statistical assumption. As shown in Table 9, the two times (pre-test and post-test) of the reception tests of form-meaning link (L2TT) were significantly different, indicating a large effect size (t = 350.00, p < 0.001, d = 2.52). The two times (pre-test and post-test) of the production tests of form-meaning link (L1TT) were statistically different, revealing a large effect size (t = 243.18, p < 0.05, d = 2.29).   Figure 1 illustrates the overall test performance on receptive and productive aspects of form-meaning links. The findings indicated that post-test scores of form-meaning links were higher than the pre-test scores at both the reception and production tests. Specifically, after the audio-visual input, the reception test scores were higher than those for the production test. It was evident that the participants' receptive and productive form-meaning link improved after implementing the instructional intervention, as in audio-visual input. Plus, receptive vocabulary knowledge was less complicated than productive vocabulary knowledge. These findings suggest that the receptive aspect of the form-meaning link is more straightforward to master than the productive aspect. Table 11, the participant administered the instruction on form-meaning links through audio-visual input (N = 51). It was reported to improve their vocabulary with 81.25% contribution (M = 4.06, SD = 0.93). Table 8 presents the participants' rates in the perception of the treatment from more effective to less effective.

As shown in
The results showed that most participants, accounting for 86.27 %, viewed that audio-visual input could enable them to learn new vocabulary, which ranked first with the highest mean score (M = 4.31, SD = 0.88). On the other hand, a fair share of them, 72.16%, agreed that using audio-visual input helps improve their recognition and recall of words, ranked last with the lowest mean score (M = 3.61, SD = 1.04). As shown in their responses, over 50% of the participants had a better understanding of vocabulary knowledge.'

Pretest Posttest
Overall Test Performance L2TT L1TT jel.ccsenet.org Journal of Education and Learning Vol. 12, No. 4;  The qualitative data obtained from the open-ended question in the questionnaire were analyzed and thematically categorized to show other relevant perceptions than those listed in the close-ended items related to incorporating audio-visual input in learning the form-meaning link of the words.
In line with the quantitative results, the analysis of the qualitative findings showed that participants perceived that audio-visual input made their vocabulary learning attractive and pleasurable. The participants pointed out that audio-visuals allowed them to understand vocabulary items with greater memorization better. The participants' responses are listed as follows: I feel relaxed when studying, with no tension at all (S1) The video is enjoyable and easily understandable (S2) The video makes it easier to memorize the vocabulary (S3) It is a learning process in a modern world (S4) The video makes me feel more active and eager to study (S5) The statement above indicated that primary school participants had a positive perception of using video visuals to enhance vocabulary learning. In this regard, video-visual input could be valuable for teaching and learning vocabulary in an EFL context.

The Effect of Audio-Visual Input on Students' Form-Meaning Link
The results showed that audio-visual input significantly impacted the participants' receptive and productive vocabulary knowledge regarding the form-meaning link. It was uncovered that the participants achieved higher post-test scores at the receptive and productive tests than their pre-test scores. Notably, their performance on the receptive test was higher than the productive test, implying that they are better at recognizing a particular lexical item's form and meaning than producing semantically meaningful vocabulary forms.
This aligns with the notion, as mentioned in the literature review that receptive knowledge promotes the ability to recall a word in reading. In contrast, productive knowledge goes beyond that level, requiring students to produce a lexical item in writing. This suggests using a particular lexical item productively. The language users need to master it; otherwise, they cannot use it in practice accurately. Still, this incident can be explained by the fact that these are primary students aged between 11 and 12, possess elementary knowledge of English, and are afforded a minimal opportunity to apply it in writing, so this can pose a hindrance to their ability to use a word productively at this stage. The explanation for their better performance on receptive tasks may be that they may be exposed to a specific word through their lesson, reinforcing their ability to recall the form and meaning with greater ease.
The significantly improved form-meaning links of the word could be accounted for by two of the cognitive processes: noticing and retrieval (Nation, 2013). The participants intentionally learned a new lexical item by noticing or devoting their attention to the item they had just encountered in a different context, hence considering such an item worth learning. This may be the attractiveness of audio-visuals for young students. Audio-visuals attract the students' attention and deepen their understanding of target words or stimuli; associating new words with a visual image makes remembering the word's meaning easier. Audio and visual information are coded differently, and linking them creates more effective pathways to retrieval so that it inputs memorization and retention. The presence of both audio and visual cues in this research can facilitate learning a new word when visual and audio representations are continuously present in working memory (Nation, 2013). Yowaboot and Sukying (2022) noted that using pictorial images is better than text when the goal is to remember new words because images are strongly associated with words. Moreover, research has proven that words, coupled with pictorial images with texts, lead to a better depth of processing than when using texts alone (e.g., Magnussen & Sukying, 2021).
Combining multiple media types allows numerous retrieval routes to the information (orthographic spellings and meanings in this research) in a student's long-term memory (Nation, 2013). Importantly, using audio and visual input encourages students to remember words since this method can be a high consolidation strategy that can be encoded based on the nature of the stimulus. In line with this, other studies also showed that using images and sound was more effective than other non-verbal media, such as videos, in vocabulary learning (Magnussen & Sukying, 2021).
The current results align with previous studies that audio-visual input can improve learning vocabulary (Yowaboot & Sukying, 2022). The finding showed that audio-visual input fostered a better outcome, especially for the form-meaning link in the post-test; participants could recall the word's spelling and retrieve its meaning.
Since EFL students find words difficult to acquire due to their complex forms and meaning senses, audio-visual input may help students remember and recall new words in an EFL context. This situation could be because they activate students' memories longer than texts alone. Nation (2013) and Magnussen and Sukying (2021) suggested that combining more than one media mode was attributed to the cognitive process involved in the mental model building so that vocabulary students can maintain lexical processing ability.
The better performance on the post-test of the research could be fostered by using audio-visual input in that such materials generally require their attention while viewing to comprehend the materials; otherwise, they cannot grasp the contents. They also acquired a word through frequent retrieval. As posited, repetition and recovery of a word would allow the students to remember it effectively. The audio-visual input could enable this process as they are naturally repetitive, promoting the participants' frequent exposure to the word and acquiring its meaning. This is consistent with Webb (2005) that improved vocabulary knowledge could be attributed to repetitions; multiple repetitions enable them to become increasingly familiar with the word and can store its meaning with less effort.
The 23% and 16% improvement in receptive and productive vocabulary knowledge could be illustrated by the senses of hearing and sight activated by the incorporation of audio-visual input. Gautam (2022) states that both senses can improve students' memory. Similarly, the sense of sight enhances learning (Ibe & Abamuche, 2019). This could be because the participants in this research as young students tend to establish clear connections effectively when hearing and seeing the contents at the same time; the sense of hearing allows for the creation of sound associations, while that of sight fostered by vividness enables visual associations or visualization, potentially facilitating them in remembering the meaning of words with visual and audio cues stored in their mind.
Despite that, the finding improved both aspects of vocabulary knowledge, i.e., reception and production. This finding is consistent with the existing literature of prior studies in that audio-visual input was effective in enhancing the students' vocabulary knowledge (Batool, Ahmed, Rehan, & Zahra, 2022;Hariffin & Said, 2019;Peters, 2019;Lestari & Selian, 2021;Safitri, Farmasari, & Thohir, 2022;Yawiloeng, 2020) and improving various aspects of vocabulary knowledge as in the form-meaning link recall and recognition (Hariffin & Said, 2019;Teng, 2022). This could be because they could access visual and audio contents, which provided helpful information about new words, thus allowing them to learn vocabulary more effectively, as pointed out by Yawiloeng (2020). Moreover, as Low and Sweller (2014) stated, audio-visual input creates a contextual learning environment, exposing students to words presented vividly and authentically. This phenomenon could be explained by the notion proposed by Çakir (2006) that exposure to contextual language, authentic input, and non-verbal cues assists the students in understanding the subject matter quickly. Finally, since these audio-visual inputs are naturally inherited with repetition, as mentioned by Widiastuti (2011) and Mansourzadeh (2014), this fosters their learning and information retention, thus being able to remember the form and meaning of the words easily.
The favorable perceptions of using audio-visual input in vocabulary learning could be due to the teaching process and features of audio-visuals. Audio-visual input combines auditory and visual platforms, which assist students in learning by offering enjoyable and interactive learning experiences. This aligns with Shah and Khan (2015) that audio-visual input helped create a stimulating environment and draw students' attention to a word. The positive perception of using audio-visual input in learning a word could be because incorporating audio-visuals (pictures and sounds) in teaching could explain complex content collaboratively and understandably, as suggested by previous studies (Durbahn, 2019;Magnussen & Sukying, 2021). Indeed, audio-visual resources (e.g., sounds and images) facilitate vocabulary acquisition, particularly for young Thai primary school students.
The audio-visual resources or materials help in acquiring new lexical items. Many agreed that audio-visual input could facilitate their comprehension and foster vocabulary knowledge, as in the form-meaning link. A fair share viewed that it enhanced their ability to recognize and recall words. Half of the participants better understood vocabulary when taught with audio-visual input. On qualitative results, the results show the same trend in that using audio-visual input created a more enjoyable and engaging learning environment; as a result, that aided in improving their understanding of vocabulary, especially word recognition and recall. Moreover, such an environment increased their motivation to learn vocabulary.
These results are consistent with prior research in that the students enjoyed integrating audio-visual input in vocabulary learning and projected positive attitudes (Batool, Ahmed, Rehan, & Zahra, 2022;Milosevic, 2017;Thohir, 2022;Yawiloeng, 2020). The explanation for their favorable perceptions may be that the audio-visual input, as found by Ogasawara (1994), offers a comfortable and engaging learning environment. As a result, it does not cause any tension and boredom while learning which may emerge in other modes of learning; simply speaking, it cultivates a sense of comfort and enjoyment in learning, which is one of the goals of teaching English to young students proposed by Widiastuti (2011). What's more, the selected audio-visual input in this research is sufficiently understandable and tailored to the student's English level, potentially leading to a relaxing and lively learning atmosphere.

Conclusion
The current research provided evidence to support the effect of audio-visual input in enhancing vocabulary learning and form-meaning links knowledge in particular. Specifically, the present results showed that primary school students significantly improved their vocabulary knowledge, as the inferential analyses proved. The findings also showed that primary school students had a positive perception of using audio-visual input in learning vocabulary in English language classrooms. The current research has proven that audio-visuals are a valuable tool for vocabulary learning.
This research on vocabulary learning through audio-visual input among Thai primary school students offers some pedagogical and research implications. In terms of pedagogical implications, as shown in the results, the student's vocabulary knowledge in receptive and productive aspects improved. This suggests that audio-visual input can effectively improve Thai students' vocabulary knowledge; combining sounds, images, and videos fosters students' ability to understand the words and recall the meaning easily because these inputs provide visual and audio cues. Furthermore, it suggests that a context should be given so the students can acquire new lexical items effectively and become aware of different uses or meanings in diverse contexts. Educators can draw on the research's findings to design material by incorporating more audio-visual input in classroom activities. Concerning research implications, the research's results serve as evidence for the effectiveness of using audio-visual input in improving vocabulary knowledge, which should be employed to encourage educators to use these materials more frequently.

Limitations and Suggestions for Future Research
Given that this research consisted of only one group of participants as in an experimental group, this may result in limited generalizability of the findings. The results cannot be generalized to a larger population or applied to other populations or participants in different contexts without comparing the experimental and control groups. This research only focused on the form-meaning link, so the results did not show the impacts of these audio-visual inputs on other aspects of vocabulary knowledge. Based on the limitations mentioned earlier, further studies are suggested conducting comparative research which compares the effects between two groups of students, namely the experimental and control group. That may help ensure that the results can be applied to a broader range of students and that no other factors are at play, influencing the improvement in their vocabulary knowledge. Future research should investigate whether audio-visual input fosters advancement in all aspects of vocabulary knowledge. Lastly, future studies may conduct a retention test to examine the effect of audio-visual input in vocabulary learning; doing so may shed light on whether this instructional approach contributes to long-term vocabulary retention.