Learning a Second Language Naturally the Voice Movement Icon Approach

Second language (L2) instruction greatly differs from natural input during native language (L1) acquisition. Whereas a child collects sensorimotor experience while learning novel words, L2 employs primarily reading, writing and listening and comprehension. We describe an alternative proposal that integrates the body into the learning process: the Voice Movement Icon (VMI) approach. A VMI consists of a word that is read and spoken in L2 and synchronously paired with an action or a gesture. A VMI is first performed by the language trainer and then imitated by the learners. Behavioral experiments demonstrate that words encoded through VMIs are easier to memorize than audio-visually encoded words and that they are better retained over time. The reasons why gestures promote language learning are manifold. First, we focus on language as an embodied phenomenon of cognition. Then we review evidence that gestures scaffold the acquisition of L1. Because VMIs reconnect language learning with the body, they can be considered as a more natural tool for language instruction than audio-visual activities.


Introduction
Often, when L2-teachers employ authentic recordings of foreign language materials such as dialogs, students listen to the audio files and fill gaps in an accompanying text (Macaro, 2006).This trains listening skills and enables learners to understand foreign language speakers.Also, listening to authentic language is intended to provide appropriate training for detecting unknown vocabulary items, novel morphological and syntactic structures and to prepare for language production.However, the validity of listening comprehension activities for language domains has not been questioned in the last decades, nor has their efficiency been empirically tested (Plonsky, 2011).Similarly, it is not clear how language production can benefit from listening comprehension training.Certainly it is more natural to hear spoken language than to only read it as learners did before the advent of audio-visuals.However, reading can be helpful when hearing is impaired or pronunciation is idiosyncratic.
Nonetheless, audio-visual encoding of language is far from natural input, far from native language learning, where, a child also collects multiple sensorimotor experiences linked to a concept.For example, an infant hearing "lemon" has already visually identified the object, i.e., its shape and color, the surface of the fruit, its position.The infant has touched, smelled, tasted and dropped the lemon.By doing so, the child assembles all possible pieces of sensorimotor experience in order to build a mental representation of the fruit and, in real life, to interact with it in an appropriate way.In this context, the name of the fruit, the word, is only one of the manifold components of the concept.
Sensorimotor experience is the natural way of acquiring words in a native language (Meltzoff, Kuhl, Movellan, & Sejnowski, 2009).Formal instruction does not provide the learner with an appropriate environment for learning language in a natural way.The learning process students undergo in the classroom does not match what happens during native language acquisition.Moreover, audio-visual encoding lacks all the body-related components necessary to naturally assimilate the novel phoneme sequence (Mandler, 2012).This might be an a priori explanation of why learning vocabulary from lists can be tedious and inefficient.Learners might be reluctant to reconstruct L1 learning experiences in the classroom.In fact, it is not easy to provide learners with the necessary objects (even if they are not abstract) and make them interact with them.

The Voice Movement Icon (VMI) Approach
During her teaching of Italian at the beginning of the 90ies, the author made extensive use of listening comprehension activities.She rendered pregnant explanations of new words by performing actions and pantomiming the words instead of translating them into the learners' native language (German).She observed that students not only understood the meaning of the novel words but that they memorized them better.Furthermore, if the students themselves performed the gestures while saying words that were difficult to remember, the students' memory performance was even more enhanced.Words, chunks and phrases were easily retrieved in role plays, and language production was easier.While these observations had anecdotal value, the author began to systematically encode difficult words with gestures in her classes.Over the years in different classes and different levels of language, she observed that retrieval of the gestured words was noticeably higher.Furthermore, even after many months, students could retrieve the words they had encoded with gestures.
In a publication in German for instruction practice, Macedonia (1996) first described the beneficial effects of gestures on memory for words that are first performed by the teacher and then repeated by the learners in Italian lessons.She named this learning strategy Voice Movement Icons (VMIs) (Macedonia, 1999).The VMI approach is an active encoding strategy used for novel texts in a foreign language.On the word level, a VMI consists of a word in L2 -spoken and read -enriched by a sensorimotor experience, i.e., an action -a few steps for to go -or a gesture by shaping the hand into a letter "C" and moving it to one's mouth, for to drink.A VMI consists of a word that the trainer speaks aloud in L2 and synchronously pairs with an action or a gesture.The translation into L1 can be written or oral and enunciated by the trainer in order to avoid misunderstandings on the meaning of L2.The VMI is first performed by the trainer and thereafter by the learners.The action or gesture is consistent in its shape over time.
A VMI encompasses two phases: perception and reproduction.In the perception phase, learners perceive the acoustic shape of the word, focus on a string of letters, watch a sequence of body movements, and the facial expression of the trainer.On the semantic level, by interpreting the gesture, they decode word meaning.In the reproduction phase, learners read the word, repeat what they have heard and imitate the action or gesture.Within a short time, a VMI clusters the word read, heard and spoken, i.e., the -Voice -and the action or gesture -Movement-performed by the trainer and then repeated by the learner.The action or gesture gives birth to a sensory-motor program with a certain shape, an image for the word it represents, i.e., an Icon.Note that in order to make sure that learners connect words to actions and gestures in an unequivable way, the translation of the L2-item into the learners' native language is often provided.In fact, words that belong to the same semantic field like walk, run, stroll etc. are easy to confuse.VMIs can be grouped into categories depending on the action or gestures performed: iconic, deictic and symbolic VMIs.

Iconic VMIs
An iconic VMI combines a word (spoken and written) with an iconic movement.It fills the novel word, an unknown sequence of graphemes and phonemes, with a sensorimotor program in one of two ways: either the gesture can be an action reproducing the motor program enacted during L1 acquisition or the word's semantics can be represented by a gesture that is chosen by the trainer and is plausible to the learner.When English speaking students learning Japanese encounter the word iku (to go), they can walk a few steps through the room.This roughly corresponds to the action, the natural motor program that the learner's body couples with the word during L1 acquisition.This is not new in L2 instruction.In fact, the Total Physical Response proposal (Asher, 1969) cued learners to perform actions in order to memorize vocabulary.However, since it is not always possible to perform real actions related to the words in the classroom, a gesture might need to replace the action.For example, horu (Japanese, to dig) cannot be performed as an action for obvious reasons, so a simulation of digging, i.e., a gesture, is needed.Likewise iku can be simulated by a gesture: students can use their index and middle fingers to create a motor image of walking.However, action words are better remembered if encoded through action instead of iconic gestures (observation of the author).Concrete words are the best candidates for iconic VMIs.For example, for the word flower, the gesture can represent the shape of the object or parts of it (stalk).Also, the gesture can be the movement that we perform when using the object (offering a flower, smelling it or picking its petals).For a concrete word like book, an iconic gesture can be used, e.g., opening an imaginary book.However, the same iconic gesture can accompany an abstract word with a possible connection to the semantic fields of the concrete gesture.For example, the noun theory can be gestured by opening an imaginary book and reading it.Thus the gesture represents the word's semantics in a no-compelling way that is plausible to learners.Hence, there are no generally applicable VMIs that trainers need to learn in advance.L2 teachers can choose and vary upon a large number of possibilities that are understandable and relevant to their target group.Iconic VMIs can incorporate an action itself or gestures that depict a word's semantics or some salient features that are arbitrarily chosen by the teacher.

Symbolic VMIs
A symbolic VMI pairs a word to a more abstract gesture that becomes symbolic.A symbolic gesture does not depict a word's semantics; the shape merely stands for it.For example, consider the gesture for "OK", when we press the tips of our thumb and forefinger together and fan the other fingers.By convention, we know what this means.Gesture research has termed this an emblem (McNeill, 2012).Many emblems are culturally defined within geographic boundaries, as Kendon (2004) describes in his book on Neapolitan gestures.Few are known in large areas of the world, such as the gesture for "quiet" that is made by holding the stretched forefinger in front of pursed lips.In everyday communication we use a multitude of symbolic gestures.They accompany and sometimes also substitute for spoken language, in noisy environments or when speakers are too far apart to hear each other.When using a symbolic gesture in a VMI, it is important to consider gestures already present in the culture of the learners in order not to mismatch meaning.If a gesture is already known in a culture to indicate quiet, the symbolic gesture will match an internal image present in the learners' minds and will be easily understood by the learners.If it is unknown, it might cause a mismatch and/or irritate users and possibly impede learning (Kelly, Creigh, & Bartolotti, 2010).
Symbolic gestures need not necessarily to be emblems; they can also be iconic gestures with a high degree of abstraction that makes the original shape unrecognizable.Consider to go: we can represent this through an action, with the highest degree of iconicity, by walking a few steps.We can also use a gesture by moving our index and middle finger as if they were legs in motion.In this case we extract some features of the action and represent them.Also we can go further in abstraction and just quickly displace our hand away from our body as if we were chasing a flea.Thus, the more our original action gesture goes into abstraction, the more symbolic it becomes.Hence, concrete words can also be paired with symbolic gestures as the gesture becomes more abstract.
Abstract words are inherently harder to represent by gestures.Many words however, like theory can be paired with associable images and represented as in the section above.Other abstract words, particularly function words (i.e., adverbs like hence or conjunctions like although) have a grammatical function within the sentence and a high degree of abstraction.Their semantics lack a concrete or metaphoric image and cannot be represented by iconic gestures or emblems.Function words in L2 are most difficult to remember (Macedonia & Knösche, 2011).In order to create VMIs for such words when the movement can neither be an action nor an illustrative gesture, the solution is drastic.The gesture must be invented from scratch and therefore it will be arbitrary.For example, for a word like although (see Figure 4), we can raise our right arm or our left leg or both together or make a little jump or anything else.This creates a highly symbolic gesture for the function word and couples to it.Note that the gestural shape should remain constant, i.e., not vary with every change in parameters (e.g., location or dimension).Also, the gesture should not be similar to another gesture paired to another word.This would create interferences and be disruptive to learning.The advantages of inventing gestures for function words are better retrievability and longer decay times compared to audio-visual encoding (Macedonia & Knösche, 2011).

Deictic VMIs
When accompanied by a deictic (a pointing gesture), demonstratives, this, that, and place adverbs, here, there, constitute a deictic VMI (Figure).Deictic gestures are traditionally performed with the extended (arm and) finger (McNeill, 2012), with a flat hand (Kendon, 2004) and in some cultures with the lips, as in Laos (Enfield, 2001).A deictic VMI can also encode a concrete object present in the room, such as a door.Thus for some objects we can make a deictic gesture.However, the more deictic gestures we use within a lesson, the more VMIs will lose saliency as they are deprived of their own sensorimotor shape.Hence the impact on memory for the words encoded through deictic VMIs will diminish (observation by the author).

How to Use VMIs in Practice
VMIs accompany novel words when foreign-language texts are presented and replace listening comprehension activities during lessons.In practice, the text to be presented is projected on the wall with teacher and students standing.The teacher proceeds by encoding each sentence.She reads it aloud and complements novel words that are difficult to associate with an action or a gesture.Take for example a Japanese text taught to English natives, in which somebody is hungry and the speaker proposes going to a restaurant.Onaka suita (literally "belly empty") conveys I am hungry.For beginners both words are new.Hence the teacher creates a VMI for each word, as illustrated in Figure 6.The next sentence in the dialog, restaurant ni iku (literally, "restaurant in go") means let's go to a restaurant.Here the French loan word restaurant does not need a VMI.The other two items are better accompanied by a gesture or an action.For in the teacher performs a deictic gesture and for go an action, e.g., walking a few steps through the classroom, as illustrated in Figure 2a  For onaka (belly) the trainer drums on her belly; for suita (empty) she looks in front of herself as if she were standing in front of an empty container.The word restaurant is a loan word, hence associable and not encoded with a VMI.In the case of ni (in), the trainer points with her finger onto her open palm and for iku (go) the trainer walks a few steps.
The relevance of VMIs varies according to the target group, i.e., it depends on the students' competence in the target language.For a novice in the first hour of class, almost every word is important and hence needs to be VMI-encoded, while advanced learners will have only the unfamiliar words encoded as VMIs when the words lack associative bridges to the learners' native language.After observing the teacher, students repeat what they hear and read along with the accompanying movement(s).This procedure is iterated for each sentence a certain number of times depending on the target group.Due to better memory, young students need fewer repetitions than elder learners (Nyberg, Lovden, Riklund, Lindenberger, & Backman, 2012).VMIs should be repeated often enough to make the word coupled with the gesture retrievable.In order to assess whether learners have internalized the VMIs, at the end of the lesson the teacher should perform the gesture(s) and the learners should be able to trigger the word(s) in the foreign language by only watching the teacher.Once vocabulary has been assimilated, the corresponding gestures are put aside.In an advanced stage of foreign-language acquisition, the utilization of VMIs shifts from lexical items (because they have become largely known or associable) to morphological and syntactical structures (Macedonia, 1999).A teacher using VMIs for the first time should mark words within the text that need VMIs and train with the VMIs like choreography before presenting the text to a target group (Figure 6).

VMIs Accompany Language but They Are not Co-Speech Gestures
A large body of research investigating the impact of gestures in instruction has grown in the past few years, not only in language research.For example Valenzeno et al. (2003) have demonstrated that, compared to pure verbal explanation, gestures enhance children's understanding of concepts such as symmetry and asymmetry.Mathematics learning has proven to work better if deictic, iconic and symbolic gestures are used during explanation (Alibali & Nathan, 2011).In L2 instruction, teachers and learners make extensive use of gestures in different domains of language acquisition (Gullberg & McCafferty, 2008;McCafferty & Stam, 2008).For example L1 speakers trying to enhance understanding complement and accommodate oral production in L2 through gestures (Olsher, 2008).However, VMIs are not co-speech gestures.First, by definition VMIs cluster a word and a gesture.Second, both word and gesture are first observed and then imitated.Hence, action/gesture is one of the components of VMIs.Third, unlike true co-speech gestures, gestures contained in VMIs are not produced spontaneously in order to accompany language.Moreover, most of the gestures used in VMIs are not consistently part of a common gestural inventory shared by teachers and learners.Whereas an emblem like OK is well known to all users, a symbolic VMI first needs to be created by the teacher and used within the group.

Actions or Gestures Used for VMIs Are not Signs
Actions or gestures are VMI components that enrich (written and spoken) words in the foreign language with additional sensorimotor information.They can be considered as meta-code accompanying the code (the foreign language) and having validity within the learners' group.However, VMIs are not a code used for communication, as signs are.It might be tempting to consider the use of signs within VMIs.This would be facilitating in practice as signs would provide a ready-to-use inventory of gestures.On closer examination, however, this turns out to be problematic for at least the following reasons: a) Teachers would need to learn a sign language in order to produce VMIs; b) the amount of effort and time to dedicate to sign-language acquisition must be compared to the effort spent to learn a spoken language; c) furthermore, what level of skill in the sign language should the teacher reach before using signs to implement VMIs?In practice, the acquisition of a sign language in order to generate VMIs would possibly limit the use of this learning strategy.Also the question arises on which sign language should be chosen.There are several variants of sign languages for English: American, British, South African, New Zealand and Australian.There are German, Swiss and Austrian variants of sign language for German.For the different Chinese languages, the Tower of Babel of signs is even more complicated.Because of all these reasons, trainers should feel free to choose actions or gestures in a way that helps their target group and enriches the word by a sensorimotor component.They can be iconic, symbolic or deictic; they must be clearly understandable, easy to reproduce and different from each other in order not to create interference.As VMIs do not serve the purpose of communication, they need not be subject to its rules.VMIs serve first as a decoding and then as an encoding tool.It activates multiple senses, facilitates, and enhances the storage of foreign language.

Plausibility of Gestural Representation
Gestures incorporated in VMIs must be plausible to learners.For natives of German, the gesture representing the Japanese word ie (house) could produce the shape of a roof with both arms.Also, the gesture could be performed with both index fingers so that the shape of a roof is still recognizable.Learners would also understand the gesture for house if the teacher arches one arm hold over their head.This suggests the idea of shelter and is still a plausible motor image for the concept of house.However, there is a limitation concerning acceptable gestures.If learners were told to scratch their heads while saying ie, they probably would think that ie is the word for scratch; then learning that ie means house would irritate or amuse them.Hence, scratching is not a suitable motor image for house.In fact, we have a subconscious idea of an image connected to a concept (J.Engelkamp, 1980;Saltz & Donnenwerthnolan, 1981).A number of experiments have demonstrated that if a word and a motor image do not match, cognitive processing is disturbed (see for a review Macedonia & Von Kriegstein, 2012).In an experiment in L2 vocabulary learning, Macedonia et al. (2011) cued participants to memorize concrete words by pairing them with either iconic or meaningless gestures.As hypothesized, subjects learned significantly more words that were accompanied by iconic gestures.Moreover, brain imaging during recognition of words encoded with meaningless gestures revealed activity in a network denoting disturbance along with effort to integrate mismatching information.In other words, the brain not only stores a word but also motor images representing it.Because of this strong coupling between a word's semantics and a gesture, in practice the same gesture should not be incorporated in two different VMIs.This would lead to interferences and possibly disrupt learning.

Benefits of Using the Body as a Learning Tool
In VMI-supported lessons, teacher and learners stand and move around and they speak.In audio-visual lessons, students sit, listen and write.The benefit of performing gestures in order to acquire foreign language is that it significantly enhances memory performance.In the last three decades, laboratory research has repeatedly shown that gestures have an impact on memory for verbal information (see for reviews H. Zimmer, 2001;H. D. Zimmer & Engelkamp, 2003).Unfortunately, this knowledge did not reach L2 research and practice when it was developed in the beginning of the eighties (J.Engelkamp & Krumnacker, 1980;J. Engelkamp & Zimmer, 1984;J. Engelkamp & Zimmer, 1985).So L2 research went its own paths and did not focus on memory as an intrinsic component of language learning.Asher (1969) had already proposed using the body in order to support memory in his Total Physical Response.However, he stagnated on a descriptive level and did not prove empirically that action has a beneficial effect on storage and retrieval of verbal information.At the beginning of the eighties, watching pantomimes was addressed by Carels (1981) as a strategy that supports memory; however, no empirical studies were conducted in order to prove the benefits of gestures in foreign language classes.So over the years the potential of action and gesture remained an opinion, a possibility in the multitude of methods and learning strategies in L2.The body as a learning tool was not considered to be a real option in formal instruction.The first attempt to empirically prove the efficiency of gestures paired with foreign language was made by Quinn-Allen (1995): She presented first-semester French students fifty French expressions and split the participants into three groups.Experimental group 1 paired the expressions like J'en ai ras le bol (I've had it up to here) with emblematic gestures; the emblem for this expression was sweeping a hand over one's head.Also, these students performed the gestures to recall the words after learning.Group 2 learned the items by only reading them.Group 3 saw the gestures not during learning but during recall.Quinn-Allen demonstrated that group 1 performed best in both the short and long range.In fact, 11 weeks after encoding, participants that had learned the expressions with the emblems had forgotten significantly fewer sentences than the other groups.
In order to better control the materials to be learned, i.e., to avoid associations between to be words to be learned and languages already known to the participants, Macedonia (2003) created an artificial corpus of 36 words conforming with Italian phonotactic rules.Participants (young adults, 20.4ys) learned single words (nouns, adjectives, verbs, prepositions): 18 audio-visually and 18 by additionally performing a gesture (iconic or symbolic), hence by using an iconic or a symbolic VMI.In the 14-month longitudinal study, memory performance was assessed through word translations from the native into the target language.At each of five time points, retrieval was significantly better for the words learned through gestures than for those encoded audio-visually.
In a study with university students, Kelly et al. (2009) presented 12 Japanese verbs according to four conditions: (i) speech, (ii) speech + congruent gesture, (iii) speech + incongruent gesture, and (iv) repeated speech.As hypothesized, participants performed best with words enriched by congruent gestures, while words accompanied by incongruent gestures were retained worst.Considering that motor activity per se (not only a plausible gesture) paired with a word could be the factor leading to superior memory performance (Schmidt-Kassow, Kulka, Gunter, Rothermich, & Kotz, 2010), Macedonia and colleagues (2011) trained university students to memorize 92 concrete words of Vimmi, an artificial language for experimental purposes.Half of them were encoded with iconic VMIs and the other half with VMIs whose gesture was meaningless, i.e., stretching one's arms in front of oneself, shrugging one's shoulders.As expected, memory performance assessed by means of cued recall tests was significantly better for iconic VMIs than for VMIs containing meaningless gestures.This study thus confirmed that the motor image produced by the gesture matters, i.e., that mere physical activity does not suffice to support word recall.These results suggest that the mind stores a sensorimotor image (Paivio, 1969) and implies that if the gesture (at least partially) contains this image, then the gesture helps to better retain the novel foreign word.
However, a good portion of vocabulary consists of abstract words that do not seem to contain a sensorimotor image per se.To explore the question of whether VMIs also have an impact on memory for abstract words, Macedonia & Knösche (2011) trained young adults (18-25ys) on a corpus of 32 sentences of Vimmi, such as miruwe ifra kadu bekoni (the driver presently ignores the warning).The sentences comprised 118 single words belonging to different categories: subject, verb, adverb and object.Subjects were concrete nouns and indicated actors.The other words were abstract.16 sentences were encoded audio-visually and 16 audio-visually complemented by a gesture for each word (VMI).The gestures for the actors were iconic, whereas the gestures for the other words were symbolic.Memory performance was assessed at six different time points with free and cued recall tests.The overall results showed significantly better retrieval in the short range and long range for VMI items.Both concrete and abstract words accompanied by symbolic VMIs were significantly better retrieved than those encoded audio-visually.Thus iconic VMIs support better memory performance than meaningless movements, but symbolic VMIs still enhance memory performance compared to pure audio-visual learning (Macedonia et al., 2011).
In another Vimmi study, Mayer et al. (in preparation) cued subjects (18-25ys) to learn 90 Vimmi words.Thirty were learned audio-visually, 30 through an iconic VMI, and another 30 where a cartoon illustrating the word meaning was presented and participants had to follow with their right index finger a prominent line in the air along the drawing.In this third condition, learners enriched the word by a movement they themselves chose to perform.In the short term, there was a significant difference in memory retrieval for the words encoded with both sensorimotor enrichments, i.e., VMI and drawing, compared to words encoded audio-visually.However, after two and six months, VMI-encoded words scored better and the difference between the two sensorimotor enrichments became significant.Hence, in this study, VMIs proved to be better tools to encode vocabulary than drawing salient lines of a picture representing the word's semantics.
More recently, in a study on vocabulary learning by Bergmann and Macedonia (2013), subjects (18-25ys) learned 45 Vimmi items according to three conditions: 15 audio-visually, 15 by imitating gestures performed by a human trainer and 15 by imitating gestures performed by a virtual trainer a sociable agent.Independently of the trainer cueing participants, human or virtual, VMI-encoded words scored better in the short term and in the long term (after 30 days).
An interesting aspect of VMIs is their impact on memory in low performers, as investigated in a combined behavioral and brain imaging study conducted by Macedonia et al. (2010).Whereas high performers constantly learned well despite conditions, low performers profited significantly from the use of VMIs.Considering a normal distribution of performance in language classes, VMIs thus provide greater support for learners who, for whichever reason, do not achieve average performance in learning vocabulary.
Altogether, a growing body of evidence in recent years has shown that gestures accompanying novel words enhance memory performance in L2.Although L2 research has not focused on memory in past decades, speaking a foreign language is possible only if learners have an adequate inventory of words at their disposal.Hence, memory matters.

Why the Body Helps the Mind
Teachers and students standing around in the classroom, gesturing and speaking aloud is not our usual image of L2 instruction.We are used to sitting quietly, listening and reading.Hence learners sometimes question whether gestures are redundant in language learning.They fear overloading their memory with sensorimotor information that, in their view, is not necessary to learn verbal information.In fact, a common view prevails that our brain is like a computer; i.e., the more it stores, the slower it works.Accordingly, learning is supposed to be efficient if it fulfills the notion of economy in a reductionist way, i.e., if it is not redundant.It is true that redundancy overloads information systems and slows down their information processing.However, computers and brains are not the same.In an article on similarities in computers and brains, Nagarajan and Stevens (2008) convincingly argue that neither the hardware nor the architecture of the two systems can be compared.Even in those few aspects, where we might see connections, our brains are much more powerful than machines.When tasks are comparable, such as speech or face recognition, brains outstrip computers in speed of processing and reliability.Furthermore, scientific evidence in gesture research has shown that gestures do exactly the opposite of overloading cognition: they lighten the load.For example, participants did better in their task when explaining mathematics (Cook, Yip, & Goldin-Meadow, 2012;Susan Goldin-Meadow, Nusbaum, Kelly, & Wagner, 2001).Even when people refer to objects that are not present, they perform better if they are allowed to gesture (Ping & Goldin-Meadow, 2010).Adding a gesture to a word is not unnatural redundancy, as it happens thousands of times a day.Hence, by using a gesture to learn a word we do not add redundancy to the task and we do not overload cognition: we learn naturally.
The idea that gesture and spoken language have to be separated goes back to the dichotomy introduced by Descartes between body and mind in his Discourse on Method in 1637.In this book, the French philosopher he maintains that mind and matter are different as mind is not governed by physical laws.This dichotomy has persisted over the centuries and was indirectly reinforced through amodal theories of cognition in the 1970s (Fodor, 1976(Fodor, , 1983;;Fodor, 1977).These theories postulated that concepts are amodal, i.e., not related to physical modalities; thereafter, concepts are abstract entities and words are unrelated to the body.Words are symbols that label objects in the real world.
However, in the last two decades, experimental psychology and neuroscience have contributed to a deep change in the view on cognition.Laboratory evidence has proven that our body and higher cognitive functions (mind) are tightly connected (see for reviews Barsalou, 2008;Gallese & Lakoff, 2005).This view is called embodiment (Jirak, Menz, Buccino, Borghi, & Binkofski, 2010) and claims that words are grounded in the body.In infancy, word learning is connected with a range of bodily experiences (Pulvermuller, 2005).In fact, when acquiring a novel word in L1, for example banana, a child sees but also grasps, smells, tastes and interacts in many ways with the fruit.This interaction is not amodal, not abstract, not symbolic.It is only possible because the child uses its body to explore the fruit.Similarly, the child's brain represents all experiences collected with the banana in an extended network connecting sensorimotor areas with language regions (Pulvermuller, 1999).The connection between sensory experience and language is given in all domains related to our senses.In fact, merely hearing odor words such as cinnamon, jasmine or garlic elicits activity in olfactory regions of the brain (González et al., 2006).It becomes clear that words are not labels for concepts: they are sounds or written components of concepts and concepts are grounded in the body (Fischer & Zwaan, 2008).

Interaction between Language Development and Gestures
When children acquire language, caregivers support oral production by reinforcing and correcting it.Words, phrases and sentences are in the focus of attention.However, alongside spoken language another communication system silently grows that they are not aware of: gestures.Since the 1970s Piaget's seminal work on linguistic and non-linguistic symbols has motivated developmental scientists to investigate the link between language development and gesture (J.M. Iverson & Thelen, 1999).Many studies have documented that language and gestures are two sides of the same coin and that they both develop as an integrated system (S.Goldin-Meadow, 1998).It is striking that milestones in language development and gestures emerge together.At the age between 4 and 9 months babies start babbling.Parents see no connection between the first attempts to articulate syllables and the rhythmic hand banging occurring when babies interact with caregivers; however, this hand banging is considered to be a rhythmic precursor to babbling (Masataka, 2001).
Single word comprehension develops at around 10 months (Capirci & Volterra, 2008;Parise & Csibra, 2012).At this age, infants cannot articulate the words they understand.Instead, they point.By doing so, they direct an adult's attention to something or make an adult attend to something or retrieve an object (Bruner, 1983).Here, the use of deictic gestures is considered protoimperative.In the interaction between a child and an adult, Tomasello et al. (2007) recognize an even more sophisticated mechanism: an infant's influence on the person with whom she is interacting.Her aim is cooperation and shared intentionality, uniquely human traits inherited with the gestural side of communication at this very early stage of prelinguistic development.Children use pointing with not only protoimperative but also protodeclarative intention.Interestingly, this is not specific to Western cultures alone.A study conducted in countries worldwide (Papua New Guinea, Indonesia, Japan, Mexico, Saomo, Peru, etc.) by Liszowski et al. (2012) with children 10-14 months of age has confirmed the hypothesis that pointing is a universal aspect of prelinguistic communication.Furthermore, across cultures, pointing is used to start communicative interaction and not simply as mimicking of behavior.The authors also report the directionality of pointing that is initiated by adults a few months before children start doing it.
Children possibly point at an object or animal, such as a bird, to indirectly ask for instruction.Caregivers usually respond to the gesture by naming the object or the animal.Thus at this stage of development it seems that pointing fulfills the function of a language instruction tool.The next step in language development is the transition from single words to two-word combinations.Iverson and Goldin-Meadow (2005) investigated this phase in 10 children.They found that gestures were precursors to words.Children first produced the actions; then the words appeared with a time delay.Moreover, children producing gesture-plus-word combinations, e.g., pointing at bird and uttering the word "bird" were the first to produce the two-word utterance "bird nap".
Children begin naming objects (doggie) and actions (drink) at around 12 months.The first actions performed when interacting with objects, i.e., a cup moving towards the lips, reproduce what children see adults do.These actions have been called "gestural naming" and occur parallel to "recognitory naming" (E. Bates & Dick, 2002) e.g., doggie.Further in development, gestural naming occurs in a more abstract way, with objects one cannot use for example to drink but also empty hand.Gestural naming is a transient phenomenon in normally developing children; it disappears once the child learns and uses more and more spoken language (at about the age of 18 months).
In a study on the acquisition of specific object names, Zammit and Schafer (2011) observed 10 mothers in interaction with their children.When naming objects, mothers showed different behaviors: they only said the word or they accompanied the word with a deictic or an iconic gesture.Iconic gestures facilitated the comprehension of words.Thus, the authors conclude that gestures support linguistic development.It is interesting to observe that parents are aware of instructing their children in language when they talk to them.However, parents do not know of doing so when they gesture.Short phrases make their appearance at around 18 months and are accompanied by pointing at objects and naming them.Behavior observed at this age includes pantomiming of complex sequences of gestures, e.g., stirring with an imaginative spoon in a non-existent cup and drinking out of it.However, these gestures appear during play without the aim of interacting (Jana M. Iverson & Goldin-Meadow, 1998); they are simulations of actions on some an abstract level, possibly an indicator that thought, language and gesture are becoming an integrated system.
At the age of 24 to 36 months, children's grammar greatly improves with more complex syntactic constructions and with inflection.This verbal phase accompanies further development of complex gestural behavior.It consists of deictic (pointing), iconic and symbolic gestures with communicative and or symbolic content, with or without objects (Elizabeth Bates, O'Connell, Vaid, Sledge, & Oakes, 1986).Now, even if children already use spoken language, they combine words with gestures and thereby convey more differentiated meaning in communication (Butcher & Goldin-Meadow, 2000).
With time, the repertoire of language and gestures gradually grows and cultural influences slowly bias the use of gestures.In a study by Huttunen et al. (2013), British and Finnish children aged 2 to 5 years had to accomplish a picture-naming task.Both groups used more pointing than iconic gestures at the age of 2. Thereafter gesture use decreased with the creation of a spoken lexicon.However, over the duration of the experiment, British children gestured more than Finnish children.The authors of the study connect this result to possible cultural differences in gesturing both in the children's environment and in parental use of gestures, as previously indicated in a study by Rowe et al. (2008).Also, depending on language structure, gestures seem to support and compensate features that might be underspecified.This was demonstrated in a study conducted by Demir et al. (2012), in which Turkish-and English-speaking children were asked to describe short vignettes.
The tight connection between language and gesture during development has also been well investigated in clinical populations, i.e., autistic disorders (Iverson, 2010), specific language impairment (Iverson & Braddock, 2011), mental retardation such as Down syndrome (Capone & McGregor, 2004) and Williams syndrome (Bello, Capirci, & Volterra, 2004).Here the common denominator is that language delay accompanies poor use of gestures.A recent paper by Ozcaliskan et al. (2013) compared simple and complex sentence production of children with prenatal/perinatal brain lesions (PL) at age 2 with sentence production of typically developing children (DL).Children with brain injury showed delays in development of speech and gesture.Moreover, in complex sentence constructions, PL children did not make use of gesture and speech combinations before producing only speech as with DL.Instead, PL children, although with delay, produced complex sentences using only speech.The authors of the study advance the hypothesis that these children might be impaired in producing motorically demanding gestures.Interestingly, studies with both Down and Williams syndrome children suggest that they compensate language deficits by making increased use of gestures during communication (Bello et al., 2004;Stefanini, Caselli, & Volterra, 2007).
The above literature documents the silent emergence of gesture in language development and provides strong evidence for the interconnectedness of these two aspects in our communication system.In this perspective, VMIs can be seen as a natural supporting tool for the growth of L2.

Conclusion and Implications for Foreign Language Instruction
The use of gestures in formal L2 instruction is limited to spontaneous pantomiming and deictics during explanation.We have presented VMIs, a learning strategy combining novel words in L2 with iconic, symbolic or deictic gestures.VMIs are neither spontaneous co-speech gestures nor signs.Instead, they are performed by the teacher in L2 during encoding and then actively repeated by the learners.Besides helping to encode a word's semantics, VMIs enhance vocabulary storage in terms of quantity and retention over time.Several studies in L2 word acquisition have demonstrated that the gestural component enriches the audio-visual input and leads to better memory performance.
Why gesture is beneficial to L2 learning can reside in the fact that language is a cognitive skill rooted in our bodies.For a long time, we were not aware of this.Mainly neuroscientific experiments have contributed to this view, demonstrating that the brain represents concepts and words as a product of their encoding, i.e., of bodily experiences acquired when a child interacts with the world.Unfortunately, L2 instruction still implicitly grasps onto theories of cognition that separate body from mind; therefore L2 instruction has not yet seriously considered the use of our body as a learning tool despite the fact that clearly first language acquisition and gesture are tightly connected -not only that they emerge more or less at the same time during development.Indeed, gestures are communicative precursors to language that catalyze it in the prelinguistic phase and serve as an instruction and communication tool between parents and children.Because VMIs intrinsically contain gesture, they relink the body to L2 learning and overcome the dichotomy between body and mind.In other words, VMIs make L2 learning more natural and therefore more efficient.This should be taken into consideration in formal instruction.

Figure 1 .
Figure 1.Voice Movement Icon (VMI) for English observe (for German learners)

Figure
Figure 2a/b.Action and gesture for to go Unlike to dig, the verb to go allows the performance of both an action (a) and of a gesture (b) within the VMI.

Figure
Figure 3a /b.Iconic gestures for flower andbook or theory

Figure
Figure 4a/b.Symbolic gestures for the function words although and already

Figure
Figure 5a/b.Deictic VMIs for there and here /b.