A Review of Visual Metaphor Based on Visual Typologies and Verbalization Forms

This study is a result of literature review of visual metaphors in the fields of linguistics and advertising. The collected fifty papers came from CNKI. It provided an overview of previous literature, in terms of multi-modal studies of visual metaphor in advertising, the typologies of visual rhetoric and the verbalization of visual metaphor. Based on the identified research gaps, this study proposed suggestions for future research to enrich the theoretical framework of visual metaphor. The review found that the study of visual metaphors remain insufficient in three aspects. Firstly, the relevant studies are mostly in the field of marketing, and lack of extension in linguistics. Secondly, most studies concerned about the cognitive effects of visual metaphor in advertising, but the cognitive processing form of visual metaphor was less focused. Furthermore, although some studies have proved that verbalization is necessary for the comprehension of visual metaphors, there is still no clear conclusion regarding the specific verbalization forms of different types of visual metaphors. Specifically, three syntactic structures have been hypothetically proposed for fusion-structured visual metaphors, as “A is like B, A is B, A with B”, but no empirical evidence suggests which is the most effective in conceptual representation of fusion-structured visual metaphors in advertising. Through the analysis of the differences between these syntactic structures, the author proposed that the verbalization form “A is B” should be the most effective in representing fusion-structured visual metaphors in the context of advertising for its basic metaphorical structure, easily comprehensible form and strong transformational effect.


Introduction
Advertising plays an increasingly influential role in the modern world with the promotion of the information era, which has been penetrated almost all fields of human life. Over the years, words are no longer the only means to convey information, while the image gradually become the indispensable carrier of information dissemination. Less attention is given to verbal elements but instead more to visual in advertisements (Madupu, Sen, & Ranganathan, 2013, p. 58). In order to cope with the rising market competition, advertisers tend to promote the quality of their ads by adopting kinds of solutions, many of which now prefer leaving the interpretation open to individuals, rather than telling them directly how to interpret the ads (Phillips & McQuarrie, 2003, p. 6). In such a case, rhetoric figures as an aspect of advertising style are enjoying a popularity in ads, with the implicit depiction that can provoke more elaboration, more positive attitudes and enhance ad recall (McQuarrie & Mick, 1999;Tom & Eves, 1999;Toncar & Munch, 2001;Mothersbaugh, Bruce, & Franke, 2002;Mcquarrie & Mick, 2003). In particular, visual metaphor, showing one object is like another by comparing two images that may be completely different (Hu, Sun, Cao, Yang, & Wang, 2014, p. 607), have been extensively used in modern advertising, for its facilitating role in producing more elaboration and more favorable attitudes towards the ads, and resulting in great viewer's response (van Mulken, Le Pair, & Forceville, 2010;Madupu et al., 2013;Chang & Yen, 2013;van Mulken, van Hooft, & Nederstigt, 2014).
The rising usage of visual metaphor in ads encouraged a growing stream of scholars to do the research on visual metaphor in the advertising context, in terms of perception, recognition and comprehension. Some of them have suggested that the comprehension of metaphor is multimodal in nature (Jung-Beeman, 2005;Eviatar & Just, 2016) and verbalization is need for comprehension of visual metaphors in some cases (Ojha, Indurkhya, & Lee, 2017, p. 258). However, in the previous studies, there is a lack of attention to the verbalization forms of visual metaphor with three visual structures (juxtaposition, fusion, replacement) and how they are processed cognitively, that is, knowledge also remains less about the verbal representation of visual metaphor conceptualization. This paper, focusing on the conceptual representation of fusion-structured visual metaphors, analyzed panoramically previous studies of visual metaphor from three aspects, including the multi-modal studies of visual metaphor in advertising, the typologies of visual metaphor and the verbal representation forms (syntactic structures) of visual metaphor. Finally, based on the literature review, the research gap of the existing studies and the direction of future research were pointed out. Besides, the author put forward the corresponding hypotheses about which verbalization form (syntactic structure: "A is B", "A is like B", "A with B") is the most effective in reflecting the conceptual representation of fusion-structured visual metaphors in the context of advertising, that is, the verbalization form which can achieve the best cognitive effect in the conceptual processing of fusion-structured visual metaphors in advertising context is preliminary identified. The findings of this paper will be significant, theoretically and practically, to the study of visual metaphor, linguistic and advertising.

Method
In this paper, both quantitative and qualitative approaches were employed to provide an overview of research of visual metaphor from three aspects: multi-modal studies of visual metaphor, visual typology and verbalization forms. In the quantitative analysis, fifty academic articles were collected from CNKI, covering a variety of academic research fields, such as advertising, marketing, rhetoric, linguistics, and cognitive psychology since 1977 to 2019. Moreover, five advertising pictures selected from Internet and magazines were used as specific examples to show different kinds of visual metaphors in detail. In qualitative study, according to an analysis of previous research findings and related theories, the research gap of existing research was pointed out, before the most effective verbalization form in reflecting the conceptual representation of fusion-structured visual metaphors in the context of advertising were hypothesized.

Studies of Visual Metaphor in Advertising
Metaphor is a cognitive phenomenon of thinking, that is, a transformational way of thinking from the source domain ('vehicle' in the field of rhetoric) to the target domain ('tenor' in the field of rhetoric) (Lakoff & Johnson, 1980, pp. 12−30). Visual metaphor is the extension of metaphor on the visual perception level, which was defined by the domestic scholar Hu (2014, p. 607) as "showing one object is like another by comparing two images that may be completely different." At present, visual metaphor has become one of the common rhetoric methods in the field of visual rhetoric. At the same time, with the increasing use of visual metaphors in the field of advertising (Mcquarrie & Mick, 2003), the research of visual metaphor has shown a trend of diversification at home and abroad.
In this section, three main perspectives of the studies of visual metaphor in advertising are introduced: 1) the multi-modal studies of visual metaphor in the field of advertising; 2) the typologies of visual metaphor; and 3) the verbal representation forms (syntactic structures) of visual metaphor.

Research of Visual Metaphor in Advertising in Multi-Modality
In the past few decades, most of the studies of metaphor focused on the verbal and cognitive dimensions. A few theories have been proposed to explain how metaphors are constructed in people's mind. According to the interaction theory proposed by Max Black (1979, pp. 19−43) which holds an interaction view of metaphor, two items exist in the context of a metaphorical statement, identified as the 'primary' item and the 'secondary' one, and the two are interacted with each other. However, views of Black only briefly touched upon the idea that metaphor is a kind of cognitive matter rather than a language form (Forceville, 1996). Lakoff and Johnson (1980, p. 15) who proposed the conceptual metaphor theory, later argued that "metaphor is not just in language but also in thought and action, and its essence is understanding and experiencing one kind of thing in terms of another".
Following these two major theories discussed above, Forceville (1996) held the view that metaphors occur on the level of cognition firstly, manifested on verbal level as well as on pictorial and possibly on other communication levels, such as sounds and gestures (Forceville, 2007, p. 16). He also promoted a more inclusive theory of metaphor as a cognitive phenomenon, making contribution to a theory of pictorial metaphor, the manifestation of metaphor in pictures which was later redefined as visual metaphor. Over the time, a substantial number of visual metaphors used in advertising context were analyzed by many scholars in the light of this theory, most of which explored the effect of advertising visual metaphors on advertising comprehension and advertising attitude (McQuarrie & Mick, 1999;Tom & Eves, 1999;Phillips, 2000;Toncar & Munch, 2001;Mothersbaugh et al., 2002;Phillips & McQuarrie, 2004;McQuarrie & Phillips, 2005;van Mulken et al., 2014). They found that advertising with visual metaphors obviously were advantageous in many ways, such as attracting attention, increasing interest and enhancing persuasive effect (McQuarrie & Mick, 1999;McQuarrie & Phillips, 2005;van Mulken et al., 2014).
Another contribution of Forceville (2007) is that he distinguished visual metaphors into two modalities: mono-modal metaphors and multi-modal metaphors. Therefore, apart from carrying out studies on advertising effect, in the past two decades, many researchers paid their attention to visual metaphors in both modalities in advertising, thus, a fair number of theoretical studies have been yielded.
In terms of the studies of mono-modal visual metaphors in advertising, more focus in this area was given to the effect of complexity degree of visual structures on cognitive elaboration, comprehension, and appreciation. According to van Mulken et al. (2010, p. 3418), the metaphor comprehension resulted in more appreciation of advertisements with visual metaphors. Then, a curvilinear pattern was proposed by van Mulken et al. (2014) to predict the effectiveness of ads based on the relation between complexity, appreciation and comprehension. Some scholars proposed that viewers enjoy solving the metaphor riddle, which could lead to an increased appreciation, but there existed a tipping point in this process where the appreciation tends to decrease when the message is too difficult to comprehend due to an overload of needed cognitive effort (Phillips, 2000). Van Mulken et al. (2014) later also verified the tipping point in an inverted curvilinear pattern (see Figure 1) for conceptual complexity and appreciation which showed that the fusion-structured visual metaphors with moderated complexity were appreciated more than the more complex ones (replacement), although not as well comprehended as simple metaphors (juxtaposition). Multi-modal metaphors derived from "verbo-pictorial metaphors" which firstly was classified by Forceville (1996, p. 148). In such a metaphor, one domain (source or target) is represented in visual and the other in verbal. Then it was redefined as a sub-type of multi-modal metaphor (Forceville, 2007, p. 16), whose target and source domains were rendered in two different modes, such as written and spoken language, gestures, images, and sounds (Forceville, 2007, p. 16). This use of multimodal information is popular in print advertising, as the verbal and pictorial elements are both the most fundamental components. Therefore, a growing stream of scholarship has contributed new insights to the multi-modal metaphors in advertising, specifically focusing on how the metaphorical meaning is constructed in pictorial metaphors in multi-modal context.
It is generally accepted that pictorial and verbal elements interact dynamically in multi-modal metaphor, that is, comprehension of metaphor is a kind of multi-modal process, based on the view that "metaphor comprehension is a complex process which requires special brain regions to deal with different perceptual and imagery tasks" (Ojha, 2013, p. 105). Several neuroimaging studies of verbal metaphor have confirmed the role of visual areas in the comprehension of verbal metaphors. Experimental findings of brain imaging studies on verbal metaphor further demonstrated other cognitive functions, such as imagery, integration mechanism for multi-modal stimuli. (Brownell, Simpson, Bihrle, Potter, & Gardner, 1990;Winner & Gardner, 1977;Rinaldi et al., 2004). Based on some researches, several metaphor processing models have proposed that comprehension of verbal metaphors need an imagistic module (Carston, 2010). Besides, Ojha (2013) also found that similar to verbal metaphor comprehension, the language-related areas in brain are activated for comprehension of visual metaphors (p. 141). Based above the researches, it could be justified as a conclusion that comprehension of metaphors, whether verbal or visual, necessarily involves a multi-modal interaction.
Since metaphor comprehension is multi-modal in essence, in this regard, Ojha, Indurkhya and Lee (2017, p. 2) suggested that linguistic or symbolic resources were required to interpret visual metaphors. Nowadays, many visual metaphors are often presented with verbal messages in the main body or in headline. Therefore, academic studies of visual metaphors tended to explore the effect of verbal elements on the comprehension of visual metaphors in ads. A few researchers found that the "verbal anchoring" (verbal message) in advertising with visual metaphors, viewed as a judgment cue of how much processing effort should be paid, could affect the comprehension and appreciation of visual metaphors in various degrees. Alba and Hutchinson (1987, p. 143) suggested that verbal anchoring made it easier for images to be understood because the explicit verbal cues will reduce the amount of required elaboration to interpret information by providing a link to the knowledge stored in memory. Phillips (2000, p. 15) claimed that "verbal copy was an anchor, which could help explain the meaning of complex image advertising to viewers". One point in his study emphasized was that implicit headline which provided a clue to the comprehension of a visual metaphor could result in a high appreciation by improving the comprehension, while the headlines which explained the ad metaphor completely would lead to an increased comprehension, but a decreased ad liking due to a reduction in viewers' pleasure of interpreting the ad message. According to Ang and Lim (2006, p. 50), the use of a non-metaphoric headline with a metaphoric illustration provided an interpretation of ads in a suitable degree, which would result in the most favorable attitudes and behavioral intentions. As Lagerwerf, Van Hooijdonk andKorenberg (2012, p. 1850) proposed that when verbal anchoring was added to a visual metaphor, the comprehension was improved by a strengthened connection between two visual elements in visual metaphors.
Based on the above, the comprehension of visual metaphors is a kind of multi-modal processing, which proves that verbal message indeed plays a role in comprehending visual metaphors. However, it's not the only influencing factor, as different types of visual structures also have a certain effect on the cognitive processing of visual metaphors, which was elaborately distinguished by different typologies of visual metaphors.

Typologies of Visual Rhetoric
The study of visual metaphor is not limited to the influence of modalities and verbal interaction on the comprehension of visual metaphors. As mentioned before, visual structures also play a role in the cognitive processing of visual metaphors, and related findings of which are derived from the development of the typologies of visual rhetoric. A variety of typologies from different dimensions were proposed to effectively explain different visual rhetorical figures into meaningful categories, which made much contribution to the differentiation of processing effect for different structures of visual metaphors.

Forceville's Typology
When it comes to the classification of visual rhetoric, the first one needed to be mentioned is the typology proposed by Forceville (1996). He pioneered a development of pictorial metaphors in advertising and provided the most basic classification model for the study of pictorial metaphors. He firstly pointed the view in his book Pictorial Metaphor in Advertising that metaphors could occur in pictures, and more specifically they were often used in billboards and print advertising (Forceville, 1996, p. 1), which proved that metaphor as a conceptual phenomenon (Ortony, 1979;Lakoff & Johnson, 1980) could be manifested more than in language.
According to Forceville (1996), four types of visual metaphors were derived from verbal metaphor forms, depending on the nature of the source domain, as MP1s, MP2s, PS and VPMs. According to the verbalization form "A is B", he distinguished visual metaphors into two types, 'MP1s' and 'MP2s'. MP1 refers to metaphors with one pictorially present element, that is, only one domain of the metaphor is pictorially presented, and the other is absent which needs to be suggested or established through the pictorial context (Forceville, 1996, p. 163).
For example, Figure 2 shows an advertising about honey. In this picture, only a target domain occurs-a bottle of honey, while the source domain-fruit that grows on a tree is absent which needs viewers to interpret based on the context. In comparison, MP2s is one type of visual metaphor which features both elements pictorially, indicating that two domains physically integrated into a single gestalt can be easily identified (Forceville, 1996, p. 163). As illustrated in Figure 3, an advertisement about chili sauce, integrating with a fire extinguisher to indicate that the too spicy sauce with an increasing heat resembles a fire which needs to be put out by a fire extinguisher. As for the verbalization form "A is like B", he proposed it could be manifested pictorially as PM, a pictorial simile, in which two separate entities are pictorially depicted in their entirety rather than partially like "MP2s" (Forceville, 1996, p. 163). Therefore, removing the pictorial context (if it exists) will not restrain viewers' recognition. As shown in Figure 4, an advertisement picture of Audi, in which a car and a gecko are shown side by side, inviting viewers to equate the car and the gecko in the way that the car is as light as the gecko. In the above three types, the two domains (source and target) are explicitly represented or unambiguously established by pictorial advertising. While, the last one he proposed is the pictorial metaphor that the two domains (source and target) rendered in two different modalities, one in text and the other in picture, called the verbo-pictorial metaphors (VPMs) (Forceville, 1996, p. 163). We can see in Figure 5, in this coco cola advertisement, one information about metaphor is conveyed by both text and picture, that is, coco cola is a hero. The target domain-cola shown in picture, while its source domain-hero illustrated in text.   Vol. 11, No. 3; Later, based on these initial categorizations, a new definition of three pictorial metaphors was put forward (Forceville, 2002) as follows: pictorial simile, context metaphor (MP1s) and hybrid metaphor (MP2s).

Phillips and McQuarrie's Typology
Following Forceville's (1996Forceville's ( , 2002 framework, several scholars put forward new classifications. Phillips and McQuarrie (2004, p. 114) argued that advertising designers tend to choose pictorial elements from one "palette" with an internal structure in which the location of the relevant elements has corresponding impact on viewer's cognitive process. Therefore, in order to differentiate and organize the diversity of pictorial strategy in advertising, a new and applicable visual rhetoric typology based on structural, pragmatic and conceptual perspectives of visual rhetoric was constructed to comprehend advertising pictures, which distinguished nine types of pictorial metaphors by inter-crossing visual structure and meaning operation. Moreover, Phillips and McQuarrie (2004, p. 118) suggested that viewers' response varies with possible combinations of these two dimensions with different richness and complexity inherent in this typology.
The visual structure and meaning operation in this typology were constructed based on two axes: complexity and visual richness. Visual structure refers to "the way the two elements that comprise the visual rhetorical figure are physically arranged in the ad" (Phillips & McQuarrie, 2004, p. 116). There are three types, such as fusion, replacement and juxtaposition, consistent with the hybrid metaphor, context metaphor and pictorial simile respectively in Forceville's typology (2002). Specifically, fusion refers to two metaphorical elements (source and target) artificially fused into one hybrid entity; replacement refers to images in which the present image points to the absent, that is, in these two metaphorical elements, only one exists and the other does not, and juxtaposition refers to two elements juxtaposed side by side (Phillips & McQuarrie, 2004). These types of visual structures are illustrated specifically in a candy advertisement in Figure 6. Phillips and McQuarrie (2004) demonstrated that complexity, as an important property demand it places on viewers' processing of the ad is related to viewers' different cognitive processing. They also suggested that the visual structure could be arrayed in the light of the complexity degree which decreased along the continuum of three visual structures, from replacement to fusion and to juxtaposition. The dimension of meaning operation refers to "the target or focus of the cognitive processing required to comprehend the picture" (Phillips & McQuarrie, 2004), including three possibilitiescomparison for similarity, comparison for opposition and connection, all of them can be arrayed based on the polysemy, ambiguity degree, or richness of reference, which decreases from comparison for opposition, to comparison for similarity and to connection (Phillips & McQuarrie, 2004, p. 116, 118). Connection can be expressed as "A is associated with B" which motivates viewers to wonder "how the metaphorical elements can be associated to create a link between them", and the rhetorical purpose is to increase some salient aspects of element A which provides the association with the element B (Phillips & McQuarrie, 2004, p. 119). Comparison for similarity refers to two metaphorical elements are the same in some way, which can be stated as "A is like B", inviting the viewer to make a comparison between the two images to produce some inferences about other similarities (Phillips & McQuarrie, 2004, p. 119;Madupu, Sen, & Ranganathan, 2013). As for the meaning operation of comparison for opposition, it can be stated as "A is not like B", that is, the two metaphorical elements are different in some ways, inviting the viewers to compare two elements and make one or more inferences about their differences (Phillips & McQuarrie, 2004, p. 119). This concise typology derived partly from the previous taxonomies provided a new perspective that link viewers' cognitive and emotional responses with types of visual rhetoric figures, concerning how these various types of visual rhetoric figures affect viewers' cognitive processing, such as elaboration and changed belief (Phillips & McQuarrie, 2004, p. 113). This paper focuses on the fusion-structured visual metaphors, due to the best cognitive effect it can achieve with a moderate complexity (van Mulken et al., 2014;van Mulken et al., 2010).

Gkiouzepas and Hogg's Typology
Another extension of visual structure was made by Gkiouzepas and Hogg (2011) with the reconsideration of the previous typologies. Different from the dimensions constructed by Phillips and McQuarrie (2004), based on two new dimensions: visual scenario and objects' mode of representation, they proposed a new conceptual framework for visual metaphor, resulting in 6 types of combinatory mechanisms. It provided a systematic and meaningful way for analyzing the relation and position of visual objects in ads.
The fist dimension, visual scenario, addresses "how the two metaphorical elements are constructed to relate with each other visually" (Gkiouzepas & Hogg, 2011, p. 105). Three possible types of visual scenario were distinguished as follows: realistic symbiosis, replacements, artificial symbiosis. Realistic symbiosis refers to "two metaphorical elements represent a metaphorical link related to real life and show some unexpected similarities in terms of position, angle of view or color" (Gkiouzepas & Hogg, 2011, p. 105). Replacement means one of the metaphorical elements is replaced by an element outside the metaphorical pattern, where both of which are presented visually as entities. And in the third one, artificial symbiosis, "two metaphorical elements are artificially put together in the same visual space, which might attribute to the absence of realistic visual background, lack of perspective and differences in some other elements, such as size and position" (Gkiouzepas & Hogg, 2011, p. 106).
The other dimension is objects' mode of representation, addressing how the entities are expected to appear or arrange in the real world. Or in other ways, it deals with whether the two metaphorical objects juxtapositioned side by side (whole entities) or synthesized into a whole (part of the entities) (Gkiouzepas & Hogg, 2011, p. 105).
The difference between two visual representations of juxtaposition and synthesis, is similar to Phillips and McQuarrie's (2004) taxonomy of juxtaposition and fusion and Forceville's (1996Forceville's ( , 2002 taxonomy of pictorial simile and hybrid metaphor. However, by introducing the new dimension-visual scenario, this new conceptual framework highlighted the distinctions between the two visual structures-juxtaposition and fusion (Phillips & McQuarrie, 2004).

Peterson's Typology
Although the typology proposed by Phillips and McQuarrie (2004) is concise and conductive for a deeper exploration of visual metaphor, some easily recognizable advertisements as visual metaphor may affect their distinctions (Peterson, 2019, p. 76). With an aim to better show the variation of visual metaphors in advertising and more clearly identify structures of visual metaphor, Peterson (2019) adopted and expanded Phillips and McQuarrie's (2004) typology, developing a new operational typology of visual rhetoric in advertising.
In his taxonomy, seven visual structure were suggested, including identification, pairwise juxtaposition, categorical juxtaposition, replacing juxtaposition, replacement, replacing fusion and fusion. Identification shares some similar features with the verbo-pictorial metaphor by Forceville (1996), in that only one domain or object pictorially is present, and the other in the textual form, by using the typography strategy of labeling (Peterson, 2019, p. 77). Both source and target domains in two kinds of juxtaposition structures, as pairwise juxtaposition and categorical juxtaposition are all presented in a whole; in the former, two entities are separately located in which the viewers need to link them in an ambiguous visual relationship; while as for the latter, the entity of source domain is presented relative to a set of entities of target domain, members of which are each unique but belong to one category, so viewers must identify the concept that these varied entities represent (Peterson, 2019, p. 77). Another important distinguished category is replacement, in which, based on the absence of one domain, three types were classified as replacing juxtaposition, replacement, replacing fusion. Replacing juxtaposition refers to that one of the member of a range of similar entities is replaced by another entity (Peterson, 2019, p. 79), which is similar to the replacement-juxtaposition and artificial symbiosis-juxtaposition in the typology of Gkiouzepas and Hogg (2011). The second one, replacing fusion is an artificial hybrid image in which part of an entity is replaced by another part of entity or whole one (Peterson, 2019, p. 80). And the fusion structure is identical with previous taxonomies, that is two metaphorical entities are fused to present an original hybrid image.
Various visual metaphors were distinguished based on the above typological studies of visual metaphors. All of them are committed to classifying visual metaphor on the spatial distribution of visual elements at the visual level. However, it's also of great importance to differentiate the expression of visual metaphor at the linguistic level, that is, verbalization forms of different kinds of visual metaphor, which could provide a new perspective for the classification of visual metaphors.

Verbalization of Visual Metaphor
As Forceville (1996) mentioned in his book Pictorial Metaphor in Advertising, it's necessary to "translate" pictorial metaphors into language, as he argued that different verbalization reflect different ways we experience the metaphors. If we feel that one verbalization form describes what we 'perceive' more appropriately than others, it tells us how we convert our 'perceptions' into categories and concepts (Forceville, 1996, p. 133).
From the previous studies, verbalization forms of visual metaphor could be identified as several syntactic structures, and the differences among them were examined in many studies.

Syntactic Structures of Visual Metaphor
Forceville (1996,2002) classified four types of pictorial metaphor, including context metaphor, hybrid metaphor, pictorial simile and verbo-pictorial metaphors (VPMs). According to Forceville (1996), the context metaphor and hybrid metaphor were distinguished from the verbal metaphor "A is B" and the pictorial simile was derived from the verbal simile "A is like B".
Nevertheless, Mashal et al. (2014, p. 221) claimed that when people asked to describe a visual hybrid in verbal form, they prefer to transfer their conceptualization into a grammatical construction, usually a kind of noun phrase or sentence. In terms of context metaphor and hybrid metaphors, which are respectively paralleled to replacement-structured visual metaphors and fusion-structured visual metaphors in Phillips and McQuarrie's (2004) typology, Forceville (1996) maintained that the original verbalization form (A is B), which is the "stronger" form, can "force" viewers to analyze cross-domain mappings, so as to fully interpret the two domains, since other syntactic structures may interfere with the verbalization with irrelevant information. As for the pictorial simile (juxtaposition structure), he proposed it is more appropriately verbalized in the form of verbal simile "A is like B", the "weaker" form, which "invites" viewers to construe cross-domain mappings by themselves (Forceville, 1996). Therefore, the verbalization form of juxtaposition-structured visual metaphors and replacement-structured were confirmed, as Forceville (1996) suggested for the verbalization form or syntactic structure "A is like B" and "A is B" respectively. While in recent years, some other verbalization forms for fusion-structured visual metaphors were hypothetically proposed as "A with B" and "A is like B" (Forceville, 1996;Mashal et al., 2014;Yeshayahu & Gil, 2017;Ojha, Gola, & Indurkhya, 2018). Mashal et al. (2014) studied the conceptual hierarchy effect of humans-animals-plants-non-animate objects by using visual hybrids which were represented in the verbal form "A with B". According to Ojha et al. (2018), it was found that for pictorial simile and pictorial metaphor (hybrid metaphor), the verbalized simile (A is like B) and verbalized metaphor (A is B) were always used to convey a similar meaning. In other words, it verified the fusion-structured visual metaphors can also be represented in the verbalization form of "A is like B" and "A is B".
Although these three syntactic structures "A is B", "A is like B" and "A with B" can be represented as verbalization forms for fusion-structured visual metaphors, there are some differences among them.

Differences in Syntactic Structures of Fusion-Structured Visual Metaphors
Differences among three syntactic structures of fusion-structured visual metaphors will be stated, with an aim to propose the hypothesis about the most effective verbalization (syntactic structure) in reflecting the conceptual representation of fusion-structured visual metaphors in the context of advertising, which could also lead to an optimal cognitive effect in processing this kind of visual metaphor.
In regard of the syntactic structure-"A is B", as illustrated by Lakoff and Johnson (1980), a wealth of metaphorical sentences, displaying a wide variety of grammatical structures, are generally traced back to a conceptual metaphor of the basic form "A is B". Shu (2000, p. 102) also held a view that "A is B" is a basic form in metaphor, supporting Lakoff and Johnson's (1980, p. 16) insight that "the essence of metaphor is understanding and experiencing one kind of thing in terms of another". While "A is like B" is a typical form used in simile, as a figure of speech which compare one thing with another for some common characteristics. It is generally composed of three parts: tenor, vehicle and simile marker, in which the simile marker acts as a medium and bridge to connect the tenor and the vehicle, often using the word "as" and "like" (Liang, 1996, p. 72). As for "A with B", it's a noun phrase containing two modifications, in which the word 'with' functions as a preposition, representing for concomitant (Cui, 1991, p. 39). It's also a basic grammatically asymmetrical structure, the genitive or attributive possessive case, in which the head and the modifier have different asymmetrical grammatical roles and are related to different linguistic properties (Mashal et al., 2014, p. 217). Compared with first two structures, this grammatical asymmetry usually leads to an Ontological Hierarchy effect (Yeshayahu & Gil, 2017, p. 1191, rather than a mapping effect between two domains.
In essence, the relationship between these two syntactic structures-"A is B" and "A is like B" is the relationship ells.ccsenet.org English Language and Literature Studies Vol. 11, No. 3; between metaphor and simile, which is always a controversial topic in some fields, such as philosophy, psychology and linguistics. Views of the production and comprehension of metaphor and simile fall into two categories, that is, comparison theory and categorization theory. The comparison theory holds that metaphor is an abbreviated simile (Aristotle, in Barnes, 1984), and therefore, there is a slight difference between metaphor and its corresponding simile. As Aristotle (in Barnes, 1984) said, "The simile is a metaphor differing only by the addition of one word (like), and they express almost the same figurative meaning". While those who hold a categorization view claim that metaphor is a categorization assertion and simile is a similitude assertion, which means that comprehension of metaphor is a process of categorization, and simile is a process of comparison (Glucksberg & Keysar, 1990;Glucksberg & Keysar, 1993). Aristotle (1984), who held a comparison view, suggested "A is B" and "A is like B" are equal, has drawn extensive commentary and criticism in recent years (Chiappe & Kennedy, 2000, p. 371). The occurrence or absence of the word "like" can evoke different cognitive processing and determine the differences in cognitive function simultaneously. According to Chiappe and Kennedy (2001, p. 250), metaphors are preferred over similes when similarity between source and target domain is salient. Glucksberg (2003, p. 95) pointed out that metaphors comprehension should be faster than that of similes. In Harris, Friel and Mickelson's (2006, p. 14) view, metaphors are more unlikely to be used to compare the similarities between two things than similes. And Ricoeur (1977) held the view that cognitive insight could strengthened in concentrated consciousness and highly perceptual involvement in metaphor processing. While the cognitive effort, in turn, may produce psychological response and perceptual involvement in different degrees. According to De Sousa (1980), the increasing perceptual awareness is more likely to produce a strong affective component. Therefore, it can be stated that metaphor which could generate high affective response relate to its great cognitive effort. Differences also occur in "transformation effect", the same as the conceptual blending between two domains in Fauconnier's theory (1997). According to Verbrugge (1980), the metaphor processing involved a fused identification of the target and source which was more likely to produce a list of common features; in comparison, for simile processing two domains are identified equivalent. Therefore, he supported the notion that metaphors should provoke a stronger fusion between two domains than similes in which target and source remain separately, that is metaphors are at least more likely to catalyze the thought of readers than similes which has a strong transformational effect between two domains.
By distinguishing differences between the three syntactic structures, a hypothesis could be proposed as "A is B" is the most effective conceptual representation of fusion-structured visual metaphors for "easily comprehensible form" and "strong transformational effect", provoking a strong fusion between two domains, which is consistent with the characteristic of fusion-structured visual metaphors which are artificially generated by fusing two metaphorical elements into a hybrid entity.

Research Gaps
Based on the above studies of visual metaphor in three aspects: multi-modal studies of visual metaphor, typologies of visual metaphor and verbalization forms of visual metaphor, several specific research gaps can be identified which needs further exploration and deepening to fill the deficiencies of the existing research.
Firstly, at present, visual metaphor has become one of the most widely used rhetorical devices in advertising. Advertising, as a medium of information dissemination, has certain commercial value. Therefore, most of the research on visual metaphor in advertising gathered in the field of marketing to evaluate the effects of metaphorical advertisements by studying the influence of visual metaphor in advertising on the cognitive behavior of consumers, that is, the cognitive processing effect of visual metaphor, such as ad comprehension and ad likability. However, few scholars have paid attention to the interdisciplinary nature of visual metaphor and integrate it into other fields, such as linguistics.
Secondly, it has been proved by some scholars that comprehension of metaphor is a multi-modal process and the language-related regions of the brain are activated during visual metaphor comprehension. But the concrete cognitive processing form of visual metaphor in brain, that is the conceptual representation of visual metaphor, still need further investigation.
Thirdly, various types of visual metaphors have been classified by visual typologies based on distribution of visual elements. Although verbalization plays an important role in visual metaphor comprehension (Forceville, 1996;Ojha et al., 2017), few studies have classified different types of visual metaphors from the perspective of verbalization form. And specifically, there is a lack of attention to the most suitable verbal representation (syntactic structure) for the visual metaphor with three visual structures (juxtaposition, fusion, replacement) respectively.
Lastly, in the previous studies verbalization forms (syntactic structure) of fusion-structured visual metaphors (target and source domain fused together pictorially) were hypothetically proposed in three types (A is B, A is like B, A with B) (Forceville, 1996;Mashal et al., 2014;Ojha et al., 2018), no empirical evidence suggests which is the most effective in reflecting conceptual representation of fusion-structured visual metaphors in advertising context, that is which can achieve the best effect on the conceptual processing of fusion-structured visual metaphors in advertising.
Thus, to our knowledge, some important gaps were identified about the research of visual metaphor, providing an impetus for the future research.

Conclusion
Over the years, visual metaphor has become a hot topic in academic research, and great achievements have been made. Through the literature review, this paper found that the conceptual representation of visual metaphor, such as the specific verbalization form for visual metaphor comprehension remains unknown in current studies. Furthermore, based on the analysis of differences between three hypothesized verbalization forms of fusion-structured visual metaphor-"A is B", "A is like B" and "A with B", this paper proposed the syntactic structure "A is B" should be most effective form of conceptual representation of fusion-structured visual metaphors for the easily comprehensible form and strong transformational effect between two domains.
However, the identified research questions need to be empirically testified and demonstrated in the future study, which may be examined in three experiments. With an aim to find out the most effective conceptual representation of fusion-structured visual metaphors in advertising context, three studies have been designed: 1) a self-reported questionnaire with a task to instruct participants to rate on line the appropriate degree of the three verbalization forms used for describing advertising pictures; 2) a behavioral experiment, in which participants will be encouraged to identify the two domains in advertising pictures before verbalizing them in one of the verbalization forms (A is B, A is like B, A with B). Reaction time of participants and frequency of each verbalization form used will be recorded; 3) a comprehension-based judgement task, in which advertising pictures and corresponding verbalization forms in three syntactic structures (A is B, A is like B, A with B) will be used as stimulus materials for participants to judge whether those verbal phrases are useful in describing the corresponding advertising pictures via button-pressing. Reaction time and accuracy rate will be analyzed to identify which verbalization form can achieve the best cognitive effect in the conceptual processing of fusion-structured visual metaphors in advertising context.
In conclusion, the main contribution of this paper lies in that it integrated the research of visual metaphor into linguistics which could enrich the theoretical framework about visual metaphor in the research field, leading to a deeper understanding of visual metaphors and their corresponding syntactic structures. In future studies, scientific experimental methods can be used for the confirmation of the hypothesis proposed in this paper.

Suggestions for Future Research
By reviewing the recent research on visual metaphor from three aspects at home and abroad, this paper found that although these studies have made many achievements, there is still an improvement needed to be made. Therefore, several suggestions were put forward to enlighten the future research from the following three aspects: research objectives, research perspectives and research methods. First of all, the objects of research shouldn't be limit to print advertising with visual metaphor, but instead should be extended to multi-modal metaphors in advertising, such as dynamic TV advertising, film advertising and so on. Secondly, in order to break through the theoretical framework of visual metaphor, the significant endeavor of future research should combine visual metaphor with other disciplines, other than advertising, marketing and communication. Finally, with an aim to get reliable conclusions for the research, more scientific methods or techniques should be employed, such as behavioral experiment, eye-movement technology and ERP experiment, to explore the mental state and cognitive processing strategies of viewers when they process visual metaphors.