Building the Tourist Destination Through Multiple Semiotic Modes

Over time, multiple semiotic modes have contributed in different ways to the construction and exchange of cultural meanings in Tourism Discourse. This has required the analysis and understanding of the modes employed and the recontextualization and adaptation of texts and images, especially to the new web genres. Nowadays, the tourist experience is mediated by personal, digital, and mobile technologies, which redirect the tourist gaze and become the mediator between the traveler and the tourist destination. Consequently, the tourism text must be considered as a single unit, where different semiotic resources intermingle to enhance its communicative strength. The present study will attempt to propose a methodology to read and write tourism texts in a comprehensive and effective way. It will start by focusing on the relationship between text and image to see how they co-exist in the page and in the way the page is arranged. Then, it will apply a functional approach to the analysis of such semiotic units. The result will show how the boundaries between image and text have become blurred, and textuality is built less through verbal syntax and more through rhetorical visual design.


Introduction
In tourism discourse, meanings are built and represented cross-culturally and intersemiotically. Therefore, on the one hand we find the destination image perceived by the tourist, and on the other hand, the destination identity that the institution or the industry wants to convey. However, there is often a gap between these two perspectives which needs to be reduced for tourism communication to be successful and to lure tourists by beating competing destinations. The tourism text, built through multiple semiotic modes (i.e., graphic, phonic and visual signs, and considered as a whole and consistent semantic unit; Halliday & Hasan, 1976;Peirce, 1931Peirce, −1935de Saussure, 1916), must meet these needs. In order to do this, processes of recontextualization and adaptation of texts and images to diverse modes and media, especially to the new web genres, are required. Applying Jakobson's (1959) taxonomy, this may be defined as Intersemiotic translation or the translation and interpretation of signs across semiotic modes, including non-verbal ones. However, the present paper does not deal so much with resemiotization, meaning "how semiotics are translated from one into the other as social processes unfold" (Iedema, 2003, p. 29), or with chains of cultural units composing other chains of cultural units, translating signs into other signs (Eco, 1979, p. 71). It rather deals with a semiotic system considering all of these features as a unit, as well as its adaptation to diverse media when needed.
Tourism texts play a fundamental role in transforming sites into sights (Cronin, 2000, p. 22), persuading and seducing people, converting potential tourists into real ones. They are characterized by being dialogical, as the writer always has a reader in mind (Bondi, 1999); specialized, as their structures and communicative purposes are shared by both experts and non-experts of the tourism discourse community; and promotional, along different degrees according to their main purpose. Thus, they may be more informative or promotional; they are descriptive but at the same time they present a combination of features from evaluative discourse, procedural discourse, e.g., recipes, and signs discourse, e.g., road maps, and from all text types: they are a combination of arguments, narratives, descriptions, explanations, and instructions. They can be said to be the result of a process of interdiscursivity: they borrow from such domains as geography, history, arts, cuisine, business transactions, and a mix of specialized languages with no clear-cut boundaries (Calvi, 2005;Denti, 2012). There are no pure texts. They represent "mixed or embedded" genres (Bhatia, 2008) of tourism discourse, as they combine several aspects of different genres, including technology and modes, and are synergically integrated with visual, text-external to handle and manipulate texts by reflecting on the following questions: What can I do with tourism texts? How can I communicate efficiently and effectively in the real world of tourism negotiations? The term negotiation is here considered in a broad sense referring to the representation, management, and exchange of meanings across cultures.
The definition of text and then of multimodal text is the starting point for the present analysis: a text is a consistent semantic unit (Halliday & Hasan, 1976) made of meanings that, in order to be communicated, need to be encoded and expressed through a system of graphic, phonic, and visual signs.
Language is influenced by dynamic elements present in its socio-cultural context: participants in the speech act, topic and setting, the addresser's communicative goals. A text is both an object, a product of its environment, of its context of culture and context of situation, and an instance of social meaning in a specific situation. The relation between text and context is a systematic and dynamic one (Halliday, 1985, pp. 10−11).
With reference to the context of culture, when creating a tourism text and choosing a medium, the awareness of the cultural parameters (Hofstede, 1980(Hofstede, , 2001 of the contexts involved is crucial. They affect information quality, quantity, explicitness, transparency, clarity, functional and aesthetic aspects, as well as perceptions, feelings, time orientation (e.g., a schedule in an itinerary) and power distance, i.e., interpersonal relationships (Hofstede, 1980(Hofstede, , 2001Reisinger & Turner, 2003). The same parameters are found in tourist images, in the choice of hue, brightness, and color saturation, which shape perceptions, physiological and emotional reactions, and behavioral intentions.
When taking into consideration the context of situation, the text needs to be considered from three perspectives: its formal and functional aspects, its social, institutional, and professional contexts, and as social practice. The focus is on the genre participants, social and professional structures, and relationships (Bhatia, 2002, pp. 16−18).
When multimodality is added to the analysis, the above-described framework must be broadened to consider the diverse semiotic modes which define the linguistic multimodal message. Saussure (1916) and Peirce (1931Peirce ( −1935 were among the founders of semiotics. Saussure (1916) identified two elements composing the sign: the signifier and the signified. Peirce (1931) defined the sign as a signifier representing something to someone, specific to a certain context, not an absolute but a relative meaning (Eco, 1980, p. 27). Thus, while a text is a sign that we can decode through semiotics, an image is something direct and physically detectable and perceivable (Bateman, 2014, pp. 13, 15;Mitchell, 1986), a perfect reproduction of reality and more reliable than what people hear (Barthes, 1985).
In tourism texts, verbal and visual items may be arranged in the page according to what Schriver (1997) calls rhetorical clusters, which originate the following categories of layout: illustrations with annotations and explanations, procedural instructions with visual elements, body text with footnotes, and the front matter of a section. The first category refers to illustrations, labels, figure numbers, captions, and credits, which are normally placed in the back cover of a travel guide or in the last page of an app, or in a specific section of an app or website. The second concerns, for example, an overview, an introduction to the description of a place, the step-by-step procedures of itineraries, and visual examples such as maps, and captions. The third category identifies the body text and paragraph style, as well as footnote texts, links to other pages, headings, lists, citations (both anonymous and named). The front matter of a section is the part of the page that displays the title, the main point, and a photograph with its caption. This taxonomy may help to recognize the units present on the page and its overall meaning.
The linkage between text and image varies. Most of the times these taxonomies do not differ significantly. Indeed, drawing clear-cut boundaries between the categories is difficult as they often overlap and sometimes are not clearly recognizable. According to their form, the relationship between text and image is ancillary, when they are next to each other; correlative or integrative, when labels or captions link them; and substantive, when they are visually combined (Pegg, 2002).
Another aspect of this relationship arises from the image polysemy. This requires texts to fix their interpretation by providing the anchorage function (e.g., through labels such as the name of a place or of an object), to control or amplify the image meaning. Sometimes, it is the image that provides illustration to the text, leaving less interpretation to the reader (Barthes, 1977;Nöth, 1995, p. 454). This unequal relationship becomes equal when text and image are combined and complement each other through the relay function and a constant cross-reference. When text and image appear to be in conflict, their relationship may be weak and the visual elements mainly adorn, induce emotion or control, and engage the reader (Marsh & White, 2003).
The analysis of the discourse functions starts from the Speech Act Theory (Austin, 1962). According to Austin ijel.ccsenet.org International Journal of English Linguistics Vol. 11, No. 4;(1962 and Searle (1969), a speech act is defined as locutionary when the denotative meaning of the utterance is performed. All speech acts are locutionary (e.g., the description of a monument, of an object on display, or of a city in an app, in a website, onsite). The act is illocutionary when it has communicative force and expresses feelings, stance, evaluation, and commitment (e.g., the involvement of the writer in the narration of a trip in a travel guide or in a blog). It is perlocutionary when it asks for action on the part of the receiver, for example ordering, requesting, warning, prohibiting, daring (e.g., in ads or in an itinerary). The same functions are achieved through images, as the analysis will show. Visual modality (i.e., the use of illumination, brightness, texture, color saturation, representation of detail, depth) corresponds in the text to modal verbs and expressions (Kress & van Leeuwen, 1996/2006. Their blend makes the picture more real or more ethereal, whether its colors are exaggerated, and saturation is high, or blurred and low. Modality is higher or lower the closer colors get to the standard (natural) combination (Kress & van Leeuwen, 1996/2006. The analysis moves to Jakobson's (1959) higher level of discourse. The referential (or informative) function focuses on the content of the communicative act, objects, facts, or events in the context, and is characterized by the use of the third-person singular or plural pronoun, nominalization, premodification, passives and stative verbs, but can also be less formal and employ the first-person pronoun and dynamic active verbs. The expressive (or emotive) function highlights the writer/speaker's point of view and emotions, by using the first-person singular or plural pronoun, interjections and personal style; the conative (or vocative) one aims to influence the addressee's internal states and emotions, by exploiting the second-person pronoun, the vocative and the imperative verb tense (e.g., slogans, titles, recommendations). When the focus is on the form of the message, its phonological and graphological features, and the use of figurative language strongly contribute to meaning making, as in ads, brochures, and itineraries. Phatic and metalingual expressions are also employed as attention-seeking devices with a conative function. Jakobson's macrofunctions naturally lead the analysis to the identification of Werlich's (1983) descriptive, narrative, instructive, expository and argumentative text types.
Another approach included in this study is Halliday's (1978Halliday's ( , 1985 systemic functional linguistics, and the concepts of field, tenor and mode, i.e., the structure of the context, and the metafunctions of language, i.e., the ideational, the interpersonal and the textual metafunction (Halliday & Hasan, 1985). The field refers to the shared knowledge of writer and reader on the topic, identified through lexis and grammar. The parameters of the ideational metafunction include the text types and the specialization of language (experiential domain), the general or in-detail orientation towards general or particular readers, time, place, and mode of reading (goal orientation and social activity). At visual level, the ideational metafunction identifies representational choices: the sequence of elements in the page, their linearity, symmetrical layout, neutral background, the distance between objects and their size (Kress & van Leeuwen, 1996/2006Denti, 2012, p. 92). They represent different types of relationships, where the participants in the semiotic act can be interactive and active, or the represented element. Within this framework, participants linked by a vector represent narrative processes, transportation, transformation, and temporary spatial arrangements (Kress & van Leeuwen, 1996/2006. They function as action verbs in language. When participants are represented in terms of their class, structure or meaning, they are identified as generalized, timeless and stable conceptual patterns (Kress & van Leeuwen, 1996/2006. The tenor concerns the human participants in the interaction, the interpersonal relationship between the writer or publisher and the text receiver. Attitude and politeness build role relationships: agentive roles, social roles and social distance. These involve their status, power relations, and discourse roles, as well as their attitude towards the topic and their interlocutors, a formal or informal, close or distant relationship. The use of direct questions, personal pronouns, exhortative and laudatory lexis, epistemic and deontic modality, impersonal constructions influence these parameters (Fodde & Denti, 2008). At visual level, the interpersonal metafunction expresses the relationship between the represented (people, places and things) and real people who communicate through the image (Denti, 2012, p. 95). This relationship may be covert or overt, as in writing. For example, the direct look of the represented towards the viewer realizes a pronoun you, while the indirect one and distance a third person pronoun. The frame size, by depicting head and shoulders or the full figure, defines personal, social and public distance (Kress & van Leeuwen, 1996/2006). While the presence or absence of a built-in perspective defines the image as subjective or objective, defining the position of the viewer within the photograph, the selection of an angle represents the subjective attitude of the photographer and of the viewer towards the represented, their power relation and involvement.
The mode identifies the textual metafunction by analyzing language roles, its relationship with images, its channel (written, spoken or a combination), directionality, medium and preparation (spontaneous or prepared, in real time or after reflection). Textual and typographical features affect the ideational and interpersonal metafunctions of language, attracting the reader's attention on specific verbal or visual elements.
Another aspect to be considered is the functional sentence perspective (Danes, 1974), i.e., the thematic structure of the written text. The normal word order of the English language (i.e., Subject-Verb-Object) may be modified with deviation devices in order to foreground specific elements of the text. Such devices are fronting, when an element different from the subject is moved to the opening of the sentence or an element or the subject is moved to the end of the sentence; inversion of subject and verb; cleft-or pseudo-cleft sentences, when an anticipatory subject it or a wh-pronoun are used to foreground another clause element; end focus, when the focal element is at the end of the sentence, in written discourse; right or left dislocation, used to postpone or anticipate identification in informal spoken discourse; active-passive voices, when the focus moves from what or who causes the event to the event itself. This is also achieved through the choice of frame size, perspective and angle in pictures. Thematization also involves diverse progression patterns of themes and rhemes. The same effects are achieved through the choice of the subject/object portrayed, the position in the page layout, the perspective, the angle, the size frame, and brightness.
Coherence and cohesion close this approach to the multimodal text. They are essential in building textuality.
Coherence involves the types of logical relationship between sentences: phenomenon-reason, phenomenon example, cause-effect, problem-solution, instrument-achievement, time. Cohesion entails such devices as reference (i.e., personal pronouns, deictics, the definite article), substitution (through one, it, so, not, same, do/did), ellipsis, conjunctions, lexical reiteration and collocation, which aim to link one element in the text to another for its interpretation. The intersemiotic cohesion between text and image helps to understand how all semiotic modes contribute to meaning making (Royce, 2007;Kress & van Leeuwen, 1996/2006. Intermodal cohesion is reached through verbal participants and processes, grammatical and lexical cohesive devices, which link visual processes and participants. Whenever a correspondence can be drawn between the visual and the verbal units, intersemiotic complementarity is sought.

The Corpus
As mentioned in the Introduction, tourism discourse entails the contribution of many diverse domains and genres, as well as changing degrees of participants' involvement in the dialogue between the industry, the tourists and the locals.
The present paper is part of a research on tourism discourse which has been going on for two decades. Most materials of the corpus were gathered on the Italian island of Sardinia. In order to carry out the objectives of the present study, four genres were considered: two apps (Cagliari App and Visit Sulcis Iglesiente App), a website (https://opapisa.it/en), three travel guides (the Rother Guide, the DK Eyewitness Travel Guide and the Sunflower Landscapes Guide), and an advertisement (an old advert for Gamboa Rainforest Resorthttp://www.gamboaresort.com). They are identified in the following examples: • Figure 9, the Rother Guide • Figure 10, the DK Eyewitness Travel Guide • Figure 11, the Sunflower Landscapes Guide • Figure 12, an old advertisement for Gamboa Rainforest Resort.
While the travel guides and the advert express the industry's point of view, the apps and the website are examples of institutional communication. Moreover, while the website was chosen as a homage to the conference hosts where the paper was firstly presented, the advert was selected among others for the strong link and impact of its image and pun in its overall promotional communication.
Travel guides are probably the least persuasive and the most univocal mode of representation of tourism discourse, while ads are the opposite. The tourist has normally already made his/her choice and is looking for broad information on the destination. As already mentioned, tourist guides are highly interdiscursive: they borrow from travel books for a more subjective perspective, from geography texts for the description of places, from commercial and pragmatic manuals, when providing practical information. They have a more cultural or pragmatic approach according to the reader addressed and the type of information to be supplied. Normally, travel guides and ads address a general indeterminate reader. The guides analyzed in the present study synthesize pragmatic information on picnicking, touring and walking around Sardinia. Their dominant text types are descriptive, narrative, and instructive, as itineraries are present, and recommendations given (Werlich, 1983). Rhetoric strategies abound in travel guides, and are the core element of ads, especially when the writer/advertiser wants to emphasize cultural diversity between the visited and the visitor: colloquialism and irony, stereotyping through citations, appellative clichés and comparisons (Margarito, 2000;Fodde & Denti, 2005).
Travel guides have always had a sort of maternal function as they accompany the reader through his/her journey. Nowadays, they have been partially replaced by apps, mobile device programs, a mediator between the provider and the user, smaller in size and constantly updated, that facilitates the sharing of data and services. Moreover, both apps and websites enable tourists to express their requests and complaints more easily. Tourists take cameras, video recorders, and mobile phones not only to have evidence and souvenirs of what they see but also to share them on social networks and on the web. They can exchange tailored information, upload pictures, and write short texts to complete the information already provided by the industry before, during or after the trip. By doing this, the traveler has become a produser, a user-led content co-creator (Bruns, 2008). The volumes and pervasiveness of information have sharply increased due to technology, and new web genres have developed (Denti, 2018). These web genres are characterized by hypertextuality, multimodality, and hypermediality, the combination of the previous two; they are characterized by granularity, diverse consumption modes, and co-articulation (Campagna et al., 2012, p. 11). Hypertextuality indicates that texts are uncovered as the reading progresses, through links to both internal and external content, joined through cognitively important relationships, thus avoiding excessively long and heavy texts. It is up to the user to decide which information to access. This means that the user chooses which path to follow, by reading linearly or through the navigation mode, how to put sequences of information together from a logical, temporal, and experiential point of view (co-articulation). It is an individual (mostly unpredictable) chain of information, which may coincide with the meaning making intended by the designer, the institution, or may differ from it. The user may follow overt or covert suggestions, which are not hierarchically or sequentially limited but logically stretched (Denti, 2018). While in apps the user moves within certain boundaries, in websites information accessibility through hypertexts is basically limitless. Multimodality provides "the opportunity to combine different semiotic resources into a single communicative act" (Campagna et al., 2012, p. 11). Not only do pictures and maps play a relevant role in apps and websites but also icons, which stand for broader meanings but allow to comply with space constraints: texts are identified in distinct units of words or images, particularly small in apps (granularity).
Different types of apps are available: apps providing navigation/directional services, social networking, mobile marketing, security and emergency, transactions, entertainment, or information. They display maps, images and texts which interconnect and intertwine to build a device characterized by real-time updates and ratings, sharing experiences and perceptions, positioning, hotel, restaurant, flight, cruise, tour bookings, organizing a visit through travel guide apps, which also provide reality services, easy-to-read maps, distance time, itineraries, updated information, reality guides, and so on. These apps are tourism specific (Denti, 2018). Cagliari App and Visit Sulcis Iglesiente App belong to this type. They are mainly tour guides but show a mixture of features of diverse app types. They mainly focus on topics such as places of art and culture, archaeology, religion, history, leisure, nature, accommodation and food, transport, events, routes and itineraries. Their language is very iconic, supported by brief, if any, text, mainly of the instructive text type, sometimes descriptive, extremely hypertextual, endowed with maps alternating with lists accessible for more specific and detailed information. Descriptions are very concise and links to related points of interest are frequent, accompanied by picture galleries and pragmatic information (e.g., addresses, opening hours, etc.). The app is so interactive that the user can add the contact of his/her personal places of interest. Preferences can be marked, and some pages can be shared through email, FB and/or Twitter.
Cagliari App is also linked to its website where additional information can be accessed. For example, people can download audio guides to the places to be visited. However, the app is recommended to "be constantly updated about the events in the city" and to "easily find the places you're looking for in your mobile" (http://www.cagliariturismo.it/en).

The Analytical Procedures
The first step of the methodology applied aims at identifying the overall layout and the relationship between text and image. Text and image visually co-exist in the page and in the way the page is arranged.
This layout may be studied by applying Schriver's (1997) rhetorical clusters to identify its functions: labelling, illustrating, giving an overview or a procedure, introducing the topic through a title, the main point, and a photograph. Another way is to look at the form and closeness (Pegg, 2002) of the image/text relationship, or to the way they combine: do they have the same status? Is the text fixing the image meaning or is the image elucidating the text?
Once the relationship is determined, and the semantic unit defined, the analysis moves to the identification of the overall text speech acts (Austin, 1962), micro (Searle, 1969), macrofunctions (Jakobson, 1959), and text types (Werlich, 1983), to highlight its communicative functions. Images can fulfil these same functions by showing the landscape beauty or an old woman symbolizing wisdom and traditions, by displaying the objects to be observed in a museum, or by inviting the viewer to follow the tourists looking at a map, following an itinerary or undertaking a specific activity such as climbing.
The following step of the methodology entails the application of Halliday's (1978Halliday's ( , 1985 concepts of register: field, tenor and mode. This will clarify the specialization of the language, the functions and the relationships of the participants. The mode basically refers to textual and typological features that have already been investigated when defining the semantic text.
The last steps of the methodology involve the observation of thematization in the written text along with the images, and of intersemiotic coherence and cohesion.

Data Analysis
The present paragraph will apply the methodology just described to some cases of tourist texts.
(1) The first example here analyzed refers to Figures 1, 2, 3. They are sequential extracts of the Opening of the Opera del Duomo Museum page from Pisa's institutional website. Thus, the page is considered as a whole.     Vol. 11, No. 4; rhetorical clusters of procedural instructions with visual elements and body text with paragraph style. In fact, the page opens with the images of some details of marble engraving, which change every time the page is refreshed, followed by the date and a headline in large capital letters, announcing the Opera del Duomo Museum's opening.
Marble is a symbol of purity, stability, tradition, and immortality. These pictures illustrate but also decorate, try to move and motivate the reader to visit the museum. The use of capital letters is a rhetorical tool to express relevance and aims to attract the reader's attention. The headline has a labelling and anchorage function of the images that precede and follow it. The page develops with the icon ope in the top left corner, the logo of the museum, and the site menu and research icons in the top right corner. These are fixed and always accessible, thus linking the visitor to the home page and to the topic choices within the website. The verbal text is divided into eight paragraphs, which are mostly short and of the same length to achieve an easy linear reading. No other hyperlinks are present, which "forces" the reader to remain on the path designed by the writer. The page is closed by another image window, where several pictures from the museum slide by one after the other. The linkage between text and picture is substantive: the overall information is supplied in a circular flow, starting and ending with images. However, while for the first image under the title, the following paragraphs provide an anchorage function, the sliding images have an illustrative function. On the one hand, they leave less interpretation to the viewer; on the other hand, they give multiple examples of the beautiful statues, paintings, and objects on display. This creates expectations in the future visitor and helps build familiarity with the sight.
After these first considerations, both images and texts were examined to identify their linguistic functions. In terms of speech acts, both the verbal and visual elements of this webpage are locutionary acts, giving information about the history of the building and the museum, the improvements that have been carried out to offer "an easier reading" to the tourist, and the itinerary through its 26 rooms and 380 works. The same act is sought by the denotative meaning of the images as they appear. The numerous positive, laudatory adjectives, such as suitable, exceptional, new (as opposed to obsolete), spectacular, wonderful, confer an illocutionary force to the text. The same effect is reached through the detailed descriptions: "The itinerary (…) including new works restored such as the Triptych of the Madonna enthroned with saints, tempera and gold on panel by Spinello Aretino, and the crown, the scepter, the globe (…)".
The perlocutionary effect is sought indirectly through the development of the itinerary, and through the representative, expressive, and verdictive speech acts, which build interest in the reader, as in the sentence "The Board of the Opera della Primaziale Pisana saw then the opportunity to fill that lack of space that the development of the Fabbriceria had long needed, to give suitable accommodation to works of art of exceptional value and to collections that were then in temporary storage" (emphasis of the author). It is also achieved by comparing the old approach in the design and arrangement of the museum, expressed by negative adjectives such as obsolete, minimum, to the new one, a spectacular setting, following a clear, modern and effective vision, "a new and contemporary philosophy", an itinerary which will not only accompany the tourist through their visit, but will also lead them to the icing on the cake: "the wonderful cloister that overlooks the Bell Tower (…) the Madonna with Child, the Evangelists and the Prophets (…)", well represented in the photographs as well.
Correspondingly, at macro level, the language has referential, expressive and conative functions, with the focus alternating on information, on the writer's point of view and involvement in the text, and on persuading the reader to visit the museum. The conative function is more visible when the aims of the project and its addressees are indicated, with the latter including scholars, restorers, and art lovers, but also less-expert people who can enjoy its attraction thanks to the "easier reading": the objects are displayed according to the monument to which they belong, described through informative tools and multimedia stations.
The same functions at micro and macro level are achieved by the photographs, through the contrast between dark areas and light, shadows, texture, depth, color saturation, and nuances, which make the image more detailed and naturalistic (i.e., more similar to reality; Kress & van Leeuwen, 1996/2006. These elements contribute to build the interpersonal metafunction. The first pictures under the headline shows the Madonna with Child accompanied by some saints. The position of the photographer, and thus of the viewer, is right in front of them, in the room, at eye-level. They are distant, in full-size, the main attraction at the end of an empty room. The perspective is deep central. The background is black and helps to foreground the statues illuminated by the light. The side walls are in different nuances of grey and light pink, the light plays with shadows, emphasizing the statues on the left and on the right. The effect is that of inviting the visitor to get closer to entirely appreciate the pieces on display. Several photographs have this structure. Others display the object closer to the viewer, through a full-size or even a close-up, seeking a closer and stronger bond with him/her. Thus, s/he can enjoy the beauty and the precision of the details of marble engravings, or the drama of the Crucified Christ, which reduce social distance. The same aim is achieved using an inclusive we in the text, relating to the suggestions about the itinerary: "At the end of this itinerary we find the (…)". The reader acquires an active role. Together with the writer/photographer, they are involved i4n touring the museum and providing a verbal and visual description of it. However, the third-person singular is more frequent, which characterizes the objective impersonal style of the text, also perceived in the images.
Studying the ideational metafunction, the most frequent text types are narrative, descriptive, and argumentative, built through verbal and non-verbal markers, as well as temporal, spatial and rhetorical chaining strategies. As an example, the first three paragraphs start with a temporal marker and focus on the history of the museum. The fourth and fifth are examples of the argumentative text type, through the use of adversative conjunctions and different points of view. The following paraphs are mainly descriptive.
In the photographs displayed in the page, arguments and counter-arguments relative to the museum's past and present vision cannot be seen, but the pictorial recommendations follow the sequence of the texts. The language is non-specialized, as the descriptions basically focus on the museum, as well as its history and development, and not on the objects displayed. As already mentioned, the reader/viewer is both a particular one (i.e., an expert scholar, restorer, art lover) and a general one.
In terms of functional sentence perspective (Danes, 1974), the thematic structure of the written text shows a hypertheme followed by some subthemes. Some subthemes are repeated: the Opera del Duomo, or synonymic expressions such as the museum or spectacular setting, or meronymies such as the set-up, the exhibition; the Board of the Opera della Primaziale Pisana, or the Board; the itinerary. As for the images, the same pattern is applied: the yard and the main hall represent the hypertheme, while the other ones are more detailed subthemes: the Madonna and Child, the saints, gold objects, antique books, parts of the ancient buildings, the Crucified Christ, the Evangelists and the Prophets, and other statues. Few deviation devices are used, basically fronting with time markers. Only one passive is present.
Some exophoric reference is found and some written information would seem to be intended shared knowledge, but remains vague for non-local, non-expert people. Images cross-refer with the text, but the cooperation of the reader is asked to associate the verbal and the visual items, as there is no numbering, captions or titles. Cohesion in texts is also built through the use of definite articles, deictics (this and that), coordinating, subordinating and relative conjunctions (and, but, where, whose, that, which), the repetition of certain words (Opera della Primiziale, museum) but also synonymic or meronymic expressions, as already mentioned.
In terms of coherence, the logical relationships of cause-effect and problem-solution (first and second paragraphs), phenomenon-reason (third and fourth), problem-solution and phenomenon-example (fifth paragraph) are present, with a prevalence of the last one both in the sections and between textual and visual items.
Being mainly an informative, objective, impersonal text, style does not show any other relevant features than those already discussed.
(2) The second example of the present analysis is the institutional Cagliari app (Figures 4−7). First, the linkage between images and texts is close and extending, as the former have the function to exemplify and describe, expand, and reiterate meaning: in Figure 5 the whole meaning is constructed through the map, the icons, functioning as hyperlinks to the sights, and the oblique yellow lines identifying itineraries. The aim is to engage tourists in a simple, clear and effective way, by inviting them to follow these instructions.   (Figure 9). Linguistics onary and repr eferential and rical meaning tographer/visit a semantic pred l style, throug e text type is m ion or with en ual and the ve but the subject with the pronou nzio) and the tit gh the expressi use-effect and t Figure 8).
ente app ach page, the , the sea and so rn: sea, sand a xt and image, w ext. Its main fu cutionary, repre ving recomme it provides the s, font sizes, a resentative, sli expressive at of the tower a tor stares from dominance of gh the use of f mainly descrip nd focus. Inte erbal elements name is only r un It, which e tle written in la ion "as the coa temporal relati same image is ome trees mov and sun. The s which appear t function is then esentative and endations on ho e ingredients a a list and a sequ Vol. 11,No. 4; ightly illocutio macro level. L and the church m a low angle the image ove formal express ptive and narra rmodal cohesi s. In Figure 6 repeated furthe xpresses anap arger capital le at of arms abov ionships. supplied on th ents movement d referential. I der to be concis y different. It c more colloqui people", or "(… Eyewitness Tr Linguistics vides an examp s interpretation gh labels, capt e, as visual an er is exhorted he top left corn t, the developm It provides prag se and factual, contains additi ial through the …) take you on ravel Guide (F ple, an illustra n to the reader. tions, recomm nd textual item to follow the ner of the pag ment of the trip gmatic, object , as maps are i ional informat e use of you, ca n a trip to (…) igure 10). Vol. 11,No. 4; ation, to explai . The page disp mendations, an ms cross-refer. oblique red lin e. The red line p. The functio tive and impers intended to be tion in the for an, and expres )".  n itinerary des the right, the b ebar in travel g ns with visual e them. There is will scan the map.
ential, but also bjective and pe or the launedda ted with, delica erative verbs, t back in the dir n to Cala Sisin f the pronoun y the use of viv words popular is the expert to the latter. Th , there are ph Vol. 11,No.  The following examples are extracted from Cagliari App. Here, texts have been dramatically reduced in size but the use of colors, of maps, icons and images has increased for the sake of clarity and effectiveness. This purpose is enhanced by using a low angle and the balance between light and shadow, which intensify the illocutionary force of the information given, and its historical meaning. Intermodal cohesion is strong, reached through the repetition and cross-reference of verbal and visual elements.
Example 3 is an instance of contrast or weak relationship between text and image. The illocutionary force and perlocutionary effect of the recipe lose efficacy. The reader may be attracted to finding more about the coastal area but not on cooking the octopus. And the picture does not identify this beach as unique and exceptionally worth visiting. The strategy to separate text and image is not a winning one.
In example 4, the maps are accompanied by some texts, albeit short, which reinforce its instructive function, unlike the maps on Cagliari App, which may rely only on the visual. It is in example 6 that the traveler may feel really taken care of as s(he) can drive along the line following the directions in the adjacent page.
Example 5 is the model of a complete combination between verbal and visual items in terms of attention-seeking, clarity, detailed descriptions, time sequences., i.e., the mutual determination of meaning.
The pun and the frog in example 7 are an exemplification of the perfect integration between text and image, thanks to the morphological deviation of froget which functions as a cohesive device with the frog. This illocutionary and perlocutionary speech act is boosted by the color contrast and the position of the frog in the frame, functioning, on the one hand, as a cohesive device with the rainforest and, on the other, hypnotizing the reader.

Conclusions
Text and image constitute different ways to build meaning while fulfilling the same communicative function. However, the relationship between them varies. Their juxtaposition, including the layout, changes according to the functions sought. This may also require recontextualization, when moving from one medium to the other, which allows for various flexible visual combinations and different meanings. The examples presented above have the purpose to show how the theoretical framework proposed can be applied to break the text down into its constitutive pieces and combine them to reach the intended addressee and aim(s). The taxonomies applied do not differ significantly. Indeed, drawing clear-cut boundaries between them is difficult as they often overlap and sometimes are not clearly recognizable.
One of the outcomes of this study has been the observation that language and image are not separate semiotic entities but fade into each other, due to a logic of space and visual design. Ideas are presented through words and images, but cohesion and coherence in their presentation are achieved less through verbal syntax and more through rhetorical organization, visual design elements such as layout and consistent color schemes.
Among the texts analyzed, images often belong to the close relation type and mostly aim to reiterate meaning by exemplifying and describing. They often relate and complement by providing the photograph of the place described or the map of an itinerary.
What is evident is the fact that visual and verbal items are increasingly integrated in the semiotic text, especially in tourism discourse. This forces both the reading and the creation of a tourism text characterized by diverse semiotic modes synergically considered. The objective of the present paper was that of proposing a possible methodology to both read and design multimodal tourist texts, which included functional linguistics at verbal and visual level.
The same approach has been applied in a bachelor's degree course of English to sensitize the students in relation to the numerous linguistic strategies, cultural facets, and pragmatic effects involved in the understanding and creation of multimodal and multimedia tourism texts, trying to reduce the gap between what is normally taught in class and the real world. However, this will be the topic of further research.
In conclusion, the study and teaching of functional linguistics through ESP (and vice versa), and through tourism discourse in particular, cannot be exempted from looking at how all semiotic modes contribute to meaning making. This entails examining how the visual and verbal processes and participants, as well as the grammatical and lexical cohesive devices of the multimodal text combine to reach a single common purpose. The need for a better understanding of this synergism has increased as the features of the new web genres and the breadth of the audience constitute both spatial constraints and challenges.