Keywords

1 Introduction

Destination marketing in the 21st century has been marked by the emergence of new digital channels, visual content overtaking text as the marketing medium, and the attention shift of the target audience from DMO’s own content to user-generated content (UGC). Marketers, whose goals remain to develop a successful brand identity and differentiate themselves from their competitors through positioning, face an increasingly complex situation in determining the optimal content marketing strategy, because (a) the dominant image of a destination among consumers is being determined by UGC which is outside of their (direct) control, and (b) that destination image is being communicated through non-textual media such as visitor photos and videos which require a distinct analysis approach compared to text. Tourism marketing has always been highly visual since imagery can best invoke the experience of visiting the destination for consumers. First the Web and then social networks has increased the scale and the reach of visual marketing content, as well as introducing a new and more significant determinant of destination image for a global audience which is UGC – the photos and videos visitors themselves post online. Whereas stakeholders, e.g. DMOs, have long made use of textual analytics to understand how their destinations are being presented online (whether in the news, in reviews on travel websites or in posts on social networks), the content of purely visual media could not be easily included in marketing analysis as accurate descriptions were largely only feasible through manual, expert annotation which could not scale up.

Computer-assisted understanding of the content of images and videos has long been a challenging research area due to the complexity of defining to a computer what is “seen” in the image. However, in just the last few years, the combination of deep learning (complex neural network architectures) and huge scale media collections for training (e.g. crawled from the Web) has led to a paradigm shift in the capability of so-called computer vision systems, including (in our case) the field of visual classification (i.e. labelling a media item with the correct concept from some controlled vocabulary of concepts). Accuracy on the ImageNet dataset (a benchmark in the computer vision community, with 1000 visual concepts) has jumped from 63.3% (AlexNet in 2013) to 91.1% (BASIC-L in 2023) while Web platforms host models and make them accessible to anyone via APIs (HuggingFace.co reports 5270 models for image classification at the time of writing). The advances in computer vision mean that tourism marketers now have access to AI-based systems that can automatically classify tourist photography and use this classification in gaining a deeper understanding of what visitors give particular focus to when at their destination. Digital marketing online, especially on social networks, is highly competitive as consumers are overwhelmed by information and can switch easily between sources. Marketers could benefit from the new insights image classification can offer them, especially comparing how their destination and its competitors are being presented visually, how that user-driven presentation compares to their targeted content marketing, and where, if at all, their destination is differentiating itself from the others in the global online marketplace. However, this requires a visual classification which is provably accurate in the domain of destination image measurement, an approach to collect and classify large sets of images as well as a methodology to represent the resulting classification in a form for marketing analysis. The contribution of this paper is the visual classification model, evaluated against a specifically prepared ground truth data set and provably accurate for destination image, as well as the methodology for compositional data analysis (CoDa) of the resulting classifications as a representation of a destination’s visual brand.

The rest of this paper continues with the current state of the art in the domain of tourism (marketing) and image classification (Sect. 2). We then present the concept of “visual destination brand”, explain it in the context of destination marketing and how it can be measured through image classification (Sect. 3). Then we introduce our experiment for acquiring the visual destination brand of 9 European cities, both projected (DMO) and perceived (user) (Sect. 4). The extracted brands are compared in Sect. 5 with a focus on identifying significant variations within and between destinations. Finally, Sect. 6 concludes with lessons learnt for tourism marketing from our experiment.

2 State of the Art

There has been a lack of studies regarding the role of visual content in destination marketing [1], particularly with respect to branding and positioning, two key aspects of a marketing content strategy. While tourist photography has long been seen as a valuable source of insights into what visitors focus on at a destination (which in turn acts as an indicator of what is most appealing to visitors), researchers had to manually classify photos, following self-determined classification vocabularies, either alone (in the role of experts) or through solicitation from the people who took the photos. This method could not scale up to handle the amount of online visual content available for any destination. Past work focused on the use of photography in the measurement of destination image, which is defined as “beliefs, ideas and impressions that a person has of a destination” [2]. Researchers agree on one component of destination image being the cognitive, which relates to the attributes that the consumer thinks of (or visualises) when they think of the destination – those with external, tangible representations may also be called functional [3]. While there is discussion about the other parts such as affective and conative, and that the holistic measurement of destination image requires combining all of the components, it has been studied that the other components are themselves influenced from the cognitive component [4]. The use of (user and/or DMO) photography to determine destination image is seen as valid since “both organic and induced secondary sources of information… significantly influences the cognitive component” [4]. Destination image is increasingly being formed through online visuals [5], and photos on social media are a valid source of consumer’s image [6]. UGC photos are the materialisation of what a visitor deems as important, even iconic, at a destination [7], and thus a means to reconstruct the destination image [8]. The cognitive component would be measured as a set of (visually identifiable) destination attributes and an individual destination represented by the analysis of the frequency, co-occurrence, clustering etc. of the aggregated attributes of a dataset [9].

The destination image is typically seen as synonymous with or similar to destination brand, which is the application of a branding strategy to market a tourism destination. Branding actions by the brand owner can be called brand identity, e.g. “a unique set of brand associations that the brand strategist aspires to create or maintain” [10]. On the other hand, brand image can be defined as “perceptions about a place as reflected by the brand associations held in tourist memory” [11]. The key characteristic of a successful branding is uniqueness, i.e. the branded concept, such as a tourism destination, is differentiated in the minds of consumers from competing concepts, which gives it an USP for consumers to choose it in place of any other option at the time of purchase. A key methodology for branding is positioning, which is about “about identifying the key characteristics that visitors had in mind [when choosing the destination] and reinforcing these” [12]. Positioning is seen as a source of competitive advantage [13]. For positioning to be effective, the range of differentiated characteristics should be limited [14]. Visual content forms part of the destination branding [15]. Perceptions of the destination brand value play a major role in boosting tourism to the destination [16].

With the emergence of deep learning-based advances in computer vision, e-tourism research has considered the use of pre-trained AI models in classifying larger scales of tourism photography and using this classification for analysis [1]. Generally, the same approaches have been followed, just with larger image sets, e.g. deriving the perceived destination image of a place through publicly available social media (e.g. Beijing [17]; Hong Kong [18]; Seoul [19]). Research has shifted from initially using Flickr to Instagram and TikTok, both highly visual social networks (e.g. [20, 21]).

Measurement of destination image and/or brand is significant for marketers as a favourable image or brand is seen as positively influencing intention to visit [22] and WoM recommendation [23]. Social networks are a valid source of data as they act as one of the major channels today to influence consumers’ brand perceptions [23]. Studies have demonstrated a relationship between UGC and destination brand [24]. Both DMO and UGC communication were analysed, and the latter was found to have a stronger positive influence on the destination image [25]. In marketing, it is generally accepted that the closer the perceived and projected destination images are, the better. Therefore, marketers seek to match the images [26]. Destination marketers want to know if their projected brand has been incorporated into consumer’s perceived images [27].

A major shortcoming of past tourism research using AI-based image classification for destination image measurement has been that the researchers have used the classifier directly in its available state rather than fine-tuning it for the tourism context. The model training has used broad, generic, large scale image datasets for classification into an equally broad, generic, object- or scene-focused classification scheme. For example, most off-the-shelf AI models for image classification are available pre-trained with ImageNet (1000 labels), as this dataset acts as the standard benchmark for accuracy in the computer vision community. However, the resulting models annotate photos just as the training data was annotated, which was not for the purpose of destination image measurement, and hence may focus on less relevant labels (e.g., a touristic photo in the Sahara is highlighting the presence of desert, but ImageNet models will label the camel which focuses rather on the attribute of animals. In fact, ImageNet has no label for desert so no desert photo would be labelled as such). Two studies on the accuracy of ImageNet-based classifiers showed that the results for destination image measurement were much less accurate than the reported benchmark [19, 28]. Off-the-shelf models are used to label thousands of photos, leading to a post-classification clustering step to reduce the broadly distributed results into a smaller number of categories, which however proves non-deterministic (changes in the clustering approach will lead to different clusters; data from other destinations will produce distinct clusters that can not be compared across destinations). As we will introduce in the next section, this is why we decided to fine tune a state of the art model specifically on destination visual attributes.

3 Visual Destination Brand

This paper makes use of the concept of visual destination brand. This term is coined by the authors to refer to the projected brand identity or the perceived brand image (depending on the source analysed), as a factor of the content of the photos posted online and shared publicly by visitors to a destination. Given the limitations in previous work discussed in the previous section, the authors have implemented their own visual classifier which has been demonstrated to be accurate in destination image measurement. The currently available model (bit.ly/destinationclassifier) uses the Vision Transformer architecture which is regarded as state of the art in computer vision. The model was fine tuned with a training dataset of 4,949 tourism photos found via Google Images. To independently test its accuracy without the risk of overfitting (reusing the data the model was trained on), the authors additionally created a “ground truth” dataset made up of 100 photos per attribute acquired via hashtags from the YFCC100M dataset (openly available photography from Flickr), scoring 94% for top-1 accuracy (bit.ly/visualdestination). Full details have been provided in [34]. As there is no single, unique correct list of destination attributes for analysis of destination image [29], we determined an appropriate classification scheme by surveying the attributes identified by the most cited (i.e. most authoritative) papers which developed lists through either expert interview or consumer surveys, particularly [4] who referred to the earlier works and aimed to list “all factors influencing image assessments”, as well as [9] where the attribute list was specifically developed for the task of photo classification (so e.g. visually distinguishable attributes were important). Table 1 shows our attribute list (rightmost column) which are aligned with all attributes previously determined as relevant.

Table 1. Visual destination attributes of our classifier, aligned to past work.

The classification of an image dataset by the classifier results in a set of frequencies for each attribute and conversely a ratio of how present each attribute is in the dataset (i.e. frequency of occurrence/total number of images classified). A ratio is preferred to absolute frequency as results may be compared between classifications which were produced from a different number of input images. In our previous work, our intuition was to model the set of ratios as a multidimensional vector, as then we could analyse and visualise results in the same manner as “embeddings” are handled as outputs of AI models, e.g. cosine distance is used as the measure for closeness rather than Euclidean. In this work, we reflect that these vectors are also compositional data since they express information about the relative importance of many parts to a whole. The characteristic of compositional data is that it sums to a constant, and any change in any one value within the composition (the vector, in our case) must have an equal and opposite change on another value. As such, Compositional Data Analysis (CoDa) is the right statistical approach to this data [30]. The vectors are transformed to centered logarithms of ratios and then CoDa-specific approaches may be used to quantify differences between the compositions, e.g. between projected and perceived destination images [31]. In the rest of this paper, we extract visual destination brand vectors for 9 European capitals using our visual classifier and show how compositional data analysis can be employed to identify the relative differences between how those cities are presented visually online, both by content marketers (DMOs) as well as by visitors (UGC).

4 Experiment

Following the theory of destination positioning, destination marketers should identify the most significant attributes of the destination (as seen by potential visitors), compare these with their competitors, and focus only on those which genuinely distinguish them. Our experiment will show how this can be done using visual destination brand vectors and compositional data analysis. The vectors will be constructed from the classification of image datasets extracted from Instagram. We consider Instagram as a valid source since it is a leading visual-centric social network for destination marketing which contributes to the co-creation of destination brand [32]. It is more recently used in tourism research as a source of destination image measurement (e.g. [21]). Besides identification of differences between city images among DMO or UGC sources, we will also compare both (projected vs perceived image) as managers can use the incongruence between them to improve their promotion of the destination [31].

We identified the ten most visited European capital cities (based on Eurostat data for the year 2019 to exclude pandemic related outliers, number of nights spent in tourist accommodation): Paris, Budapest, Rome, Madrid, Berlin, Vienna, Stockholm, Prague, Lisbon and Athens. We believe it is valid to compare directly between these cities as they largely market to the same target audience: city travellers visiting or in Europe. As many of these travellers will plan travels between different cities in the same trip (or on different trips), the European cities compete directly with one another for traveller choice (“intention to visit”). We acknowledge that there are always other factors that influence the final destination selection (e.g. price and accessibility) but our choice of the most popular European cities means that costs both to get to the city and to be in the city can be very similar for tourists and all of our chosen destinations are similarly well connected internationally in transportation networks.

The resulting datasets are as follows, with an indication of which DMO account/UGC hashtag was used for collection (the official city tourism site was found via Google and the Instagram link followed; the recommended tourism hashtag is usually given in their Instagram account bio) and how many photos were acquired within the calendar year 2023 (we use the Python library Instaloader. Please note the image files were used exclusively for this research and deleted afterwards). We had to remove Prague from the list of cities for two reasons: the Prague DMO account @cityofprague posts content the least regularly of all the considered DMO accounts, meaning that the DMO photo dataset would have been smaller than the rest; also the Prague DMO is alone in our sample in not promoting a related destination tourism hashtag so we would have to choose from user’s own selected hashtags, from which #cityofprague seemed to be popular for visitor photography but also lower scale than the hashtags of the other cities. So in the end we will compare these 9 popular European cities (Table 2).

Table 2. Photo datasets collected from Instagram.

We use the datasets to measure the respective visual destination brand by labelling each photo in the dataset with one of our 18 visual destination attributes (using our visual classifier implementation). For each dataset, a vector is constructed by taking for each attribute the number of photos in the dataset labelled by that attribute divided by the total number of photos in the dataset, then mapping each attribute value to one feature (or dimension) in the vector. Since these vectors represent compositional data (the sum of all of the values in the vector is constrained to a constant value - since they are ratios in our case all vectors sum to 1), we follow the approach in compositional data analysis to convert them to centred logarithms of ratios [33], which has the effect of removing the sum constraint, introduces linearity in the differences between values, and makes the data applicable to standard statistical techniques. Once we have determined the centred log-ratio vectors for the visual destination brands, we can apply statistical methods and data visualisations to test for:

  1. (a)

    Destination positioning – how distinct is the projected image by DMO marketing of each destination from the other European capital cities?

  2. (b)

    Alignment to perceived image – how aligned is the projected image by DMO marketing of a destination to the perceived image of that destination as reflected in UGC photography by visitors?

5 Results and Interpretation

We cluster the DMO vectors according to cosine distance from one another, so that we can identify which cities position themselves through their DMO-driven marketing in a manner visually similar to others and, as the corollary, which cities if any already demonstrate successful destination positioning (offering a marketed image of the city which is distinct from the others). Cosine distance is used instead of Euclidean as we want to consider similarly significant attributes as closer than similarly insignificant attributes (i.e. (3.2, 3.5) should be closer than (0.9, 1.2) whereas it would be the same in Euclidean distance). We use k-means clustering, normalise the vectors and choose the optimal number of clusters using the Calinski index (a metric which is optimal for compositional data) for values between 2 and 6 (we do not want to spread clusters too thin, so use 2/3 of the total number of data points as a max value). The Calinski index value indicates optimally 6 clusters for 9 cities, suggesting that the DMO’s do distinguish their destinations visually in their content marketing. Budapest and Rome are clustered together (cluster 0), Athens and Paris as well (cluster 2), and a third cluster pairs Berlin and Stockholm (cluster 3). Madrid (cluster 1), Lisbon (cluster 4) and Vienna (cluster 5) are all individual clusters, suggesting greater distinctiveness in the distribution of visual attributes in their (Instagram) content marketing.

We compare the geometric means of the values of the attributes for all the cities’ visual destination brands in each cluster in order to interpret the visual meaning of each cluster (which attributes are relatively more present in each). Figure 1 shows a geometric bar plot for the six clusters, which highlights the relative differences between them across the 18 attributes. It visualises the variation of the attribute values from the geometric mean for the attribute values over all of the data therefore highlighting those values which are relatively higher or lower than the rest. We include in our calculation a weighted mean, which reduces the variation calculated for attributes which are overall less significant as part of the branding (e.g. while 10% is double 5% and 30% is equally double 15%, the latter is clearly more significant for the branding). Cluster 0 cities (Budapest, Rome) show relatively more entertainment and monument content. Cluster 1 (Madrid) has relatively much more shops & markets content and much less water content (i.e. lakes, seas) than the others. Cluster 2 (Athens, Paris) proves to be the brands which are closest to the overall mean, suggesting they may not stick out from the other destination marketing. Cluster 3 (Berlin, Stockholm) are brands which show relatively more modern buildings and trees. Cluster 4 (Lisbon) and cluster 5 (Vienna) show the most relative variation in attributes in their marketing. Lisbon highlights more than other destinations animals, beach, landscape and water; Vienna is highlighting more its historical buildings, gastronomy and museums.

Fig. 1.
figure 1

Geometric bar plot of the 6 city clusters comparing the visual attributes.

We have noted that how the DMO markets the city may not align with what visitors actually attach importance to in their visit. We make the informal assumption that the photos visitors choose to post to a social network such as Instagram are knowingly selected out of a much larger set of options, where the purpose of selection may be to show the destination to their friends and followers, but also there can be the intention that their public photos will be seen broadly and globally. The latter is much more the case with those who deliberately use the tourism-related hashtag to acquire more visibility, as is the case with our UGC datasets. This assumption is based on our own usage of Instagram while travelling as well as observation of friends and family. As a result, we may assume that the set of UGC photos posted of the city by visitors using the tourism hashtag reflect a curated set of visual attributes of the city that they have deemed of significance (for themselves, and to others). Through the centring of the log-ratios in the visual destination brand vectors, which means subtracting the geometric mean of the vector values (as a measure of central tendency) from each individual value, positive values already indicate the attributes which are relatively more present in the visual destination brand than the others and as a logarithmic function, the extent of positive value does correlate with relatively more significance of the attribute in the photos over the dataset. Therefore, we can compare the DMO and UGC vectors of each city to identify where incongruence occurs, which can indicate where DMOs overemphasize an attribute which is comparably less significant to visitors or where DMOs need to emphasize more strongly an attribute which is found to have greater significance among visitors to that destination. We measure the Aitchison distance between both vectors to assess the similarity in direction (distribution of features), Fisher’s exact test pairwise on the features of both vectors (to find significance in variation between the same feature) and a geometric mean bar plot of both vectors to visualise where the vectors vary most from each other. Due to space, Table 3 is restricted to reporting for each city the Aitchison distance between the vectors, and the visual attributes which show significance in difference, as indicated by the centred log-ratio value – the higher the positive value, the more significantly present:

Table 3. Attributes with significant differences between DMO and UGC branding.

Athens’ DMO marketing aligns closest to the UGC photography from the destination, whereas it is Paris and Rome which show the most incongruence between projected and perceived images. Focusing on attributes which show relatively more importance to visitors in UGC content than in the DMO marketing (i.e. higher positive value), we can identify “gastronomy” and “shops markets” in Berlin, “gastronomy” and “museum” in Rome as well as “roads traffic” in Vienna as attributes DMOs should focus more strongly on presenting in their content marketing as they resonate with visitors.

Finally, we provide an interpretation of this data for the purposes of destination brand management. DMOs want to provide a consistently distinct visual communication about their destination through the strategy of positioning to successfully form a destination brand with the consumers. We have seen that European city marketing does lead to distinct positioning of their destinations as a visual brand, with Madrid promoting comparatively more its shopping offer, Berlin and Stockholm modern buildings and trees, Lisbon shows more animals, beach, landscape and water, and Vienna focuses on historical buildings, gastronomy and museums. In as far as this branding is aligned with what visitors find to be significant to visit at the destination, marketers believe this branding activity maximises consumer intention to visit as well as other success metrics like intention to recommend (including eWOM). We found that DMO and UGC visual destination brands were often aligned, which is perhaps to be expected (DMOs repost some of the UGC content and visitors photograph aspects they consider significant, which also may come from the projected image by the DMO). However, some attributes were more strongly present in UGC content which suggests that DMOs could incorporate more of those visual aspects in their content marketing strategy, such as the gastronomic offer in Berlin and Rome, shopping in Berlin, museums in Rome and roads/traffic (i.e. street scenes, the red and white trams) in Vienna.

6 Conclusions

In this paper, we have shown how a visual destination brand can be extracted and analysed to better understand how destinations are being presented in terms of their visual attributes. While studies have shown that images and videos is increasingly becoming more determinant for consumer’s image of a destination, tourism research lacks shared methodologies and approaches for measuring and processing those inputs as part of a destination branding strategy. We presented our approach which is based on an image classifier specifically fine tuned for destination visual attributes to measure the destination brand, the brand representation as a multidimensional vector and the use of compositional data analysis to cluster and compare vectors. Unlike previous work where we had looked at very distinct destinations (e.g. Maldives and New York), this time we chose 9 European cities which market themselves to a similar target audience. We found that their DMO marketing does position each destination differently, with Madrid, Lisbon and Vienna being the most distinct due to relatively more content highlighting different attributes of each destination. DMOs also seek to align projected with perceived image, with a number of cities (Berlin, Rome and Vienna) having opportunities to still strengthen their branding with attributes that are found to be significant to their visitors. The same approach may be applied to any destination(s). Future work would be to validate the set of visual attributes with visitors (do they align with how people make mental associations with a destination), compare the automatic labelling of the classifier with human annotators’ decisions, as well as correlate the extracted vectors with quantifiable destination metrics which, at the end, is the most important outcome for any destination marketer: e.g., which visual destination brand leads to more visitors? Such experiments are much more difficult as one can not easily separate out all the other factors which might influence a dependent variable such as visitor numbers. However, the means to (accurately) extract a visual destination brand and represent it in a manner viable for statistical analysis is an important first step and will hopefully support further research on understanding visual brand and its relevance to tourism research and marketing.