1 Introduction and goals

Destination image perception is a fundamental matter for destination managers, and consequently it has been widely investigated. Extant research underscores that communication directed towards potential visitors must be ‘congruent,’ as consistency in messaging increases the chances of the destination being chosen. This congruence must be achieved across all communication channels the organization uses (see Van Rompay et al. 2010, who study the congruence of text and pictures on web sites). In the case of destinations, it is crucial that this congruence is achieved over time, as the local attractions evolve from basic assets to more complex and rich pullers (Richards 2018).

Despite the important role image congruence and its evolution play in destination management, studies on both subjects in multi-asset destinations are limited, as is the investigation into multi-channel UGC (user-generated content). The literature review section examines the wealth of studies on perceived destination image generally using a single type of UGC, which usually support their findings in either photos or text, but exposes the gap in understanding the question when related to multi-source UGC.

The objective of this article is therefore to better study the congruence between the promoted image by a local DMO (Destination Management Organization) and how this image is perceived by visitors taking also into account longitudinal issues. This is researched with a quantitative, massive-data approach. Thus, this work contributes to the literature on UGC, destination image and communication congruence from an empirical point of view. The authors processed 9213 photos, 24,610 video seconds and 359,350 textual mentions from different social networks in order to understand discrepancies among different sources of UGC. Framed by congruence and evolution theories, and layering a longitudinal analysis, the shift in perception of attractions as they relate to the point of view of multi-channel UGC is examined. Therefore, the conclusions of this research will have theoretical as well as managerial implications as they relate to destination image strategy. While not the central research topic of the paper, it is a truism to say that the COVID-19 pandemic has hit the studied destination. Therefore, some comments on that matter are put forth in Sect. 4 (Context) as well as in the final section as a possible topic for further research.

2 Literature and theory review

How a tourist destination is photographed, videographed, or written about on websites or social media networks via UGC has been so amply studied since the 2000s that only a state-of-the-art article may render the width of the topic, see for example Camprubí and Coromina (2016) or Picazo and Moreno-Gil (2017). For their part, Lee and Rha (2018) confirm the interest of studies like the present one when they state that “tourism research is expanding into research on communication, CSR, and marketing.”

UGC has become a valuable tool for studying different aspects of a destination’s image, for example to better understand the destination’s image formation (Serna et al. 2013; Llodrà Riera et al. 2015; Micera and Crispino 2017), to compare the image projected by DMO versus the one perceived by visitors (Stepchenkova and Zhan 2013; Nechita et al. 2019; Mariné-Roig and Ferrer-Rossell 2018), to better understand managerial challenges in the destination (Alcázar et al. 2014; Stepchenkova et al. 2015; Mariné-Roig and Anton Clavé 2016; Rahman et al. 2016), to better market the place (Zhang et al. 2015; Doosti et al. 2016; Moro and Rita 2018; Luna-Cortés 2018; Tan 2018; Mohamed et al. 2019) and to improve visitors’ segmentation (Moliner-Velázquez et al. 2021).

These research efforts have not only resulted in recommendations that industry professionals may follow, but have on occasion generated new theories or refined existing ones (Munar 2011; Llodrà Riera et al. 2015) by comparing different image formation dynamics.

In their field work, researchers have used different techniques and data sources. For example, website mining (Hellemans and Govers 2005; Költringer and Dickinger 2015; Zhang et al. 2015; Mohamed et al. 2019), picture analysis (Negri and Vigolo 2015; Hunter 2016) or text analysis, be it from blogs (Akehurst 2009; Çakmak and Isaac 2012; Tseng et al. 2015) or from Tripadvisor (Kladou and Mavragani 2015; Chiu and Leng 2017; Garay Tamajón and Cànoves Valiente 2017; Wong and Qi 2017). Due to its speed and dynamism, Twitter has been studied as a communication tool, such as when catastrophes affect destinations (Barbe et al. 2018; Oliveira and Huertas 2019). The use of Facebook has been more varied and has served multiple purposes, compare Park et al. (2016) studying the communication of tourism policies, to Jadhav et al. (2018) investigating tourists’ behavior. The fact that online UGC content has been available for some years now means that longitudinal studies can start emerging (Wong and Qi 2017; Gálvez-Rodríguez et al. 2020). In a similar vein, the ‘traditional’ vs the ‘digital’ word-of-mouth are compared by Tan and Lin (2021).

However, most studies use a single type of UGC when studying destinations, and only a few research projects have combined more than one type of source. Table 1 gives a short overview of some of these publications. Until 2016, most research relied on Flickr and Panoramio to make comparative studies. The former has lost its significance and the latter was retired in 2016, so these sources are no longer available (or interesting) to today’s researchers.

Table 1 Studies dealing with multiple UGC types or sources

Thus, studies bringing together congruence and evolutionary theories with the analysis of multiple UGC sources are scarce, and hence a research gap that this article covers.

As for the selected theoretical frameworks, the first thing to notice is that the presented empirical evidence is from the city of Cartagena de Indias on the Colombian Caribbean coast, chosen because it is a multi-asset destination with several main attractions and so a suitable case study. Thus, this work is ultimately framed within Case Research Theory (Yin 1984; Stake 1995) as it discusses a concrete case from an empirical point of view with the aim of verifying (or modifying, as might be the case) existing theories (Eisenhardt 1989). Accordingly, in this work, empirical findings from a context are studied in-depth to confirm theories and propose modifications. Within Case Research Theory, different types of cases are posited: exceptional cases, regular cases, comparative cases, etc. This paper assumes the framework outlined by Neape et al. (2006) as the discussed case is representative. In fact, Cartagena is an adequate proxy for significant samples of Latin American destinations, boasting historical attractions as well as beaches, allowing the results to be generalized to other destinations in the region. This study can also be underscored as one of the first of its kind conducted in a South American city, a geographical area that has been underexamined.

Another important framework for this research is ‘congruence’. This concept appears to play a crucial role in the frictionless consumption process at different touchpoints, be it in the choices made by consumers or in the organizations’ communication directed towards them. Van Rompay et al. (2010) state that, in multimedia messages, congruence favors “processing efficiency”, which in turn accounts for favorable attitudes from the consumers’ perspective. According to these authors, there must be a “consistency among meanings associated with [different] elements within an (online) environment” (p. 23) to create congruence. In our study, searching for congruence means researching whether the UGC by visitors to Cartagena displays similar content across different social media channels, i.e., whether the perceived image for the tourist who post photos, videos, and/or text are congruent with each other or not. While Van Rompay et al. (2010)’s field work is based only on pictures and text, we extend our quest for congruence by adding video sources to our study.

Thus, our field work researches into the level of congruence among three types of UGC. For instance, a high level of congruence would mean that tourists perceive the destination in the same way, irrespective of the media they use, which can indicate an overall high congruence in the perception of the place—and favorable attitudes toward it. Correspondingly, a high level of congruence across different media outlets also means a correct strategy and use of these communication channels on behalf of the local DMO.

A potential threat to image congruence is the fact that attractions in destinations and the destination’s image are intrinsically dynamic factors. This is explained, among others, by Richards (2018), when he specifically asserts that destinations’ attractions evolve over time going from basic attractions (for example sun and sand), to tangible cultural attractions (built military heritage in the case of Cartagena), and finally to intangible attractions. This is a value-catching process and is pushed by both supply and demand. The data in this paper spans a wide timeframe; this allows for longitudinal reflections on the evolution of attractions, the place’s projected image and the destination’s perceived image.

3 Context: Cartagena de Indias, Colombia: from a growing to a Covid-plagued destination

Located on the Colombian Caribbean coast, Cartagena is one of the country’s well-established destinations, receiving around 3 million visitors annually, approximately 10% to 12% of them foreigners. Founded in the 1530 s by the Spanish conquerors, Cartagena soon became a key point in the colonization process of Hispanic America, leading to the construction of a consistent urban defense system with walls and fortresses (Bassols and Soutto-Colón 2020). Today, these ancient military structures, along with the historic and picturesque city center, play an iconic role in the city’s tourist landscape, landing a classification as a UNESCO-marked heritage in 1984. Thanks to its beaches, it was promoted as a sun and sand destination until around 2010. From then, it has been increasingly sold by the local tourist board as a ‘heritage destination’. This has allegedly turned Cartagena into a ‘multifaceted’ destination, i.e., a tourist space where different attraction typologies geared towards different visitor segments coexist.

This ‘multifaceted’ or ‘multi-asset’ character of the destination raises important questions about branding and management strategies (Bassols and Leicht 2020). It also raises questions about how the city is perceived by its visitors, in terms of what they think the destination stands for (i.e., beach or culture or other attractions). Therefore, a close study of the UGC by the tourists might provide essential clues to guide future developments of multifaceted destinations.

One study on visitors to Cartagena worth mentioning is by Pinillos Castillo and Hernández Vargas (2017), concerning the motivations of tourists from different markets of origin (Table 2). The study, conducted among 320 tourists visiting Cartagena in 2016, found that a majority (55%) were mainly interested in ‘culture’, whereas the remaining 45% were primarily interested in beaches and leisure. North Americans, and especially Europeans, showed a much wider gap (57% for culture versus 43% for beaches), whereas Latin Americans -with the exception of Brazilians-narrowed this gap (53% versus 47%). Thus, Table 2 shows that ‘cultural motivations’ are at the fore.

Table 2 Main attractions for foreign tourists visiting Cartagena in 2016

The general figures from the last decade show that Cartagena is a growing destination, or at least it was until March 2020 when the COVID-19 pandemic stroke. The fact that flights were called off for 5 months in Colombia (April to August 2020) as a cautionary measure towards the pandemic caused foreign tourism to completely collapse. The reaction from the national government in supporting the tourism industry has been relatively slow, according to several interviewed stakeholders. This situation has caused -nationwide- around one-third less turnover for hotels and restaurants, and about two thirds for the transportation industry. The impact on the largest tourist destinations in Colombia has been huge in terms of employment losses, however, the stakeholders see that Cartagena’s position as a well-established destination is not endangered in the mid- to long term. Companies were able to react to adapt to the new situation. Since September 2020, the beaches and the historic centre have reopened at a slow pace, with ‘biosecurity’ as an important argument. Meanwhile, the city has found its way back to its main vocation in past decades: being the largest national holiday destination. In fact, businesses were relatively happy with occupancies and expenditure levels in the Easter holiday 2021. Although international tourism has been at a low, some local companies say that events or incentive trips are being delayed but not canceled. Short-term predictions are difficult to make as the situation remains changeable as COVID-related travel warnings may appear (or disappear) within a few weeks. In fact, as stated by several stakeholders, short-term planning remains a challenge in the city for both the public and the private sectors.

4 Methodology and field work

4.1 Establishing the attractions’ typologies

A list of attraction types for photos and videos corresponding to Cartagena’s main attractions was set up according to FONTUR and Mincit (2014), after Richards (2018) typological division. These were the following 8 categories: Civil Architecture, Religious Architecture, Public Spaces, (standalone) Monuments, Hotels (and their different ambiances), Beaches, and People (this one split into ‘Locals’ and ‘Foreigners’). Figure 1 shows the process of establishing these typologies for pictures and videos (how these categories were matched to the text categories, see next subsection). Out of the 8 established typologies, 5 come from the classifications put forth by the literature and the local official sources, and the three categories of ‘Hotels’, ‘Locals’ and ‘Foreigners’ were created after examining the video and pictures results to further refine the retrieved data. In the case of ‘People’, due to the large numbers of pictures and videos containing people, this category is an intermediate one that is finally divided into two final categories: ‘Locals’ and ‘Foreigners’. This differentiation allows for a better understanding of this category, particularly the qualitative difference of having a visitor engaging with locals and their culture or with other tourists, possibly from his/her own traveling group (see the final sections for this discussion). With the category ‘Hotels’, the aim is to better understand the engagement of visitors with their lodgings, in an innovative view seldom taken before. Therefore, 5 categories come from the literature and 3 from the need to refine and understand the data further (Fig. 1).

Fig. 1
figure 1

Grounding the eight categories of attractions according to sources

4.2 Data collection

The authors of this paper intend to quantitatively analyse large swaths of data collected using web-scraping methods. Images for analysis were downloaded from Facebook, Flickr, Instagram, and the extinct Panoramio, and videos were taken from the sites Vimeo and YouTube. Texts were taken from Twitter and Instagram with the software developed by Mabrian Technologies. Purposely, different social picture and video networks were chosen to research any possible gaps or incongruences among these. Data were collected over several years from a broad time spectrum—from 2006 to 2018. Two intensive collection moments were 2016 for video and 2018 for text. This broad span of 12 years enriches the discussion, as it allows for some longitudinal considerations.

4.3 Data collection for photos

Concerning the pictures, the following seven social networks were considered from the outset: Facebook, Flickr, Instagram, the extinct Panoramio, Pinterest, TripAdvisor and Twitter. These sites were trawled for pictures of Cartagena using relevant keywords, randomly collecting the output produced by each social network’s algorithm in response to highly relevant keywords (‘Cartagena’, ‘Cartagena de Indias’, ‘Cartagena, Colombia’, etc.). The field work was carried out over 6 semesters, from 2015 to 2018, by volunteer tourism undergraduate students at the Universidad Autónoma del Caribe in Barranquilla, Colombia. A total of 15 students were involved in the project at different times. Each student was trained carefully before starting tagging, so that the tagging criteria would be consistently upheld for the duration of the whole project. The final result was 10,089 pictures with 28,219 tags. In terms of dates, the pictures cover a timeframe from 2006 to 2018, with the majority of them taken between 2010 and 2016.

However, some limitations imposed on the collected data meant some of the mentioned social networks were rejected. First, we did not want single pictures on the database, but rather pictures taken while the users were en route in the city, thus conveying an attitude of in-depth exploration. For this reason, only sets of a minimum of 4 pictures taken on the same day were entered into the database. In some instances, 3 pictures were accepted. This meant the pictures from Pinterest, TripAdvisor and Twitter were taken off the final list, as their users seldom uploaded more than one or two pictures on a given day. Additionally, there was much less quantitative material to collect in these three social networks, Pinterest being the least rich of the three sources, so the final result consisted of 9213 pictures (Table 3). The pictures from each source were not established beforehand, and instead reflect the best efforts of the data collectors and the ease with which pictures could be found and tagged.

Table 3 Photos admitted to the picture database

In the aforementioned database, each photo was entered with its basic data (URL, date of access, etc.) and subsequently the main motive(s) in the picture were tagged.

4.4 Data collection for videos

As for videos, a total of 74 videos were taken from Vimeo and YouTube. In order to make them comparable to the pictures, it was assumed that a 1-s video is equal to 1 picture. The videos were taken in 2015 and 2016, again by volunteer students. Therefore, the video processing was identical to that described above for photos. The most popular video social networks, YouTube and Vimeo were chosen, though the former provided more material than the latter. The length of the tagged videos and other figures regarding the processed video data can be seen in Table 4.

Table 4 General data for the tagged videos and video social networks

As said further above, in order to ensure comparability, the same eight categories (Civil Architecture, Religious Architecture, Public Spaces, Monuments, Hotels, Beaches, Locals and Foreigners) were created for both video and pictures.

4.5 Data collection for text

The third type of collected data are text. Texts were extracted from the social network sites Twitter and Instagram using Mabrian Technologies software, which output a total number of 359,350 textual mentions for Cartagena. Due to a software limitation, only the mentions included in the time span from June 2017 to June 2018 could be collected, i.e., a full year. The software did not output figures corresponding to customer keywords, and instead performed a text analysis with its own keywords (see Table 7) as well as a semantic analysis, in order to assign a tourism typology to each mention, dividing the resulting output into 9 categories.

In order to have comparable results among images and text, the 9 categories output by the text software were grouped into 2 overarching classes: General Tourism and Niche Tourism, and so were the 8 image categories. These two overarching typologies are recognized by many authors as perhaps the most primary tourism dichotomy (Novelli 2005; Butcher 2019). Thus, the 9 categories output by the text analysis system could be matched to the 8 categories created for photos and videos. To do so, the following categories of text tags and image tags were grouped into overarching categories in the following way:

  • Overarching categories for text tags:

    • General Tourism: ‘Sun and Sand’, ‘Night Leisure’, ‘Shopping’, and ‘Families’.

    • Niche Tourism: ‘Cultural Tourism’, ‘Active Tourism’, ‘Food Tourism’, ‘Wellness’, and ‘Nature Tourism’.

  • Overarching categories for image tags:

    • General Tourism: ‘Hotels’, ‘Beaches’, ‘Public Spaces’, and ‘Foreigners’.

    • Niche Tourism: ‘Monuments’ ‘Civil Architecture’, ‘Religious Architecture’, and ‘Locals’.

4.6 Data analysis methods: exploring congruence

In order to analyze the results, the authors used several quantitative analysis methods. For the analysis of the photo tag categories and the social networks sites, bivariate statistics such as cross-tables and chi-square tests were put to use. In order to determine the level of congruence between photos and videos, these were analyzed by applying Spearman's rank correlation on the rankings of the represented types of attractions for both types of data (Tables 5 and 6). Another cross-table was also set up to compare the number of text mentions belonging to the ‘general’ and ‘niche’ overarching typologies (Table 7).

Table 5 Classification of picture tags into categories, absolute numbers and percentages
Table 6 Combined results summing up each category’s tags across the different video social networks used
Table 7 Text mentions related to their tourism typology

5 Results

This section presents the results of processing the photos, videos and text, each separately. In the case of photos and videos, the tables are similar due to the fact that, as said above, the same 8 categories were applied to both photos (Table 5) and videos (Table 6), thus allowing for better comparisons between them and setting the text results slightly apart.

In Table 5, the columns display the 8 attraction categories. The rows correspond to the four researched social network sites. A chi-square test was performed (χ2 = 4919.7, df = 21, p value < 0.001) therefore differences among some of the social network sites were found.

The table shows the most and least tagged categories for each given social network. The category ‘Civil Architecture’ appears as the most favored category, mainly because Panoramio (37.22%) and Flickr (26.68%) display this category far above the other ones. ‘Monuments’ (16.10%) comes second because of their high share in Facebook (23.22%) and Flickr (17.14%). The categories ‘Hotels’ and ‘Religious Architecture’ are the least favored categories in each of the four networks, save for Panoramio which, significantly, shows ‘Foreigners’ as the least relevant category. Flickr is the most ‘balanced’ network, i.e., the one displaying more percentage averages. Finally, the authors observe how Facebook’s and Instagram’s most favored category is by far ‘Foreigners’, at almost 40% in the former and 30% in the latter. These two networks also display the lowest percentage of ‘Locals’ pictured.

The video results, similar to the pictures’ analysis, were collated and the mentions per type of attractions were counted. As mentioned above, the same 8 categories were created to tag the videos. The results of the video tagging in Table 6 show that ‘People’ (i.e., the combined categories of ‘Locals’ and ‘Foreigners’) is clearly the most favored motive, with almost 44% of the tags (25.61% for ‘Foreigners’ plus 18.21% for ‘Locals’), followed by ‘Civil Architecture’ (15.47%). ‘Religious Architecture’ is the least favored, with 4.46% of the tags.

The rankings of the listed types of attractions for photos (Table 5) and videos (Table 6) are correlated but only up to a certain point: Spearman's rank correlation coefficient = 0.691, showing thus a significant (though not overwhelming) level of congruence across these information sources. Comparisons between videos and photos also show that the results for certain categories are quite disparate: for the case of videos, the overarching category ‘People’ obtains almost 44% of the video tags whereas for the photo tags this category merely represents 25.81%. ‘Civil Architecture’ comes second but far behind with 15.47% of video tags. ‘Religious architecture’ is the least tagged category for both photos and videos, receiving 4.46% of video tags and 5.32% of the photo tags. In short, video accentuates differences in some instances.

Finally, Table 7 shows the results for text analysis. Here we see that ‘Cultural’ motivations are at the very fore (31%), followed from a distance by ‘Sun and Sand’ (13%). As stated in the previous section, the output types of tourism categories were grouped into the overarching ‘niche’ and ‘general’ to make it possible to conduct a comparison with the photos and videos. The table presents the identified types of tourism in Cartagena, alongside their mentions on social media and their classification into ‘general’ or ‘niche’.

6 Discussion and implications

This section discusses the data in an incremental way, i.e., pitching first the pictures of the four different networks against each other, followed by a discussion of stills vs video, and finally it discusses image vs text.

6.1 Cross-comparing the pictures: a longitudinal view

The authors use longitudinal methodology to make comparisons between different social media platforms, rendered possible by the large time span of the collected data. First, it is interesting to see that for users of the extinct Panoramio, ‘Civil Architecture’ is by far the winning category (37.22%) plus a very high share of ‘Beaches’, the highest among the four social networks (15.43%). Panoramio has the oldest set of pictures on average and became extinct some years ago, so it shows the typical behavior of tourists visiting the city in the first half of the 2010s, as built heritage was by far the largest tourist-puller to the destination, complemented by sun and sand. Flickr, with pictures from a larger time span, follows this trend but in a more nuanced way, balancing tangibles and intangibles: ‘Civil Architecture’ 26.68% closely followed by the overarching category ‘People’ (25.62%) which includes the sets of ‘Locals’ (13.13%) and ‘Foreigners’ (12.49%) in quite even numbers—in fact, this is the best balanced relationship among ‘Locals’ and ‘Foreigners’ in the four social networks, the other three showing very disparate values either in favor of ‘Locals’ (Panoramio) or ‘Foreigners’ (Instagram, Facebook). The most recent sets of pictures on Facebook and Instagram specifically show ‘Foreigners’ as the dominant category, and thus come closer to the results of the video data set for this category (Facebook 39.87%, Instagram 30.15%, Video 25.61%), demonstrating how, in the course of the last decade, tourist motivations and habits have substantially shifted. Interestingly, on Flickr, Facebook and Instagram, ‘Civil Architecture’ comes before ‘Beaches’, so the current photo rankings of attractions in Cartagena is (1) ‘Civil Architecture’ (27.34%) closely followed by (2) ‘People’ (25.81%) and ‘Beaches’ come third (12.2%).

In sum, for picture shooters, Cartagena is perceived as a ‘dual’ destination with two main attractions: ‘Civil Architecture’ and the overarching category ‘People’. These findings present interesting managerial conclusions, discussed in the next Section.

6.2 Pictures vs. video

These two types of UGC are the most easily compared of the three, as the tags and categories were created for both by the authors following Fig. 1. However, pictures and video show a different pattern for their categories: in video, ‘People’ is the dominating category, specifically ‘Foreigners’ (25.61%), followed by ‘Locals’ (18.21%) and ‘Civil Architecture’ (15.47%). As seen above, the photo sets display quite a different pattern altogether, with ‘Civil Architecture’ and the overall category ‘People’ quite even. An explanation here might be that videographers interact much more with ‘moving’ motives like people, than with static motives.

‘Religious Tourism’ is the loser in both pictures and videos. The category of ‘Religious Tourism’ was used in the tables because, some years ago, the local DMO started promoting this type of tourism and we wanted to check on its progression. This introduction was a bit of a contradictory move, taking into account the large share of visitors interested in picturing people and even beaches before religious motives (Table 3). The present research confirms that the strong cultural and sun and sand image of this destination cannot be changed easily by just ‘adding’ yet another product: if this product is not adequate to the destination, it will fail, as has happened with religious tourism in Cartagena. This finding is in line with the widely accepted literature indicating that the overall image of a destination evolves very slowly, rendering short-term changes difficult (Berrozpe et al. 2017).

In both sets, another losing category is ‘Hotels’, which means that, notwithstanding the high standard of some of the city’s accommodation facilities, visitors give much more importance to the destination’s attractions than to the superstructure of the place, regardless of how nice or well-built it is. This is a finding full of implications in terms of place management and business strategy.

As for ‘Beaches’, this is the most consistent category between the photo set and the video set. This category has an intermediate position in both rankings, showing a consensus that, for the whole studied period, this is a complementary product, and not the main attraction in the destination. This has also been communicated this way in recent years by the DMO, so we see here congruence between supply and demand.

6.3 Image vs. text

Comparing both pictures and videos with text is more difficult as the latter data categorization differs from the other two data sets (see above). Also, the data set is much larger (359 K text mentions) than picture tags (23 K) or video tags (36 K). As explained above, the 9 categories output by the text analysis system were matched to the 8 categories created for photos and videos by generating the two overarching categories ‘general tourism’ and ‘niche tourism’ (see Sect. 4.2).

Grouped in this way, the text data set comes to 59% of the mentions in the ‘niche’ typology, whereas 41% belong to ‘general tourism’. If we replicate this with the other two data sets, pictures show a 59% of ‘niche’ preferences vs 41% of ‘general’ preferences, a full coincidence with the text data set. As for the video tags, 52% are ‘general’ and 48% are ‘niche’, interestingly inverting the trend. We see that, for the first two groups, Cartagena is nowadays attractive mainly because of both its tangible and intangible cultural resources rather than its basic attractions (‘Beaches’ and ‘Public Spaces’). Videographers show almost split preferences here, but with a slight predilection for the ‘general’ attractions.

7 Conclusions, limitations and further research

This research finds several gaps and differences in congruence among the three different types of UGC studied and within the four sets of pictures examined as well. Photographers at the destination still appreciate the many edifices and monuments Cartagena has to offer, with the exceptions of instagramers and facebookers, for whom picturing ‘Foreigners’ is, by far, the most exciting activity (i.e., interacting with other travelers, possibly form their traveling group). These differences are also partly explained by the characteristics of each picture network and their users: Flickr and Panoramio are used by mainly prosumers, whereas Instagram, Facebook and the video networks are used by regular consumers.

Text writers, for their part, seem much more interested in the cultural aspects of the place and the same goes for Panoramio and Flickr, which are networks for people interested in the big architecture and monuments the destination offers. These three groups display similar behavior and preferences, contrasting with the other three groups: videographers, facebookers and instagrammers. So, we see here the first general incongruences and specific congruences.

As for the coincidence in the overarching categories ‘general’ and ‘niche, in the last paragraph of Sect. 6, we have we established the congruence of text and pictures, showing a high degree of congruence share between ‘general’ and ‘niche’ tourism types (59% vs 41%). Videos show a different pattern here (48% vs 52% for ‘general’) so when looking at the data from these two overarching categories, we also find incongruences.

If we now observe the above results in a longitudinal way, it must be said that the DMO pushed beaches as Cartagena’s main product in the 2000s. In the 2010s, the DMO sold built, tangible heritage as the main product, and it mainly continues to do so. Therefore, while a significant section of visitors sees their tastes catered for, another, growing section of them, are not served adequately: those wanting to deeply engage with the locals and their culture (revealed under the category ‘Locals’) and those seeing Cartagena as just a relax place, revealed in the category ‘Foreigners’ in stills and video—this trend seems to have grown in recent years, as exposed by the ‘youngest’ photo social networks, i.e., Facebook and Instagram as well as the video networks.

Pitching these results against Richard’s theory, we must say the destination seems completely stuck on its ‘tangible heritage’ phase. The city seems to forever cling to its material heritage without showing any big signs of evolution. Even the preference for picturing or recording ‘Foreigners’ (to the detriment of cultural motives) in the most recent data sets may point to a backwards trend towards the 2000s when the city was just promoted (and perceived) as a sun and sand destination. At the moment, it is not clear how the destination is evolving as, for the DMO and most visitors, the city’s tangible assets are its main ones. As a matter of fact, the growing category of ‘Locals’, i.e., visitors engaging with local culture and residents, seems to go unnoticed and uncatered for and, according to Richards, this is the segment with the biggest potential in places like Cartagena. As a general remark here, we may see the power the DMO has on the destination image: most visitors’ content reflects the current DMO’s image strategy, hence showing congruence.

The results also claim for a broader concept of ‘congruence’. Van Rompay et al. (2010) use this concept in a static way, i.e., as the possibility of studying ‘one moment in time’ in a communication setting. However, things become less clear in the case of longitudinal congruence studies. Particularly when focusing on the destinations’ evolution and shifts, congruence may be there at some point in time (as in the first half of the 2010’s in Cartagena) but then at other times congruence may lessen, as in recent years. Therefore, congruence also becomes determined by the ‘time’ factor; it is not a static, it is evolving within a given communication setting. Also, the more features or UGC types introduced in a study, the more exact the findings, but also the higher the likelihood of a low-congruent result. If the reader tries to unify ‘Locals’ and ‘Foreigners’ above into a single category of ‘People’—as we did in a previous version of this article—the final results show a higher degree of congruence. Therefore, ‘congruence’ as such depends on the number of features considered as well as the number or UGC sources taken: Van Rompay et al. (2010) took pictures and text; we have added video and this supplementary source brings incongruence to the final results. We therefore hypothesize that, adding a fourth source (say, audio) may still produce more incongruence—but adding more UGC sources to such research remains also a possibility for further study.

There are several limitations to the present work, notably the different dates among the three different types of data collected, as stated above. The fact that video and text never overlap in dates might pose a limitation. Another issue is the origins of the visitors producing UGC: Cartagena’s foreign arrivals are some 10% to 12% of the total arrivals at the destination, so the vast majority of its visitors are nationals. Even the most optimistic estimates based on the recent years of the tourism boom in the city do not estimate the foreign visitors’ share above 15%. However, this is compensated by a far much larger engagement of foreigners with social networks: for instance, in the case of text mentions, 60% were collected from nationals, while around 40% are from foreigners. Also, in the photo and video datasets, the presence of foreigners is much more significant than their share of arrivals at the destination. Apart from the fact that foreigners engage much more with social media than nationals, it also seems plausible that foreigners look for more “added-value” products than beaches, whereas nationals use Cartagena more as a relaxation destination. These facts might have introduced some biases into the data. However, notice that our goal in this paper is not to study segments or sub-segments of visitors to the destination but to research into the overall congruence of the place’s image. Combining this with the study of segments of visitors would have made for a very cumbersome paper, so this is a possible future research avenue more focused on consumer behavior rather than destination image. The same goes for other side issues such as cross-posting: our massive-data approach disregards them but future works based on qualitative methodologies (netnography, etc.) might take these matters into account and provide further insights.

As for the field work, it is worth noticing that, in recent years, social media providers have revealed the trend towards hiding relevant picture information (geolocation, date, hour…) and closing their APIs (Application Programming Interfaces). Users’ demographics information is more and more difficult to come by, thus making segmented studies almost impossible. As providers seem more and more reluctant to share their information, working with social media in the future will be more cumbersome and harder than it was for this study.

In any case, the usefulness of cross-studies like this one is confirmed, as is the usefulness of longitudinal studies regarding these issues. Looking into multiple-source UGC may yield interesting visions of the destination, which can benefit all the stakeholders. Of course, another way of having good insights into the goals of the present research is by studying other destinations (possibly similar ones) using the frameworks and methodologies we have applied here. According to Case Research Theory, such studies would be a source of enrichment to the conclusions put forth by this paper—or moderate them, as the case might be. This is a future research avenue.

And finally, given the current pandemic context affecting tourism worldwide, a thought-provoking research work to do in the future would be to check whether the post-COVID-19 UGC is similar or differs from the pre-COVID-19 UGC. This would be a highly interesting contribution towards the current discussion on COVID-19 and its impact on tourism, specifically on destination image. However, this will be possible only in the mid-term once international tourism has resumed in Cartagena.