Keywords

1 Introduction

The outbreak of the COVID-19 pandemic led to severe damage to the international tourism industry, but it can also be a transformational opportunity [1, 2]. As tourists’ travel behaviors change, destination marketing and management face new challenges. Some destination marketers are increasingly using digital marketing methods (e.g., short promotional videos on YouTube) to ensure that potential future visitors will remember and visit the destination when tourism recovers in the future [3]. For example, on the Douyin (Chinese version of TikTok) platform, the number of users interested in travel has exceeded 270 million [4]. In particular, the total content of STVs has increased by 65%, and the consumption of STVs have increased by 117% compared to the period before the COVID-19 [4]. These studies and practices demonstrate the potential and importance of short-form travel videos (STVs) in destination marketing digital transformation strategies during the COVID-19 pandemic.

Considering the above, this study constructs a research mechanism based on SOR theory by using a scenario-based experiment related to Ganzi (甘孜) destination. This study aims to develop a research model that is applicable to STVs content, and to design a new survey and quality control method appropriate for STVs research. Furthermore, the purpose of this study is to explore the possibility of using STVs for cross-border promotion of rural tourism.

2 Literature Review and Hypothesis Development

2.1 STVs and Destination Marketing

The importance of STVs for destination marketing and tourism is increasing in the COVID-19 era. For example, Du et al. [5] verified the continued growing influence of the short-form video platform TikTok in shaping destination image and changing tourists’ behavioral intentions through qualitative analysis methods (such as interviews). Although the role of STVs content in terms of entertainment, multi-sensory experiences, and re-experience memories has been highlighted [5], a quantitative research model has not been constructed. In addition, it has been found that influencers and celebrities [6] who produce tourism content on short video platforms stimulate users’ travel interests and travel intentions. This provides evidence that short video marketing has an important impact on destination marketing. However, the impact of the unique attributes related to short video content has been overlooked.

Moreover, although these studies argue that STVs or vlogs have an important impact on destination marketing, the definition of the concept of STVs is not clear, it is not appropriate to define the concept of STVs only by dividing it from the duration of 15 s to 60 s [5]. Additionally, the framework and factors of the research model were not adjusted according to the characteristics of the STVs as visual stimuli. Therefore, the results may be affected by subjective cognitive and memory bias [7].

Among the few destination-specific studies, Cao et al. [8] emphasize that short videos are an effective tool for destination marketing and tourism marketing. In particular, the immersive experience of STVs content can positively impact destination brand attitude. However, the mechanism by which STVs affect destination image and tourists’ travel intentions has not yet been developed. Cao et al. [8] used only one user-generated short video for the survey, which was not representative. Furthermore, the influence of the platform and the users’ possible inherent attitudes toward the destination were also not fully controlled, which may have impacted the study results.

To clarify the impact of STVs on destination image and potential tourists’ travel intentions, STVs can be defined in terms of the following three points based on the prior studies: (i) Cheng et al. [7] emphasized that natural, authentic, and unique elements of destination-related content are the essential influence of STVs. Therefore, inclusion of tourism elements unique to the destination (such as scenery, ethnic culture, and food) is defined as the first point; (ii) Based on the different scenarios of destinations and SNS platforms, the expression of STVs in terms of content requires using post-editing for integration [5, 7, 8] (e.g., background music, special effects, etc.). Hence, multi-angle expressiveness (such as content presentation that incorporates background music and video post-editing) is defined as the second point; and (iii) The prior studies highlighted the importance of sociality [7, 8] and shorter duration [5] for the attraction of STVs. Therefore, applicable to the short video platform and within 1 min in duration are defined as the third point. Based on the above key characteristics of STVs, this study introduces a framework to illustrate how STVs content stimulates tourists’ emotional resonance and cognitive resonance formation and how it influences destination image perception and travel decision-making processes.

2.2 SOR Model

The Stimulus-Organism-Response (S-O-R) model [9] originated in the field of environmental psychology and explains that when an individual is stimulated by the environmental stimuli (S), the individual's emotional states will be affected (O), and then affects the individual's response (R). Existing consumer behavior models, such as the technology acceptance model (TAM) and theory of planned behavior (TPB), have limitations in explaining the emotional side of consumers, while SOR theory, as a complement, is superior in users’ emotion research [10]. Thus, it has been widely used in consumer marketing [11] and tourism research [12]. Recently, literature has explored the use of SOR theory to examine the impact of STVs on users’ travel behaviors [10, 13]. However, mature research models and scales have not yet been developed for STVs content.

Therefore, the research model of this study is based on the SOR theory, with reference to four realms of an experience [14] and the theory of resonance in travel vlogs [7], and elaborates on the Organism and Response stages: (i) Organism. According to the characteristics of STVs, key factors related to emotional resonance and cognitive resonance are selected as the subjects of the Organism stage, and finally, the attitude towards STVs is formed to influence the Response. (ii) Response. In the context of tourism, the response stage was divided into two parts: destination image and tourism intention.

2.3 Emotional Resonance and Cognitive Resonance

Cheng et al. [7] emphasized that emotional resonance (such as inspiration) and cognitive resonance generated by users while viewing STVs can have an impact on their engagement behavior and travel intention. The research model and related factors were adjusted by considering the research context of STVs and the possible heuristic resonance [8, 15] from visual stimuli. According to Cheng et al. [7], emotional resonance is based on the factors that aroused by the audiences’ feelings, passions and desires of the audience. When short video consumers watch STVs, self-reference and sense of presence are the main psychological feelings that may be evoked [8, 15]. These two feelings are very important for emotional resonance, because they allow the STVs consumers to escape from reality for a short period of time and create emotional involvement [7]. Thus, self-reference and sense of presence were included in the emotional resonance dimension. Self-reference is defined as the recall and memories of one's past travel experiences inspired by the viewing experience of STVs. Sense of presence is defined as a psychological state in which the travel experience or destination in STVs is experienced in a sensory or non-sensory way as a sense of reality.

Meanwhile, Klimmt and Vorderer [16] emphasized that the user's memory and familiarity with scenes and spatial associations can have an impact on the sense of presence created by media products. Moreover, since factors related to inspiration in STVs (e.g., narrative structure) influence the sense of presence [8], and self-reference belonging to inspiration-related factors [15], this study hypothesizes that:

  • H1: Self-reference positively affects sense of presence.

Cognitive resonance was based on the object’s attraction to audiences’ values, beliefs, and understandings [7]. According to Cheng et al. [7], aesthetic fatigue may weaken the impact of cognitive resonance. Furthermore, Cheng et al. [7] also demonstrated the importance of source credibility. Considering the research context and user’s perspective of STVs, this study argues that the value of STVs content is also reflected in entertainment experience [16, 17]. Therefore, aesthetic experience [14, 15], credibility, and entertainment experience were included in the cognitive resonance dimension of STVs. Moreover, perceived aesthetics is defined as users’ aesthetic perceptions of the overall elements of STVs content, video credibility is defined as users’ demonstrated trust in STVs content; and perceived entertainment is defined as perceptions of entertainment-related elements in STVs content.

According to the findings of prior studies, cognitive experiences related to entertainment and aesthetics are closely linked to immersion [14], and the effect of immersive video on perceived entertainment and credibility is mediated by the sense of presence [17]. Therefore, the sense of presence stimulated by STVs content may have an impact on perceived entertainment, video credibility, and perceived aesthetics. Meanwhile, the sense of presence may become a key mediator factor connecting emotional resonance and cognitive resonance. Based on the above, the following hypothesis is proposed:

  • H2: Sense of presence positively affects (a) perceived esthetics, (b) video credibility, and (c) perceived entertainment.

2.4 Attitude Towards STVs and Response

Attitude towards STVs is the final factor formed at the organism stage. According to SOR theory, it is the key factor that influences the subsequent user's response.

On the other hand, the effects of the three cognitive resonance factors (perceived esthetics, video credibility, and perceived entertainment) on users’ attitudes have been examined in many prior studies. For example, users’ attitudes towards videos on YouTube are influenced by the credibility of the video content [18], the perceived aesthetics of travel-related content has a direct impact on users’ attitudes [15], and entertainment content on the video platform has a significant impact on users’ attitudes [19]. Therefore, the following hypothesis is proposed:

  • H3: Perceived esthetics affects attitude towards STVs.

  • H4: Video credibility affects attitude towards STVs.

  • H5: Perceived entertainment affects attitude towards STVs.

In the context of tourism, the response stage was divided into two parts: destination image and travel intention. In fact, destination image is defined as an individual's overall impression of a place [20]. Meanwhile, many studies have shown that destination image has an impact on users’ travel intentions [e.g., 21]. In this study, destination image is defined as the overall perception of the relevant destination in STVs after viewing them.

According to prior studies, users’ attitudes towards destination marketing campaigns or content can impact destination image and travel intention [22]. Therefore, as a potential destination marketing tool or content, STVs may impact the destination image and travel intention of potential tourists. Currently, the research related to the impact of STVs marketing on destination image and tourism intention is increasing [10, 13]. Additionally, the perceived risk associated with the COVID-19 pandemic has an in-creasing impact on users’ travel intentions. Rather [23] verified that perceived risk during an epidemic may moderate users’ travel intentions. Therefore, it can be hypothesized that:

  • H6: Attitude towards STVs positively affects the (a) destination image, (b) travel intention.

  • H7: Destination image positively affects the travel intention.

  • H8: Perceived risk of traveling during COVID-19 moderates (a) the relation between destination image and travel intention, (b) the relation between attitude towards STVs and travel intention.

Based on the above, Fig. 1 shows the hypothesis model of this study.

Fig. 1.
figure 1

Research model.

3 Methodology

3.1 Research Design

This study was conducted in Japan, and the survey respondents were Japanese people who had travel experience in the last three years (2019–2022) but had not been to the Ganzi destination. Moreover, the research design was divided into two main parts. The first is the selection of the destination and respondents. The destination was set as Ganzi Tibetan Autonomous Prefecture, located in Sichuan Province, China. This is because Ganzi as a destination is currently not well known to international tourists, providing good conditions for research design and exploration. Meanwhile, although Ganzi was once the lowest known and poorest region in China, during the COVID-19 pandemic, it was able to lift itself out of poverty by promoting the local tourism industry through STVs [24]. Therefore, using Ganzi as a destination is meaningful for solving the problem of cross-border tourism promotion in regions (especially rural areas) during the COVID-19 pandemic. The respondents of this study were Japanese because Japan is one of the world’s top outbound tourist source nations [25]. Second, the selection of sample videos and research methods (e.g., quality control methods) was improved to exclude the influence of platform attributes, users’ inherent perceptions.

3.2 Data Collection and Quality Control

In August 2022, two questionnaires were distributed with the help of the local research firm FREEASY in Japan. The first was a screening questionnaire to select the main questionnaire respondents. The second was the main questionnaire with a scenario-based experiment. In addition to the measurement items, respondents were provided with five sample STVs and a question related to the content of the videos. Considering that this study was a scenario-based online experiment, many quality control methods were used in the research design and survey implementation.

Stage 1 (screening questionnaire) was as follows. To accurately exclude non-target respondents, the respondents were not asked directly whether they had been to Ganzi, but several destinations were mixed in the options from which the respondents could choose. In addition, respondents were asked about the frequency of their use of short videos and their travel during COVID-19. As a result, 792 respondents were identified as suitable for the main questionnaire, from which 600 responses were collected.

Stage 2 (selection and processing of sample STVs) was as follows: (i) To balance the influence of the attributes of STVs authors, user-generated STVs and professional-generated STVs were selected according to the characteristics of the destination image, not based on the popularity of the short video platform. (ii) To avoid the influence of users’ inherent perception of China, only 2 sample videos were slightly adjusted. Meanwhile, in order to ensure the viewing experience, and accurately test the impacts of original STVs contents without the interference of foreign language, some of the video content (location tag in Chinese) and back-ground music containing Chinese were post-edited, blurred, or converted into Japanese. (iii) To avoid the influence of platform attributes and video resolution, proprietary video links were used, and video resolution was controlled to be consistent when playing.

Stage 3 (the main questionnaire) was as follows: (i) Respondents must click on the links of the five sample STVs videos to watch all videos completely (the system will automatically monitor them). In addition, only the respondents who correctly answered the questions related to the sample STVs content could continue to fill out the questionnaire. (ii) Each ID account can be submitted only once. (iii) Reverse-worded (RW) items were used (e.g., the items of “video credibility” in the Table 1). (iv) The standard deviation of all responses must be greater than 0.5.

After the stage 2 and 3, 456 valid responses were obtained from 600 responses.

3.3 Measurement and Analysis

Measurement.

Measurement items were developed from existing literature using a seven-point likert scale (1 = strongly disagree; 7 = strongly agree). The scales used in this study were those that have been validated in prior literature and modified specifically to fit the research context of STVs. Meanwhile, because the original questionnaire items were in English, all items (Japanese version) were confirmed by several professional scholars proficient in English and Japanese (including native speakers) to ensure the accuracy of the translation. Items measuring self-reference and perceived esthetics were adapted from Hsiao et al. [15], whereas items for sense of presence were derived from Cao et al. [8]. The measurement items for video credibility were adapted from Li [13]. The items for perceived entertainment were developed from Cheng et al. [7] and Chen et al. [19], while items for attitude towards the short-form travel videos were derived from Xiao et al. [18]. The measurement items for destination image and travel intention were adapted from Ong et al. [22]. Furthermore, items measuring the perceived risk of traveling during COVID-19 were adapted from rather [23].

Analysis.

Descriptive analysis was performed using spss software. Subsequently, the measurement and structural models were examined using partial least squares structural equation modeling (PLS-SEM). PLS-SEM is appropriate for predictive studies and has the flexibility to handle complex models, small sample sizes, and non-normal data [26]. PLS-SEM was selected because of the predictive nature of this study and the complexity of the research model. Moreover, the measurement and structural models were evaluated via SmartPLS, using the pls algorithm and bootstrapping (5000 subsamples).

4 Results

4.1 Descriptive Statistics and Measurement Model

After 15 days of data collection, 456 valid responses were obtained from 600 Japanese people who had travel experience in the last three years (2019–2022) but had not been to Ganzi.

The respondents included 220 males and 236 females. 15.1% of the respondents are aged 15–19, 13.2% aged 20–29, 15.1% aged 30–39, 18.2% within the 40–49 age range, 19.3% who are aged 50–59, and 19.1% are aged 60–70.

The measurement model was evaluated in terms of internal consistency reliability, convergent validity, and discriminant validity of the constructs. As presented in Table 1, only one indicator loading was 0.670, but this was retained [27] for theoretical reasons. Except for this, all indicator loadings were between 0.748 and 0.932, which was greater than the threshold of 0.70 [26]. Cronbach’s α and composite reliability (CR) exceeded 0.70, indicating sufficient construct reliability [26]. All values of average variance extracted (AVE) were between 0.634 and 0.817, surpassing the stipulated threshold of 0.50 [28], showing a satisfactory degree of convergent validity. Moreover, the square root of AVE for each construct was higher than the correlations between the constructs [28]. Based on the above, the measurement model successfully satisfied internal consistency reliability, convergent validity, and discriminant validity.

Table 1. Measurement model for constructs.

4.2 Structural Model and Hypothesis Testing

The structural model was evaluated using a series of statistical indices based on PLS estimation. The standardized root mean residual (SRMR) was 0.070, lower than the stipulated criterion of 0.08, indicating a good model fit [29]. The inner and outer values of the variance inflation factor (VIF) were between 1 and 4.309, which was less than 5, indicating no multicollinearity [26]. The effect sizes (f2) of most paths exceeded 0.15, indicating medium-to-large effects [30]. The proposed research model manifested relatively moderate explanatory power (SP = 0.441, PES = 0.293, VC = 0.149, PEN = 0.371, AT = 0.596, DI = 0.633, and TI = 0.312) [30]. Moreover, the results of the blindfolding test showed that, all values of Stone-Gaisser’s Q2 for endogenous constructs exceeded the minimum requirement of zero (0.456–0.675), implying good predictive relevance [30]. The results of the hypothesis testing are presented in Table 2. Eleven of the 12 hypothesis paths were confirmed to be significant and supported. The mediating effects of five factors (SP, PES, PEN, AT, and DI) were significant and supported. In addition, the effects of the three control variables (frequency of viewing short videos, gender, and age) on AT were not significant and unsupported.

Table 2. Results of hypothesis testing.

5 Discussion

The findings attempted to demonstrate that the influence mechanism (organism and response) of STVs content (stimulus) on users’ travel-related behavioral intentions can be divided into three main phases.

Emotional resonance is the first phase of an organism that embodies the process of evoking emotional resonance of the user in a short period of time. Meanwhile, it is also the most important phase that motivates users to continue watching STVs content and determines whether users can continue to evoke cognitive resonance and travel behavioral intention. This is consistent with the fast-paced, fragmented nature of short videos. Users complete the initial reception of STVs content in this phase. In addition, SR is the key factor in stimulating users’ emotional resonance and immersion experience, while SP is the key mediating factor in connecting emotional resonance and cognitive resonance. Cognitive resonance is the second phase in an organism. PES, VC, and PEN jointly build the cognitive resonance framework and work together to influence AT. PES and PEN have a mediating effect, and in this stage, users complete the deep reception and reprocessing of STVs content. In the third phase (response), AT is the final output factor that affects the users’ travel intentions and destination image. Meanwhile, although PR has a moderating effect on the path from DI to TI, it does not have a direct impact on the path of AT to TI. This finding reconfirmed the potential effectiveness of STVs marketing during the COVID-19 pandemic [5].

6 Conclusions

6.1 Theoretical and Practical Implications

Theoretical Implications.

Based on the prior studies, to exclude the interference factors that may affect the results, this study attempted to refine the research design and methods for STVs by determining the selection criteria of sample videos, conducting a scenario-based online experiment, and developing possible quality control methods. Furthermore, this study attempts to provide a convenient and reliable research method for researchers engaged in STVs and SNS content, and also contributes a fundamental theoretical framework with feasibility for future research related to short video content.

Practical Implications.

These findings highlight the potential of STVs as important tools for destination marketing and cross-border tourism promotion during the COVID-19 pandemic. First, according to the findings, to engage users in the first moment of viewing STVs content, it is important to inject more emotional resonance into STVs content to quickly create a connection with users. For example, destination marketers can produce more STVs from a first-person perspective or actively use user-generated STVs for a real travel experience. Through this type of content production, users’ travel-related memories can be better inspired, thereby enhancing the impact of sr. In addition, video content can be processed through technical means, such as post-editing and video effects, allowing users to gain a stronger sense of immersion and enhance the impact of SP. Meanwhile, destination marketers must focus on improving the aesthetics, entertainment, and credibility of STVs content when producing STVs. These factors related to cognitive resonance can lead to the formation of positive attitudes towards STVs content, thus creating a good destination image and inspiring users’ travel intentions. The above findings provide useful references for regions and countries that use STVs to design and implement cross-border tourism promotion. Secondly, taking ganzi as an example, this study attempted to demonstrate the possibility of STVs for international promotion and destination image building in impoverished regions.

6.2 Limitations and Future Studies

This study has some limitations because the research model and survey methods related to STVs are still in the basic stage, and the reusability of the methodologies needs to be further explored or improved in the future.

For future research, it is necessary to make a larger experiment with different control groups (e.g., different destinations and respondents from various countries), because this study was conducted with only 1 sample group of 5 STVs due to objective conditions. Also, in order to determine if STVs are indeed significant in terms of impact, the impact of STVs needs to be compared to the other media (e.g., images or text) based on the confirmation of baseline or ground truth. Meanwhile, as the practice and theory related to STVs are still in the development stage, more relevant influencing factors, such as video quality, narrative transportation, and perceived interactivity, may need to be identified to extend the research model and theoretical framework. Especially, with the easing of COVID-19 restrictions, the impact of STVs should be further explored through offline surveys.