1 Introduction

In the past decades, the global tourism industry has experienced major losses and damages caused by various unfortunate events, including natural disasters, epidemic crises, and man-made disasters [18]. Currently, the industry is in dire straits due to the COVID-19 outbreak, which was declared a pandemic by the World Health Organization (WHO) on March 12, 2020 [8, 26]. Governments at regional and national levels across the world have so far announced and implemented policies, such as travel bans, community closures, stay-at-home orders, voluntary or mandatory quarantines, and business-specific retrenchments [7], to combat the negative effects of COVID-19. In Japan, a state of emergency was declared on April 7, 2020, in seven prefectures, requesting people to refrain from going out, and its scope was expanded to the whole country on April 16 [1]. A state of emergency empowers governors in affected regions to call for restrictions on movement and commerce while offering minimal ways of enforcement [22].

1.1 Current State of Japan’s Tourism Industry

First, we present the current condition of the tourism industry in Japan, comparing the situation before and after the outbreak of the COVID-19 pandemic. The annual number of hotels and guests in 2020 decreased by 48.9% compared to the previous year. The rapid spread of this infectious disease has prompted people to rethink and change their lifestyles in several ways [27]. As a result, online tourism emerged as a new form of e-tourism, which involves the use of information and communication technology (ICT) to experience virtual tourism. Furthermore, the number of monthly Japanese departures has consistently declined by 98.1%–99.8% year-on-year since April 2020. Therefore, the Japanese travel agency HIS, which carries out the second largest number of overseas travel transactions in Japan as well as a high percentage of overseas travel business, also started offering online travel experiences. Among them, “Online Tour,” which uses a web conference system where the host guide visits the site and takes real time communication (RCT), is considered to be a close substitute for group tours. HIS was established in 1980, and as a venture company in Japan’s tourism industry, it has repeatedly undertaken novel attempts, such as providing low-cost airline tickets, establishing hotel business in Australia, and an airline company—Skymark Airlines—in 1996. This research is based on paid online tours, which have already attracted more than 100,000 participants [9].

2 Theoretical Background

2.1 What is “Online Tour”?

There are three types of online tour platforms that can be found in Japan:

  1. 1.

    Video streaming sites, such as YouTube.

  2. 2.

    Web conferencing applications, such as Zoom.

  3. 3.

    Original websites.

The concept of e-tourism is presented as a bundle consisting of three distinctive areas: business management, information systems, and tourism [4]. According to the analysis [13], which aimed to develop and present a conceptual framework of e-tourism system based on the factors and conditions in the emergence and development of e-tourism, it was found that some authors have included characteristics of “electronic excursions, also called virtual, as well as electronic delivery of tourism services” to the concept of “e-Tourism” [15]. The “online tour” we discuss in this paper is therefore considered to be a type of e-tourism because it constitutes a digital excursion with interactive communication, conducted by travel agents as business management (Table 1).

Table 1. Specific examples of online tours using web conferencing systems

Changes in lifestyle have also been observed due to lack of physical travel. In particular, e-tourism, which does not involve going to the site, has attracted attention as an alternative to physical travel. However, the online tours currently available in Japan are considered to be intellectual and/or emotional access to cultural properties (tourism objects) under the International Cultural Tourism Charter [10], but do not satisfy the urge of physical access. Therefore, it is not considered to be a complete substitute for travel.

Compared to existing modes of travel, online tours are considered to have the following three advantages:

  • No travel costs (time and money).

  • All participants can sightsee from the same point of view.

  • Less restrictions on participants (Fig. 1).

Fig. 1.
figure 1

Typical picture of online tours

2.2 Research Purpose

Due to the COVID-19 pandemic and restrictions on movement, travelers are looking for new ways to travel to relieve boredom and anxiety [21]. We focused on new ways of traveling for those who were unable or unwilling to go on actual trips due to COVID-19. Understanding tourists’ experiences and revealing their perceptions based on user-generated content (UGC) can be useful [28]. For this purpose, we collected voluntary non-paid customer reviews available on the HIS website as comments on each online tour offered by HIS. To analyze the huge amount of data, text mining [17, 20] was chosen as our research method to help establish trends and patterns on specific topics. In addition to LDA, a topic model method was proposed for the analysis of UGC, using unstructured data, such as reviews in marketing research [2]. Data mining approach, including text mining approach, has a major weakness: the temporal distribution of individual sequences is lost [25]. Nevertheless, by looking at the ranking of topics created from the topics of multiple corpus separated by attributes, readers can grasp the common characteristic topics of online tours.

The investigation adopted HIS’ online tour as a case study and analyzed it using text mining and topic models. The current status of e-tourism using online conference systems in Japan is explored, and the possibility of the online tour being a new travel style in the future is discussed.

3 Research Methodology

3.1 Data Acquisition

In this study, we retrieved data from the HIS website three times. The HIS website was chosen because it is one of the largest online tour providers and the only website which currently has reviews from participants with their attributes. The data acquisition is summarized in Table 2. The first set of data was obtained on April 17, 2021, and processed, while the second set of data was obtained on April 25, 2021, and used for the analysis of the target locations, using the URL of the processed data. The third set of data was acquired on July 13, 2021, assuming that topic modeling using LDA would be performed.

Table 2. Summary of data acquisition

After the 1st data acquisition, the following steps were performed on the acquired online tour titles and the contents of each page.

  1. (i)

    Because this study focused on new travel methods, knowledge-based webinars (seminars), lessons, fortune-telling, shopping, and English conversations were excluded from the analysis since they were not centered on physical sightseeing.

  2. (ii)

    Domestic travel was excluded because overseas travel was more restricted than domestic travel, and the residence of the participants was unknown.

  3. (iii)

    For the remaining 516 cases, we checked the titles and contents of the websites, added the country names to the data, and assigned region names to the country names by referring to the destinations data in the International Airline Passenger Survey [11] by the Japanese government.

Table 3. Break down of data

In Table 3, 25% of the respondents were male, and 75% were female. In terms of age, the majority of respondents were in their 40s (33%), 79% were in their 30s to 50s (middle generation), 9% were in their 20s or younger, and 12% were in their 60s or older. From the above, it can be seen that middle-aged women are the most likely to participate in online tours and write reviews.

The breakdown of gender and age was almost the same as that in the second time. In terms of the evaluation scores (score5 is good, score1 is bad.), score5 was the most frequent score, accounting for 78%, and score4 and score5 together accounted for 95% of the high ratings, while score1 and score2 were the lowest, accounting for only 1.5%. Therefore, the reviews on the online tours were generally favorable. The largest number of participants (48%) were solo, followed by family (41%), friends, and dates (9%). Therefore, the proportion of single participants to multiple participants is approximately the same, and in the case of multiple participants, it is often families.

3.2 Target Locations for Online Tours

Compared to physical trips, online tours cost less in terms of time and money, and fewer restrictions on participants. Accordingly, it was anticipated that there would be a high demand for remote destinations, unlike physical trip, which has a high demand for nearby destinations. Therefore, the following hypotheses were tested:

  • Hypothesis 1: Actual destinations and online tour destinations tend to be different.

  • Hypothesis 2: The number of tours and that of word-of-mouth comments will increase in distant areas where the actual travel costs are higher.

The two hypotheses were tested by comparing the implementation of online tours (number of online tours, reviews, tours with multiple reviews) with the number of Japanese departures, distance, and time difference (Fig. 3).

Comparison with the Number of Departures

In Table 4, the number of departures and the number of online tours are shown based on the destinations of the International Airline Passenger Survey [11].

Table 4. Number of departures and online tour implementation

First, we thought of testing the ratio of the populations with the χ-square test. However, the Pearson’s χ-square test could not be performed because 0 also has a meaning in our analysis. Thus, normality was checked using the Shapiro-Wilk test. As a result, normality could not be confirmed for all the data. Instead, the Wilcoxon signed rank test, which is a nonparametric test of the difference in the median of the representative values between two corresponding groups, was conducted to check if there was a difference in the representative values of the data. Because the values of the online tours and the number of departures differed significantly, the test was performed after correcting the total values to be the same.

Result

Table 5 shows that there is no difference in the representative values of the number of departures by destination and the data showing status of the implementation of online tours.

Therefore, hypothesis 1 was rejected.

Table 5. Summary of test results using Table 4

Comparison with Distance and Time Difference

A test of zero correlation was conducted to see if the number of online tours, the number of reviews, and the number of online tours with multiple reviews related to the distance and time difference between the capital cities in each country and Tokyo. The Pearson correlation coefficient was calculated.

Result

Table 6 shows that there is a weak negative correlation between the linear distance, the capitals, and the number of online tours. There is also a weak negative correlation between the time difference and the number of online tours.

Table 6. Correlation between distance/time difference and online tour implementation
Fig. 2.
figure 2

Distance from Tokyo and number of online tours scatter

Fig. 3.
figure 3

Number of online tours by time plot difference

3.3 Topic Model

To clarify the impressions and evaluations of the online tour participants, reviews of the online tour were analyzed. The researchers applied a topic model, a probabilistic language model that expresses the process of word generation probabilistically, assuming that each document in a document set is generated based on a potential topic. LDA is a topic model method proposed by Blei [3]. The model assumes that the distribution of topics in each document and the distribution of words in each topic are generated by the Dirichlet distribution. Therefore, in this study, the LDA model is introduced as a data-mining method for online tour reviews. In LDA, the input vector of words is usually the bag-of-words, which is a word occurrence matrix that does not consider lexical relations in the document. In this study, we weighted the bag-of-words by the TF-IDF value, which is an index that considers the frequency of occurrence and rarity of words, to improve the accuracy. The LDA module of Gensim, a Python machine learning library, was used for the analysis. The number of reviews to be analyzed was 2904, excluding webinars, lessons, fortune-telling, shopping, and English conversations. The reviews were subdivided by rating, gender, age, and type of use, and LDA was used to estimate topic models from each corpus.

In LDA, the analyst needs to set the number of topics in advance. Two metrics, perplexity and coherence, were used to determine the number of topics. Perplexity is a measure of the generalization performance of a model and is obtained by normalizing the predicted likelihood of a set of words in a trained model. While perplexity has been used to evaluate many topic models, it has been pointed out that even models with excellent perplexity do not necessarily have high interpretability, and that perplexity may not be appropriate for human evaluation. For this reason, coherence has been proposed as an evaluation index to measure whether the extracted topics are easy to understand [5]. Since the definition of coherence is ambiguous for “ease of interpretation from the human point of view,” many coherence calculation methods have been proposed to improve calculation efficiency and accuracy. In this study, the authors adopted c_v [19], which has the best accuracy among the coherence calculation methods. Figure 2 shows the relationship between perplexity and coherence in Score5. The lower the value of perplexity, the better the prediction performance of the model. Therefore, we set the number of topics with low perplexity and high coherence [16]. As shown in Fig. 4, the number of topics in the score5 example is introduced.

Fig. 4.
figure 4

Coherence & perplexity example score5

Result

The number of topics in each corpus is summarized in Table 7 based on the relationship between perplexity and coherence. The total number of topics covered in this study was 79, out of which three were uninterpretable. A total of 23 different topics were estimated. In Table 8, the most common topic was guide, followed by explanation/question-and-answer, enjoyment, real travel, and telepresence.

Table 7. Number of topics per corpus
Table 8. Characteristic topics

4 Conclusion

4.1 General Discussion

The weak negative correlation between the number of online tours, distance, and time difference can explain the implementation of online tours in accordance with the first law of geography: “everything is related to everything else, but near things are more related than distant things” [22]. This law translates into the concept of “distance decay” where demand peaks near the source and decreases with increasing distance [14]. The linear distance between the capital and the number of online tours, and the time difference between the capital and the number of online tours showed weak negative correlations, whereas the time difference showed a stronger correlation than the distance. Therefore, the time difference was considered more important than the linear distance for online tours using RTC technology. The comparison with the target destinations of online tours did not show a different trend towards the target destinations of real trips, and the number of online tours had a weak negative correlation with distance and time difference. The number of online tours was negatively correlated with the distance and time difference. However, in the topic model, five topics related to world tours, including multiple countries, were identified. This phenomenon might suggest that a reduction in travel costs is also important.

According to the topic model, the guide was considered to be the most important component of the online tour because the topic was most frequently mentioned. In the same way as the study on the development of an English tour guide project in the context of cultural tourism in Taiwan [24] pointed out, the quality of the guide is important for the success of tourism as well as online tours.

In terms of the impact of VR on impulsive desire to visit a destination, higher telepresence reduced participants’ virtual distance by 45%, enhanced affection by 62%, and increased impulsive desire by 75% through emotional processes [12]. Therefore, although the online tour was not VR, if it could provide telepresence, it was expected to increase the users’ impulsive desire for the destination.

The topics of communication environment, filming, and YouTube were seen in Score3, and video/image quality and communication environment were seen in Score2&1. In the online tour, the image quality was lower than that of YouTube or TV because of the specification of the application. In addition, because this is a real-time communication, the image quality changes depending on the communication situation and the performance of the device; therefore, the image quality may not meet the participants’ expectations.

4.2 Contribution, Limitation and Recommendations for Future Research

As online tours are relatively new business models, they are constantly changing. This study analyzed only one company and we just dealt with the temporary information on that website. In addition, under the influence of COVID19, online tours are not always prepared to cater to the expectations of travelers. For example, on the day of the first data acquisition, the lockdown was still in effect in Germany [6], and there were no online tours available. Whether online tours will be truly established in the tourism industry as a new travel style and marketing tool for tourists can only be determined after the pandemic is over. To keep track of the implementation of online tours in the future, research on e-tourism will be important, because it will record the new marketing and profit-making business in the tourism industry, just as guidebooks and social media marketing appeared in the past.

The biggest challenge of this study was the discrepancy between the characteristics of the Japanese language and the method of analysis. Morphological analysis divides the negative phrase of “not good” into “good” and “not” in the review. As a result, the negative word “not good” is counted as a positive word “good” and a negative word “not” in the process. Therefore, the words “good” and “fun” appeared in score2&1 as words with high probability. In addition to these words, the word “no” also appeared as a high probability word. Having read and confirmed the reviews in advance, the researcher could tell the true message of the comments as “the signal is not good”, “the picture quality is not good” or “[the travel was] not fun”. Therefore, the research should take into account the words “not good” and “not bad” in advance.

Fortune-telling and shopping (Live Commerce), which were excluded from the research, are expected to become established as online tourism experiences in the future. The evaluation of the differences between in-person visits and online experiences should be explored. This study focused on group tours, but if online tours could be considered as a substitute for individual or actual travel, it would be necessary to conduct a survey including customized tours according to the wishes of users.

Since guide and telepresence were mentioned among the topics representing online tours, it is conceivable that tours that include communication with a guide will appear in the future in virtually constructed sightseeing areas such as VR Chat and Second Life. The use of HMDs in such tours may provide a higher sense of realism than the current flat display tours and enable physical access through somatic senses. This will create a new market, which, in turn, will affect the existing market; hence, keeping a close watch on them is necessary.

The main contribution of this study is that the possible characteristics of online tours the host guide goes to the site and takes RTC by using a web conference application were estimated from the text mining approach. Furthermore, the weaknesses of the text mining approach due to its linguistic characteristics of Japanese were suggested.