Introduction

The utilization of the internet is growing from day to day. The immense growth of internet users has opened the new research areas for the practitioners, researchers and government in various fields including in tourism also. In last 25 years, the number of internet users has become from 0.4% to 59% of the world populations (Digital, 2020; Navío-Marco et al., 2018). Nowadays, social media platform has become a popular choice to the internet users to share their reviews about the products and services (Steven & Hoffman, 2012). However, measurement of customer experiences from services becomes a challenge since services are intangible (Pleger, 2000). This makes measurement of service experience a major challenge especially if the customer journey is elongated and there are provisions for multiple touchpoints during the service consumption process (Zomerdijk & Voss, 2010). This challenge is one of the critical areas of service science we hope to contribute with our study in general and in tourism industry in particular. The immense growth of the tourism industry plays a vital role in the world’s economy. Tourism industry not only creates a huge amount of revenue in the world’s economy but also generates livelihood to the people (Chatterjee et al., 2018). According to the World Travel and Tourism Council (WTTC), the tourism industry has contributed 10.4% of the world’s GDP, 319 million employments, and 1.65 trillion exports (Manzo, 2019). According to the World Tourism Organization (UNWTO), the revenue from the international tourists has been increased from $ 811 billion to $ 1.65 trillion in between 2005 to 2018 (UNWTO, 2019).

Among services industries, the tourism industry is fairly complex since the overall service experience is affected by experiences and encounters with different service providers such as hotels, restaurants, local cultural interactions, travel, shopping, online ticketing, online information guides, and experiences during sight-seeing. As compared to other industries, the tourism industry highly depends on information and customer satisfaction (Yu & Goulden, 2006; Wang et. al., 2009). Therefore, the quality and the context of the information that the tourism industry provides to tourists, play a crucial role. Hence, tourists utilize that huge information (Grover & Kar, 2017; Johnson et al., 2021) before going on their trips. Involvement of digital information is reducing the gap between tourism destinations and tourists (Kim et. al., 2004; Assaf & Tsionas, 2018; Navio et. al., 2018). However, digital information also affects service expectations which subsequently affects service experience in tourism.

The number of internet users is increasing day by day, and users share their views on the tourism’s product and services on that platform. Since the availability of digital information, tourists consider that online information to decide before going on the trip and so that it affects the tourism services and the product indirectly. In addition, 80% of the tourists use online review about the destination to choose the destination for the trip (Almeida-Santana & Moreno-Gil, 2017; Bell, 2016). Therefore, social media data and online review about the tourism product is the first choice for the researchers and government for the tourism analysis. Nowadays, internet users share information in terms of photographs, text, or videos (Yoo et. al., 2009; Yu, 2012; González-Rodríguez et al., 2021) referred as a user-generated content (UGC). Internet users post text on social media which contains information about tourism destinations (Serna et. al., 2013; Serna et. al., 2016) their complaints (Maurer & Schaich, 2011), demographic (Fujii et. al., 2017), and tourist preferences (Marrese-Taylor et al., 2013). Therefore, user-generated content (UGC) plays a vital role in travelling process and consumption of the tourism product in tourism industry (Nezakati et. al., 2015; Wen et al., 2020; Kushwaha et al., 2021; González-Rodríguez et al., 2021). Tourists use tourism-related UGC on social media such as Facebook and Twitter to set their travel decisions and plans (Xiang & Gretzel, 2010). When tourists get recommendations about the destination from social media by their friends and family, it inspires the tourists the visit that destination in the future and seems to be trustworthy information (Yoo & Gretzel, 2011). Mining UGC in social media enables researchers to analyse honest signals from consumers who are unaware that their feedback is being studied (Kar & Dwivedi, 2020). Further, using UGC, it is possible to measure and analyse service experience, just after the service encounter or service consumption, which otherwise becomes difficult to capture by the researchers (Kar, 2020).

Tourism is an industry that contains a vast amount of travel information; therefore, it is exciting to know about the variations in consumer behaviour and technologies. This interaction of tourism with digital technologies is often called digital tourism (Adeola & Evans, 2019). Access of information through the internet plays a vital role in the information exchange among tourists, transportation services, hotels, government organizations, and destination marketing organizations (Ruzic & Biloz, 2010). Therefore, this information also impersonates a notable role in deciding to choose the destination for the trip (Narangajavana et. al., 2017). A trip is made by the tourists voluntarily and for experiencing new things to complete personal desire, not because it is mandatory for everyone, but because they want to experience new things which impacts the experience based on the consumption process (Prebensen et. al., 2018). Tourist attractions and destination images are driving factors for most trips (Karayilan & Cetin, 2016; Suanmali, 2014). Word of mouth and word of web surrounding tourist attractions and destinations motivate tourists to undertake the travel (Holloway & Humphreys, 2019). Tourists choose known or unknown destination through good advertisement with good travel experiences or word-of-mouth publicity (Lohmann & Netto, 2016).

Further, tourism industry is growing due to the market globalization. In parallel, it is also true that development of new technologies and intensive competition not only give opportunities to grow but also increase some unintended consequences (Hollebeek & Macky, 2019). Researchers believe that tourism industry comes on top positive in adopting the technologies (Le et al., 2019; Hultman et al., 2015). Besides, tourism service providers use branding strategy to show the uniqueness of the tourism destination in the highly competitive global environment. Therefore, the number of tourism destinations is continuously increasing because of this development strategy and these technologies are helping tourists to make the tourism experience unforgettable (Kumar et al., 2019). During the trip, tourists experience all tangible (room, staff members of hotels, etc.) and non-tangible (behaviour, security, etc.) products and services at the tourism destinations. These are just consumable things and tourists do not remember them for a long period. Besides, the experience of the tourism products and services keeps in tourist’s mind long-lasting (Le et al., 2019).

The literature review on customer experience also indicates that very few studies have been undertaken on customer experiences (Becker and Jaakola, 2020). In this background, the focus of this study to find those factors that are responsible for the overall service experience of the tourists during his visit to India. The service experience can be defined as communications with many entities, i.e. the industry and their employee, tangible services, and other consumers (Berry et. al., 2002; Meyer & Schwager, 2007; Voss et. al., 2008). This study uses a novel research methodology to achieve this research objective by mining honest signals from service consumers based on their service encounters. This study is undertaken by mining the tourist’s conversations about the usage of tourism products and services to measure customer service experience (CSE). This user-generated content (UGC) has been analysed using a combination of sentiment mining and topic modelling (Zhang et. al., 2017; Kar, 2020). Then, these identified topics have been mapped to constructs within existing literature identified from service science and tourism services. The relationship between these constructs and CSE has been validated using inferential statistics. Therefore, this study has been undertaken to address the following research questions.

RQ1: What are the factors which digital tourists discuss as being critical for impacting customer service experiences during service consumption?

RQ2: How do different factors faced during service encounters vary while impacting customer service experience in tourism?

The organization of the remaining part of the study is as follows: Sect. 2 describes the related literature review of tourism and social media. Section 3 consists of research questions and hypotheses for theory building for measuring tourism service experience. Section 4 shows the research methodology. Section 5 demonstrates the findings of our study. Section 6 includes the implications of the results, limitations of the study, and future directions. Subsequently, at the end, the study is concluded in Sect. 7.

Literature Review

In this digital era, most of the customers are taking help from social media sites for sharing their ideas (like WOM), access pieces of information, make tourism experience, and product purchase decisions. Among these customers, 60% of customers use social media platform to share their ideas with others, 70% of customers use social media platform to access information, 49% of customers use social media platform to taking product purchase decision, and 45% of customers use social media platform for word of mouth (Bravo et al., 2021; Kim & ko, 2012; Tan & Lee, 2019) The internet is growing continuously and efficiently accessible via smartphones and tablets, thereby making information accessible for all customers.

Nowadays, all the information related to the tourism is digitally available, and travellers are self-dependent to do the planning and building the trip. Travellers not only access online information about tourism but also obtain the feedback of other travellers who are sharing their experiences (Bi et al., 2020; Zeng & Gerritsen, 2014). Tourists have access to share their experiences (Xiang & Gretzel, 2010) on an online platform (Facebook, Twitter, and Instagram) (Boyd & Ellison, 2008; Munar & Jacobsen, 2014; Volo, 2010) and give recommendations to other travellers (Bjørkelund et. al., 2012; Confente, 2015). VirtualTourist, TripAdvisor, Expedia, and lonely planet are the most successful professional platform which provides valuable information related to travel and tourism (Akehurst, 2009; García-Pablos et. al., 2016).

Airport, restaurants, and hotels (Kotus et al., 2015) are the most accessed infrastructure and facilities that tourists access these tourism facilities most frequently. Transportation (Zhong et al., 2019) facility helps to connect scenic places (Mou et al., 2020) at the destination. In addition, the attraction of scenic spots and their accessibility improves the revisit intention in the tourists (Qian et al., 2021). Nowadays, the safety and security of tourists are attracting researchers in the tourism context (Mawby et al., 2021; Preko, 2021). In addition, the United Nation World Tourism Organization (UNWTO) has proposed that all countries should prepare a policy for the tourist’s safety and security (WTO, 1991). Due to the pandemic, tourists do not want to travel and stay in hotels at the destinations (Gursoy & Chi, 2020). Therefore, cleanliness is playing an essential role in the hospitality industry to overcome the fear in the tourists in this pandemic situation (Shin & Kang, 2020). Prices of tourism products also are affecting by the pandemic. Price is directly attached to hospitality and outdoor tourism activities (Falk & Scaglione, 2021; Yang et al., 2016).

All communications related to the tourism’s services or goods towards the customers through the internet technology are called e-WOM (Kankhuni & Ngwira, 2021; Pourfakhimi et al., 2020; Yen & Tang, 2019). Online social media such as Twitter, Facebook, and Google Plus also provides a platform for generating electronic word of mouth (e-WOM) (Trusov et al., 2009; Cantallops & Salvi, 2014; Filieri et. al., 2015; Kanje et al., 2020; Bravo et al., 2021). Informal communication regarding the service consumption or about the product directed to the consumers or their sellers is called word of mouth (WOM) (Westbrook, 1987). Even though tourists or travellers take advice from private sources before going to trip, still online social media has more trust as compare to traditional WOM because source considers independent from commercial influence (Litvin et. al, 2018). Therefore, social media becomes a major source to get trustworthy opinions from friends, relatives, and colleagues (Cantallops & Salvi, 2014; Xiang et. al., 2015; Pop et al., 2021). Nowadays, online social media and data mining tools and techniques are playing an essential role in the field of tourism sectors (Xiang et. al., 2015). Frequently used operations in the application of social media analytics are data collection, data cleaning, and mining processes. Understanding and evaluation of the outcomes of social media analytics in the area of tourism is a crucial process for getting insights (Feldman, 2013). Tourism experience is produced by the social and cultural activities of travellers or tourists (Larsen, 2007; Tussyadiah & Fesenmaier, 2009). Thus, online social media plays a notable role in developing tourism experience.

Social media have the power to create a new experience after the trip by adding some meaning in the trip (Tussyadiah & Fesenmaier, 2009; White & White, 2007). Customer experience plays a vital role in business success and competitive advantage (Kim & Chen, 2019; Lemon & Verhoef, 2016). The last few studies indicate that positive customer experience influences tourist’s satisfaction, emotions, and behavioural intention (Tsaur et. al., 2007; Lee and Cheng, 2012; Ridhwan & Hargreaves, 2021). Customer experience is a conceptual concept that experience when a customer has done interaction with a product, organization, and parts of it (Shaw & Ivens, 2005). In the tourism context, customer experience creates an advantage as well as sustainability for the tourism destination (Song et. al., 2015; Rather & Sharma, 2019). Customer service experience (CSE) tells about the quality of the service provided by the service providers to the customers (Bateson, 1995). Customers do not buy goods or services; rather, they buy the experiences and benefits those services and goods provided for customers (Gronroos, 2001).

Model Development

This study attempts to build theory inductively, through two independents but connected studies. The first study is an exploratory study, which helps in model development, based on an inductive approach. This is followed by a confirmatory study which helps in model validation.

Exploratory Analysis

For the first study, we used an exploratory data analysis technique to analyse the user-generated contents (UGC) on Twitter platform to find out the potential constructs and create a hypothesis for the improvement of customer service experience of the tourists at the destination. The factors on which we are developing the hypothesis have basically identified by the text mining of the large volume of Twitter text data. In this exploratory analysis stage, we obtained the data from the social media platform (Twitter) using hashtags and @mentions related to Indian tourism and heritage. For further analysis, data should be clean and ready to use. Therefore, we applied a pre-processing technique to the data, which removes stop words, numbers, special characters, URLs, images, and videos from the data. We did not consider retweets for the analysis because they were not adding extra meaning to the data. More details on the exploratory study are elaborated in Research methodology section, and the outputs are highlighted in the findings.

In the next stage, we analysed the UGC using topic modelling for text summarization and prepared clusters on highly associated words in the topics. After pre-processing, we did topic modelling by using latent Dirichlet allocation (LDA) algorithms. We got 20 topics with 20 terms from Twitter’s text data. Further, we developed a network diagram on the topic modelling output. Basically, the network diagram demonstrated the association among the words in the topics and creating clusters of highly associated words. We have identified our ten potential constructs from the clusters of highly associated words from the topic models and tried to create hypothesis on it during the model building stage, based on inductive theory building. The output of this stage is illustrated in the findings section elaborately.

Model Specification

Here, we proposed a theoretical model based on the inductive method of research in which we are revisiting the data carefully and identifying the variables from the summarization of the social media conversations. On the basis of text summarization methods, ten independent variables could be determined from the previous literature. These independent variables were accessibility, accommodation, assurance, attraction, cleanliness, hospitable, Jugaadu service flexibility, cleanliness, hospitability, price, restaurant quality, and security as illustrated in Fig. 1. An overview of these factors and how they have been identified from existing literature is illustrated in Table 1.

Fig. 1
figure 1

A holistic framework for the evaluation of customer service experience

Table 1 Factors affecting TSE identified

Customer Service Experience (CSE) CSE is the dependent variable in our study and the selection of antecedents of CSE came from UGC after applying text mining methods. This is an inductive research whereby the dependent variable was derived using sentiment mining approaches within text mining. This CSE is defined by us as the intangible emotional state of mind towards the service immediately after the encounter or during the consumption of the service processes during different touchpoints. It is measured by understanding the different sentiments surroundings the topics which are generated. Topics are generated through text summarization using the latent Dirichlet allocation (LDA) method on 7,91,804 tweets that have been collected across India. These tweets have been segregated into five zones of India, and then, topic modelling is applied. It is not possible to document CSE by sharing a survey questionnaire with the tourists after he encounters a service because the researcher is distanced from the tourist at the point of service encounter. However, when the tourists share their experiences on social media sites, the data are available and documented in these social media platforms, and it becomes an honest expression of describing the service encounter (Kar, 2020). By undertaking this methodology using social media analytics, then it is possible to develop a better understanding of the service experience of the individual customer during the service encounter. This approach mines honest signals from customers since they are unaware of the fact that their responses are being used to understand a phenomenon better (Pentland, 2008).


Accessibility Accessibility refers to accessible tourism services which can be accessible by all people in all tourism destination (Darcy & Dickson, 2009). Accessibility should include accessible online information, transportation, accommodation facilities, local information, and professional service providers (Toth & David, 2010). The perceived quality and availability of the transport (Mo et al., 1993) affect the behaviour intention, tourism experiences, and revisit (McKercher, 1998; Hanqin & Lam, 1999; Thompson & Schfield, 2007) at the tourism destination. If the tourists perceive that accessibility is well enough at tourism destinations (Cole et al., 2019), then it should have a positive impact on CSE. As illustrated in Fig. 1, we propose H1:

H1: Accessibility of the destination has a positive impact on the CSE.


Accommodation When tourists plan for the trip, accommodation becomes the basic needs at the destination (Aktaş et al., 2010; Marrese-Taylor et al., 2013). Accommodation could be resorts, hotels, and other accommodations such as homestay, cottages/commercial homes, and hostels/apartments (Hosany et al., 2006; Hui et al., 2007; Xia et al., 2009; Hsu, 2012). Accommodation is a prime factor that provides services and facilities to the tourists that influence the tourist’s choice and selection process of tourism products (Chu & Choi, 2000). Service quality and accommodation facilities help to attract tourists to a destination, inspire for WOM recommendations (Nessel et al., 2021), and improve the intention of tourists to revisit the same place after the consumption of tourism products. In addition, there are several other attributes of the accommodation (Kim et al., 2020; Nessel et al., 2021) that make tourists happy and satisfied, such as the cleanliness of the dining area, bathroom and room, flexible check-in and check-out, hospitality, and location. Continuing this discussion that tourists who use these services likely to get more considerable CSE. As illustrated in Fig. 1, we propose H2 as follows:

H2: Accommodation has a positive impact on the CSE.


Assurance Perception of quality and assurance is a tourist’s attitude, which comes from a comparison between the perception of actual services and the expectation of services (Berry, 1991; Mackay & Crompton, 1990; Rossi & Tarquinio, 2017). Assurance establishes the confidence in tourists that there will be no difference between in perception of actual services and the expectation of services, and the performance of the quality-related activities well (Juran, 1993). SERVQUAL is the most famous quality measurement tool which can measure the relevant dimensions of the perceived quality (Parasuraman et al., 1985, 1988). Tourist generates expectation for tourism product through word of mouth, advertising, and previous experiences (Oliver, 1997). If the expectation meets or is higher than the actual service performance, then tourists get better CSE. As illustrated in Fig. 1, we set the following hypothesis H3:

H3: Assurance about the services at the destination has a positive impact on the CSE.


Attraction Attraction is a place where tourist wants to visit. Cultural events (Amoiradis et al., 2021), music, monuments, and gastronomy are significant factors of the attraction which influence the tourist towards the destination (Amalu et al., 2020; Kim & Chen, 2010; Kruger et al., 2013). Behaviour, habits (Herbert, 2001), festivals (Gannon et al., 2019), and lifestyle of the people can be defined as culture also play a significant role in measuring tourism service experience and revisit of the same destination (Funk & Bruun, 2007; Zhuang et al., 2019). If the destination is attractive for the tourist and giving experiencing about the local culture, then it leaves a significant impact on CSE. As illustrated in Fig. 1, we set the following hypothesis H4:

H4: Attractions at the destination have a positive impact on the CSE.


Cleanliness Globally, cleanliness of the city (Ozturk & Hancer, 2008; Alegre & Garau, 2010), restaurants, and accommodation is the current topic in the hospitality as well as also in tourism industry. Nowadays, the whole world is facing a coronavirus problem that is spreading human to human. Therefore, cleanliness (Magnini & Zehrer, 2021) is touching every area of the hospitality (Alonso et al., 2020) and tourism industry, including transportation, food services, and accommodations. Cleanliness of the restroom of the accommodation, visit sites, and restaurants is the primary concern area in the hospitality and tourism industry (Bagnera et al., 2020). If the area of the restrooms is not clean thoroughly, then diseases like coronavirus can spread among the tourist (Hu et. al., 2009). If the cleanliness is high at the destination, CSE is also positively impacted. As illustrated in Fig. 1, we set the following hypothesis H5:

H5: The cleanliness of the destination has a positive impact on the CSE.


Hospitality Hospitality is the way to show the welcoming and smiling nature of the residents to the tourists in their community (Megeirhi et al., 2020; Aerts et al., 2018) (i.e. hotels, apartments, and guest houses). Friendliness and hospitality of the local people of the destination lead to the tourism service experience of the tourists (Henry, 2018; Jenkins, 1999; Yoon & Uysal, 2005). Excellent services from the service providers and healthy discussions with the local peoples and other tourists leave a significant impact on the customer service experience of the tourists (Prebensen et al., 2013). As illustrated in Fig. 1, we set the following hypothesis H6:

H6: Hospitality has a positive impact on the CSE.


Jugaadu Service Flexibility (JSF) JSF is a new construct that we are able to define based on the data obtained from UGC. As of now, the existing literature does not have this construct either in services literature or in tourism literature although the flexibility literature in a broader connotation has captured the essence of this construct. JSF is the openness towards modifications in service delivery demonstrated by the service provider at the tourist’s point of service during the service consumption journey. For example, if a tourist is travelling by bus, and he has excess baggage or an additional co-traveller who is travelling with him, but the ticket has not been booked accordingly, JSF is demonstrated whether the service provider creates a temporary arrangement to allow the tourist to meet his immediate needs (Iravani et al., 2005; Haldar et al., 2016). Similarly, in a hotel, a tourist may arrive early or leave the room a few hours late. The hotel manager may facilitate such service requests in exchange for a small additional service fee (Pleger, 2000; Kar & Rakshit, 2015; Prabhu & Jain, 2015). However, it may not have been planned for while the hotel was booked via some online platform like MakeMyTrip or Oyo where service level agreements are somewhat fixed and less open towards very high levels of personalization (Yu et. al., 2013). Across tourists, such requests may arise frequently, and so such service flexibility is planned for by the service provider on a regular basis. As of now, JSF is not defined or identified in tourism literature or in-service science literature and extends the concept of Jugaad in management literature particularly defined in the Indian context. It is further important to note that while Jugaad and Service flexibility are established somewhat in existing flexibility literature (Victorino et. al., 2013; Kar & Rakshit, 2015; Shalender & Singh, 2015; Wudhikarn et. al., 2015; Trento et. al., 2016; Sushil, 2016; Halder et. al., 2016; Sharma et. al., 2016; Shalender et. al., 2017; Tan, 2021), the interaction between these constructs collectively in defining a particular service orientation from the service provider as a factor has never been established which impacts the overall experience of the service encounter during the journey. Hence extending this line of initial findings based on data, we felt that the JSF would also impact customer service experience, and we set the following hypothesis H7:

H7: JSF has a positive impact on the CSE.


Price Price shows the cost of the tourism product. Tourism products could be tour package, transportation, entertainment, accommodation, and attractions. Tourism product is a collection of psychological and physical experience which earns by the tourists during the trip. Tourism products could be including different services such as tourists’ attractions, transport, entertainment, and accommodations, which makes tourism service experience of the tourists. Tourists expect to get tourism service experience during the trip as they spend a certain amount of money for the experience (Moon & Han, 2019). Therefore, the price is the prime factor that evaluates the actual tourism service experience of the tourist at the destination (Lawson et al., 1995). Hence exceeding this argument, we experience that price is inversely proportional to the overall customer service experience. As illustrated in Fig. 1, we set the following hypothesis H8:

H8: The price of services has a negative impact on CSE.


Restaurant The Restaurant is the place where tourist can enjoy the local and global food. Food is a necessary thing to sustain in life, and it could also be an attraction and reason to make a trip to the same destination for the tourists (Henderson, 2004). Food and beverage play an essential role in the loyalty and revisit intention of the tourists (Yuksel & Yuksel, 2002). Familiar food at remote destinations helps them to sustain daily life and feel like homestay at the destination (Quan & Wang, 2004). Ignoring the proper arrangement of the food and beverage will generate a negative perception about the destination in the tourists and destination will face economic loss (Yuksel, 2000). Hence exceeding this discussion, we felt that the restaurant would also impact customer service experience. As illustrated in Fig. 1, we set the following hypothesis H9:

H9: The quality of the restaurants has a positive impact on CSE.


Security In the tourism industry, the safety and security of the tourist are the essential factor that affects the overall customer service experience. Security and safety play a vital role in choosing a destination. When a tourist visits the destination and faces some security threats such as terrorism (Sönmez, 1998; Hamadeh & Bassil, 2017; Walters et al., 2019; Seabra et al., 2020; Ulqinaku et al., 2020), crime, and corruption, tourists will not only to drop the idea for revisit that place, but also make negative recommendations to other tourists (Fourie et al., 2020). Tourists create positive WOM on the social media platform if they receive positiveness about the tourism destination. In addition, it is also true that tourists check negative factors such as tricksters, beggars, terrorism, and thieves about the destinations before going to the trips (Khuong & Phuong, 2017). Tourists want to visit safe places. If tourists feel the risk of death or injury, then tourist does not select those sites for a visit (Neumayer, 2004). Security is a way to protect tourists from various risks {functional and financial risk, time and social risk, physical and psychological risk} (Tasci & Boylu, 2010). Hence exceeding this discussion, we felt that security would also impact customer service experience. As illustrated in Fig. 1, we set the following hypothesis H10:

H10: The security at the destination has a positive impact on CSE.

Model Validation

For model validation, we tested the hypothesis and calculated the significance value of ten potential constructs on CSE of the tourists in five zones of India based on inferential statistical analysis. In this study, we divided India into five zones such as North, South, East, West, and North-East. We applied a multiple regression analysis on the model to test the significance value (p-value) of each construct. Model validation for big data-driven research enables us to establish relationships about the significant constructs (Kar & Dwivedi, 2020). In section 3.2, we have created a model which we used it for validation within each zone. Statistical validation for each zone not only highlights significance of each factor, but also establishes why and how such factors impact CSE for the tourists.

Research Methodology

This study focuses on to be analysing the factors that are affecting Indian tourism. Therefore, twitter is the primary source of data for our analysis. Twitter is having a vast amount of data related to Indian tourism in the form of a hashtag, mentions, and user-profile. The user-profile of twitter can give you a maximum of around 3200 tweets at a time, but hashtag and at-mentions can provide the maximum of the latest 15-day tweet via the python API or R languages. Social media analytics (Rathore et al., 2017) has been used to extract and draw insights from the user-generated content (UGC). In research methodology, sentiment analysis is applied to know the emotion of the tourist towards Indian tourism; topic modelling has been used to extract the discussions among the tourist across India and in the end, we used multiple linear regression because we have ten independent variables and one dependent variable in the study. Multiple linear regression tells that how strong the relationship between independent variables and dependent variables. It also tells that how much percentage of the dependent variable explains by the independent variables. In the case of multiple linear regression, it is important to check the correlation between the independent variables. If two independent variables are highly correlated with each other, then we can use any of them in the multiple linear regression. Multiple linear regression helps to check the overall fit of the model.

Data Collection

Data collection from Twitter as data source also has challenges, which we attempted to address: Twitter API is a Global Public Stream API. It gives a smaller stream of data up to 1% of the publicly available tweets of the latest 07 days. Therefore, we must download tweets from Twitter at regular intervals of 07 or 15 days. In the Twitter API, it is very difficult to set language-based filtering. If Twitter introduces the community-based filtering in the Twitter API, then it will increase the quality of the Twitter text data and comparability. Further, Twitter API supports only Unicode Transformation Format (UTF-8)-8. Other than this format, we must convert it into UTF-8 before processing the tweets. When we select the hashtags to download the tweets, it is very difficult to check the authenticity of the hashtags. Solution for this problem is that we check data under hashtags manually or programmatically (such as Python or R) using topic modelling or word-cloud.

The data for this study were extracted in India from April 2019 to January 2020 from Twitter over a period of 10 months. The popular hashtags, at-mention, and user-profiles related to the tourism of India have been used to extract the tweets. India is big country with 28 states and 9 union territories. Each state is having its cultural heritage, monuments, and cities, which is famous for tourism. Name of the popular cities which are renowned for tourism and monuments has been used as a hashtag for collecting tweets. Some states are recognized for tourism as compared to other states. Depending on the popularity, some states (such as Goa, Delhi, and Uttarakhand) are having a huge amount of data as compared to other states (such as Andaman and Nicobar, Chhattisgarh, and Dadar & Nagar) on the twitter. Initially, over 1.2 million tweets were collected. However, many of these tweets were such that they would not contribute significantly to our analysis. Such tweets were typically having less than five words, only links or hashtags, only multimedia files, and only retweets. After removing such tweets, 7,91,804 tweets were retained for the final analysis after cleaning after segregating them zone wise, as illustrated in Fig. 2. We grouped states and union territories into five zones, namely North, East, West, South, and North-East, based on the segregation followed by the Ministry of Tourism, Government of India. The north zone contains Himachal Pradesh, Punjab, Haryana, Jammu & Kashmir, Rajasthan, Ladakh, Delhi, Chandigarh, Uttar Pradesh, and Uttarakhand. East zone contains Bihar, Orissa, Jharkhand, West Bengal, and Chhattisgarh. West zone contains Gujarat, Goa, Dadar & Nagar Haveli, Maharashtra, Daman & Diu, and Madhya Pradesh. South zone contains Andhra Pradesh, Tamil Nadu, Telangana, Karnataka, Kerala, Pondicherry, Andaman, and Lakshadweep. The North-East zone contains Arunachal Pradesh, Assam, Manipur, Nagaland, Meghalaya, Mizoram, Sikkim, and Tripura.

Fig. 2
figure 2

Total tweets across India

Data Preparation

After the collection of data, data preparation is the next step because data should be clean for data mining. Data transformation and cleansing are the essential steps for the data mining process, which include normalization, removal of punctuation, stop words, and extra white spaces. Normalization is the process of transforming words into a more generalized form. We know that two same words can be written with different combinations of words like in uppercase and lowercase. Normalization reduces the size of the vocabulary and removes variation of the same words in large volumes of text. Conversion of text from uppercase to lowercase is one of the important normalization techniques. Stemming of words is a technique that converts words into their basic structure. Almost all tweets had tags that were starting from a hashtag (#) and @. These tags were not relevant for the text mining, so we deleted these tags from the tweets. There were a massive number of retweets and duplicates tweets in the data. Duplicate and retweets were not showing a new idea. Therefore, we deleted all retweets and duplicated tweets from the analysis. The data preparation has been done with the help of R language where the English dictionary is available for Natural Language Processing.

Exploratory Data Analysis

Data related to the overall customer service experience who visit India’s tourist places are obtained from twitter. We divided these data into five zones of India, such as North, South, East, West, and Centre. We combined states based on their geographical in which they fall. The description of the distribution of the tweets is illustrated in Fig. 2.

The data analysis was undertaken using text summarization and sentiment mining. In services management literature, text mining has been extensively used for assessing different parameters of services such as quality, engagement, and impact (Kumar et al., 2021). Sentiment analysis was done on the tweets to measure the polarity in discussions among tourists across India using natural language processing (Fazzolari & Petrocchi, 2018) and a semantic approach (Chang & Chen, 2019; Kar, 2020). Here, “syuzhet” package of R language is used for the purpose of this objective. For the classification of the tweets, this package uses the NRC library after the text processing. The NRC library contains words with eight emotions (fear, anger, disgust, joy, sadness, surprise, trust, and anticipation) and negative and positive sentiment of the words.

In this study, since there is no questionnaire, there is no need for applying EFA and CFA for finding the constructs or items, respectively. The identification and measurement of constructs have been done using text mining and natural language processing. Therefore, we identified constructs based on network analysis on topic modelling output and association among the topics. The text summarization for topic modelling was undertaken using the LDA algorithm (Kar, 2020; Kar & Dwivedi, 2020) for the identification of potential constructs for the theoretical model. There were ten constructs identified in our study from the literature review, which affects the overall service experience of the tourist. Each construct is measured by using a bag-of-word technique in which the term frequency of each construct is calculated. The collected data were analysed using R. Sentiment analysis was utilized to know the negative and positive opinion of the tourist towards Indian tourism destination. Topic modelling is used to know the area of discussion of the tourist after visiting the destination. Latent Dirichlet allocation (LDA) (Blei et al., 2003) is a popular algorithm to find out the topics in large datasets of unstructured data. Topics in LDA form a probability distribution over every word found in the dataset based on proximity and association among these words, based on their occurrence. After the output of the topic modelling, it was necessary to find out the interconnection among the topics. A network diagram is a visualization technique that uses a text summarization method to show co-occur topics together. The network analysis among the topics is used to evaluate the criterion variables which are used in the inferential analysis.

While topic modelling could have been done using latent semantic analysis (LSA) and its variants, some inherent shortcomings of the methodology required the adoption of LDA. The preference towards LDA over LSA and pLSA algorithms were because of the following limitations of LSA and pLSA. For the application of LSA, limitations were like for the term occurrence, probability model is not available. Further, LSA results are very difficult to interpret and the method assumes a Gaussian model. Since there is no generative model, LSA selects an arbitrary value for the dimension k. Further, LSA is not able to handle polysemy words. For the pLSA algorithm, some limitations of the method are documented. pLSA has high computational complexity. Further, in pLSA, EM algorithm provides a local maximum, and like LSA, it cannot create an appropriate generative model. Further, pLSA is not able to handle polysemy words. Due to these limitations, the current analysis was undertaken using LDA.

Reliability and Validity

A team comprising of three members independently analysed the cluster of the topics which were identified after the community deduction. Based on the cluster of the topics, revisiting literatures, and then, we started mapping it to the exiting themes (Kassarjian, 1977). Inter-coder reliability was established for the basket of words based on two researchers who worked in the domain. The basket of words were derived from the output of latent Dirichlet allocation. Two hundred and forty-eight decision points were identified whereby initially there was consensus among 212 decision points in terms of factor mapping and thus preliminary consensus was exceeding 85% (Kar & Dwivedi, 2020). Content reliability was established for the rest of the basket of words through consensus where there was difference of initial coding between the cluster of words and constructs identified from literature. Face validity was also established based on the bag of words in each cluster identified based on association among topics. In topics where consensus was not achieved in terms of mapping with existing constructs, the construct was dropped.

Confirmatory Data Analysis

For the confirmatory data analysis, multiple regression analysis is applied on data to investigate determinants, which influence the customer service experience in the Indian tourism context. Multiple regression techniques help us to know the relation between dependent and independent variables (Mason & Perreault Jr, 1991). In this study, there are ten independent variables (accessibility, accommodation, assurance, attraction, cleanliness, hospitality, Jugaadu service flexibility, price, restaurant, and security) and one dependent variable (customer service experience). The dependent variable, CSE, is used to determine the level of service experience of tourists. Guidelines for inferential model validation have been followed from Gefen et al. (2000). The text summarization for topic modelling was undertaken using the LDA algorithm (Kar, 2020; Kar & Dwivedi, 2020) for the identification of potential variables for the theoretical model. After the output of the topic modelling, it was necessary to find out the interconnection among the topics. A network diagram is a visualization technique that uses a text summarization method to show co-occur topics together. The network analysis among the topics is used to evaluate the criterion variables which are used in the inferential analysis. Since the outputs of the topic modelling had high inter-judge reliability and validity, the inferential analysis proceeded with the literature in assuming that there are low multi-collinearity effects, if any, and thus multiple regression analysis is sufficient for inferential analysis.”

Results

To know the perception of the tourists towards Indian tourism, we analysed the tweets by the methods of social media analytics. For this, we grouped tweets into five zones of India and then apply sentiment analysis, network analysis, and inferential analysis to extract insights from tweets. Here, Sect. 5.1 shows the descriptive analysis on the collected Twitter’s data. Section 5.1 explains the answer of the first research question (RQ1) with the help of exploratory data analysis. Further, Sect. 5.2 explains the answer of the second research question (RQ2) with the help of statistical and inferential analysis.

Exploratory Findings

First, we analysed the sentiment scores of each zone data. Figure 3 shows the sentiment scores of each zone. Each zone is having many positive categories as compared to negative and neutral. Each is having a similar kind of opinion in terms of positive and neutral for the tourist destination. In negative sentiment, East and North-East zone contains more negative views about the tourist destination as comparing other zones.

Fig. 3
figure 3

Positive, negative, and neutral sentiments across the India

Figure 4 demonstrates the negative and positive word that uses by the tourists during the discussion on the social media platform. In the discussion, most of the positive tweets contains “beautiful”, “thank”, “peace”, “great”, “love”, “enjoy”, and “best”. It shows that tourists are enjoying their trip at the destination and participating in the events. Tourists are satisfied by the services at the tourism destination which inspires the tourists for positive WOM and revisit the destination. India’s tourism destinations are facing some negativity due to poor infrastructure, crime, and illegal activities. A lot of the negative tweets are contained disgust words such as protest, kill, rape, and attack. Poor hospitality is creating a high level of dissatisfaction in tourists about the destination. Security is the main concern in the discussion among tourists.

Fig. 4
figure 4

Negative and positive words in tourist’s discussion across India

Through topic modelling, we provide an overview of the large volumes of UGC through a single word-cloud. With the help of a word-cloud package in R programming, we prepared a word-cloud of 20 topics for India. After a deep study of Fig. 5, it is perceptible that "people", "visit", "beautiful", "temple", “time”, “day”, and "place" are the most discussion words among the tourists on the social media platform across India. It indicates that tourists consider occasions and ease for choosing the destination for a trip. They are enjoying their trip at the destination which clearly affects the overall customer service experience. Hospitality is also the main point when tourists visited the destination because, in the discussions among tourists, they are discussing about the people.

Fig. 5
figure 5

Word-cloud of topics across the India

We prepared a network diagram on LDA output with 20 topics and 20 terms with each topic. Twelve clusters were identified in the network analysis. Figure 6 demonstrates 12 clusters with colour-coding. These 12 twelve clusters were indicating our nine independent variables, which were the analysis factors for Indian tourism. These factors were accessibility, price, assurance, accommodation, attraction, hospitality, JSF, cleanliness, restaurant, and security. The other three clusters were consisting of outlier words which were not connected to the main cluster of words across their topics.

Fig. 6
figure 6

Network diagram based on output of LDA algorithm

Model validation

To know the predictive power of ten factors as independent variables for measuring CSE as a dependent variable, we applied multiple regression on extracted data from twitter. Table 2 shows the results of the multiple regression analysis and demonstrates the value of the standardized beta coefficient, significance value, and t-value of the model for each zone of India and for India itself.

Table 2 Results of segregated regression analysis

To check the effect of these ten credential variables on the dependent variable, also we applied multiple regression on the complete dataset (collection of all zone’s dataset). Results showed that accessibility, accommodation, attraction, cleanliness, hospitality, Jugaadu service flexibility, price, and security are statistically significant on customer service experience because the significant value is less than 0.05. Therefore, hypotheses H1, H2, H4, H5, H6, H7, H8, and H10 were accepted. Hypotheses H3 and H9 were not statistically significant on customer service experience because their significant value was greater than 0.05. Therefore, hypotheses H3 and H9 were not accepted. The acceptance and rejection of the hypothesis are shown in Table 3.

Table 3 Model and hypothesis summary

Independent Variables: accessibility, accommodation, assurance, attraction, cleanliness, hospitality, JSF, price, restaurant, and security

Dependent Variable: Customer service experience (CSE).

All the ten variables (accessibility, accommodation, assurance, attraction, cleanliness, hospitality, JSF, price, restaurant, and security) affect the customer service experience. Data about the tourism of India are grouped into five zones (North, East, West, South, and North-East). Therefore, the effect of variables on the customer service experience depends on these zones. Table 4 demonstrates the value of R, R2, and adjusted R2. The relationship between customer service experience and criterion variables can be considered strong in each zone (East {R = 0.682}, North {R = 0.554}, North-East {R = 0.715}, South {R = 0.727}, and West {R = 0.771}) as well as in complete dataset (India) (R = 0.673).

Table 4 Statistics for the inferential model explainability

Discussion

The results of the model validation using multiple regression analysis are extremely insightful. Accessibility and assurance are not statistically significant on the customer service experience in East, North-East, South, and West zones because the values of significance in these zones are higher than 0.05. So, the impact of these factors on customer service experience is low, thus not supporting H1 and H3. Further, accessibility and assurance are statistically significant on the customer service experience in the North zone because here, the value of significance is less than 0.05, thus supporting H1 and H3. Accommodation is statistically significant on the customer service experience in East, South, and West zones because the values of significance in these zones are less than 0.05, thus supporting H2. Besides, accommodation does not have a significant effect on the customer service experience in North and North-East zones because the values of significance in these zones are higher than 0.05. So, the impact of this factor on the customer service experience is low, thus not supporting H2. The attraction is statistically significant on the customer service experience in East, North, and West zones, thus supporting H4. However, attraction is not statistically significant on the customer service experience in North-East and South. So, the impact of this on customer service experience is low, thus not supporting H4. Cleanliness is statistically significant on the customer service experience in East, North-East, and South zones, thus supporting H5. However, cleanliness is not statistically significant on the customer service experience in the North and West. So, the impact of this on customer service experience is low, thus not supporting H5. Hospitality is statistically significant on customer service experience in North-East, South, and West zones, thus supporting H6. However, hospitality is not statistically significant on the customer service experience in the North and East zones. So, the impact of this on customer service experience is low, thus not supporting H6. JSF is statistically significant on the customer service experience in East, North, South, and West zones, thus supporting H7. However, JSF is not statistically significant in the customer service experience in the North-East. So, the impact of this on customer service experience is low, thus not supporting H7. Price is statistically significant on the customer service experience in North, North-East, and South zones, thus supporting H8. However, price is not statistically significant in the customer service experience in the East and West. So, the impact of this on customer service experience is low, thus not supporting H8. The restaurant is statistically significant on the customer service experience on North, North-East, and West zones, thus supporting H9. Besides, the restaurant is not statistically significant in the customer service experience in the East and South zones. So, the impact of this on customer service experience is low, thus not supporting H9. Security is statistically significant in the customer service experience in North, North-East, and West zones, thus supporting H10. However, security is not statistically significant in the customer service experience in East and South zones. So, the impact of this on customer service experience is low, thus not supporting H10. Table 2 demonstrates that JSF in East and North zones, hospitality in North-East and West zones and accommodation in South zone are the most significant factors in CSE.

In this study, we introduce a construct, namely JSF, that is the most influencing factor among all the ten factors in Indian tourism context. JSF is statistically significant in four zones out of five zones that shows an importance of this factor in CSE. In this digital era, most of the tourists search the information about the destination before going on the trip and make arrangements for the journey such as book hotels and tickets to tourism places if applicable. Therefore, flexibility in services inspires tourists for destination loyalty as well as recommendations for the same place (Singh & Shalender, 2014).

Theoretical Implications

Customer service experience has never been modelled by using social media analytics. Inferential modelling using social media data and customer service experience has never been documented. To the best of our knowledge, this is the first study which attempts to do this.

When all ten criterion variables were regressed on the customer service experience, the results reveal a statistically significant relationship between criterion variables and customer service experience according to the zones of India and India itself. In the East zone, accommodation, attraction, cleanliness, JSF, and restaurant are only a statistically significant factor. The strength of the relationship between criterion variables and the customer service experience is 0.682 (R-value). The R2 value of 0.466 means that only 46.6 percentage of the variation of the customer service experience is explained by the criterion variables. In this model, JSF (t = 6.34) is the most statistically significant factor to measure the customer service experience. Similarly, the R2 values for North Zone, North-East Zone, South, Zone West Zone, and complete dataset (India) are 0.307, 0.511, 0.528, 0.594, and 0.452, respectively. The most influential factors in North, North-East, South, and West zones are JSF, hospitality, accommodation, and hospitality, respectively. The result from the complete dataset reveals that hospitality (t = 4.43) is the most influencing factor among the other factors.

Price, cleanliness, JSF, and hospitality are playing major roles in the customer service experience. Price includes the price of food, transportation, services, and accommodation. Price has a negative influence on the CSE. Besides, hospitality is another important factor in Indian tourism, which plays a necessary role in the CSE. The attitude and friendly nature of the local people attracts tourists. Nowadays, the attraction becomes the most crucial factor that influences the CSE. Most of the attractions are human created which can be observed from the architecture, such as monuments, temples, gardens, and festivals. JSF has a positive impact on CSE because it gives extra support to the tourists with the purchased tourism product, e.g. flexibility in check-in and check-out in hotels, no transport costs of extra luggage, etc.

In this study, we have combined social media analytics with multivariate analysis to identify the hidden factors which are necessary to enhance the customer service experience in India. Sentiment analysis and topic modelling are used to get the sentiment of the tourists and the theme of the discussions in the data that are collected from twitter. Our theory is utilizing the power of qualitative analysis to analyse the text data by using content analysis. This approach of converting discussions of the tourist into factors which are validated by the multivariate analysis has not been implemented in the existing literature. Here, we used social media sites like Twitter for collecting the data instead of collecting data from surveys. Therefore, this methodology shows a novelty that is absent in the existing literature. The diversity of the important constructs which impact the CSE is also indicative of the location specificity of the needs of the tourists. Expectations of tourists need to be met by the local governments to promote tourism, and a national mandate would be less impactful in this context.

Practical Implications

In this study, we found that accessibility, accommodation, attraction, assurance, cleanliness, hospitable, JSF, price, restaurant, and security influence the customer service experience. India is a geographically large country with 28 states and 9 union territories. Therefore, the effect of these factors on states and union territory depends on the geographic location of the tourism destination. Besides, we divided India into five zones. Therefore, we calculated the effects of each factor on zone wise. JSF is the most valuable factors among all the factors because it is a significant factor in four zones out of five zones. The implication here is that tourists who go to hotels, restaurants, and tourist places want flexibility in purchased services such as in check-in and check-out services at hotels. It may be a recommendation to the manager of the hotels, restaurants, transport authority, and tourist places to provide flexibility in offline or online services. Flexibility (Momaya et. al., 2016) in service at the destination motivates the tourists to share the positive word of mouth, recommendations, and inspires to revisit the same destination in the future.

When tourists plan for a trip, first, they calculate an overall cost of tourism products, which includes price of food, cost of accommodation, and transportation charges. If the price of tourism products increases, then the customer service experience will decrease because tourists rate the tourism product at a price. Therefore, it is essential to make the pricing structure and objectives to set the role of price in the tourism market (Chua et. al., 2015). In the tourism industry, safety and security are a prime concern for tourists and social media plays a significant role in providing security information and makes awareness of the crime. Tourists are likely to post the online review on social media about the tourism experience, which helps tourists in decision-making about the destination. There are 80% of the tourist read online review before booking rooms in hotels and 53% of the tourists would not want to stay in hotels without taking reviews (Tsao et al., 2015). India being a large country full of attractions such as tourist-friendly traditions, cultural heritage, colourful festivals, beaches, forests, mountains, and handicrafts, focus towards enhancing CSE at the destination could make their attractions accessible by which tourists can quickly get pleasure from these attractions. Destination attraction leaves a direct effect on tourists and inspires for the word-of-mouth recommendation and destination loyalty.

Limitation and Future Research Direction

As per the previous literature reviews, this research is the first to explore the effects of these constructs on the customer service experience using UGC in the context of the Indian tourism industry. It also explores new opportunities for future research. First, although we have investigated social media posts regarding these ten constructs from the twitter platform, by no mean this research gives a universal solution. Therefore, this study calls for researchers for further investigations of this research on other constructs and social media platforms. Second, our research examines the user-generated content on social media for the Indian tourism industry. It would be beneficial to check whether our results are generalizable if we analyse the government handled accounts on social media of Indian tourism. Third, in this study, we focus on only text contents in social media posts regarding these constructs for the analysis. For that, only the social media contents that were in the English language were extracted for the present study. It would be beneficial to check the impact of regional languages and image posts of social media on the customer service experience. Fourth, an in-depth analysis of the impact of image contents and regional languages on tourist’s experience towards tourism would be an exciting addition to the present research.

Conclusion

This research tries to provide information that can be beneficial in the Indian tourism industry to improve the overall customer service experience. If the tourists are highly satisfied with customer services, then it generates higher revenues. India is a big country of 28 states and 9 union territories. Each state and union territory are having its unique heritage, culture, and geographic locations. Therefore, each state contains various types of influencing factors for enhancing the overall customer service experience of the tourists. In this study, only ten predictors were considered, and all ten variables predict the overall customer service experience according to the zones. Price, cleanliness, JSF, and hospitality are the most influential factors in the East, North, North-East, South, and West zones of India. The results obtained in the study are based on current information only. This study can help the Indian government in making policies for the tourism of India. The government can set the standard of the price for Indian and foreign tourists, which can help to improve the level of customer service experience in the tourists because our results are showing that price is a statistically significant factor in three zones out of five zones. Cleanliness and hospitality are other vital areas where the government can make the policies for conserving the natural and cultural resources of tourism. The education programmes can play an essential role in educating people towards conveying the importance of cultural and natural resources (Ryan, 2002) of tourism. Hospitality is a fundamental cultural activity in the tourism industry to improve the overall customer service experience because it motivates the tourists for e-WOM publicity of the destination after the trip. Proper arrangements of restaurants, accommodations, waste management, and security can improve the satisfaction level of the tourists. So, the government can make policies for the security and cleanliness of the destination. Besides, improvements in cleanliness and security at the destination, i.e. proper arrangement of toilets, accommodations, and restaurants, can help to the Indian tourism industry to compete with other nations. Whenever tourists decide to go for a trip, personal safety and security come on top priority. Although, Jugaad provides an indigenous innovation for local use. Besides, it gives extra support in tourism services with minimal costs. In the Indian context, we adopt the services and products that are simple and useful. Jugaad (Prabhu & Jain, 2015) can play a vital role in making government policy and development of the tourism industry (Parente & Prescott, 2002). So, this study concludes that price, cleanliness, JSF, and hospitality are the most influencing factors to improve the customer service experience. Although, this study tries to understand the sentiment of the tourists about Indian tourism. Still, a lot of information is available on social media in the form of images and text in the regional languages. That unprocessed data can give us better accountability and can provide a better understanding of the emotion of the people, which in turn can be capitalized in the decision-making process.