Keywords

1 Introduction

With the rapid development of Internet technology, it is an important factor that the tourism services are changing from the past. The modern traveler has an easy access to large amount of travel information in seconds via the Internet. The travel websites such as TripAdvisor, Expedia, and TourismThailand plan their own trip. However, it is necessary to spend more time for searching about tourism information but finding or identifying the most relevant information about tourist attraction, routing, point of interest and planning a trip for each day of each trip are difficult. Therefore, one solution for this problem is the development of a recommendation system (RS) based on knowledge graph (KG). The high availability of information has huge benefits for the tourism domain.

RS is the information filtering system that deals with the problem of information overload by filtering vital information fragment out of large amount of dynamically generated information according to user’s preferences, interest, or observed behavior about item [1]. RS is used in variety of areas including movies, restaurants, social tags and products in general. The tourism field is one of the most potential application areas of RS. The authors [2] collected tourism information from social media, extracted data, and find similarity with the relations among information which method gained high effectiveness performance. In [3], RS was constructed to solve the problem of generating road-trip itineraries and activity duration with time of the day is more relevant to the user by using data from community opinion and location. In [4], the authors applied machine learning algorithms, such as K-nearest neighbors, decision tree, switching and weighted, to overcome the cold start problem in tourism field.

The concept of KG is highlighted by Google Corporation. The key technologies include the extraction of entities with their attribute information, and the relationships between entities [5]. In [6], KG was presented for recommending travel attraction in a new city that the user is going to by the use of semantic information has been exploited. In recent years, network representation learning has been proposed and aroused considerable research interest, most notably models known as word2vec [7], node2vec [8], and entity2rec [9]. It aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the graph is preserved. Therefore, we propose a novel approach to organize and share tourism information in a large scale, and use a KG to connect all information regarding to the tourism e.g. tourist attraction, location, category, time and make it easy and universally accessed by everyone. We are to best of our knowledge the first who analyze the evolution of a KG and make use of this novel information to design tourist attraction recommendation system based on KG in Thailand.

A data set of this research is 5-year tourism data of TripAdvisor and TourismThailand of Bangkok in Thailand since 2014. This research has been conducted under three objectives: (1) to analyze tourism data set for exploring the tourism domain, (2) to design and implement a tourist attraction KG that can present the relationship among data, monitor and retrieve required information and (3) to propose a tourist attraction recommendation model based on KG that adopts a flexible random walk procedure based on Node2Vec.

2 Design and Implementation of Tourist Attraction Knowledge Graph

The conceptual framework is shown in Fig. 1. Tourism data in Bangkok is collected by the data aggregator from TripAdvisor and TourismThailand, the end of retrieved data is April 16, 2018. Specifically, we use the Knowledge of Information Retrieval tool (KnIR) which is developed in JAVA language for gathering data aggregator after that putting the data into the centralized database.

Fig. 1.
figure 1

The conceptual framework

From data aggregator, we design the six fact attributes in tourist attraction: location, category, open time, user ID, rating and comment. We purpose to generate graph that represent a relationship between tourist attraction and other related information in triples, such as (Chatuchak Weekend Market, Category of, Street Markets). The Neo4j is used to generate the tourist attraction knowledge graph as shown in Fig. 2.

Fig. 2.
figure 2

The schema of the tourist attraction knowledge graph

Our data set contains 1,411 attractions, 66,461 reviews rated by 41,765 users. There are 23 categories and 35 sub categories. The total number of triples in knowledge graph is 87,544.

3 Tourist Attraction Recommendation Model Based on Knowledge Graph

In this section, we design a flexible random walk procedure based on Node2Vec for tourist attraction recommendation. Using the network representation learning method Node2Vec models the tourist attraction. The node sequence model takes attribute sub-graph as input to the generate node sequence, and produces the feature vectors of an attraction or tourists as the output which obtained by learning sequence feature. Next, uses the cosine similarity to calculate the correlation scores between tourists and attractions. Finally, we normalize the correlation scores to generate the recommended list. This model is able to overcome the sparsity problem of the tourist knowledge graphs. The framework is shown in Fig. 3.

Fig. 3.
figure 3

The framework of the system

3.1 Generate Node Sequences

First, divide KG into several sub-graphs and generate the corresponding sequence \( S = \left[ {u_{1} ,u_{2} , \ldots \ldots u_{n} } \right] \) by using random walk strategy. Formally, u is a given source node, we simulate a random walk of flexible length l which related to the scale of a subgraph. \( u_{j} \) is the ith node in the walk, starting with \( u_{0} = u \). Nodes \( u_{j} \) are generated by the following distribution:

$$ P\left( {u_{j} = x |u_{j - 1} = v} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{\pi_{vx} }}{N}} \hfill & {if\,\left( {v,x} \right) \in E} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(1)

where \( E \) represents the set in the knowledge graph, \( \pi_{vx} \) is the unnormalized transition probability between \( v \) and \( x \), and \( N \) represents the normalizing constant. Edge (t, v) is just traversed by a random walk which now resides at node v. The walk needs to compute the transition probabilities \( \pi_{vx} \) on edges (v, x) for selecting the next node. The unnormalized transition probability is as follows

$$ \pi_{vx} = \alpha_{pq} \left( {t,x} \right)*w_{vx} $$
(2)

where \( w_{vx} \) is the weights on edges (v, x), in the case of no weighted graphs the default values are 1, and \( \alpha_{pq} \) is as follows

$$ \alpha_{pq} \left( {t,x} \right) = \left\{ {\begin{array}{*{20}l} {\frac{1}{p}} \hfill & {if} \hfill & {d_{tx} = 0} \hfill \\ 1 \hfill & {if} \hfill & {d_{tx} = 1} \hfill \\ {\frac{1}{q}} \hfill & {if} \hfill & {d_{tx} = 2} \hfill \\ \end{array} } \right. $$
(3)

where \( d_{tx} \) is the shortest path distance between \( t \) and \( x \) and the value of \( d_{tx} \) is one of the set {0, 1, 2}. P and q are the return parameter and in-out parameter respectively.

3.2 Learning Sequence Features

In Sect. 3.1, we have divided KG into several sub-graphs, and here we use one of these sub-graphs as an example. Suppose \( S = \left[ {u_{1} ,u_{2} , \ldots \ldots u_{n} } \right] \) is a node sequence of the sub-graph, we build the node sequence representation model by using three lawyer neural network. The objective function which is a log-likelihood function is as follows

$$ \sum\nolimits_{{{\text{j}} = 1}}^{\text{N}} {{ \log }\,{ \Pr }\left( {{\text{u}}_{\text{j}} |{\text{x}}^{{{\text{u}}_{\text{j}} }} } \right)} $$
(4)

where N denotes the number of nodes in this sequence, \( {\text{X}}^{{u_{j} }} \) denotes a feature vector which is composed of the context nodes of \( u_{j} \).

3.3 Recommend List Generation

In Sects. 3.1 and 3.2, tourists and attractions in the knowledge graph have been represented as vectors based on the attributes of features. Here we use v(attract) represents the feature vector of the attraction, and v(user) represents the feature vector of the user. After get the feature vector of the attractions and visitors in the same vector space, we use the following formula to measure the correlation between a tourist and attractions and generate the prediction score.

$$ \text{Re} l(attract,user) = sim({\mathbf{v(attract}}),{\mathbf{v(user)}}) $$
(5)

where sim is the cosine similarity.

Finally, the system returns a recommendation list according to the prediction score. This model is able to overcome the sparsity problem of the tourist knowledge graphs.

4 Conclusion and Future Work

In this paper, we designed and implemented a tourist attraction knowledge graph that can present the relationship among data, monitor and retrieve required information. We also proposed a tourist attraction recommendation model based on knowledge graph that adopts a flexible random walk procedure based on Node2Vec. Firstly, we collect tourism data in Bangkok by using KnIR tool for gathering data aggregator from TripAdvisor and TourismThailand and use the Neo4j tool to generate the tourist attraction knowledge graph. After that the network representation learning method Node2Vec is applied to obtain the feature vectors of an attraction or tourists, and then use the cosine similarity to calculate the correlation scores between tourists and attractions. Finally, we normalize the correlation scores to generate the recommended list. This model is presented for overcoming the sparsity problem of the tourist knowledge graphs and can be used in large scale knowledge graph. In the future work, we will enrich our knowledge graph in depth with all cities in Thailand and develop recommendation system about Thailand.