Voice of urban park visitors: exploring destination attributes influencing behavioural intentions through online review mining

In this paper, we will identify the destination attributes of a popular urban park and investigate their specific roles in forming visitors' behavioural intentions using text mining approaches. The principles of natural language processing and psychometric procedure were combined to achieve the objectives of the research. Initially, park visitors’ online reviews were collected and analysed to identify possible latent dimensions for questionnaire design. Then, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were used for crucial factor selection and verification. Lastly, a structural equation model (SEM) was constructed to investigate the impacts of these park attributes on the behavioural intention of visitors.


Introduction
As the guardian of the city's natural environment, urban parks are indispensable public resources for the sustainable development of urban ecosystems [1]. However, there is always a lack of adequate understanding of visitors' experiences by park managers or planners. Post-occupancy evaluation (POE), as a bridge between them, forms a feedback loop to connect the two parties. The user feedback can inform the functional comfort and environmental stress of a built environment [2], which has recently become the focus in POE research [3]. Nowadays, the massively user-generated content (UGC) on travel websites can be regarded as passionate, insightful and spontaneous review by visitors [4]. It can be taken as a rich source of online user feedback for helping understand user's real preferences or needs [5,6].
Moreover, in empirical research, a pilot survey is crucial for scale design. It requires to identify the core dimensions of influential factors comprehensively and concisely [7]. However, traditional methods of pilot survey often rely solely on on-site interviews or observations. Due to the limited sample size, it is challenging to locate the overall visitors' concerns accurately, and some other essential dimensions may B Beibei Jiang jiangbeibei@tom.com 1 College of Landscape Architecture, Sichuan Agricultural University, Chengdu 611130, Sichuan, China be ignored unnecessarily. Furthermore, most original items were adapted from prior literature of similar disciplines, which may lead to inconsistencies with current research scenarios [8]. Therefore, in the pilot survey, we suggest identifying the necessary attribute factors from the rich source of UGC.
Evaluating park usage and comprehending its driving elements are very significant for expanding park usage and subsequently human prosperity. Past examinations researched the impacts of various physic and culture variables on park use utilizing guest reviews and direct perceptions of park clients, which are typically site explicit and tedious. Many researches measured and analysed visitors' behavioural intentions for various kinds of parks utilizing uninhibitedly accessible geotagged registration information from webbased media. Some authors explored how park aspects, area, setting and transport influenced the visitors' behavioural intentions, utilizing different straight relapses. Regardless of likely inclinations in the utilization of web-based media information, utilizing a recreation center typology, visitors' behavioural intentions is fundamentally varied between various kinds of parks. Although social relics parks and enormous metropolitan parks had specific visitors' behavioural intentions, neighborhood parks had higher appearance rates per unit of region. Park size and extra charges were related with adequate visitors' behavioural intentions for a wide range of parks. For parks that mostly serve neighborhood inhabi-tants, the separation from metropolitan focus fundamentally influenced park usage [5][6][7][8].
In this study, we attempted to explore the key attributes of an urban park that may influence the behavioural intentions of visitors and what is the relative impact weight of each park attribute on behavioural intention. Moreover, the means-end theory was applied in this research context. Natural language processing (NLP) and psychometric procedure were combinedly used in the research process.
Lots of prior studies have explored the role of a broad range of factors affecting park use behaviours, such as environmental quality, income, accessibility, demographic characteristics and individual preferences [7]. Nevertheless, few of those studies were based on established cognitive or behaviour theories. After that, Zhang and Tan researched determinant park-use behavioural factors based on the Theory of Planned Behavior (TPB) [9]. Han explored the relation between halalfriendly destination attributes and revisit intention based on the complexity theory [9]. Thus far, to the best of our knowledge, little research is known about the relationship between destination attributes and behavioural intention based on the means-end theory in park POE.
Destination attributes are recognized as an amalgam of different elements of the destination [10]. Visitors can perceive a variety of natural or artificial destination attributes [11], which aids in the forming of a visitor's on-site experience. In many prior studies, destination attributes are viewed as the primary and critical antecedent of behavioral intention [9,12,13].
Behavioral intention is assumed to be the likelihood of taking specific actions based on one's subjective tendencies [14]. Revisit and recommend intention are both essential components of behavioural intention [15]. In many studies, the behavioural intention was always the last factor in the perception chain. It is a crucial indicator for evaluating visitors' loyalty of a destination [9,12,13]. Favourable behavioural intentions usually represent the conative loyalty of visitors. It is critical for a destination's long-term viability and sustainability. Loyal visitors are more likely to return to the destination and give favourable word-of-mouth (WOM) to their relatives, friends or other potential visitors [16,17]. In practice, actual loyal visitors' behaviours are often difficult to measure; thus, most studies employed behavioural intention as a compromise of it [18].
The means-end theory assumes that people are goaloriented, and they need to achieve individual values by purchasing particular attributes of a product or service [19]. It provides a practical perspective to explore the impacts of different park attributes on a visitor's behaviour. For example, a visitor may be especially satisfied due to a unique set of features provided by the facilities or services of a park. Therefore, this study tries to extend the means-end theory in the context of park POE to explore the attributes that may significantly influence visitor's behaviours.
The rest of the paper is organized as follows: In the second section, we discuss the methodology behid the proposed method, as well as the entire research process. In Sect. "Numerical results", we present the numerical results and compare them to some well-known algorithms, with specific attention given to the demographic characteristics in Sect. "Demographic characteristics" and the exploratory factor analysis and the confirmatory factor analysis in Sects. "Exploratory factor analysis" and Confirmatory factor analysis", respectively. Structural equation modelling analysis is discussed in Sect. "Confirmatory factor analysis". Finally, after the algorithm discussion in Sect. "Discussion", the final section provides the conclusion and future research remarks.

Methodology
In this research, Wangjiang Pavilion Park was taken as a case study, which is next to the south bank of Jinjiang River and Sichuan University, Chengdu. Wangjiang Pavilion Park was built to commemorate the poet XueTao in Tang Dynasty, who cherished bamboo all her life and praised its personified noble quality. So far, more than 200 species of bamboo have been planted in the park, and the 39-m-high Wangjiang Pavilion is a landmark of that area [20]. It has now become a popular place for travelling and recreational purposes. According to official information, the daily visit has grown steadily, which has exceeded an average of 3000 times/day and reached over 7000 times/day on weekends.
The entire research process includes two significant steps: 1. Collect online reviews and use text analysis to identify the latent dimensions, specifically the following: • text pre-processing, • word frequency analysis, • word co-occurrence network analysis, • sentiment analysis and • latent Dirichlet allocation (LDA) topic analysis. 2. Conduct the questionnaire survey.
• Perform exploratory factor analysis and confirmatory factor analysis. • Build a structural equation model to evaluate the impacts of park destination attributes on visitors' behaviour intentions. • Derive the overall research flow chart, as shown in Fig. 1.

Text analysis and latent dimension identification
Text analysis is a sort of natural language processing (NLP) technology. NLP has gone through an array of technologies such as naive Bayes, TF/IDF, word2vec, LDA, LSTM, fast- text, BERT and even the latest ALBERT [21,22], which has vastly improved the text analysis quality and efficiency. The analysis process of recent researches on visitors' behaviours can be classified into two categories. The first category was based entirely on online review analysis, which extracted features only from the text for investigation [6,23,24]. However, such processes are: Less pertinent than methods using questionnaires.
For unstructured text such as reviews or blogs, user ratings are unavailable.
Self-selection bias may be included in the analyses [5].
The second category relied solely on questionnaires [7,25,26], which could not make full use of big textual data to obtain the most representative features of the population; therefore, we proposed a two-step approach in this research.
Step one is to use text analysis to identify the latent dimensions of online reviews in the pilot survey.
Step two is to survey the target population and build a structural equation model (SEM) for empirical analysis. In this way, we can take full advantages of both methods.
In this study, we employed multiple text analysis methods, including word frequency statistics, word co-occurrence network, sentiment analysis and latent Dirichlet allocation (LDA). The target population are park visitors (including tourists and residents). We collected 2,435 reviews from nine Chinese travel websites or blogs by searching using the keyword "Wangjiang Pavilion Park". All the nine websites are ranked at The list of UGC collection source websites with appropriate rankings and reviews are presented in Table 1.
We used Baidu lexical analysis of Chinese (LAC) for word segmentation. LAC is a lexical analysis model using a recurrent neural network (RNN). Its accuracy on the test set can reach 95.5% [27] and can complete tasks such as word cutting, part-of-speech (POS) labelling and word stemming. Jieba (a word segmentation package on Python 3.7) has much better performance than LAC, but its accuracy is lower than that of LAC. Thus we first used LAC to generate a lexicon and then used Jieba for sentence-by-sentence word segmentation. Among the extracted keywords, we performed manual inspections by two students and removed the stop and meaningless words [28]. We also merged synonyms to improve the word segmentation performance. Some frequently used keywords are presented in Table 2, along with their respective occurrence. We used Python 3.7 to calculate the word co-occurrence matrix and record the number of keywords that appeared simultaneously in each review. Then, we used the netdraw module in the UCINET software to draw the word cooccurrence network graph and observe its clusterings and correlations [29]. The depiction of a network of keyword co-occurrence is presented in Fig. 2.
As a result, we identified five node groups, where semantically related nodes were labelled with the same colours: 1. Group one: nodes connected to the bamboo forest landscape (green); 2. Group two: nodes related to culture and heritage (purple); 3. Group three: nodes associated with people's activities in the park (yellow); 4. Group four: nodes related to the adjacent Sichuan University (orange); 5. Group five: nodes associated with the entrance ticket (red).
Since visitors will only give specific comments on the topics that they are most concerned" the keyword distribution showed a "long tail" pattern [30]. Some low-frequency words may still contain valuable information. Therefore, we inspected them and found some essential keywords like "older people", "cosplay", "pleasantly cool" and "fresh air".
Also, we used the Bi-LSTM model of the Baidu Senta platform for sentiment analysis. Senta has been pre-trained on large corpora and can give sentiment scores ranging from 0 to 1 on specific texts [31]. After the sentiment analysis, we found that the keyword "ticket fee" was with strong emotion score and included "ticket charge" as an item in the questionnaire.
Finally, we used the LDA algorithm to extract topics from online reviews. The LDA is an unsupervised Bayesian algorithm [32], which automatically extracts text topics without manual labelling. Therefore, it is suitable for identifying latent dimensions from a large volume of unstructured texts [4,33]. We performed this analysis using the LDA module from the Gensim package. However, the user needs to set the topic numbers in advance. Therefore, we used the coherence model to get the optimal topic numbers [34]. Through calculating the coherence values of topics from 2 to 100,   Table 3.

Questionnaire design and survey
The questionnaire was built based on the pilot survey results. A 5-point Likert scale was employed for measuring the model items. The initial version of the questionnaire was written in English. Then, it was translated into Chinese using the back-translation method, which was then reviewed by a native Chinese speaker to ensure that the meaning was clearly delivered. There are three parts of the questionnaire:The first part briefly presented the survey guidance on the privacy and anonymity protection of the respondent. The second part contained survey questions for measuring the items in the model. Among them, the behavioural intention containing three items was adapted from the researches [9,12]. The elements of the park destination attribute referenced the results of the pilot survey and used operational definitions. The final part is a demographic survey of participants, including age, gender and education.
This study adopted the method of convenience sampling. A total of 312 visitors were invited to the survey, and 306 of them completed the questionnaire. The investigator first gave a brief introduction to the research and then surveyed the respondent under his permission. The data collection process lasted for 3 weeks, from March to April 2019. Then, through screening, a total of 299 valid cases were retained. According to the rule of 10:1 observations per each indicator [35], the 299 samples met the relevant criteria.

Exploratory factor analysis
We used IBM SPSS 17.0 to perform exploratory factor analysis (EFA) on park destination attributes [36] and used principal component analysis (PCA) and rotated component matrix to determine their underlying factors and groups. The results indicated that the kaiser-meyer-olkin (KMO) value was 0.815 > 0.8, and Bartlett's sphericity test was significant (p < 0.000), which validated the adequacy of EFA [37]. After removing items with cross-loading or low factor loading (< 0.40) [38], 14 attribute items were retained ( Table 5). The eigenvalues of these four factors were all greater than 1, accounting for approximately 78.601% of the total variance. These factor items were grouped and named according to the rotated component matrix. As shown in Table 5, factor on is denoted as "Ecological Environment" which consisted of four items, accounting for 18.021% of the total variance. Factor two is expressed as "Culture and Heritage" which consisted of four items, accounting for 17.167% of the total variance. Factor three is named as "Service Facilities" which consisted of three items, accounting for 14.847% of the total variance. Factor four is designed as "Bamboo Forest Landscape" which consisted of three items, accounting for 14.183% of the variance. The factor loading of 14 items all exceeded the threshold of 0.50 [39]. The Cronbach's is used to check the internal consistency of the items in the corresponding group. They are α 1 0.889, α 2 0.872, α 3 0.897 and α 4 0.870, which were all greater than 0.70, indicating that they met the relevant criteria [40].
Therefore, we proposed the following four hypotheses: H1 Bamboo forest landscape attributes impacts on behavioural intention significantly and positively.
H2 Cultural and heritage attributes impacts on behavioural intention significantly and positively.
H3 Service facilities attributes impacts on behavioural intention significantly and positively.
H4 Ecological environment attributes impacts on behavioural intention significantly and positively.

Confirmatory factor analysis
We used IBM AMOS 22.0 for confirmatory factor analysis (CFA) to verify the reliability, discriminant and convergent validity of the measuring model [41]. Table 6 gave the results of the CFA. The standard factor loading of each item was greater than 0.70 and less than 0.95, which met relevant standards [42]. The goodness-of-fit statistics indicators of the measuring model was: χ 2 105.266, df 71, χ 2 /df 1.483, p 0.005, RMSEA 0.04, CFI 0.985, GFI 0.952, NFI 0.956. It indicated that the proposed measuring model fitted with the data appropriately [43].
Moreover, the composite reliability (CR) values reached over the required cutoff rate of 0.70, which indicated that these items were internally consistent and reliable [39]. The average variance extract (AVE) values all exceeded the 0.50 threshold, which met the relevant criteria [39]. As shown in Table 7, each correlation coefficient value was found below the AVE's square root [39]. The measurement model is given in Fig. 4.
As shown in Table 8 and Fig. 5, the bamboo forest landscape attributes are positively and significantly correlated with behavioural intention (H1: β 1 0.261, t 4.114, p < 0.001); the cultural and heritage attributes is positively and significantly correlated with behavioural intention (H2: β 2 0.252, t 4.165, p < 0.001); the service facilities' attributes are positively and significantly correlated with behavioural intention (H3: β 3 0.157, t 2.798, p 0.005); the ecological environment attributes is positively and significantly correlated with behavioural intention (H3: β 4 0.308, t 5.007, p < 0.001). The outcomes of the SEM analysis indicated that the park attributes impacted on behavioural intention significantly. Among them, the most significant proportion is the ecological environment attributes (β 4 0.308). Its sub-items include micro-climate environment, acoustic environment, air quality and bird habitat. These are related to the rich ecological functions of the bamboo forest [44]. The second is the bamboo forest landscape attributes (β 1 0.261). Its sub-items are bamboo varieties, bamboo garden bamboo architecture. The third is the cultural and heritage attributes (β 2 0.252). Its sub-items include park history, cultural figures, ancient architectures and leisure culture. Among them, the cultural figure mainly refers to the poet XueTao. The fourth is the service facilities attributes (β 3 0.157). Its sub-items are ticket fees, activity platform for older people and activity platform for young people. According to the on-site observation, young people mainly like to engage in cosplay activities in the park, and middle-aged or older people prefer to do various exercise activities in the park such as strolling, Taichi, Yoga, dancing, Yoyo. The R 2 of behavioural intention is 0.32. To sum up, the proposed structural equation model has a good ability to explain the correlations among the variables.

Discussion
By incorporating NLP technologies into the pilot survey, we extracted the latent dimensions from online reviews. The outcomes of the SEM analysis confirmed the validity of the text analysis methods. Sentiment analysis was found quite useful for locating key concerns of visitors. Although some keywords were distributed in the low-frequency zone, their emotions were very intense and could be recognized by sentiment analysis. Although manual analysis of online reviews may be more accurate than using machine learning methods [5], it is almost impossible to accomplish the work when faced with the massive amount of online user-generated content. Therefore, this study used a pre-trained Baidu Senta Bi-LSTM model to perform this task [31].
Furthermore, many studies relied on users' online ratings as a primary quantitative indicator [5,24,45]. Nonetheless, with the soaring number of unstructured online texts, their ratings are no longer available. Hence, unsupervised topic models have been increasingly adopted [4,6]. However, the analysis granularity of LDA was still relatively coarse in this study, and it was impossible to identify more subtle items. Therefore, we used both word frequency analysis and word co-occurrence networks [29] to extract latent dimensions and found that it was effective.
Consequently, it is necessary to incorporate multiple text analysis methods into one research. Also, we performed the   manual inspection on keywords and merged some synonyms to improve the expression of the word frequency distribution. Therefore, to increase efficiency, the introduction of the synonym merging mechanism may be worthy of attention in future studies. The trends in POE research indicated that end-user feedback was critical for improving the quality of the built The diagram of the SEM analysis environment [3]. Tveit argued that people's perception of the landscape was at the heart of the European Landscape Convention [46]. The outcomes of this study indicated a significant association between the inner behavioural intentions of visitors and the outer park attributes. From the visitors' overall perception, we can trace the attributes that may significantly affect their behaviours. Meanwhile, the relative impact weight of each destination attribute on behavioural intention was revealed through the SEM analysis. Moreover, It is also possible for eliciting further exploration of more specific attributes in details.
Furthermore, the market competition in scenic spots is becoming increasingly fierce. It is imperative to enhance the differentiated competitiveness of the destination. The common sustainable indicators of urban parks have been thoroughly investigated [47], but research on how to explore the unique attributes of a typical park destination is relatively limited. The two-step analysis approach proposed in this research can help to explore the key attributes of a successful case and to provide references for future designs.

Conclusion
This study attempted to explore the significant park attributes influencing visitors' behavioural intentions in the park POE. Wangjiang Pavilion Park was taken as a case study. We combined both the natural language processing (NLP) technology and psychometric test procedure into the research process, which allowed us to listen to the voice of visitors more effectively. Text analysis was initially conducted on online reviews to identify the latent park attributes. The extracted attributes were further examined and grouped by exploratory factor analysis (EFA). Then, the reliability, discriminant and convergent validity of these factors were verified by confirmatory factor analysis (CFA). Last, a structural equation model was constructed to include the selected variables and calculate the relations and impacts among them. The overall outcomes revealed four main dimensions: bamboo forest landscape, ecological environment, culture and heritage and service facilities. The SEM analysis indicated that the impacts of these four dimensions on behavioural intention were positive and significant, which is favourable for increasing visitors' repeat visitation and recommendation behaviours.
In the future research, researchers can deal with incorporating multiple text analysis methods into one research. Since we performed the manual inspection on keywords and merged some synonyms to improve the expression of the word frequency distribution, and the synonym merging mechanism can be explored in order to increase efficiency.
Also, in future designs, the key attributes of a successful case can be explored using the two-step analysis approach proposed in this research.