Abstract
Web search query data are obtained to reflect social spots and serve as novel economic indicators. When faced with high-dimensional query data, selecting keywords that have plausible predictive ability and can reduce dimensionality is critical. This paper presents a new integrative method that combines Hurst Exponent (HE) and Time Difference Correlation (TDC) analysis to select keywords with powerful predictive ability. The method is called the HE-TDC screening method and requires keywords with predictive ability to satisfy two characteristics, namely, high correlation and fluctuation memorability similar to the predicting target series. An empirical study is employed to predict the volume of tourism visitors in the Jiuzhai Valley scenic area. The study shows that keywords selected using HE-TDC method produce a model with better robustness and predictive ability.
Similar content being viewed by others
References
Bangwayo-Skeete, P. F. & Skeete, R. W. (2015). Can Google data improve the forecasting performance of tourist arrivals? mixed-data sampling approach. Tourism Management, 46: 454–464.
Brynjolfsson, E., Geva, T. & Reichman, S. (2015). Crowd-squared: amplifying the predictive power of search trend data. MIS Quarterly (Forthcoming). Available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2513559. Cited January 18, 2016.
CNNIC. (2014). Statistical Report on the Development of China Internet Network in the Thirty-Fifth Time. China Internet Network Information Center. Available at http://www.cac.gov.cn/cnnic35fzzktjbg.htm. Cited March 1st, 2015.
D. Butler. (2013). When Google got flu wrong. Nature, 494(7436): 155.
Du J., Xu H. & Huang X. (2014). Box office prediction based on microblog. Expert Systems with Applications, 41(4): 1680–1689.
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012–1014.
Lazer, D., Kennedy, R., King, G. & Vespignani, A. (2014). Big data. The parable of Google flu: traps in big data analysis. Science (NY), 343(6176): 1203.
Liu, Y., Chen, Y., Wu, S., Peng, G. & Lv, B. (2015). Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Annals of Operations Research, 234(1): 77–94.
Peng, G. & Wang, J.Y. (2014). Detecting syphilis amount in China based on Baidu query data. In: International Conference on Soft Computing in Information Communication Technology (SCICT 2014), Atlantis Press.
Preis, T., Moat, H.S. & Stanley, H.E. (2013). Quantifying trading behavior in financial markets using google trends. Scientific Reports, 3:1684. doi:10.1038/srep01684
Scott, S. L. & Varian, H. R. (2013). Bayesian variable selection for nowcasting economic time series. National Bureau of Economic Research. Available via http://www.nber.org/papers/w19567.pdf. Cited January 18, 2016.
Vaughan, L. & Romero-Frías, E. (2014). Web search volume as a predictor of academic fame: an exploration of Google Trends. Journal of the Association for Information Science and Technology, 65(4): 707–720.
Wang, J.Y., Peng, G. & Dai, W. (2014). Prediction of online trade growth using search-ANFIS: transactions on Taobao as examples. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), July 6-11, 2014, Beijing, China.
Wu, L. & Brynjolfsson, E. (2014). The future of prediction: how Google searches foreshadow housing prices and sales. Economics of Digitization, University of Chicago Press.
Yang, X., Pan, B., Evans, J. A. & Lv, B. (2015). Forecasting Chinese tourist volume with search engine data. Tourism Management, 46: 386–397.
Yang, Y., Pan, B. & Song, H. (2014). Predicting hotel demand using destination marketing organization’s WEB traffic data. Journal of Travel Research, 53(4): 433–447.
Yuan, Q., Nsoesie, E. O., Lv, B., Peng, G., Chunara, R. & Brownstein, J. S. (2013). Monitoring influenza epidemics in china with search query from Baidu. PloS one, 8(5): e64323.
Author information
Authors and Affiliations
Corresponding author
Additional information
Geng Peng received BS from North China Electric Power University in 1992, MS degree from Wuhan University of Science and Technology in 1998 and PhD degree from Tianjin University in 2001. Now he is an associate professor of School of Economic and Management, University of Chinese Academy of Sciences (UCAS). He is having more than 15 years of teaching experience and current area of research includes e-commerce and Internet data analysis.
Ying Liu received BS from Jilin University in 2006, MS and PhD degree from University of Chinese Academy of Sciences respectively in 2008 and 2011. Now he is an associate professor of School of Economic and Management, UCAS. His research interests focus on e-commerce, Internet economy and Internet data analysis.
Jiyuan Wang received BS in Nanjing University of Aeronautics and Astronautics in 2012. Now he is a PhD student of UCAS and University of Groningen in Netherland. His main research interests focus on econometrics.
Jifa Gu received his BS from Peking University and PhD from Institute of Mathematics, USSR Academy. He is a professor of Institute of Systems Science, Chinese Academy of Sciences (CAS). He is academician of International Academy of Systems Science and Cybernetics. His main interests are operations research, systems engineering and systems science.
Rights and permissions
About this article
Cite this article
Peng, G., Liu, Y., Wang, J. et al. Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors. J. Syst. Sci. Syst. Eng. 26, 163–182 (2017). https://doi.org/10.1007/s11518-016-5311-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-016-5311-7