Data mining for unemployment rate prediction using search engine query data

Special Issue Paper


Unemployment rate prediction has become critically significant, because it can help government to make decision and design policies. In previous studies, traditional univariate time series models and econometric methods for unemployment rate prediction have attracted much attention from governments, organizations, research institutes, and scholars. Recently, novel methods using search engine query data were proposed to forecast unemployment rate. In this paper, a data mining framework using search engine query data for unemployment rate prediction is presented. Under the framework, a set of data mining tools including neural networks (NNs) and support vector regressions (SVRs) is developed to forecast unemployment trend. In the proposed method, search engine query data related to employment activities is firstly extracted. Secondly, feature selection model is suggested to reduce the dimension of the query data. Thirdly, various NNs and SVRs are employed to model the relationship between unemployment rate data and query data, and genetic algorithm is used to optimize the parameters and refine the features simultaneously. Fourthly, an appropriate data mining method is selected as the selective predictor by using the cross-validation method. Finally, the selective predictor with the best feature subset and proper parameters is used to forecast unemployment trend. The empirical results show that the proposed framework clearly outperforms the traditional forecasting approaches, and support vector regression with radical basis function (RBF) kernel is dominant for the unemployment rate prediction. These findings imply that the data mining framework is efficient for unemployment rate prediction, and it can strengthen government’s quick responses and service capability.


Unemployment rate prediction Data mining Search engine query data Government service 


  1. 1.
    Askitas N, Zimmermann KF (2009) Google econometrics and unemployment forecasting. Appl Econom Q 55(2):107–120CrossRefGoogle Scholar
  2. 2.
    Blasco N, Corredor P, Del Rio C, Santamaria R (2005) Bad news and Dow Jones make the Spanish stocks go round. Eur J Oper Res 163(1):253–275MATHCrossRefGoogle Scholar
  3. 3.
    Chen CI (2008) Application of the novel nonlinear grey Bernoulli model for forecasting unemployment rate. Chao Solitons Fractals 37(1):278–287MATHCrossRefGoogle Scholar
  4. 4.
    Choi H, Varian H (2009) Predicting initial claims for unemployment benefits. Google technical reportGoogle Scholar
  5. 5.
    Choi H, Varian H (2009) Predicting the present with Google trends. Google technical reportGoogle Scholar
  6. 6.
    D’Amuri F (2009) Predicting unemployment in short samples with internet job search query data. MPRA paper no. 18403:1–17Google Scholar
  7. 7.
    D’Amuri F, Marcucci J (2009) Google it! forecasting the US unemployment rate with a Google job search index. MPRA Paper No. 18248:1–52Google Scholar
  8. 8.
    Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS (2009) Detecting influenza epidemics using search engine query data. Nature 457(19):1012–1014CrossRefGoogle Scholar
  9. 9.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  10. 10.
    Harvill JL, Ray BK (2005) A note on multi-step forecasting with functional coefficient autoregressive models. Int J Forecast 21(4):717–727CrossRefGoogle Scholar
  11. 11.
    Keilis-Borok VI, Soloviev AA, Allegre CB, Sobolevskii AN (2005) Patterns of macroeconomic indicators preceding the unemployment rise in Western Europe and the USA. Pattern Recogn 38(3):423–435MATHCrossRefGoogle Scholar
  12. 12.
    Krolzig HM, Marcellino M (2002) A Markov-switching vector equilibrium correction model of the UK labour market. Empir Econ 27:233–254CrossRefGoogle Scholar
  13. 13.
    Lahiani A, Scaillet O (2009) Testing for threshold effect in ARFIMA models: application to US unemployment rate data. Int J Forecast 25(2):418–428 Google Scholar
  14. 14.
    Lan KC, Ho KS, Luk RWP, Yeung DS (2005) FNDS: a dialogue-based system for accessing digested financial news. J Syst Softw 78(2):180–193Google Scholar
  15. 15.
    Milas C, Rothman P (2008) Out-of-sample forecasting of unemployment rates with pooled STVECM forecasts. Int J Forecast 24(1):101–121Google Scholar
  16. 16.
    Proietti T (2003) Forecasting the US unemployment rate. Comput Stat Data Anal 42(3):451–476MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Schanne N, Wapler R (2010) Regional unemployment forecasts with spatial interdependencies. Int J Forecast 26(4):908–926CrossRefGoogle Scholar
  18. 18.
    Schumaker RP, Chen H (2009) A quantitative stock prediction system based financial news. Inform Process Manag 45(5):571–583CrossRefGoogle Scholar
  19. 19.
    Suhoy T (2009) Query indices and a 2008 downturn: Israeli data. Bank of Israel discussion paperGoogle Scholar
  20. 20.
    Tashman LJ (2000) Out-of-sample tests of forecast accuracy: an analysis review. Int J Forecast 16(4):437–450CrossRefGoogle Scholar
  21. 21.
    Terui N, van Dijk HK (2002) Combined forecasts from linear and nonlinear time series models. Int J Forecast 18(3):421–438CrossRefGoogle Scholar
  22. 22.
    Vijverberg CPC (2009) A time deformation model and its time-varying autocorrelation: an application to US unemployment data. Int J Forecast 25(1):128–145Google Scholar
  23. 23.
    Xu W, Han ZW, Ma J (2010) A neural network based approach to detect influenza epidemics using search engine query data. In: Proceeding of the ninth international conference on machine learning and cybernetics, Qingdao, China, pp 1408–1412Google Scholar
  24. 24.
    Xu W, Zheng T, Li Z (2011) A neural network based forecasting method for the unemployment rate prediction using the search engine query data. In: Proceeding of the eighth IEEE international conference on e-business engineering, Beijing, China, pp 9–15Google Scholar
  25. 25.
    Xu W, Li Z, Chen Q (2012) Forecasting the unemployment rate by neural networks using search engine query data. In: Proceeding of the 45th Hawaii international conference on system sciences, Hawaii, US, pp 3591–3599Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  1. 1.School of Information, Renmin University of ChinaBeijingChina
  2. 2.School of Economics and Management, Tsinghua UniversityBeijingChina

Personalised recommendations