Relationship Between the Popularity of Key Words in the Google Browser and the Evolution of Worldwide Financial Indices

  • R. Ortells
  • J. J. Egozcue
  • M. I. Ortego
  • A. Garola
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 187)


The purpose of this contribution is to evaluate whether there is enough statistical basis to establish a relationship between the popularity of certain terms in the Google browser and the evolution of several worldwide economic indices the subsequent week. A linear model trying to predict the evolution of 19 financial indices from all over the world with the information of how many times a selected group of 200 key words are looked up online the previous week is proposed. The linear model that is proposed takes a compositional approach due to two reasons. First, the information contained in the values of the financial indices has a compositional nature. The strongest proof supporting this idea is that in case all values for the indices on a certain week were multiplied by a factor, the information would remain unchanged. In fact, the value for a certain index is irrelevant by itself, since it is its evolution with respect to the rest of indices that indicates whether it is performing well. Therefore, this idea suggests that the numerical values of the 19 indices for a certain week can be understood as a vector of the simplex and be analyzed accordingly. Second, the explanatory variable has to be understood as a vector of the simplex as well, for a similar reason as before. For instance, let us imagine that the number of times the words are looked up online in a certain week was multiplied by a factor. Indeed, the information contained in this vector would be exactly the same. Moreover, it seems intuitive as well how the absolute value for the number of searches is irrelevant by itself, since we will be interested in the relationships amongst variables. For the reasons we have just set, a compositional approach seems necessary in order to address the problem successfully, since both the explanatory and predicted variables present a compositional nature. In other words, despite not adding up to a constant, the components of the vectors of both the explanatory and predicted variables seem to be closely related in terms of giving information of a part of a whole, so tackling the problem through a compositional perspective seems appropriate. The analysis consists of an exploratory analysis of both response (indices) and explanatory (searches) variables and a compositional linear multiple regression between both sets of variables.


Financial markets Google searches Stock market indices Compositional data Multiple linear regression 


  1. 1.
    Aitchison, J.: The statistical analysis of compositional data (with discussion). J. R. Stat. Soc. Ser. B (Stat. Methodol.) 44(2), 139–177 (1982)Google Scholar
  2. 2.
    Aitchison, J.: The Statistical Analysis of Compositional Data (Reprinted in 2003 by The Blackburn Press), p. 416. Chapman & Hall Ltd., London (UK) (1986)Google Scholar
  3. 3.
    Aitchison, J., Greenacre, M.: Biplots of compositional data. J. R. Stat. Soc. Ser. C (Appl. Stat.) 51(4), 375–392 (2002)Google Scholar
  4. 4.
    Anderson, T.W., Darling, D.A.: Asymptotic theory of certain goodness-of-fit criteria based on stochastic processes. Ann. Math. Stat. 23, 193–212 (1952)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Arrow, K.J.: Aspects of the theory of risk bearing. The theory of risk aversion. Helsinki: Yrjo Jahnssonin Saatio, Reprinted in: Essays in the theory of risk bearing, Markham Publ. Co., Chicago, 1971 (1965)Google Scholar
  6. 6.
    Babu, G.J., Rao, C.R.: Goodness-of-fit tests when parameters are estimated. Technometrics (Am. Stat. Assoc.) 66, 63–74 (2004)MathSciNetMATHGoogle Scholar
  7. 7.
    Bordino, I., Battiston, S., Caldarelli, G., et al.: Web search queries can predict stock market volumes. PloS one 7(7):e40, 014 (2012)Google Scholar
  8. 8.
    Challet, D., Marsili, M., Zhang, Y.C., et al.: Minority Games: Interacting Agents in Financial Markets. OUP Catalogue (2013)Google Scholar
  9. 9.
    Choi, H., Varian, H.: Predicting the present with Google trends. Econ. Rec. 88, 2–9 (2012)Google Scholar
  10. 10.
    Cook, R.D.: Detection of influential observations in linear regression. Technometrics (Am. Stat. Assoc.) 19, 15–18 (1977)MATHGoogle Scholar
  11. 11.
    Durbin, J., Watson, G.S.: Testing for serial correlation in least squares regression. I. Biometrika 37, 409–428 (1950)MathSciNetMATHGoogle Scholar
  12. 12.
    Durbin, J., Watson, G.S.: Testing for serial correlation in least squares regression. II. Biometrika 38, 159–179 (1951)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Efron, B., Hastie, T., Johnstone, I., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Egozcue, J.J., Pawlowsky-Glahn, V.: Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 799–832 (2005)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Egozcue, J.J., Pawlowsky-Glahn, V.: Compositional data and their analysis: an introduction. Geol. Soc., Lond., Spec. Publ. 264, 1–10 (2006)Google Scholar
  16. 16.
    Egozcue, J.J., Pawlowsky-Glahn, V.: Simplicial geometry for compositional data. Geol. Soc., Lond., Spec. Publ. 264, 145–159 (2006)Google Scholar
  17. 17.
    Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., et al.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Ginsberg, J., et al.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)CrossRefGoogle Scholar
  19. 19.
    Kim, C., Storer, B.E.: Reference values for Cook’s distance. Commun. Stat. Simul. Comput. 25, 691–708 (1996)Google Scholar
  20. 20.
    Koohang, A., Harman, K., Britz, J.: Knowledge Management: Theoretical Foundation (Chapter 6: Network Analysis and Crowds of People as Sources of New Organisational Knowledge). Informing Science Press, Santa Rosa, CA, US (2008)Google Scholar
  21. 21.
    Martín-Fernández, J.A., Hron, K., Templ, M., et al.: Model-based replacement of rounded zeros in compositional data: classical and robust approach. Comput. Stat. Data Anal. 56, 2688–2704 (2012)CrossRefMATHGoogle Scholar
  22. 22.
    Massey, F.J.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)CrossRefMATHGoogle Scholar
  23. 23.
    Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R.: Modeling and analysis of compositional data. Wiley (2015)Google Scholar
  24. 24.
    Pratt, J.W.: Risk aversion in the small and in the large. Econometrica 32(1–2), 122–136 (1964)CrossRefMATHGoogle Scholar
  25. 25.
    Preis, T., Moat, H.S., Stanley, H.E.: Quantifying trading behavior in financial markets using Google trends. Sci. Rep. 3, 1684 (2013)Google Scholar
  26. 26.
    Royston, P.: Algorithm AS 181: the W test for normality. Appl. Stat. 31, 176–180 (1982)CrossRefGoogle Scholar
  27. 27.
    Surowiecki, J.: The Wisdom of Crowds: Why the many are smarter than the few and how collective wisdom shapes business. Economies. Societies and Nation, Little, Brown (2004)Google Scholar
  28. 28.
    Surowiecki, J.: The Wisdom of Crowds. Anchor Books (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • R. Ortells
    • 1
  • J. J. Egozcue
    • 2
  • M. I. Ortego
    • 3
  • A. Garola
    • 4
  1. 1.The London School of Economics and Political Science (LSE)LondonUK
  2. 2.Universitat Politècnica de Catalunya (UPC)BarcelonaSpain
  3. 3.Universitat Politècnica de Catalunya (UPC)BarcelonaSpain
  4. 4.Universitat Politècnica de Catalunya (UPC)BarcelonaSpain

Personalised recommendations