Quality & Quantity

, Volume 50, Issue 4, pp 1695–1713 | Cite as

How to measure the quality of financial tweets



Twitter text data may be very useful to evaluate from a different perspective financial tangibles, such as share prices, as well as intangible assets, such as company reputation. While twitter data are becoming widely available to researchers, methods aimed at selecting reliable twitter data are, to our knowledge, not yet available. To overcome this problem, and allow to employ twitter data for descriptive and predictive purposes, in this contribution we propose an effective statistical method that formalises and extends a quality index employed in the context of the evaluation of academic research, the h index, renamed T index. Our proposal will be tested on a list of twitterers described by the Financial Times as “the top financial tweeters to follow”, for the year 2013. Using our methodology we rank these twitterers and provide confidence intervals to decide whether they are significantly different. Moreover through a sentiment analysis, we employ the twitters content to estimate graphical models useful in the context of financial systemic risk. To this aim we focus on the Italian bank system and we show how listed banks are connected on the basis of tweets data.


Big data h index Monte Carlo methods Systemic risk modeling 


  1. Beirlant, J., Einmahl, J.H.J.: Asymptotics for the Hirsch Index. Scand. J. Stat. 37, 355–364 (2010)CrossRefGoogle Scholar
  2. Ball, P.: Index aims for fair ranking of scientists. Nature 436, 900 (2005)CrossRefGoogle Scholar
  3. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)CrossRefGoogle Scholar
  4. Bordino, I., Battiston, S., Caldarelli, G., Cristelli, M., Ukkonen, A., Weber, I.: Web search queries can predict stock market volumes. PloS one 7(7), e40014 (2012)CrossRefGoogle Scholar
  5. Burrell, Q.L.: Hirsch’s h-index: a stochastic model. J. Informetr. 1, 16–25 (2007)CrossRefGoogle Scholar
  6. Cerchiello, P., Giudici, P.: On the distribution of functionals of discrete ordinal variables. Statist. Probab. Lett. 82, 2044–2049 (2012)Google Scholar
  7. Choi, H., Varian, H.: Predicting the present with google trends. Econ. Rec. 88(s1), 2–9 (2012)CrossRefGoogle Scholar
  8. Cruz, M.G.: Modeling, measuring and hedging operational risk. Wiley, Chichester (2002)Google Scholar
  9. Conway, R.W., Maxwell, W.L.: A queuing model with state dependent service rates. J. Ind. Eng. 12, 132–136 (1962)Google Scholar
  10. Dalla Valle, L., Giudici, P.: A Bayesian approach to estimate the marginal loss distributions in operational risk management. Comput. Stat. Data Anal. 52, 3107–3127 (2008)CrossRefGoogle Scholar
  11. Evert S.: A simple LNRE model for random character sequences. In: Proceedings of the 7 Journes Internationales d’Analyse Statistique des Donnes Textuelles (JADT 2004), Louvain-la-Neuve pp. 411–422 (2004)Google Scholar
  12. Glanzel, W.: On the h-index—a mathematical approach to a new measure of publication activity and citation impact. Scientometrics 67, 315–321 (2006)CrossRefGoogle Scholar
  13. Hirsch, J.E.: An index to quantify an individual’s scientific research output. In Proceedings of the National Academy of Sciences of the United States of America pp. 16569–1657 (2005)Google Scholar
  14. Kennet, R., Raanan, Y.: Operational Risk Management: a pracical approach to intelligent data analysis. Wiley, Chichester (2011)Google Scholar
  15. Lauritzen, S.L.: Graphical models. Oxford University Press (1996)Google Scholar
  16. Iglesias, J.E., Pecharroman, C.: Scaling the h-index for different scientific ISI fields. Scientometrics 73, 303–320 (2007)CrossRefGoogle Scholar
  17. Izsak, F.: Maximum likelihood estimation for constrained parameters of multinomial distributions —application to Zipf–Mandelbrot models. Comput. Stat. Data Anal. 51, 1575–1583 (2006)CrossRefGoogle Scholar
  18. King, D., Ramirez-Cano, D., Greaves, F., Vlaev, I., Beales, S., Darzi, A.: Twitter and the health reforms in the English national health service. Health policy 110, 2–3 (2013)CrossRefGoogle Scholar
  19. Mandelbrot, B.: On the theory of word frequencies and on related Markovian models of discourse. In: Jakobson, R. (ed.) Structure of Language and its Mathematical Aspects, pp. 190–219. American Mathematical Society, Providence (1962)Google Scholar
  20. Pratelli, L., Baccini, A., Barabesi, L., Marcheselli, M.: Statistical Analysis of the Hirsch Index. Scand. J. Stat. 39, 681–694 (2012)CrossRefGoogle Scholar
  21. Preis, T., Reith, D., Stanley, H.E.: Complex dynamics of our economic life on different scales: insights from search engine query data. Philos. Trans. R. Soc. A 368(1933), 5707–5719 (2010)CrossRefGoogle Scholar
  22. Sellers, K.F., Borle, S., Shmueli, G.: The COM Poisson model for count data: a survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 28(2), 104–116 (2012)CrossRefGoogle Scholar
  23. Todeschini, R.: The j-index: a new bibliometric index and multivariate comparisons between other common indices. Scientometrics 87, 621–639 (2011)CrossRefGoogle Scholar
  24. Whittaker, J.: Graphical models in applied multivariate analysis. Wiley, Chichester (1990)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Department of Economics and ManagementUniversity of PaviaPaviaItaly

Personalised recommendations