Mixed Clustering Methods to Forecast Baseball Trends

  • Héctor D. MenéndezEmail author
  • Miguel Vázquez
  • David Camacho
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 570)


Sport betting has become one of the most profitable business around the world. This business generates millions of dollars every year. One of the most influenced games is Baseball. Baseball has suffered an important change after the introduction of statistical methods to tune up the team strategy. This effect, called Moneyball, started in 2002 when the team Oaklans Atletics began to choose players according to their statistics. After this successful approach, several teams decided to continue with this strategy, generating strong statistical teams. The statistical information about players and matches have acquired highly importance, creating different datasets, such as Retrosheet which collects detailed information about players, teams and matches since 1956 until today. This work pretends to generate a forecasting model for Baseball focused on the result prediction of new matches using statistical previous information. We combine time-series and clustering algorithms to generate a model which learns about the teams and matches evolution and tries to predict the final results. Even whether this model is not complete accurated, it becomes a good starting point for future models.


Clustering Time Series Forecast Baseball 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bittner, E., NuBbaumer, A., Janke, W., Weigel, M.: Self-affirmation model for football goal distributions. EPL (Europhysics Letters) 78(5), 58002 (2007), CrossRefGoogle Scholar
  2. 2.
    Cox, A., Stasko, J.: Sportsvis: Discovering meaning in sports statistics through information visualization. In: Compendium of Symposium on Information Visualization, pp. 114–115. Citeseer (2006)Google Scholar
  3. 3.
    Everitt, B.: Cluster analysis. Reviews of current research. Heinemann Educational [for] the Social Science Research Council (1974),
  4. 4.
    Hakes, J.K., Sauer, R.D.: An economic evaluation of the moneyball hypothesis. The Journal of Economic Perspectives 20(3), 173–185 (2006)CrossRefGoogle Scholar
  5. 5.
    Jiménez-Díaz, G., Menéndez, H.D., Camacho, D., González-Calero, P.A.: Predicting performance in team games. In: INSTICC - Institude for systems and Technologies of Information, Control and Communication (ed.) Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, ICAART 2011, vol. 1, pp. 401–406 (2011),
  6. 6.
    Marchi, M., Albert, J.: Analyzing Baseball Data with R. CRC Press, Taylor and Francis Group (2013)Google Scholar
  7. 7.
    Vaz de Melo, P.O., Almeida, V.A., Loureiro, A.A.: Can complex network metrics predict the behavior of nba teams? In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 695–703. ACM, New York (2008), doi:
  8. 8.
    Menendez, H., Bello-Orgaz, G., Camacho, D.: Extracting behavioural models from 2010 fifa world cup. Journal of Systems Science and Complexity 26(1), 43–61 (2013), CrossRefGoogle Scholar
  9. 9.
    Onody, R.N., de Castro, P.A.: Complex network study of brazilian soccer players. Phys. Rev. E 70, 037103 (2004),, doi:10.1103/PhysRevE.70.037103CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Héctor D. Menéndez
    • 1
    Email author
  • Miguel Vázquez
    • 1
  • David Camacho
    • 1
  1. 1.Departamento de Ingeniería Informítica, Escuela Politécnica SuperiorUniversidad Autónoma de MadridMadridSpain

Personalised recommendations