Skip to main content
Log in

Performance prediction in major league baseball by long short-term memory networks

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Player performance prediction is a serious problem in every sport since it brings valuable future information for managers to make important decisions. In baseball industries, there already existed variable prediction systems and many types of researches that attempt to provide accurate predictions and help domain users. However, it is a lack of studies about the predicting method or systems based on deep learning. Deep learning models had proven to be the greatest solutions in different fields nowadays, so we believe they could be tried and applied to the prediction problem in baseball. Hence, the predicting abilities of deep learning models are set to be our research problem in this paper. As a beginning, we select numbers of home runs as the target because it is one of the most critical indexes to understand the power and the talent of baseball hitters. Moreover, we use the sequential model Long Short-Term Memory as our main method to solve the home run prediction problem in Major League Baseball. We compare models’ ability with several machine learning models and a widely used baseball projection system, sZymborski Projection System. Our results show that Long Short-Term Memory has better performance than others and has the ability to make more exact predictions. We conclude that Long Short-Term Memory is a feasible way for performance prediction problems in baseball and could bring valuable information to fit users’ needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Al-Asadi, M.A.M.: Decision support system for a football team management by using machine learning techniques. Xinyang Teach. Coll. 10(2), 1–15 (2018)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. 1409.0473 (2014)

  3. Baumer, B., Zimbalist, A.: The sabermetric revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press, Philadelphia (2014)

    Book  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  5. Brown, L.D.: In-season prediction of batting averages: A field test of empirical bayes and bayes methodologies. The Ann. Appl. Stat. 2(1), 113–152 (2008)

    Article  MATH  Google Scholar 

  6. Cao, L.: Domain-driven data mining: Challenges and prospects. IEEE Trans. Knowl. Data Eng. 22(6), 755–769 (2010)

    Article  Google Scholar 

  7. Cao, L., Zhang, C., Yang, Q., Bell, D., Vlachos, M., Taneri, B., Keogh, E., Philip, S.Y., Zhong, N., Ashrafi, M.Z., et al.: Domain-driven, actionable knowledge discovery. IEEE Intell. Syst. 22(4), 78–88 (2007)

    Article  Google Scholar 

  8. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 (2014)

  9. Cross, J., Davidson, D., Rosenbloom, P.: Steamer projections. http://www.steamerprojections.com/ (2009), Accessed 30-May-2021

  10. Goldschmied, N., Harris, M., Vira, D., Kowalczyk, J.: Drive theory and home run milestones in baseball: an historical analysis. Percept. Motor Skills 118(1), 1–11 (2014)

    Article  Google Scholar 

  11. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, Springer, Berlin, (2012) https://doi.org/10.1007/978-3-642-24797-2, https://cds.cern.ch/record/1503877

  12. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Article  Google Scholar 

  13. Hamilton, M., Hoang, P., Layne, L., Murray, J., Padget, D., Stafford, C., Tran, H.: Applying machine learning techniques to baseball pitch prediction. In: ICPRAM, pp 520–527 (2014)

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.90, (2016)

  15. Hearst, M.A.: Support vector machines. IEEE Intell. Syst. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428

    Article  Google Scholar 

  16. Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., Meyer, T.: Machine learning in men’s professional football: Current applications and future directions for improving attacking play. Int. J. Sports Sci. Coaching 14(6), 798–817 (2019)

    Article  Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  18. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. 1502.03167 (2015)

  19. Jain, A.K., Mao, J., Mohiuddin, K.: Artificial neural networks: A tutorial. IEEE Comput. 29, 31–44 (1996)

    Article  Google Scholar 

  20. Jiang, W., Zhang, C.H.: Empirical bayes in-season prediction of baseball batting averages. In: Borrowing Strength: Theory Powering Applications-A Festschrift for Lawrence D, pp. 263–273. Brown, Institute of Mathematical Statistics (2010)

  21. Karnuta, J.M., Luu, B.C., Haeberle, H.S., Saluan, P.M., Frangiamore, S.J., Stearns, K.L., Farrow, L.D., Nwachukwu, B.U., Verma, N.N., Makhni, E.C., et al.: Machine learning outperforms regression analysis to predict next-season major league baseball player injuries: epidemiology and validation of 13,982 player-years from performance and injury profile trends, 2000–2017. Orthopaedic J. Sports Med. 8(11), 2325967120963046 (2020)

    Article  Google Scholar 

  22. Kingma, DP., Ba, J.: Adam: A method for stochastic optimization. 1412.6980 (2014)

  23. Koseler, K., Stephan, M.: Machine learning applications in baseball: A systematic literature review. Appl. Artifi. Intell. 31(9–10), 745–763 (2017)

    Article  Google Scholar 

  24. Kumari, M.: Data driven data mining to domain driven data mining. Global Journal of Computer Science and Technology (2012)

  25. Li, C., Zhan, G., Li, Z.: News text classification based on improved bi-lstm-cnn. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), IEEE, pp 890–893 (2018)

  26. LLC, SR.: Baseball-reference.com - major league statistics and information. https://www.baseball-reference.com/ (2008), Accessed 30-May-2021

  27. Lyle, A.: Baseball prediction using ensemble learning. PhD thesis, University of Georgia (2007)

  28. Nair, V., Hinton, GE.: Rectified linear units improve restricted boltzmann machines. In: Fürnkranz J, Joachims T (eds) ICML, Omnipress, pp 807–814, http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10 (2010)

  29. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) https://doi.org/10.18653/v1/n18-1202(2018)

  30. Pinheiro, P., Cavique, L.: A bi-objective procedure to deliver actionable knowledge in sport services. Expert Syst. 37(6), e12617 (2020)

    Article  Google Scholar 

  31. Qing, X., Niu, Y.: Hourly day-ahead solar irradiance prediction using weather forecasts by lstm. Energy 148, 461–468 (2018)

    Article  Google Scholar 

  32. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  33. Raza, S., Ding, C.: (2021) News recommender system: a review of recent progress, challenges, and opportunities. Artificial Intelligence Review pp 1–52

  34. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91, (2016)

  35. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning Representations by Back-propagating Errors. Nature 323(6088), 533–536 (1986) https://doi.org/10.1038/323533a0, http://www.nature.com/articles/323533a0

  36. Sawicki, G.S., Hubbard, M., Stronge, W.J.: How to hit home runs: Optimum baseball bat swing parameters for maximum range trajectories. Am. J. Phys. 71(11), 1152–1162 (2003)

    Article  Google Scholar 

  37. Saymborski, D.: Zips. https://blogs.fangraphs.com/the-2021-zips-projections-an-introduction/, Accessed 30-May-2021 (2004)

  38. Schumaker, R.P., Solieman, O.K., Chen, H.: Sports data mining: The field, pp. 1–13. Springer, US, Boston, MA (2010). https://doi.org/10.1007/978-1-4419-6730-5_1

    Book  Google Scholar 

  39. Schumaker, R.P., Solieman, O.K., Chen, H.: Sports knowledge management and data mining. Annu. Rev. Inf. Sci. Technol. 44(1), 115–157 (2010)

    Article  Google Scholar 

  40. Seber, G.A., Lee, A.J.: Linear regression analysis, vol. 329. John Wiley & Sons, Hoboken, New Jersey (2012)

    MATH  Google Scholar 

  41. Silver, N.: Introducing pecota. Baseball Prospect. 2003, 507–514 (2003)

    Google Scholar 

  42. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MATH  Google Scholar 

  43. Sun, HC., Lin, TY., Tsai, YL.: Lstm-based approaches for the performance prediction in mlb. In: International Workshop on Domain-Driven Data Mining, https://datascience.utk.edu/content/dddm/ (2021)

  44. Sutskever, I., Vinyals, O., Le, QV.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112 (2014)

  45. Tango, T.: Marcel. http://tangotiger.net/marcel/ (2004), Accessed 30-May-2021

  46. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hsuan-Cheng Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, HC., Lin, TY. & Tsai, YL. Performance prediction in major league baseball by long short-term memory networks. Int J Data Sci Anal 15, 93–104 (2023). https://doi.org/10.1007/s41060-022-00313-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-022-00313-4

Keywords

Navigation