Abstract
Player performance prediction is a serious problem in every sport since it brings valuable future information for managers to make important decisions. In baseball industries, there already existed variable prediction systems and many types of researches that attempt to provide accurate predictions and help domain users. However, it is a lack of studies about the predicting method or systems based on deep learning. Deep learning models had proven to be the greatest solutions in different fields nowadays, so we believe they could be tried and applied to the prediction problem in baseball. Hence, the predicting abilities of deep learning models are set to be our research problem in this paper. As a beginning, we select numbers of home runs as the target because it is one of the most critical indexes to understand the power and the talent of baseball hitters. Moreover, we use the sequential model Long Short-Term Memory as our main method to solve the home run prediction problem in Major League Baseball. We compare models’ ability with several machine learning models and a widely used baseball projection system, sZymborski Projection System. Our results show that Long Short-Term Memory has better performance than others and has the ability to make more exact predictions. We conclude that Long Short-Term Memory is a feasible way for performance prediction problems in baseball and could bring valuable information to fit users’ needs.
Similar content being viewed by others
References
Al-Asadi, M.A.M.: Decision support system for a football team management by using machine learning techniques. Xinyang Teach. Coll. 10(2), 1–15 (2018)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. 1409.0473 (2014)
Baumer, B., Zimbalist, A.: The sabermetric revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press, Philadelphia (2014)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Brown, L.D.: In-season prediction of batting averages: A field test of empirical bayes and bayes methodologies. The Ann. Appl. Stat. 2(1), 113–152 (2008)
Cao, L.: Domain-driven data mining: Challenges and prospects. IEEE Trans. Knowl. Data Eng. 22(6), 755–769 (2010)
Cao, L., Zhang, C., Yang, Q., Bell, D., Vlachos, M., Taneri, B., Keogh, E., Philip, S.Y., Zhong, N., Ashrafi, M.Z., et al.: Domain-driven, actionable knowledge discovery. IEEE Intell. Syst. 22(4), 78–88 (2007)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 (2014)
Cross, J., Davidson, D., Rosenbloom, P.: Steamer projections. http://www.steamerprojections.com/ (2009), Accessed 30-May-2021
Goldschmied, N., Harris, M., Vira, D., Kowalczyk, J.: Drive theory and home run milestones in baseball: an historical analysis. Percept. Motor Skills 118(1), 1–11 (2014)
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, Springer, Berlin, (2012) https://doi.org/10.1007/978-3-642-24797-2, https://cds.cern.ch/record/1503877
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Hamilton, M., Hoang, P., Layne, L., Murray, J., Padget, D., Stafford, C., Tran, H.: Applying machine learning techniques to baseball pitch prediction. In: ICPRAM, pp 520–527 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.90, (2016)
Hearst, M.A.: Support vector machines. IEEE Intell. Syst. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428
Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., Meyer, T.: Machine learning in men’s professional football: Current applications and future directions for improving attacking play. Int. J. Sports Sci. Coaching 14(6), 798–817 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. 1502.03167 (2015)
Jain, A.K., Mao, J., Mohiuddin, K.: Artificial neural networks: A tutorial. IEEE Comput. 29, 31–44 (1996)
Jiang, W., Zhang, C.H.: Empirical bayes in-season prediction of baseball batting averages. In: Borrowing Strength: Theory Powering Applications-A Festschrift for Lawrence D, pp. 263–273. Brown, Institute of Mathematical Statistics (2010)
Karnuta, J.M., Luu, B.C., Haeberle, H.S., Saluan, P.M., Frangiamore, S.J., Stearns, K.L., Farrow, L.D., Nwachukwu, B.U., Verma, N.N., Makhni, E.C., et al.: Machine learning outperforms regression analysis to predict next-season major league baseball player injuries: epidemiology and validation of 13,982 player-years from performance and injury profile trends, 2000–2017. Orthopaedic J. Sports Med. 8(11), 2325967120963046 (2020)
Kingma, DP., Ba, J.: Adam: A method for stochastic optimization. 1412.6980 (2014)
Koseler, K., Stephan, M.: Machine learning applications in baseball: A systematic literature review. Appl. Artifi. Intell. 31(9–10), 745–763 (2017)
Kumari, M.: Data driven data mining to domain driven data mining. Global Journal of Computer Science and Technology (2012)
Li, C., Zhan, G., Li, Z.: News text classification based on improved bi-lstm-cnn. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), IEEE, pp 890–893 (2018)
LLC, SR.: Baseball-reference.com - major league statistics and information. https://www.baseball-reference.com/ (2008), Accessed 30-May-2021
Lyle, A.: Baseball prediction using ensemble learning. PhD thesis, University of Georgia (2007)
Nair, V., Hinton, GE.: Rectified linear units improve restricted boltzmann machines. In: Fürnkranz J, Joachims T (eds) ICML, Omnipress, pp 807–814, http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10 (2010)
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) https://doi.org/10.18653/v1/n18-1202(2018)
Pinheiro, P., Cavique, L.: A bi-objective procedure to deliver actionable knowledge in sport services. Expert Syst. 37(6), e12617 (2020)
Qing, X., Niu, Y.: Hourly day-ahead solar irradiance prediction using weather forecasts by lstm. Energy 148, 461–468 (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Raza, S., Ding, C.: (2021) News recommender system: a review of recent progress, challenges, and opportunities. Artificial Intelligence Review pp 1–52
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.91, (2016)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning Representations by Back-propagating Errors. Nature 323(6088), 533–536 (1986) https://doi.org/10.1038/323533a0, http://www.nature.com/articles/323533a0
Sawicki, G.S., Hubbard, M., Stronge, W.J.: How to hit home runs: Optimum baseball bat swing parameters for maximum range trajectories. Am. J. Phys. 71(11), 1152–1162 (2003)
Saymborski, D.: Zips. https://blogs.fangraphs.com/the-2021-zips-projections-an-introduction/, Accessed 30-May-2021 (2004)
Schumaker, R.P., Solieman, O.K., Chen, H.: Sports data mining: The field, pp. 1–13. Springer, US, Boston, MA (2010). https://doi.org/10.1007/978-1-4419-6730-5_1
Schumaker, R.P., Solieman, O.K., Chen, H.: Sports knowledge management and data mining. Annu. Rev. Inf. Sci. Technol. 44(1), 115–157 (2010)
Seber, G.A., Lee, A.J.: Linear regression analysis, vol. 329. John Wiley & Sons, Hoboken, New Jersey (2012)
Silver, N.: Introducing pecota. Baseball Prospect. 2003, 507–514 (2003)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sun, HC., Lin, TY., Tsai, YL.: Lstm-based approaches for the performance prediction in mlb. In: International Workshop on Domain-Driven Data Mining, https://datascience.utk.edu/content/dddm/ (2021)
Sutskever, I., Vinyals, O., Le, QV.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112 (2014)
Tango, T.: Marcel. http://tangotiger.net/marcel/ (2004), Accessed 30-May-2021
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, HC., Lin, TY. & Tsai, YL. Performance prediction in major league baseball by long short-term memory networks. Int J Data Sci Anal 15, 93–104 (2023). https://doi.org/10.1007/s41060-022-00313-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-022-00313-4