Employee Turnover Prediction with Machine Learning: A Reliable Approach

  • Yue ZhaoEmail author
  • Maciej K. Hryniewicki
  • Francesca Cheng
  • Boyang Fu
  • Xiaoyu Zhu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 869)


Supervised machine learning methods are described, demonstrated and assessed for the prediction of employee turnover within an organization. In this study, numerical experiments for real and simulated human resources datasets representing organizations of small-, medium- and large-sized employee populations are performed using (1) a decision tree method; (2) a random forest method; (3) a gradient boosting trees method; (4) an extreme gradient boosting method; (5) a logistic regression method; (6) support vector machines; (7) neural networks; (8) linear discriminant analysis; (9) a Naïve Bayes method; and (10) a K-nearest neighbor method. Through a robust and comprehensive evaluation process, the performance of each of these supervised machine learning methods for predicting employee turnover is analyzed and established using statistical methods. Additionally, reliable guidelines are provided on the selection, use and interpretation of these methods for the analysis of human resources datasets of varying size and complexity.


Machine learning Artificial intelligence Data mining Data analytics Data visualization Feature selection Model stability Employee turnover Human resources management 


  1. 1.
    Alao, D., Adeyemo, A.B.: Analyzing employee attrition using decision tree algorithms. Comput. Inf. Syst. Dev. Inform. Allied Res. J. 4 (2013)Google Scholar
  2. 2.
    Al-Radaideh, Q.A., Al Nagi, E.: Using data mining techniques to build a classification model for predicting employees performance. Int. J. Adv. Comput. Sci. Appl. 3, 144–151 (2012)Google Scholar
  3. 3.
    Chang, H.Y.: Employee turnover: a novel prediction solution with effective feature selection. WSEAS Trans. Inf. Sci. Appl. 6, 417–426 (2009)Google Scholar
  4. 4.
    Chien, C.F., Chen, L.F.: Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Expert Syst. Appl. 34, 280–290 (2008)CrossRefGoogle Scholar
  5. 5.
    Li, Y.M., Lai, C.Y., Kao, C.P.: Building a qualitative recruitment system via SVM with MCDM approach. Appl. Intell. 35, 75–88 (2011)CrossRefGoogle Scholar
  6. 6.
    Nagadevara, V., Srinivasan, V., Valk, R.: Establishing a link between employee turnover and withdrawal behaviours: application of data mining techniques. Res. Pract. Hum. Resour. Manag. 16, 81–97 (2008)Google Scholar
  7. 7.
    Quinn, A., Rycraft, J.R., Schoech, D.: Building a model to predict caseworker and supervisor turnover using a neural network and logistic regression. J. Technol. Hum. Serv. 19, 65–85 (2002)CrossRefGoogle Scholar
  8. 8.
    Sexton, R.S., McMurtrey, S., Michalopoulos, J.O., Smith, A.M.: Employee turnover: a neural network solution. Comput. Oper. Res. 32, 2635–2651 (2005)CrossRefGoogle Scholar
  9. 9.
    Suceendran, K., Saravanan, R., Divya Ananthram, D.S., Kumar, R.K., Sarukesi, K.: Applying classifier algorithms to organizational memory to build an attrition predictor modelGoogle Scholar
  10. 10.
    Tzeng, H.M., Hsieh, J.G., Lin, Y.L.: Predicting nurses’ intention to quit with a support vector machine: a new approach to set up an early warning mechanism in human resource management. CIN: Comput. Inf. Nurs. 22, 232–242 (2004)Google Scholar
  11. 11.
    Valle, M.A., Varas, S., Ruz, G.A.: Job performance prediction in a call center using a naive Bayes classifier. Expert Syst. Appl. 39, 9939–9945 (2012)CrossRefGoogle Scholar
  12. 12.
    Haq, N.F., Onik, A.R., Shah, F.M.: An ensemble framework of anomaly detection using hybridized feature selection approach (HFSA). In: SAI Intelligent Systems Conference (IntelliSys), pp. 989–995, IEEE (2015)Google Scholar
  13. 13.
    Punnoose, R., Ajit, P.: Prediction of employee turnover in organizations using machine learning algorithms. Int. J. Adv. Res. Artif. Intell. 5, 22–26 (2016)CrossRefGoogle Scholar
  14. 14.
    Sikaroudi, E., Mohammad, A., Ghousi, R., Sikaroudi, A.: A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing). J. Ind. Syst. Eng. 8, 106–121 (2015)Google Scholar
  15. 15.
    McKinley Stacker, I.V.: IBM waston analytics. Sample data: HR employee attrition and performance [Data file]. Retrieved from (2015)
  16. 16.
    Shahshahani, B.M., Landgrebe, D.A.: The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32, 1087–1095 (1994)CrossRefGoogle Scholar
  17. 17.
    Géron, A.: Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media (2017)Google Scholar
  18. 18.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7, 179–188 (1936)Google Scholar
  20. 20.
    Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press, Cambridge (2012)zbMATHGoogle Scholar
  21. 21.
    Seddik, A.F., Shawky, D.M.: Logistic regression model for breast cancer automatic diagnosis. In: SAI Intelligent Systems Conference (IntelliSys), IEEE, pp. 150–154 (2015)Google Scholar
  22. 22.
    Bakry, U., Ayeldeen, H., Ayeldeen, G., Shaker, O.: Classification of Liver Fibrosis patients by multi-dimensional analysis and SVM classifier: an Egyptian case study. In: Proceedings of SAI Intelligent Systems Conference, pp. 1085–1095. Springer, Cham (2016)Google Scholar
  23. 23.
    Mathias, H.D., Ragusa, V.R.: Micro aerial vehicle path planning and flight with a multi-objective genetic algorithm. In Proceedings of SAI Intelligent Systems Conference, pp. 107–124. Springer, Cham (2016)Google Scholar
  24. 24.
    Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)CrossRefGoogle Scholar
  25. 25.
    Durant, K.T., Smith, M.D.: Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In: International Workshop on Knowledge Discovery on the Web, pp. 187–206. Springer, Berlin, Heidelberg (2006)Google Scholar
  26. 26.
    Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, ACM (2016)Google Scholar
  27. 27.
    Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  29. 29.
    Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)Google Scholar
  31. 31.
    Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data, and a proposal. J. Am. Stat. Assoc. 58, 415–434 (1963)CrossRefGoogle Scholar
  32. 32.
    Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE. T. Neural. Networ. 12, 181–201 (2001)CrossRefGoogle Scholar
  33. 33.
    Zhang, H.: The optimality of naive Bayes. AA, 1, 3Google Scholar
  34. 34.
    Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001)zbMATHGoogle Scholar
  35. 35.
    Jantan, H., Hamdan, A.R., Othman, Z.A.: Human talent prediction in HRM using C4. 5 classification algorithm. Int. J. Comput. Sci. Eng. 2, 2526–2534 (2010)Google Scholar
  36. 36.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  37. 37.
    Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc. B. Met., 215–242 (1958)Google Scholar
  38. 38.
    Hong, W.C., Pai, P.F., Huang, Y.Y., Yang, S.L.: Application of support vector machines in predicting employee turnover based on job performance. Adv. Nat. Comput., 419 (2005)Google Scholar
  39. 39.
    DMLC: Introduction to boosted trees. Retrieved from (2015)
  40. 40.
    Somers, M.J.: Application of two neural network paradigms to the study of voluntary employee turnover. J. Appl. Psychol. 84, 177 (1999)CrossRefGoogle Scholar
  41. 41.
    McKnight, P.E., Najab, J.: Mann Whitney U Test. In: Corsini Encyclopedia of Psychology (2010)Google Scholar
  42. 42.
    Dos Santos, E.M., Oliveira, L.S., Sabourin, R., Maupin, P.: Overfitting in the selection of classifier ensembles: a comparative study between pso and ga. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, ACM, pp. 1423–1424 (2008)Google Scholar
  43. 43.
    Raschka, S.: Python Machine Learning. Packt Publishing Ltd, Birmingham (2015)Google Scholar
  44. 44.
    Efron, B.S., Hastie, T.: Computer Age Statistical Inference. Cambridge University Press, Cambridge (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yue Zhao
    • 1
    Email author
  • Maciej K. Hryniewicki
    • 2
  • Francesca Cheng
    • 2
  • Boyang Fu
    • 3
  • Xiaoyu Zhu
    • 4
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.PricewaterhouseCoopersTorontoCanada
  3. 3.University of MünsterMünsterGermany
  4. 4.Fifth Third BankCincinnatiUSA

Personalised recommendations