Early Identification of At-Risk Students Using Iterative Logistic Regression

  • Li ZhangEmail author
  • Huzefa Rangwala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10947)


Higher education institutions are faced with the challenge of low student retention rates and high number of dropouts. 41% of college students in United States do not finish their undergraduate degree program in six years, and 60% of them drop out in their first two years of study. It is crucial for universities and colleges to develop data-driven artificial intelligence systems to identify students at-risk as early as possible and provide timely guidance and support for them. However, most of the current classification approaches on early dropout prediction are unable to utilize all the information from historical data from previous cohorts to predict dropouts of current students in a few semesters. In this paper, we develop an Iterative Logistic Regression (ILR) method to address the challenge of early prediction. The proposed framework is able to make full use of historical student record and effectively predict students at-risk of failing or dropping out in future semesters. Empirical results evaluated on a real-wold dataset show significant improvement with respect to the performance metrics in comparison to other existing methods. The application enabled by this proposed method provide additional support to students who are at risk of dropping out of college.


Iterative Logistic Regression Educational data mining Early dropout prediction 



This research work was supported by the National Science Foundation Grant 1447489. The experiments were running on ARGO, a research computing cluster provided by the office of research computing at George Mason University, VA (URL:


  1. 1.
    Agresti, A., Finlay, B.: Statistical models for the social sciences. Revascularization procedures after coronary angiography. J. Am. Med. Assoc. 269, 2642–46 (1997)Google Scholar
  2. 2.
    Ameri, S., Fard, M.J., Chinnam, R.B., Reddy, C.K.: Survival analysis based framework for early prediction of student dropouts. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 903–912. ACM (2016)Google Scholar
  3. 3.
    Astin, A.W.: Predicting Academic Performance in College: Selectivity Data for 2300 American Colleges (1971)Google Scholar
  4. 4.
    Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. arXiv preprint arXiv:1201.3417 (2012)
  5. 5.
    Cabrera, N.L., Miner, D.D., Milem, J.F.: Can a summer bridge program impact first-year persistence and performance?: A case study of the new start summer program. Res. High. Educ. 54(5), 481–498 (2013)CrossRefGoogle Scholar
  6. 6.
    Campbell, J.P., DeBlois, P.B., Oblinger, D.G.: Academic analytics: a new tool for a new era. EDUCAUSE Rev. 42(4), 40 (2007)Google Scholar
  7. 7.
    Chen, Y., Johri, A., Rangwala, H.: Running out of stem: a comparative study across stem majors of college students at-risk of dropping out early. In: Proceedings of the 8th International Conference on Learning Analytics and Knowledge, LAK 2018, pp. 270–279. ACM, New York (2018).
  8. 8.
    Druzdzel, M., Glymour, C.: What do college ranking data tell us about student retention? (1994)Google Scholar
  9. 9.
    Dynarski, M., Clarke, L., Cobb, B., Finn, J., Rumberger, R., Smink, J.: Dropout prevention. IES practice guide. NCEE 2008-4025. National Center for Education Evaluation and Regional Assistance (2008)Google Scholar
  10. 10.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)CrossRefGoogle Scholar
  11. 11.
    Glynn, J.G., Sauer, P.L., Miller, T.E.: A logistic regression model for the enhancement of student retention: the identification of at-risk freshmen. Int. Bus. Econ. Res. J. (IBER) 1(8), 79–86 (2011)CrossRefGoogle Scholar
  12. 12.
    Golding, P., Donaldson, O.: Predicting academic performance. In: 36th Annual Frontiers in Education Conference, pp. 21–26. IEEE (2006)Google Scholar
  13. 13.
    Hagedorn, L.S.: How to define retention. In: College Student Retention Formula for Student Success, pp. 90–105 (2005)Google Scholar
  14. 14.
    Horn, L., Carroll, C.D.: Stopouts or stayouts. In: Undergraduates Who Leave College in Their First Year (1999-087) (1998)Google Scholar
  15. 15.
    Kovacic, Z.: Predicting student success by mining enrolment data (2012)Google Scholar
  16. 16.
    Lonn, S., Aguilar, S.J., Teasley, S.D.: Investigating student motivation in the context of a learning analytics intervention during a summer bridge program. Comput. Hum. Behav. 47, 90–97 (2015)CrossRefGoogle Scholar
  17. 17.
    McFarland, J., Hussar, B., de Brey, C., Snyder, T., Wang, X., Wilkinson-Flicker, S., Gebrekristos, S., Zhang, J., Rathbun, A., Barmer, A., et al.: The condition of education 2017. NCES 2017-144. National Center for Education Statistics (2017)Google Scholar
  18. 18.
    Nandeshwar, A., Menzies, T., Nelson, A.: Learning patterns of university student retention. Expert Syst. Appl. 38(12), 14984–14996 (2011)CrossRefGoogle Scholar
  19. 19.
    Pittman, K.: Comparison of data mining techniques used to predict student retention. Nova Southeastern University (2008)Google Scholar
  20. 20.
    Schneider, M.: Finishing the first lap: the cost of first year student attrition in America’s four year colleges and universities. American Institutes for Research (2010)Google Scholar
  21. 21.
    Schneider, M., Yin, L.: The hidden costs of community colleges. American Institutes for Research (2011)Google Scholar
  22. 22.
    Seidman, A.: College Student Retention: Formula for Student Success. Greenwood Publishing Group, Santa Barbara (2005)Google Scholar
  23. 23.
    Stage, F.K.: University attrition: LISREL with logistic regression for the persistence criterion. Res. High. Educ. 29(4), 343–357 (1988)CrossRefGoogle Scholar
  24. 24.
    Tanner, T., Toivonen, H.: Predicting and preventing student failure-using the k-nearest neighbour method to predict student performance in an online course environment. Int. J. Learn. Technol. 5(4), 356–377 (2010)CrossRefGoogle Scholar
  25. 25.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.), 267–288 (1996)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceGeorge Mason UniversityFairfaxUSA

Personalised recommendations