Skip to main content
Log in

Predicting first-time-in-college students’ degree completion outcomes

  • Published:
Higher Education Aims and scope Submit manuscript

Abstract

About one-third of college students drop out before finishing their degree. The majority of those remaining will take longer than 4 years to complete their degree at “4-year” institutions. This problem emphasizes the need to identify students who may benefit from support to encourage timely graduation. Here we empirically develop machine learning algorithms, specifically Random Forest, to accurately predict if and when first-time-in-college undergraduates will graduate based on admissions, academic, and financial aid records two to six semesters after matriculation. Credit hours earned, college and high school grade point averages, estimated family (financial) contribution, and enrollment and grades in required gateway courses within a student’s major were all important predictors of graduation outcome. We predicted students’ graduation outcomes with an overall accuracy of 79%. Applying the machine learning algorithms to currently enrolled students allowed identification of those who could benefit from added support. Identified students included many who may be missed by established university protocols, such as students with high financial need who are making adequate but not strong degree progress.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig 1.
Fig 2
Fig 3
Fig 4

Similar content being viewed by others

Availability of data and material

The data from this research are unavailable for sharing.

Code availability

The code from this research is available upon request to the corresponding author.

References

  • Adelman, C. (1999). Answers in the tool box. Academic intensity, attendance patterns, and bachelor’s degree attainment. (Report No PLLI-1999-8021). U.S. Department of Education, National Institute on Postsecondary Education, Libraries, and Lifelong Learning. Retrieved September 10, 2021, from https://files.eric.ed.gov/fulltext/ED431363.pdf

  • Adelman, C. (2006). The toolbox revisited: Paths to degree completion from high school through college. Washington D.C.: U.S. Department of Education, Office of Vocational and Adult Education. Retrieved September 10, 2021, from https://www2.ed.gov/rschstat/research/pubs/toolboxrevisit/index.html

  • Aiken, J. M., De Bin, R., Hjorth-Jensen, M., & Caballero, M. D. (2020). Predicting time to graduation at a large enrollment American university. PLoS One, 15(11), e0242334. https://doi.org/10.1371/journal.pone.0242334

    Article  Google Scholar 

  • Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: A corrected feature importance measure. Bioinformatics, 26(10), 1340–1347.

    Article  Google Scholar 

  • Armstrong, S., Dearden, L., Kobayashi, M., & Nagase, N. (2019). Student loans in Japan: Current problems and possible solutions. Economics of Education Review, 71, 120–134. https://doi.org/10.1016/j.econedurev.2018.10.012

    Article  Google Scholar 

  • Astin, A. W. (1984). Student involvement: A developmental theory for higher education. Journal of College Student Development, 25(4), 518–529.

    Google Scholar 

  • Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2017). Predicting student dropout in higher education. ArXiv:1606.06364 [stat.ML]. Retrieved September 10, 2021, from http://arxiv.org/abs/1606.06364

  • Ayodele, T. O. (2010). Types of machine learning algorithms. In T. O. Ayodele (Ed.), New Advances in Machine Learning (pp. 19–48). Intech Open. https://doi.org/10.5772/225

    Chapter  Google Scholar 

  • Baker, S. R. (2004). Intrinsic, extrinsic, and a motivational orientations: Their role in university adjustment, stress, well-being, and subsequent academic performance. Current Psychology, 23(3), 189–202.

    Article  Google Scholar 

  • Baker, R., & Inventado, P. S. (2014). Educational data mining and learning analytics. InIn Learning Analytics (pp. 61–75). Springer.

    Chapter  Google Scholar 

  • Baker, A. R., & Montalto, C. P. (2019). Student loan debt and financial stress: Implications for academic performance. Journal of College Student Development, 60(1), 115–120.

    Article  Google Scholar 

  • Barshay, J. (2017, November 6). Federal data shows 3.9 million students dropped out of college with debt in 2015 and 2016. Hechinger Report. Retrieved September 10, 2021, from https://hechingerreport.org/federal-data-shows-3-9-million-students-dropped-college-debt-2015-2016/

  • Bean, J. P. (1980). Dropouts and turnover: The synthesis and test of a causal model of student attrition. Research in Higher Education, 12(2), 155–187.

    Article  Google Scholar 

  • Bernacki, M. L., Chavez, M. M., & Uesbeck, P. M. (2020). Predicting achievement and providing support before STEM majors begin to fail. Computers & Education, 158, 103999. https://doi.org/10.1016/j.compedu.2020.103999

    Article  Google Scholar 

  • Bound, J., Lovenheim, M. F., & Turner, S. (2012). Increasing time to baccalaureate degree in the United States. Education Finance and Policy, 7(4), 375–424.

    Article  Google Scholar 

  • Braunstein, A., McGrath, M., & Pescatrice, D. (2000). Measuring the impact of financial factors on college persistence. Journal of College Student Retention: Research, Theory & Practice, 2(3), 191–203.

    Article  Google Scholar 

  • Braxton, J. M., & Lien, L. (2016). The viability of academic integration as a central construct in Tinto’s interactionalist theory of college student departure. In J. Braxton (Ed.), Reworking the student departure puzzle (pp. 11–28). Vanderbilt University Press.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning; Dordrecht, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  • Campagni, R., Merlini, D., & Verri, M. C. (2014). Finding regularities in courses evaluation with K-means clustering (2nd ed.pp. 26–33). CSEDU 2014 - Proceedings of the 6th International Conference on Computer Supported Education. https://doi.org/10.5220/0004796000260033

    Book  Google Scholar 

  • Caputi, V., & Garrido, A. (2015). Student-oriented planning of e-learning contents for Moodle. Journal of Network and Computer Applications, 53, 115–127. https://doi.org/10.1016/j.jnca.2015.04.001

    Article  Google Scholar 

  • Conklin, K. A. (1997). Course attrition: A 5-Yr perspective on why students drop classes. Community College Journal of Research and Practice, 21, 753–759.

    Article  Google Scholar 

  • Couronné, R., Probst, P., & Boulesteix, A. L. (2018). Random forest versus logistic regression: A large-scale benchmark experiment. BMC Bioinformatics, 19(1), 1–14.

    Article  Google Scholar 

  • Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003

    Article  Google Scholar 

  • Dorodchi, M., Al-Hossami, E., Benedict, A., & Demeter, E. (2019). Using synthetic data generators to promote open science in higher education learning analytics (pp. 4672–4675). 2019 IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/BigData47090.2019.9006475

    Book  Google Scholar 

  • Dorodchi, M., Benedict, A., Desai, D., Mahzoon, M. J., MacNeil, S., & Dehbozorgi, N. (2018). Design and implementation of an activity-based introductory computer science course (CS1) with periodic reflections validated by learning analytics (pp. 1–8). 2018 IEEE Frontiers in Education Conference (FIE). https://doi.org/10.1109/FIE.2018.8659196

    Book  Google Scholar 

  • Dorodchi, M., Mahzoon, M. J., Maher, M. L., & Benedict, A. (2020). A learning analytics approach to assessing student risk in active learning. In J. A. Keith-Le & M. P. Morgan (Eds.), Faculty Experiences in Active Learning (pp. 86–100). UNC Press.

    Google Scholar 

  • EAB. (2018). Benchmarking the Murky Middle. American Association of State Colleges and Universities. Retrieved September 10, 2021, from https://www.aascu.org/corporatepartnerships/EAB/MurkyMiddleReport.pdf

  • Federal Student Aid. (n.d.). The U.S. Department of Education offers low-interest loans to eligible students to help cover the cost of college or career school. U.S. Department of Education. Retrieved September 10, 2021, from https://studentaid.gov/understand-aid/types/loans/subsidized-unsubsidized

  • Fielding, A. H. (2006). Cluster and classification techniques for the biosciences. Cambridge University Press.

    Book  Google Scholar 

  • Fishman, T., Ludgate, A., Tutak, J., & Singh, P. (2017). Success by design improving outcomes in American higher education. Deloitte Center for Higher Education Excellence. Retrieved September 10, 2021, from https://www2.deloitte.com/us/en/insights/industry/public-sector/improving-student-success-in-higher-education.html

  • Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  • Gershenfeld, S., Ward Hood, D., & Zhan, M. (2016). The role of first-semester GPA in predicting graduation rates of underrepresented students. Journal of College Student Retention: Research, Theory & Practice, 17(4), 469–488.

    Article  Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques (3rd ed.). Elsevier.

    Google Scholar 

  • Harackiewicz, J. M., Barron, K. E., Tauer, J. M., & Elliot, A. J. (2002). Predicting success in college: A longitudinal study of achievement goals and ability measures as predictors of interest and performance from freshman year through graduation. Journal of Educational Psychology, 94(3), 562.

    Article  Google Scholar 

  • Hardman, J., Paucar-Caceres, A., & Fielding, A. (2013). Predicting students’ progression in higher education by using the random forest algorithm. Systems Research and Behavioral Science, 30(2), 194–203. https://doi.org/10.1002/sres.2130

    Article  Google Scholar 

  • Hayes, S. K. (2010). Student employment and the economic cost of delayed college graduation. Journal of Business & Leadership: Research, Practice and Teaching, 6(1), 129–140.

    Google Scholar 

  • Hellings, J., & Haelermans, C. (2020). The effect of providing learning analytics on student behaviour and performance in programming: A randomised controlled experiment. Higher Education, 1–18. https://doi.org/10.1007/s10734-020-00560-z

  • Hildt, E., Laas, K., & Sziron, M. (2020). Editorial: Shaping ethical futures in brain-based and artificial intelligence research. Science and Engineering Ethics, 26, 2371–2379. https://doi.org/10.1007/s11948-020-00235-z

    Article  Google Scholar 

  • Horn, C., Santelices, M. V., & Avendaño, X. C. (2014). Modeling the impacts of national and institutional financial aid opportunities on persistence at an elite Chilean university. Higher Education, 68(3), 471–488. https://doi.org/10.1007/s10734-014-9723-3

    Article  Google Scholar 

  • Huang, L., Roche, L. R., Kennedy, E., & Brocato, M. B. (2017). Using an integrated persistence model to predict college graduation. International Journal of Higher Education, 6(3), 40–56.

    Article  Google Scholar 

  • Hutt, S., Gardener, M., Kamentz, D., Duckworth, A. L., & D’Mello, S. K. (2018). Prospectively predicting 4-year college graduation from student applications. InProceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK ‘18) (pp. 280–289). Association for Computing Machinery. https://doi.org/10.1145/3170358.3170395

    Chapter  Google Scholar 

  • Johnson, N. (2012). The institutional costs of student attrition (Delta Cost Project). American Institutes for Research. Retrieved September 10, 2021, from https://deltacostproject.org/sites/default/files/products/Delta-Cost-Attrition-Research-Paper.pdf

  • Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47. https://doi.org/10.1080/21568235.2020.1718520

    Article  Google Scholar 

  • Kirasich, K., Smith, T., & Sadler, B. (2018). Random forest vs logistic regression: Binary classification for heterogeneous datasets. SMU Data Science Review, 1(3), 9.

    Google Scholar 

  • Komarraju, M., Musulkin, S., & Bhattacharya, G. (2010). Role of student–faculty interactions in developing college students’ academic self-concept, motivation, and achievement. Journal of College Student Development, 51(3), 332–342.

    Article  Google Scholar 

  • Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x

    Article  Google Scholar 

  • Letkiewicz, J., Lim, H., Heckman, S., Bartholomae, S., Fox, J. J., & Montalto, C. P. (2014). The path to graduation: Factors predicting on-time graduation rates. Journal of College Student Retention: Research, Theory & Practice, 16(3), 351–371. https://doi.org/10.2190/CS.16.3.c

    Article  Google Scholar 

  • Lo Piano, S. (2020). Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward. Humanities & Social Sciences Communications, 7(9), 1–7.

    Google Scholar 

  • López Turley, R. N., & Wodtke, G. (2010). College residence and academic performance: Who benefits from living on campus? Urban Education, 45(4), 506–532.

    Article  Google Scholar 

  • Lykourentzou, I., Giannoukos, I., Mpardis, G., Nikolopoulos, V., & Loumos, V. (2009). Early and dynamic student achievement prediction in e-learning courses using neural networks. Journal of the American Society for Information Science and Technology, 60(2), 372–380. https://doi.org/10.1002/asi.20970

    Article  Google Scholar 

  • Mabel, Z., & Britton, T. A. (2018). Leaving late: Understanding the extent and predictors of college late departure. Social Science Research, 69, 34–51. https://doi.org/10.1016/j.ssresearch.2017.10.001

    Article  Google Scholar 

  • Marcus, J. (2016, February 17). Colleges confront the simple math that keeps students from graduating on time. The Hechinger Report. Retrieved September 10, 2021, from https://hechingerreport.org/colleges-confront-the-simple-math-that-keeps-students-from-graduating-on-time/

  • Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135

    Article  Google Scholar 

  • Maryland Higher Education Commission. (2016). Report on unmet need and student success at Maryland public four-year institutions. Retrieved September 10, 2021, from https://mhec.maryland.gov/publications/Documents/Research/AnnualReports/Unmet%20Need%202016%20Final%20Report.pdf

  • Mendez, G., Buskirk, T. D., Lohr, S., & Haag, S. (2008). Factors associated with persistence in science and engineering majors: An exploratory study using classification trees and random forests. Journal of Engineering Education, 97(1), 57–70. https://doi.org/10.1002/j.2168-9830.2008.tb00954.x

    Article  Google Scholar 

  • Murtaugh, P. A., Burns, L. D., & Schuster, J. (1999). Predicting the retention of university students. Research in Higher Education, 40(3), 355–371. https://doi.org/10.1023/A:1018755201899

    Article  Google Scholar 

  • Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80, 875–894. https://doi.org/10.1007/s10734-020-00520-7

  • National Center for Education Statistics. (2019). Undergraduate retention and graduation rates (NCES 2019-144; The Condition of Education 2019). U.S. Department of Education, Institute of Education Sciences. Retrieved January 15, 2021, from https://nces.ed.gov/programs/coe/pdf/Indicator_CTR/coe_ctr_2019_05.pdf

  • Nandal, R., Dhamija, P., & Sehrawat, H. (2017). A review paper on prediction analysis: Predicting student result on the basis of past result. International Journal of Engineering and Technology, 9(2), 1204–1208. https://doi.org/10.21817/ijet/2017/v9i2/170902226

    Article  Google Scholar 

  • Nguyen, A., Gardner, L., & Sheridan, D. (2020). Data analytics in higher education: An integrated view. Journal of Information Systems Education, 31(1), 61–71.

    Google Scholar 

  • Nur, N., Park, N., Dorodchi, M., Dou, W., Mahzoon, M. J., Niu, X., & Maher, M. L. (2019). Student network analysis: A novel way to predict delayed graduation in higher education. InInternational Conference on Artificial Intelligence in Education (pp. 370–382). Springer.

    Google Scholar 

  • O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.

    Google Scholar 

  • Ojha, T., Heileman, G. L., Martinez-Ramon, M., & Slim, A. (2017). Prediction of graduation delay based on student performance. InInternational Joint Conference on Neural Networks (IJCNN) (pp. 3454–3460). IEEE.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., & Cournapeau, D. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  • Perna, L. W. (1998). The contribution of financial aid to undergraduate persistence. Journal of Student Financial Aid, 28(3), 25–40.

    Article  Google Scholar 

  • Raisman, N. (2013, February). The cost of college attrition at four-year colleges & universities. Educational Policy Institute. Retrieved September 10, 2021 from https://files.eric.ed.gov/fulltext/ED562625.pdf

  • Rajni, J., & Malaya, D. B. (2015). Predictive analytics in a higher education context. IT Professional, 17(4), 24–33. https://doi.org/10.1109/MITP.2015.68

    Article  Google Scholar 

  • Ray, S., & Saeed, M. (2018). Applications of educational data mining and learning analytics tools in handling big data in higher education. In M. M. Alani, H. Tawfik, M. Saeed, & O. Anya (Eds.), Applications of Big Data Analytics (pp. 135–160). Springer International Publishing. https://doi.org/10.1007/978-3-319-76472-6_7

    Chapter  Google Scholar 

  • Reed, J. G. (1981). Dropping a college course: Factors influencing students’ withdrawal decisions. Journal of Educational Psychology, 73(3), 376.

    Article  Google Scholar 

  • Reuterberg, S.-E., & Svensson, A. (1983). The importance of financial aid: The case of higher education in Sweden. Higher Education, 12(1), 89–100. https://doi.org/10.1007/BF00140274

    Article  Google Scholar 

  • Schneider, M. (2010). Finishing the first lap: The cost of first-year student attrition in America’s four-year colleges and universities. American Institutes for Research. Retrieved September 10, 2021, from https://www.air.org/sites/default/files/AIR_Schneider_Finishing_the_First_Lap_Oct101.pdf

  • Smith, V. C., Lange, A., & Huston, D. R. (2012). Predictive modeling to forecast student outcomes and drive effective interventions in online community college courses. Online Learning, 16(3), 51–61. https://doi.org/10.24059/olj.v16i3.275

    Article  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B: Methodological, 36(2), 111–133.

    Google Scholar 

  • Swail, W. S. (2004). The art of student retention: A handbook for practitioners and administrators. Texas State Higher Education Coordinating Board, Educational Policy Institute. Retreived September 10, 2021, from https://files.eric.ed.gov/fulltext/ED485498.pdf

  • Tampakas, V., Livieris, I. E., Pintelas, E., Karacapilidis, N., & Pintelas, P. (2018). Prediction of students’ graduation time using a two-level classification algorithm. InInternational Conference on Technology and Innovation in Learning, Teaching and Education (pp. 553–565). Springer.

    Google Scholar 

  • Thayer, P. B. (2000). Retention of students from first generation and low-income backgrounds. U.S. Department of Education, National TRIO Clearinghouse. Retrieved September 10, 2021, from https://files.eric.ed.gov/fulltext/ED446633.pdf

  • Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89–125.

    Article  Google Scholar 

  • Tinto, V. (2004, July). Student retention and graduation: Facing the truth, living with the consequences. Washington D.C.: Pell Institute for the Study of Opportunity in Higher Education. http://www.pellinstitute.org/publications-Student_Retention_and_Graduation_July_2004.shtml

  • U.S. Department of Education. (2015, July 27). Fact sheet: Focusing higher education on student success. [Press Release]. Retrieved January 9, 2021, from https://www.ed.gov/news/press-releases/fact-sheet-focusing-higher-education-student-success

  • Vossensteyn, H., Kottmann, A., Jongbloed, B., Kaiser, F., Cremonini, L., Stensaker, B., Hovdhaugen, E., & Wollscheid, S. (2015). Dropout and completion in higher education in Europe: Main report. Center for Higher Education Policy Studies, Nordic Institute for Studies in Innovation, Research and Education. https://op.europa.eu/s/uOj1

  • Wade, N. L. (2019). Measuring, manipulating, and predicting student success: A 10-year assessment of Carnegie R1 doctoral universities between 2004 and 2013. Journal of College Student Retention: Research, Theory & Practice, 21(1), 119–141.

    Article  Google Scholar 

  • Wei, C. C., Horn, L., & Weko, T. (2009). A profile of successful Pell Grant recipients: Time to degree and early graduate school enrollment (NCES 2009-156). National Center for Education Statistics, Institute of Education Sciences, US Department of Education. Washington, D.C.

  • Witteveen, D., & Attewell, P. (2019). Delayed time-to-degree and post-college earnings. Research in Higher Education, 62, 230–257.

    Article  Google Scholar 

Download references

Acknowledgements

A.B., E.A., and M.D. were supported by NSF award 1742461 and M.D. by NSF award 1820862. We would like to thank our partners in the Office of Institutional Research (especially Derrick Isler) and the Financial Aid Office (Wally Anderson and Bruce Blackmon) for providing us with data and support for interpreting shared information. We thank our academic advisors for their expert guidance and feedback on our predictive models.

Funding

This research was supported by the UNC System Office’s Student Success Innovation Lab under Grant GR-000007639.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Funding acquisition was obtained by Elise Demeter, Lisa Walker, and John Smail. Data collection and analysis were performed by Erfan Al-Hossami, Mohsen Dorodchi, and Elise Demeter. The first draft of the manuscript was written by Elise Demeter, Erfan Al-Hossami, and Aileen Benedict, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elise Demeter.

Ethics declarations

Ethics approval

This research was performed according to procedures approved by the UNC Charlotte Institutional Review Board.

Consent to participate

A waiver of consent was granted for this research by the UNC Charlotte Institutional Review Board as this study involved no direct contact with participants.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 18 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demeter, E., Dorodchi, M., Al-Hossami, E. et al. Predicting first-time-in-college students’ degree completion outcomes. High Educ 84, 589–609 (2022). https://doi.org/10.1007/s10734-021-00790-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10734-021-00790-9

Keywords

Navigation