Research in Higher Education

, Volume 58, Issue 4, pp 449–467 | Cite as

The College Completion Puzzle: A Hidden Markov Model Approach

  • Dirk Witteveen
  • Paul Attewell


Higher education in America is characterized by widespread access to college but low rates of completion, especially among undergraduates at less selective institutions. We analyze longitudinal transcript data to examine processes leading to graduation, using Hidden Markov modeling. We identify several latent states that are associated with patterns of course taking, and show that a trained Hidden Markov model can predict graduation or nongraduation based on only a few semesters of transcript data. We compare this approach to more conventional methods and conclude that certain college-specific processes, associated with graduation, should be analyzed in addition to socio-economic factors. The results from the Hidden Markov trajectories indicate that both graduating and nongraduating students take the more difficult mathematical and technical courses at an equal rate. However, undergraduates who complete their bachelor’s degree within 6 years are more likely to alternate between these semesters with a heavy course load and the less course-intense semesters. The course-taking patterns found among college students also indicate that nongraduates withdraw more often from coursework than average, yet when graduates withdraw, they tend do so in exactly those semesters of the college career in which more difficult courses are taken. These findings, as well as the sequence methodology itself, emphasize the importance of careful course selection and counseling early on in student’s college career.


College completion COURSE-TAKING Academic momentum Quantitative methodology Longitudinal analysis 



We thank the National Science Foundation (Grant DRL 1243785) and the Bill & Melinda Gates Foundation (Grant OPP 1012951) for their support for this study. We also thank Andrew Rosenberg (Queens College, CUNY) for his extensive technical support and his feedback on programming Hidden Markov models.


  1. Achieve Inc. (2004). Ready or not: Creating a high school diploma that counts. An American diploma project. Accessed 24 November 2015.
  2. Adelman, C. (1999). Answers in the toolbox: Academic intensity, attendance patterns, and bachelor’s degree attainment. Washington, DC: U.S. Department of Education.Google Scholar
  3. Adelman, C. (2004). Undergraduate grades: A complex story (Chapter 6). In C. Adelman (Ed.), Principal indicators of student academic histories in postsecondary education (pp. 1972–2000). Washington, DC: US Department of Education.Google Scholar
  4. Adelman, C. (2006). The toolbox revisited: paths to degree completion from high school through college. Washington, DC: US Department of Education.Google Scholar
  5. Adelman, C. (2009). The spaces between numbers: getting international data on higher education straight. Washington, DC: Institute for Higher Education Policy.Google Scholar
  6. Aud, S., Wikinson-Flicker, S., Kristapovich P., Rathbun A., Wang X., & Zhang, J. (2013). The condition of education 2013. National Center for Education Statistics (NCES) 2013-037. Washington, DC: US Department of Education.Google Scholar
  7. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.CrossRefGoogle Scholar
  8. Bean, J. P., & Metzner, B. S. (1985). A conceptual model of nontraditional undergraduate student attrition. Review of Educational Research, 55(4), 485–540.CrossRefGoogle Scholar
  9. Bowen, W. G., Chingos, M. M., & McPherson, M. S. (2009). Crossing the finish line: Completing college at America’s Public Universities. Princeton, NJ: Princeton University Press.Google Scholar
  10. Bozick, R. (2007). The role of students’ economic resources, employment, and living arrangements. Sociology of Education, 80(3), 261–285.CrossRefGoogle Scholar
  11. Chen, X. (2005). First generation students in postsecondary education: a look at their college transcripts. National Center for Education Statistics (NCES) 2005-171. Washington, DC: US Department of Education.Google Scholar
  12. Complete College America. (2011). Time is the enemy. Washington, DC: Complete College America. Accessed 24 Nov 2015.
  13. Duda, R. O., Hart, P. E., & Stork, D. G. (1973). Pattern classification and scene analysis (1st ed.). New York: John Wiley.Google Scholar
  14. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (2nd ed.). New York: John Wiley.Google Scholar
  15. Elzinga, C. H., Hoogendoorn, A. W., & Dijkstra, W. (2007). Linked Markov Sources modeling outcome-dependent social processes. Sociological Methods and Research, 36(1), 26–47.CrossRefGoogle Scholar
  16. Heck, R. H., Price, C. L., & Thomas, S. L. (2004). Tracks as emergent structures: A network analysis of student differentiation in a high school. American Journal of Education, 110(4), 321–353.CrossRefGoogle Scholar
  17. Hess, F., Schneider, M., Carey, K., & Kelly, A. P. (2009). Diplomas and dropouts: Which colleges actually graduate their students (and which don’t). Washington, DC: American Enterprise Institute.Google Scholar
  18. Horn, L. & Kojaku, L.K. (2001). High school curriculum and the persistence path through college. National Center for Education Statistics (NCES) 2001-163. Washington, DC: US Department of Education.Google Scholar
  19. Ip, E. H., Snow Jones, A., Heckert, A., Zhang, Q., & Gondolf, E. D. (2010). Latent Markov model for analyzing temporal configuration for violence profiles and trajectories in a sample of batterers. Sociological Methods AND Research, 39(2), 222–255.CrossRefGoogle Scholar
  20. Langeheine, R., & Van de Pol, F. (2002). Latent Markov chains. In J. Hagenaars & A. McCutcheon (Eds.), Applied latent class analysis (pp. 304–334). Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  21. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: John Wiley.CrossRefGoogle Scholar
  22. Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. (PhD dissertation, Department of Computer Science). Berkeley, CA: University of California.Google Scholar
  23. Murphy, K. P. (2005). Hidden Markov model (HMM) Toolbox for Matlab (Original Toolbox of 1998). Accessed 1 Aug 2016.
  24. National Center for Education Statistics. (2011). 2004/2009 Beginning postsecondary students longitudinal study restricted use data file [in Stata]. Washington, DC: US Department of Education, NCES 2011-244 [distributor].Google Scholar
  25. Perna, L. W. (2010). Toward a more complete understanding of financial aid in promoting college enrollment. In J. Smart, Higher education: handbook of theory and research (Vol. 25) (pp. 129–180). New York, NY: Springer.Google Scholar
  26. Perna, L. W., & Li, C. (2006). College affordability: Implications for college opportunity. Journal of Student Financial Aid, 36(1), 7–24.Google Scholar
  27. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.Google Scholar
  28. Radford, A. W., Berkner L., Wheeless, S.C., & Shepard, B. (2011). Persistence and attainment of 20032004 beginning postsecondary students: After six years. National Center for Education Statistics (NCES) 2011-151. Washington, DC: US Department of Education.Google Scholar
  29. Schneider, M., & Yin, M. L. (2012). Completion matters: The high cost of low community college graduation rates. Washington, DC: American Enterprise Institute for Public Policy Research.Google Scholar
  30. Schuh, J. (2005). Finances and retention: Trends and potential implications. In A. Seidman (Ed.), College student retention: Formula for student success (pp. 277–294). Westport, CT: American Council on Education and Praeger.Google Scholar
  31. Scott, S. L. (2002). Bayesian methods for hidden Markov models. Journal of the American Statistical Association, 97(457), 337–351.Google Scholar
  32. St. John, E. P., Cabrera A. F., Nora, A., & Asker, E.H. (2000). Economic influences on persistence reconsidered. In J.M. Braxton (Ed.), Reworking the student departure puzzle (pp. 29–47). Nashville, TN: Vanderbilt University Press.Google Scholar
  33. Stamp, M. (2015). A revealing introduction to hidden Markov models (Course). Accessed 24 Nov 2015.
  34. Tinto, V. (1993). Leaving college: Rethinking the causes of student attrition (2nd ed.). Chicago, IL: University of Chicago Press.Google Scholar
  35. Vermunt, J. K., Tran, B., & Magidson, J. (2008). Latent class models in longitudinal research. In S. Menard (Ed.), Handbook of longitudinal research: design, measurement, and analysis (pp. 373–385). Burlington, MA: Elsevier.Google Scholar
  36. Wine, J., Janson, N., & Wheeless, S. (2011). 2004/09 Beginning postsecondary students longitudinal study (BPS:04/09) full-scale methodology report. National Center for Education Statistics (NCES) 2012-246. Washington, DC: US Department of Education.Google Scholar
  37. Wyner, J. S., Bridgeland, J.M., Diiulio, J. (2007). Achievement trap: How America is failing millions of high-achieving students from lower-income families. In V.A. Lansdowne (Ed.), Jack Kent Cooke Foundation. Accessed 24 Nov 2015.

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.The Graduate CenterThe City University of New YorkNew YorkUSA

Personalised recommendations