Abstract
Higher education in America is characterized by widespread access to college but low rates of completion, especially among undergraduates at less selective institutions. We analyze longitudinal transcript data to examine processes leading to graduation, using Hidden Markov modeling. We identify several latent states that are associated with patterns of course taking, and show that a trained Hidden Markov model can predict graduation or nongraduation based on only a few semesters of transcript data. We compare this approach to more conventional methods and conclude that certain college-specific processes, associated with graduation, should be analyzed in addition to socio-economic factors. The results from the Hidden Markov trajectories indicate that both graduating and nongraduating students take the more difficult mathematical and technical courses at an equal rate. However, undergraduates who complete their bachelor’s degree within 6 years are more likely to alternate between these semesters with a heavy course load and the less course-intense semesters. The course-taking patterns found among college students also indicate that nongraduates withdraw more often from coursework than average, yet when graduates withdraw, they tend do so in exactly those semesters of the college career in which more difficult courses are taken. These findings, as well as the sequence methodology itself, emphasize the importance of careful course selection and counseling early on in student’s college career.
Similar content being viewed by others
Notes
Panel weights are not conventionally used in the construction of the HMM itself (the hidden states). An HMM looks at variation over time within individuals’ sequences rather than representativeness of samples to a larger population.
Around 8190 of these sampled students were 18 or 19 years old when they entered a four-year college for the first time (790 students were 20 years or older). At the urging of one reviewer, we reran the Hidden Markov model omitting students who were 20 years or older. These reworked analyses yielded similar results in terms of state description, transitions probabilities, and prediction accuracy, and are available upon request.
References
Achieve Inc. (2004). Ready or not: Creating a high school diploma that counts. An American diploma project. http://www.achieve.org/files/ADPreport.pdf. Accessed 24 November 2015.
Adelman, C. (1999). Answers in the toolbox: Academic intensity, attendance patterns, and bachelor’s degree attainment. Washington, DC: U.S. Department of Education.
Adelman, C. (2004). Undergraduate grades: A complex story (Chapter 6). In C. Adelman (Ed.), Principal indicators of student academic histories in postsecondary education (pp. 1972–2000). Washington, DC: US Department of Education.
Adelman, C. (2006). The toolbox revisited: paths to degree completion from high school through college. Washington, DC: US Department of Education.
Adelman, C. (2009). The spaces between numbers: getting international data on higher education straight. Washington, DC: Institute for Higher Education Policy.
Aud, S., Wikinson-Flicker, S., Kristapovich P., Rathbun A., Wang X., & Zhang, J. (2013). The condition of education 2013. National Center for Education Statistics (NCES) 2013-037. Washington, DC: US Department of Education.
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.
Bean, J. P., & Metzner, B. S. (1985). A conceptual model of nontraditional undergraduate student attrition. Review of Educational Research, 55(4), 485–540.
Bowen, W. G., Chingos, M. M., & McPherson, M. S. (2009). Crossing the finish line: Completing college at America’s Public Universities. Princeton, NJ: Princeton University Press.
Bozick, R. (2007). The role of students’ economic resources, employment, and living arrangements. Sociology of Education, 80(3), 261–285.
Chen, X. (2005). First generation students in postsecondary education: a look at their college transcripts. National Center for Education Statistics (NCES) 2005-171. Washington, DC: US Department of Education.
Complete College America. (2011). Time is the enemy. Washington, DC: Complete College America. http://www.completecollege.org/docs/Time_Is_the_Enemy.pdf. Accessed 24 Nov 2015.
Duda, R. O., Hart, P. E., & Stork, D. G. (1973). Pattern classification and scene analysis (1st ed.). New York: John Wiley.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (2nd ed.). New York: John Wiley.
Elzinga, C. H., Hoogendoorn, A. W., & Dijkstra, W. (2007). Linked Markov Sources modeling outcome-dependent social processes. Sociological Methods and Research, 36(1), 26–47.
Heck, R. H., Price, C. L., & Thomas, S. L. (2004). Tracks as emergent structures: A network analysis of student differentiation in a high school. American Journal of Education, 110(4), 321–353.
Hess, F., Schneider, M., Carey, K., & Kelly, A. P. (2009). Diplomas and dropouts: Which colleges actually graduate their students (and which don’t). Washington, DC: American Enterprise Institute.
Horn, L. & Kojaku, L.K. (2001). High school curriculum and the persistence path through college. National Center for Education Statistics (NCES) 2001-163. Washington, DC: US Department of Education.
Ip, E. H., Snow Jones, A., Heckert, A., Zhang, Q., & Gondolf, E. D. (2010). Latent Markov model for analyzing temporal configuration for violence profiles and trajectories in a sample of batterers. Sociological Methods AND Research, 39(2), 222–255.
Langeheine, R., & Van de Pol, F. (2002). Latent Markov chains. In J. Hagenaars & A. McCutcheon (Eds.), Applied latent class analysis (pp. 304–334). Cambridge, UK: Cambridge University Press.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: John Wiley.
Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning. (PhD dissertation, Department of Computer Science). Berkeley, CA: University of California.
Murphy, K. P. (2005). Hidden Markov model (HMM) Toolbox for Matlab (Original Toolbox of 1998). http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html. Accessed 1 Aug 2016.
National Center for Education Statistics. (2011). 2004/2009 Beginning postsecondary students longitudinal study restricted use data file [in Stata]. Washington, DC: US Department of Education, NCES 2011-244 [distributor].
Perna, L. W. (2010). Toward a more complete understanding of financial aid in promoting college enrollment. In J. Smart, Higher education: handbook of theory and research (Vol. 25) (pp. 129–180). New York, NY: Springer.
Perna, L. W., & Li, C. (2006). College affordability: Implications for college opportunity. Journal of Student Financial Aid, 36(1), 7–24.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Radford, A. W., Berkner L., Wheeless, S.C., & Shepard, B. (2011). Persistence and attainment of 2003–2004 beginning postsecondary students: After six years. National Center for Education Statistics (NCES) 2011-151. Washington, DC: US Department of Education.
Schneider, M., & Yin, M. L. (2012). Completion matters: The high cost of low community college graduation rates. Washington, DC: American Enterprise Institute for Public Policy Research.
Schuh, J. (2005). Finances and retention: Trends and potential implications. In A. Seidman (Ed.), College student retention: Formula for student success (pp. 277–294). Westport, CT: American Council on Education and Praeger.
Scott, S. L. (2002). Bayesian methods for hidden Markov models. Journal of the American Statistical Association, 97(457), 337–351.
St. John, E. P., Cabrera A. F., Nora, A., & Asker, E.H. (2000). Economic influences on persistence reconsidered. In J.M. Braxton (Ed.), Reworking the student departure puzzle (pp. 29–47). Nashville, TN: Vanderbilt University Press.
Stamp, M. (2015). A revealing introduction to hidden Markov models (Course). http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf. Accessed 24 Nov 2015.
Tinto, V. (1993). Leaving college: Rethinking the causes of student attrition (2nd ed.). Chicago, IL: University of Chicago Press.
Vermunt, J. K., Tran, B., & Magidson, J. (2008). Latent class models in longitudinal research. In S. Menard (Ed.), Handbook of longitudinal research: design, measurement, and analysis (pp. 373–385). Burlington, MA: Elsevier.
Wine, J., Janson, N., & Wheeless, S. (2011). 2004/09 Beginning postsecondary students longitudinal study (BPS:04/09) full-scale methodology report. National Center for Education Statistics (NCES) 2012-246. Washington, DC: US Department of Education.
Wyner, J. S., Bridgeland, J.M., Diiulio, J. (2007). Achievement trap: How America is failing millions of high-achieving students from lower-income families. In V.A. Lansdowne (Ed.), Jack Kent Cooke Foundation. http://www.jkcf.org/news-knowledge/research-reports/. Accessed 24 Nov 2015.
Acknowledgements
We thank the National Science Foundation (Grant DRL 1243785) and the Bill & Melinda Gates Foundation (Grant OPP 1012951) for their support for this study. We also thank Andrew Rosenberg (Queens College, CUNY) for his extensive technical support and his feedback on programming Hidden Markov models.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Appendix 1: Trellis diagram of a Hidden Markov Chain
Appendix 2: Building an HMM in Matlab
Notes O = number of categories of all variables together, Q = number of expected states, s_prior0 = random initial distributions, s_transmat0 = random transition probabilities, s_obsmat0 = random observation probabilities, s_prior1 = expected initial distributions, s_transmat1 = expected transition probabilities, s_obsmat1 = expected observation probabilities, LL_S = log-likelihood (of the Graduation-HMM).
Appendix 3: Assessing the HMM
Notes The ‘Start’ and ‘End’ indicate test of nongraduating students (N = 1045). The dmm_logprob function in the HMM toolbox was used to produce two log-likelihoods. One that matches each individual test case with the prior, transition, and observation matrices of the Graduation-HMM ( LL_S ) and one that matches each individual test case with the prior, transition, and observation matrices of the Non-Completion-HMM ( LL_F ). The :,i can be replaced with any selection of length of specific semester transcripts (e.g., semester 1 through 4). Finally, the algorithm classifies by comparing the log-likelihoods LL_F > LL_S
Appendix 4: Encoding and Decoding Transcripts
The input for each student-semester observation (\(m_{1 \ldots n}\)) is based on a vector (\(v_{1 \ldots n}\)) that has all feature values encoded using the following algorithm:
The student-semester observations (m…) include categorical and continuous values. Some categorical features include the binary features “did the student take a STEM class?” and “did the student drop any courses?” Continuous features include “number of attempted credits” and “cumulative GPA.” To simplify the modeling process, we represent all features as independent categorical features. The first step in this process is the discretization of continuous features. Each continuous value is represented as a categorical feature as described in Table 1. At this point, each vector (vi) is vector of k categorical features each of which can take one of m(k) values. The second step in the simplification process converts the vectors vi to v’i where v’i is a single categorical variable which can take one of m= \prod_{i=1}^k m^(k) values. This transformation is accomplished by a bijection, f(vi) = v’i. Since the HMM assumes that all elements in the original v vector are independent, no information is lost in via this transformation.
Subsequently, in the analysis phase, the encoded student-semester observation can be decoded through the inverse function f^-1(v’i) = v_i. Since f(v) is bijective, no information is lost in this inversion.
Rights and permissions
About this article
Cite this article
Witteveen, D., Attewell, P. The College Completion Puzzle: A Hidden Markov Model Approach. Res High Educ 58, 449–467 (2017). https://doi.org/10.1007/s11162-016-9430-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11162-016-9430-2