Abstract
Grounded in theory and research on the role of adolescent family experiences in young adult educational attainment, this study took the novel step of synthesizing results from prior studies and using a machine learning (ML) approach to address three questions: (1) By incorporating adolescent family experience factors examined across prior studies in a single analysis, how accurately can we predict young adult educational attainment? (2) Which family experience factors are the best predictors of young adult educational attainment? (3) What complex patterns among family experience predictors merit further examination? Based on a review of 101 publications that used National Longitudinal Study of Adolescent Health data to investigate links between adolescent family experiences and young adult attainment, we identified 53 family experience independent variables. We used an ML-based approach to train and test models with these 53 Wave I family variables (adolescent in Grade 7–12) as predictors of both college enrollment (N = 4598) and graduation (N = 4180) at Wave IV (young adult mean age = 28.88, SD = 1.76). Our models (1) obtained prediction accuracies of 73.43% and 72.33% for college enrollment, and 79.10% and 79.07% for college graduation, (2) identified the best predictors of college enrollment and graduation, including family socioeconomic characteristics and parent educational expectations, and (3) highlight nonlinear patterns for further examination. This study advanced understanding of how adolescent family experiences may influence educational attainment and provided a paradigm for developmental research to synthesize existing findings into novel discoveries with large-scale datasets.
Highlights
-
A machine learning paradigm for cross-study synthesis with large-scale datasets.
-
Synthesized 101 education attainment studies with 53 adolescent family predictors.
-
Family experiences predicted young adult attainment with 72.33–79.10% accuracy.
-
Identified 12–19 key family predictors of young adult education attainment.
-
Partial dependence plots highlight nonlinear patterns in prediction.
Similar content being viewed by others
Data Availability
This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.
References
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.
American Psychological Association, Presidential Task Force on Educational Disparities (2012). Ethnic and racial disparities in education: psychology’s contributions to understanding and reducing disparities. http://www.apa.org/ed/resources/racial-disparities.aspx.
Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52, 2249–2260.
Ashtiani, M., & Feliciano, C. (2018). Access and mobilization: How social capital relates to low-income youth’s postsecondary educational (PSE) attainment. Youth & Society, 50, 439–461.
Baltes, P. B., Reese, H. W., & Nesselroade, J. R. (1977). Life-span developmental psychology: introduction to research methods. Monterey, CA: Brooks.
Benner, A. D., Boyle, A. E., & Sadler, S. (2016). Parental involvement and adolescents’ educational success: the roles of prior achievement and socioeconomic status. Journal of Youth and Adolescence, 45, 1053–1064.
Benner, A. D., & Wang, Y. (2014). Demographic marginalization, social integration, and adolescents’ educational success. Journal of Youth and Adolescence, 43, 1611–1627.
Boardman, J. D., Alexander, K. B., Miech, R. A., MacMillan, R., & Shanahan, M. J. (2012). The association between parent's health and the educational attainment of their children. Social Science & Medicine, 75, 932–939.
Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18, 71–86.
Breiman, L. (1984). Classification and regression trees. Wadsworth International Group.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Brick, T. R., Koffer, R. E., Gerstorf, D., & Ram, N. (2017). Feature selection methods for optimal design of studies for developmental inquiry. The Journals of Gerontology: Series B, 73, 113–123.
Bronfenbrenner, U., & Morris, P. A. (2006). The bioecological model of human development. In R. M. Lerner, & W. Damon (Eds), Handbook of child psychology (5th ed.). Theoretical models of human development, Vol. 1. (pp. 793–828). New York, NY: Wiley.
Burke, T. A., Ammerman, B. A., & Jacobucci, R. (2019). The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: a systematic review. Journal of Affective Disorders, 245, 869–884.
Choudhary, P., Kramer, A., & datascience.com team (2018). Skater: Model Interpretation Library. https://doi.org/10.5281/zenodo.1198885.
Chouldechova, A., Benavides-Prado, D., Fialko, O., & Vaithianathan, R. (2018). A case study of algorithm-assisted decision making in child maltreatment hotlinescreening decisions. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in PMLR, 81, 134–148.
Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Van Calster, B. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology, 110, 12–22.
Coleman, J. S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94, 95–120.
Couronné, R., Probst, P., & Boulesteix, A. L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics, 19, 270.
Eccles, J. (2011). Gendered educational and occupational choices: applying the Eccles et al. model of achievement-related choices. International Journal of Behavioral Development, 35, 195–201.
Elder, Jr., G. H. (1998). The life course as developmental theory. Child Development, 69, 1–12.
Erickson, L. D., McDonald, S., & Elder, G. H.Jr. (2009). Informal mentors and education: Complementary or compensatory resources? Sociology of Education, 82, 344–367.
Faas, C., Benson, M. J., & Kaestle, C. E. (2013). Parent resources during adolescence: effects on education and careers in young adulthood. Journal of Youth Studies, 16, 151–171.
Fasang, A. E., Mangino, W., & Brückner, H. (2014). Social closure and educational attainment. Sociological Forum, 29, 137–164.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.
Feliciano, C., & Lanuza, Y. R. (2017). An immigrant paradox? Contextual attainment and intergenerational educational mobility. American Sociological Review, 82, 211–241.
Fletcher, J., & Lehrer, S. (2009). The effects of adolescent health on educational outcomes: causal evidence using genetic lotteries between siblings. Forum for Health Economics & Policy, 12(2), Article 8.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American statistical Association, 70, 320–328.
Gillette, M. T., & Gudmunson, C. G. (2014). Processes linking father absence to educational attainment among African American females. Journal of Research on Adolescence, 24, 309–321.
Glanville, J. L., Sikkink, D., & Hernández, E. I. (2008). Religious involvement and educational outcomes: the role of social capital and extracurricular participation. The Sociological Quarterly, 49, 105–137.
Gordon, M. S., & Cui, M. (2012). The effect of school-specific parenting processes on academic achievement in adolescence and young adulthood. Family Relations, 61, 728–741.
Harris, K. M., & Udry, J. R. (1994–2008) National Longitudinal Study of Adolescent to Adult Health (Add Health) [Public Use]. Ann Arbor, MI: Carolina Population Center, University of North Carolina-Chapel Hill [distributor], Inter-university Consortium for Political and Social Research [distributor], 2018-08-06.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Holder, H. (2010). Prevention programs in the 21st century: what we do not discuss in public. Addiction, 105, 578–581.
Humberstone, E. (2018). Social networks and educational attainment among Adolescents Experiencing Pregnancy. Socius, 4, 1–13.
IOM (Institute of Medicine) & NRC (National Research Council) (2015). Investing in the health and well-being of young adults. Washington, DC: The National Academies Press.
Joel, S., Eastwick, P. W., & Finkel, E. J. (2017). Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychological Science, 28, 1478–1489.
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics, 6, 10.
Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N., Ghani, R., & Addison, K. L. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1909–1918). ACM.
Mahatmya, D., & Smith, A. (2017). Family and neighborhood influences on meeting college expectations in emerging adulthood. Emerging Adulthood, 5, 164–176.
Mangino, W. (2014). The negative effects of privilege on educational attainment: gender, race, class, and the bachelor’s degree. Social Science Quarterly, 95, 760–784.
McArdle, J. J. (2013). Exploratory data mining using decision trees in the behavioral sciences. In J. J. McArdle & G. Ritschard (Eds.), Contemporary issues in exploratory data mining in the behavioral sciences (pp. 3–47). Routledge.
Mears, D. P., & Siennick, S. E. (2016). Young adult outcomes and the life-course penalties of parental incarceration. Journal of Research in Crime and Delinquency, 53, 3–35.
Minuchin, P. (1985). Families and individual development: provocations from the field of family therapy. Child Development, 56, 289–302.
Molnar, C. (2019). Interpretable machine learning: a guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/.
Monserud, M. A., & Elder, Jr., G. H. (2011). Household structure and children’s educational attainment: a perspective on coresidence with grandparents. Journal of Marriage and Family, 73, 981–1000.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Vanderplas, J. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pettit, G. S., Davis-Kean, P. E., & Magnuson, K. (2009). Educational attainment in developmental perspective: longitudinal analyses of continuity, change, and process. Merrill-Palmer Quarterly, 55, 217–223.
Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1301.
Ryabov, I. (2013). The influence of co-racial versus inter-racial peer friendships on academic achievement of Asian-American adolescents. Asian American Journal of Psychology, 4, 201–210.
Ryabov, I. (2016). Colorism and educational outcomes of Asian Americans: evidence from the National Longitudinal Study of Adolescent Health. Social Psychology of Education, 19, 303–324.
Serang, S., & Jacobucci, R. (2020). Exploratory mediation analysis of dichotomous outcomes via regularization. Multivariate Behavioral Research, 55, 69–86.
Stokes, C. E. (2008). The role of parental religiosity in high school completion. Sociological Spectrum, 28, 531–555.
Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307.
Sun, X., McHale, S. M., & Updegraff, K. A. (2017). Maternal and paternal resources across childhood and adolescence as predictors of young adult achievement. Journal of Vocational Behavior, 100, 111–123.
Turley, R. N. L., Desmond, M., & Bruch, S. K. (2010). Unanticipated educational consequences of a positive parent-child relationship. Journal of Marriage and Family, 72, 1377–1390.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.
Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91.
Whelan, R., Watts, R., Orr, C. A., Althoff, R. R., Artiges, E., Banaschewski, T., Barker, G. J., Bokde, A. L. W., Büche, C., Carvalho, F. M., Conrod, P. J., Flor, H., Fauth-Bühler, M., Frouin, V., Gallinat, J., Gan, G., Gowland, P., Heinz, A., & Ittermann, B., The IMAGEN Consortium. (2014). Neuropsychosocial profiles of current and future adolescent alcohol misusers. Nature, 512, 185–189.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons from machine learning. Perspectives on Psychological Science, 12, 1100–1122.
Funding
This material is based on work supported by the National Science Foundation under IGERT Grant DGE-1144860, Big Data Social Science.
Author information
Authors and Affiliations
Contributions
X.S. led the design of the study, data analyses and writing. N.R. and S.M.M. collaborated with the design and writing of the study.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This study conducts secondary data analysis with a publicly available dataset. This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
The following declaration is obtained from the Add Health website (http://www.cpc.unc.edu/addhealth): “Add Health participants provided written informed consent for participation in all aspects of Add Health in accordance with the University of North Carolina School of Public Health Institutional Review Board guidelines that are based on the Code of Federal Regulations on the Protection of Human Subjects 45CFR46: http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html”.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
About this article
Cite this article
Sun, X., Ram, N. & McHale, S.M. Adolescent Family Experiences Predict Young Adult Educational Attainment: A Data-Based Cross-Study Synthesis With Machine Learning. J Child Fam Stud 29, 2770–2785 (2020). https://doi.org/10.1007/s10826-020-01775-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10826-020-01775-5