Abstract
Missing data is a pervasive problem in social science research. Allison (2002: 1) has written that “sooner or later, usually sooner, anyone who does statistical analysis runs into problems with missing data. In a typical dataset, information is missing for some variables for some cases. … Missing data are a ubiquitous problem in both the social and health sciences … [Yet] the vast majority of statistical textbooks have nothing whatsoever to say about missing data or how to deal with it.” Treiman (2009: 182) has noted that “missing data is a vexing problem in social research. It is both common and difficult to manage.” In this chapter we undertake two separate analyses, one for females and the other for males, of the likelihood of the respondent reporting having had a teen birth. We use several independent variables in our analyses that have been shown in prior studies to be important predictors of adolescent fertility. We handle the problem of missing data using several different approaches.
This research uses data from the National Longitudinal Study of Adolescent Health (Add Health), a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There are other “traditional” methods that researchers have used for handling missing data. Among them are dummy variable adjustment and hot and cold deck imputation. Although we will not use any of these methods in the analyses we undertake in this chapter, we mention each of them here, as follows.
Dummy variable adjustment is an approach widely used in the social sciences; it is also known as the missing indicator method. According to Treiman (2009: 184), “for each independent variable with substantial missing data, the mean (or some other constant) is substituted, and a dummy variable, scored 1 if a value has been substituted and scored 0 if otherwise, is added to the regression equation.” Some prefer this method because it is also a test of the MCAR assumption. “If any of the dummy variables has a (significant) nonzero coefficient, the data are not MCAR” (Treiman, 2009: 184). Although some have argued that this approach corrects the missing data for nonrandomness, it has been has shown that it produces biased estimates (Treiman, 2009). And Acock has added that “it gives a false sense of statistical power” (Acock 2005:101–7). Also, if this method is applied to multiple independent variables, one may well have problems with multicollinearity if many respondents fail to provide data on two or more of the same variables (Acock 2005). In sum, this method might seem appealing since it uses all the cases, but it has been shown to produce biased estimates irrespective of whether or not the data are MCAR, MAR or MNAR (Acock 2005; Allison 2002).
Hotdeck imputation is a method used by the U.S. Census Bureau to construct complete data public use samples. According to Treiman (2009: 185) the “sample is divided into strata… Then each missing value within a stratum is replaced with a value randomly drawn (with replacement) from the observed cases within the stratum. As a result, within each stratum the distribution of values for the imputed cases is (within the limits of sampling error) identical to the distribution of values for the observed cases. When the imputation model is correctly specified (that is, when all variables correlated with the missingness of values on a given variable are used to impute the missing values), this method produces unbiased coefficients but biased standard errors. It also tends to perform poorly when a substantial fraction of cases have at least one missing value …”
Cold deck imputation follows the same approach as hotdeck imputation, but it replaces the missing values with those from another data set rather than from the same data set. The hotdeck and cold deck methods may seem to be appealing because they use all the cases, but they have been shown to produce biased estimates irrespective of the reason why the data are missing.
- 2.
We used three auxiliary variables. Two questions were asked of the parents, namely, “How important is religion to you?” and “Do you have enough money to pay your bills.” And one question was asked of the students, namely, “How much do you want to go to college?” All three auxiliary questions were answered on a 1–4 or a 1–5 point scale from low to high.
References
Acock, A. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012–1028.
Allison, P. D. (2002). Missing Data. Thousand Oaks: Sage Publications.
Bean, F. D., & Swicegood, G. (1985). Mexican American fertility patterns. Austin: Univeristy of Texas Press.
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Francis, S. A. (2010). Using the primary socialization theory to predict substance use and sexual risk behaviors between black and White adolescents. Substance Use & Misuse, 45, 2113–2129.
Greene, M. E., & Biddlecom, A. E. (2000). Absent and problematic men: Demographic accounts of male reproductive roles. Population and Development Review, 26, 81–115.
Harris, K. M. (2008). The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002 [machine-readable data file and documentation]. Chapel Hill: University of North Carolina at Chapel Hill.
Klepinger, D. H., Lundberg, S., & Plotnick, R. D. (1995). Adolescent fertility and the educational attainment of young women. Family Planning Perspectives, 27, 23–28.
Lee, K. J., & Carlin, J. B. (2010). Multiple imputation for missing data: Fully conditional specification versus multivariate Normal imputation. American Journal of Epidemiology, 171, 624–632.
Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). College Station: Stata Press.
Mirowsky, J., & Ross, C. E. (2003). Education, social status, and health. New York: A. de Gruyter.
Perreira, K., Harris, K., & Lee, d. (2007). Immigrant youth in the labor market. Work and Occupations, 34, 5–34.
Poston, D. L., Jr. (2002). Son preference and fertility in China. Journal of Biosocial Science, 34, 333–347.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton: Chapman & Hall/CRC.
Sen, A. (1999). Development As Freedom. New York: Knopf.
StataCorp. (2016). Stata survey data reference manual, release 14. College Station: StataCorp.
Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 338, b2393.
Treiman, D. J. (2009). Quantitative data analysis: Doing social research to test ideas. San Francisco, CA: Jossey-Bass.
Vaquera, E. (2006). The implications of choosing ‘no race’ on the salience of Hispanic identity: How racial and ethnic backgrounds intersect among Hispanic adolescents. Sociological Quarterly, 47, 375–396.
von Hippel, P. T. (2007). Regression with missing Ys: An improved strategy for analyzing multiple imputed data. Sociological Methodology, 37, 83–117.
Wahl, A. M. (2010). Gender, acculturation and alcohol use among Latina/o adolescents: A multi-ethnic comparison. Journal of Immigrant & Minority Health, 12, 153–165.
Zhang, L., Poston, D. L., Jr., & Chang, C. F. (2014). Chapter 9: Male and female fertility in Taiwan. In D. L. Poston Jr., W. S. Yang, & D. N. Farris (Eds.), The family and social change in Chinese societies (pp. 151–161). New York: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Conde, E., Poston, D.L. (2020). Approaches for Addressing Missing Data in Statistical Analyses of Female and Male Adolescent Fertility. In: Singelmann, J., Poston, Jr, D. (eds) Developments in Demography in the 21st Century. The Springer Series on Demographic Methods and Population Analysis, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-26492-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-26492-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26491-8
Online ISBN: 978-3-030-26492-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)