Skip to main content

Approaches for Addressing Missing Data in Statistical Analyses of Female and Male Adolescent Fertility

  • Chapter
  • First Online:
Developments in Demography in the 21st Century

Abstract

Missing data is a pervasive problem in social science research. Allison (2002: 1) has written that “sooner or later, usually sooner, anyone who does statistical analysis runs into problems with missing data. In a typical dataset, information is missing for some variables for some cases. … Missing data are a ubiquitous problem in both the social and health sciences … [Yet] the vast majority of statistical textbooks have nothing whatsoever to say about missing data or how to deal with it.” Treiman (2009: 182) has noted that “missing data is a vexing problem in social research. It is both common and difficult to manage.” In this chapter we undertake two separate analyses, one for females and the other for males, of the likelihood of the respondent reporting having had a teen birth. We use several independent variables in our analyses that have been shown in prior studies to be important predictors of adolescent fertility. We handle the problem of missing data using several different approaches.

This research uses data from the National Longitudinal Study of Adolescent Health (Add Health), a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are other “traditional” methods that researchers have used for handling missing data. Among them are dummy variable adjustment and hot and cold deck imputation. Although we will not use any of these methods in the analyses we undertake in this chapter, we mention each of them here, as follows.

    Dummy variable adjustment is an approach widely used in the social sciences; it is also known as the missing indicator method. According to Treiman (2009: 184), “for each independent variable with substantial missing data, the mean (or some other constant) is substituted, and a dummy variable, scored 1 if a value has been substituted and scored 0 if otherwise, is added to the regression equation.” Some prefer this method because it is also a test of the MCAR assumption. “If any of the dummy variables has a (significant) nonzero coefficient, the data are not MCAR” (Treiman, 2009: 184). Although some have argued that this approach corrects the missing data for nonrandomness, it has been has shown that it produces biased estimates (Treiman, 2009). And Acock has added that “it gives a false sense of statistical power” (Acock 2005:101–7). Also, if this method is applied to multiple independent variables, one may well have problems with multicollinearity if many respondents fail to provide data on two or more of the same variables (Acock 2005). In sum, this method might seem appealing since it uses all the cases, but it has been shown to produce biased estimates irrespective of whether or not the data are MCAR, MAR or MNAR (Acock 2005; Allison 2002).

    Hotdeck imputation is a method used by the U.S. Census Bureau to construct complete data public use samples. According to Treiman (2009: 185) the “sample is divided into strata… Then each missing value within a stratum is replaced with a value randomly drawn (with replacement) from the observed cases within the stratum. As a result, within each stratum the distribution of values for the imputed cases is (within the limits of sampling error) identical to the distribution of values for the observed cases. When the imputation model is correctly specified (that is, when all variables correlated with the missingness of values on a given variable are used to impute the missing values), this method produces unbiased coefficients but biased standard errors. It also tends to perform poorly when a substantial fraction of cases have at least one missing value …”

    Cold deck imputation follows the same approach as hotdeck imputation, but it replaces the missing values with those from another data set rather than from the same data set. The hotdeck and cold deck methods may seem to be appealing because they use all the cases, but they have been shown to produce biased estimates irrespective of the reason why the data are missing.

  2. 2.

    We used three auxiliary variables. Two questions were asked of the parents, namely, “How important is religion to you?” and “Do you have enough money to pay your bills.” And one question was asked of the students, namely, “How much do you want to go to college?” All three auxiliary questions were answered on a 1–4 or a 1–5 point scale from low to high.

References

  • Acock, A. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012–1028.

    Article  Google Scholar 

  • Allison, P. D. (2002). Missing Data. Thousand Oaks: Sage Publications.

    Book  Google Scholar 

  • Bean, F. D., & Swicegood, G. (1985). Mexican American fertility patterns. Austin: Univeristy of Texas Press.

    Google Scholar 

  • Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.

    Google Scholar 

  • Francis, S. A. (2010). Using the primary socialization theory to predict substance use and sexual risk behaviors between black and White adolescents. Substance Use & Misuse, 45, 2113–2129.

    Article  Google Scholar 

  • Greene, M. E., & Biddlecom, A. E. (2000). Absent and problematic men: Demographic accounts of male reproductive roles. Population and Development Review, 26, 81–115.

    Article  Google Scholar 

  • Harris, K. M. (2008). The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002 [machine-readable data file and documentation]. Chapel Hill: University of North Carolina at Chapel Hill.

    Google Scholar 

  • Klepinger, D. H., Lundberg, S., & Plotnick, R. D. (1995). Adolescent fertility and the educational attainment of young women. Family Planning Perspectives, 27, 23–28.

    Article  Google Scholar 

  • Lee, K. J., & Carlin, J. B. (2010). Multiple imputation for missing data: Fully conditional specification versus multivariate Normal imputation. American Journal of Epidemiology, 171, 624–632.

    Article  Google Scholar 

  • Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). College Station: Stata Press.

    MATH  Google Scholar 

  • Mirowsky, J., & Ross, C. E. (2003). Education, social status, and health. New York: A. de Gruyter.

    Google Scholar 

  • Perreira, K., Harris, K., & Lee, d. (2007). Immigrant youth in the labor market. Work and Occupations, 34, 5–34.

    Article  Google Scholar 

  • Poston, D. L., Jr. (2002). Son preference and fertility in China. Journal of Biosocial Science, 34, 333–347.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.

    Article  MathSciNet  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  Google Scholar 

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Sen, A. (1999). Development As Freedom. New York: Knopf.

    Google Scholar 

  • StataCorp. (2016). Stata survey data reference manual, release 14. College Station: StataCorp.

    Google Scholar 

  • Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 338, b2393.

    Article  Google Scholar 

  • Treiman, D. J. (2009). Quantitative data analysis: Doing social research to test ideas. San Francisco, CA: Jossey-Bass.

    Google Scholar 

  • Vaquera, E. (2006). The implications of choosing ‘no race’ on the salience of Hispanic identity: How racial and ethnic backgrounds intersect among Hispanic adolescents. Sociological Quarterly, 47, 375–396.

    Article  Google Scholar 

  • von Hippel, P. T. (2007). Regression with missing Ys: An improved strategy for analyzing multiple imputed data. Sociological Methodology, 37, 83–117.

    Article  Google Scholar 

  • Wahl, A. M. (2010). Gender, acculturation and alcohol use among Latina/o adolescents: A multi-ethnic comparison. Journal of Immigrant & Minority Health, 12, 153–165.

    Article  Google Scholar 

  • Zhang, L., Poston, D. L., Jr., & Chang, C. F. (2014). Chapter 9: Male and female fertility in Taiwan. In D. L. Poston Jr., W. S. Yang, & D. N. Farris (Eds.), The family and social change in Chinese societies (pp. 151–161). New York: Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dudley L. Poston Jr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Conde, E., Poston, D.L. (2020). Approaches for Addressing Missing Data in Statistical Analyses of Female and Male Adolescent Fertility. In: Singelmann, J., Poston, Jr, D. (eds) Developments in Demography in the 21st Century. The Springer Series on Demographic Methods and Population Analysis, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-26492-5_4

Download citation

Publish with us

Policies and ethics