Approaches for Addressing Missing Data in Statistical Analyses of Female and Male Adolescent Fertility

Conde, Eugenia; Poston, Dudley L.

doi:10.1007/978-3-030-26492-5_4

Eugenia Conde⁴ &
Dudley L. Poston Jr⁵

Part of the book series: The Springer Series on Demographic Methods and Population Analysis ((PSDE,volume 48))

505 Accesses

Abstract

Missing data is a pervasive problem in social science research. Allison (2002: 1) has written that “sooner or later, usually sooner, anyone who does statistical analysis runs into problems with missing data. In a typical dataset, information is missing for some variables for some cases. … Missing data are a ubiquitous problem in both the social and health sciences … [Yet] the vast majority of statistical textbooks have nothing whatsoever to say about missing data or how to deal with it.” Treiman (2009: 182) has noted that “missing data is a vexing problem in social research. It is both common and difficult to manage.” In this chapter we undertake two separate analyses, one for females and the other for males, of the likelihood of the respondent reporting having had a teen birth. We use several independent variables in our analyses that have been shown in prior studies to be important predictors of adolescent fertility. We handle the problem of missing data using several different approaches.

This research uses data from the National Longitudinal Study of Adolescent Health (Add Health), a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
There are other “traditional” methods that researchers have used for handling missing data. Among them are dummy variable adjustment and hot and cold deck imputation. Although we will not use any of these methods in the analyses we undertake in this chapter, we mention each of them here, as follows.
Dummy variable adjustment is an approach widely used in the social sciences; it is also known as the missing indicator method. According to Treiman (2009: 184), “for each independent variable with substantial missing data, the mean (or some other constant) is substituted, and a dummy variable, scored 1 if a value has been substituted and scored 0 if otherwise, is added to the regression equation.” Some prefer this method because it is also a test of the MCAR assumption. “If any of the dummy variables has a (significant) nonzero coefficient, the data are not MCAR” (Treiman, 2009: 184). Although some have argued that this approach corrects the missing data for nonrandomness, it has been has shown that it produces biased estimates (Treiman, 2009). And Acock has added that “it gives a false sense of statistical power” (Acock 2005:101–7). Also, if this method is applied to multiple independent variables, one may well have problems with multicollinearity if many respondents fail to provide data on two or more of the same variables (Acock 2005). In sum, this method might seem appealing since it uses all the cases, but it has been shown to produce biased estimates irrespective of whether or not the data are MCAR, MAR or MNAR (Acock 2005; Allison 2002).
Hotdeck imputation is a method used by the U.S. Census Bureau to construct complete data public use samples. According to Treiman (2009: 185) the “sample is divided into strata… Then each missing value within a stratum is replaced with a value randomly drawn (with replacement) from the observed cases within the stratum. As a result, within each stratum the distribution of values for the imputed cases is (within the limits of sampling error) identical to the distribution of values for the observed cases. When the imputation model is correctly specified (that is, when all variables correlated with the missingness of values on a given variable are used to impute the missing values), this method produces unbiased coefficients but biased standard errors. It also tends to perform poorly when a substantial fraction of cases have at least one missing value …”
Cold deck imputation follows the same approach as hotdeck imputation, but it replaces the missing values with those from another data set rather than from the same data set. The hotdeck and cold deck methods may seem to be appealing because they use all the cases, but they have been shown to produce biased estimates irrespective of the reason why the data are missing.
2.
We used three auxiliary variables. Two questions were asked of the parents, namely, “How important is religion to you?” and “Do you have enough money to pay your bills.” And one question was asked of the students, namely, “How much do you want to go to college?” All three auxiliary questions were answered on a 1–4 or a 1–5 point scale from low to high.

References

Acock, A. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012–1028.
Article Google Scholar
Allison, P. D. (2002). Missing Data. Thousand Oaks: Sage Publications.
Book Google Scholar
Bean, F. D., & Swicegood, G. (1985). Mexican American fertility patterns. Austin: Univeristy of Texas Press.
Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Google Scholar
Francis, S. A. (2010). Using the primary socialization theory to predict substance use and sexual risk behaviors between black and White adolescents. Substance Use & Misuse, 45, 2113–2129.
Article Google Scholar
Greene, M. E., & Biddlecom, A. E. (2000). Absent and problematic men: Demographic accounts of male reproductive roles. Population and Development Review, 26, 81–115.
Article Google Scholar
Harris, K. M. (2008). The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002 [machine-readable data file and documentation]. Chapel Hill: University of North Carolina at Chapel Hill.
Google Scholar
Klepinger, D. H., Lundberg, S., & Plotnick, R. D. (1995). Adolescent fertility and the educational attainment of young women. Family Planning Perspectives, 27, 23–28.
Article Google Scholar
Lee, K. J., & Carlin, J. B. (2010). Multiple imputation for missing data: Fully conditional specification versus multivariate Normal imputation. American Journal of Epidemiology, 171, 624–632.
Article Google Scholar
Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). College Station: Stata Press.
MATH Google Scholar
Mirowsky, J., & Ross, C. E. (2003). Education, social status, and health. New York: A. de Gruyter.
Google Scholar
Perreira, K., Harris, K., & Lee, d. (2007). Immigrant youth in the labor market. Work and Occupations, 34, 5–34.
Article Google Scholar
Poston, D. L., Jr. (2002). Son preference and fertility in China. Journal of Biosocial Science, 34, 333–347.
Article Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–590.
Article MathSciNet Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton: Chapman & Hall/CRC.
Book Google Scholar
Sen, A. (1999). Development As Freedom. New York: Knopf.
Google Scholar
StataCorp. (2016). Stata survey data reference manual, release 14. College Station: StataCorp.
Google Scholar
Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 338, b2393.
Article Google Scholar
Treiman, D. J. (2009). Quantitative data analysis: Doing social research to test ideas. San Francisco, CA: Jossey-Bass.
Google Scholar
Vaquera, E. (2006). The implications of choosing ‘no race’ on the salience of Hispanic identity: How racial and ethnic backgrounds intersect among Hispanic adolescents. Sociological Quarterly, 47, 375–396.
Article Google Scholar
von Hippel, P. T. (2007). Regression with missing Ys: An improved strategy for analyzing multiple imputed data. Sociological Methodology, 37, 83–117.
Article Google Scholar
Wahl, A. M. (2010). Gender, acculturation and alcohol use among Latina/o adolescents: A multi-ethnic comparison. Journal of Immigrant & Minority Health, 12, 153–165.
Article Google Scholar
Zhang, L., Poston, D. L., Jr., & Chang, C. F. (2014). Chapter 9: Male and female fertility in Taiwan. In D. L. Poston Jr., W. S. Yang, & D. N. Farris (Eds.), The family and social change in Chinese societies (pp. 151–161). New York: Springer.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

The Samuel DuBois Cook Center on Social Equity, Duke University, Durham, NC, USA
Eugenia Conde
Department of Sociology, Texas A&M University College Station, College Station, TX, USA
Dudley L. Poston Jr

Authors

Eugenia Conde
View author publications
You can also search for this author in PubMed Google Scholar
Dudley L. Poston Jr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dudley L. Poston Jr .

Editor information

Editors and Affiliations

Department of Demography, University of Texas at San Antonio (UTSA), San Antonio, TX, USA
Joachim Singelmann
Department of Sociology, Texas A&M University College Station, College Station, TX, USA
Dudley L. Poston, Jr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Conde, E., Poston, D.L. (2020). Approaches for Addressing Missing Data in Statistical Analyses of Female and Male Adolescent Fertility. In: Singelmann, J., Poston, Jr, D. (eds) Developments in Demography in the 21st Century. The Springer Series on Demographic Methods and Population Analysis, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-26492-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-26492-5_4
Published: 25 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26491-8
Online ISBN: 978-3-030-26492-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics