Abstract
The NHANES study contains objectively measured physical activity data collected using hip-worn accelerometers from multiple cohorts. However, using the accelerometry data has proven daunting because (1) currently, there are no agreed-upon standard protocols for data storage and analysis; (2) data exhibit heterogeneous patterns of missingness due to varying degrees of adherence to wear-time protocols; (3) sampling weights need to be carefully adjusted and accounted for in individual analyses; (4) there is a lack of reproducible software that transforms the data from its published format into analytic form; and (5) the high dimensional nature of accelerometry data complicates analyses. Here, we provide a framework for processing, storing, and analyzing the NHANES accelerometry data for the 2003–2004 and 2005–2006 surveys. We also provide an NHANES data package in R, to help disseminate high-quality, processed activity data combined with mortality and demographic information. Thus, we provide the tools to transition from “available data online” to “easily accessible and usable data”, which substantially reduces the large upfront costs of initiating studies of association between physical activity and human health outcomes using NHANES. We apply these tools in an analysis showing that accelerometry features have the potential to predict 5-year all-cause mortality better than known risk factors such as age, cigarette smoking, and various comorbidities.
Similar content being viewed by others
References
Banack HR, Kaufman JS (2014) The obesity paradox: understanding the effect of obesity on mortality among individuals with cardiovascular disease. Prev Med 62:96–102. https://doi.org/10.1016/j.ypmed.2014.02.003
Centers for Disease Control and Prevention (2017) About the national health and nutrition examination survey. http://www.cdc.gov/nchs/nhanes/about_nhanes.htm
Cooper R, Huang L, Hardy R, Crainiceanu A, Harris T, Schrack JA, Crainiceanu C, Kuh D (2017) Obesity history and daily patterns of physical activity at age 60-64 years: findings from the MRC national survey of health and development. J Gerontol A Biol Sci Med Sci 72(10):1424–1430
Curtin L, Mohadjer L, Dohrmann S (2012) The national health and nutrition examination survey: sample design, 1999–2006. Vital Health Stat 2(155):1–39
Di C, Crainiceanu CM, Caffo BS, Punjabi NM (2009) Multilevel functional principial component analysis. Ann Appl Stat 3(1):458–488
Di J, Leroux A, Urbanek J, Varadhan R, Spira A, Schrack J, Zipunnikov V (2017) Patterns of sedentary and active time accumulation are associated with mortality in US adults: the NHANES study. bioRxiv. https://doi.org/10.1101/182337
Gellar JE, Colantuoni E, Needham DM, Crainiceanu CM (2015) Cox regression models with functional covariates for survival data. Stat Model 15(3):256–278
Huang L, Scheipl F, Goldsmith J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss P (2016) refund: Regression with functional data
Klenk J, Srulijes K, Schatton C, Schwickert L, Maetzler W, Becker C, Synofzik M (2016) Ambulatory activity components deteriorate differently across neurodegenerative diseases: a cross-sectional sensor-based study. Neurodegener Dis 16:317–323
Krane-Gartiser K, Henriksen TEG, Morken G, Vaaler A, Fasmer OB (2014) Actigraphic assessment of motor activity in acutely admitted inpatients with bipolar disorder. PLoS ONE 9(2):1–9. https://doi.org/10.1371/journal.pone.0089574
Krane-Gartiser K, Henriksen TEG, Vaaler AE, Fasmer OB, Morken G (2015) Actigraphically assessed activity in unipolar depression: a comparison of inpatients with and without motor retardation. J Clin Psychiatry 76(9):1181–1187
Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG (2015) Bflcrm: a bayesian functional linear cox regression model for predicting time to conversion to alzheimer’s disease. Ann Appl Stat 9(4):2153–2178
Leroux A (2018) rnhanesdata: NHANES accelerometry data pipeline. R package version 1.0. https://github.com/andrew-leroux/rnhanesdata
Lohr SL (2009) Sampling: design and analysis, 2nd edn. Duxbury Press, Australia
Lumley T (2010) Complex surveys: a guide to analysis using R. Wiley series in survey methodology. Wiley, Hoboken, NJ
Lumley T (2017) survey: Analysis of complex sample surveys. R package version 3.32
Lumley T, Scott A (2015) AIC and BIC for modeling with complex survey data. J Surv Stat Methodol 3(1):1–18. https://doi.org/10.1093/jssam/smu021
National Cancer Institute (2018) Risk factor monitoring and methods: SAS programs for analyzing nhanes 2003 2004 accelerometer data. https://epi.grants.cancer.gov/nhanes_pam/
National Center for Health Statistics (2015) Office of analysis and epidemiology, public-use linked mortality file. http://www.cdc.gov/nchs/data_access/data_linkage/mortality.htm
Preston SH, Stokes A (2014) Obesity paradox: conditioning on disease enhances biases in estimating the mortality risks of obesity. Epidemiology 25(3):454–461
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Ramsay J, Silverman B (2005) Functional data analysis. Springer, New York
Robillard R, Hermens DF, Naismith SL, White D, Rogers NL, Ip TK, Mullin SJ, Alvares GA, Guastella AJ, Smith KL, Rong Y, Whitwell B, Southan J, Glozier N, Scott EM, Hickie IB (2015) Ambulatory sleep-wake patterns and variability in young people with emerging mental disorders. J Psychiatry Neurosci 40(1):28–37
Schrack JA, Zipunnikov V, Goldsmith J, Bai J, Simonsick EM, Crainiceanu C, Ferrucci L (2014) Assessing the “physical cliff”: detailed quantification of age-related differences in daily patterns of physical activity. J Gerontol A Biol Sci Med Sci 69(8):973–979
Shou H, Zipunnikov V, Crainiceanu CM, Greven S (2015) Structured functional principal component analysis. Biometrics 71(1):247–257
Steeves JA, Murphy RA, Crainiceanu CM, Zipunnikov V, Van Domelen DR, Harris TB (2015) Daily patterns of physical activity by type 2 diabetes definition: comparing diabetes, prediabetes, and participants with normal glucose levels in NHANES 2003–2006. Prev Med Rep 2:152–157
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R (2015) UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12(3):e1001779
Troiano RP, Berrigan D, Dodd KW, Mâsse LC, Tilert T, McDowell M (2008) Physical activity in the united states measured by accelerometer. Med Sci Sports Exerc 40(1):181–188
Van Domelen DR (2018) accelerometry: Functions for processing accelerometer data. R package version 3.1.2
Van Domelen DR, Pittard WS, Harris TB (2014) nhanesaccel: Process accelerometer data from NHANES 2003–2006. R package version 2.1.1/r86
Van Domelen DR, Pttard SW (2014) Flexible R functions for processing accelerometer data, with emphasis on nhanes 2003–2006. R J 6:52–62
Varma VR, Dey D, Leroux A, Di J, Urbanek J, Xiao L, Zipunnikov V (2018) Total volume of physical activity: tac, tlac or tac(\(\lambda \)). Prev Med 106:233–235. https://doi.org/10.1016/j.ypmed.2017.10.028
Wood SN, Pya N (2016) Säfken: smoothing parameter and model selection for general smooth models. J Am Stat Assoc 111(516):1548–1575
Xiao L, Zipunnikov V, Ruppert D, Crainiceanu CM (2016) Fast covariance estimation for high-dimensional functional data. Stat Comput 26(1):409–421
Yoshida K, Bohn J (2017) tableone: Create ‘Table 1’ to describe baseline characteristics. R package version 0.9.3
Zipunnikov V, Caffo B, Yousem DM, Davatzikos C, Schwartz BS, Crainiceanu CM (2011) Multilevel functional principal component analysis for high-dimensional data. J Comput Graph Stat 20(4):852–873
Acknowledgements
We would like to thank the CDC, specifically the National Center for Health Statistics for collecting, organizing, and making public this unique data resource. We would also like to thank them for the permission to repost the publicly available NHANES and NDI data in analytic format. Also, we would like to thank the thousands of anonymous participants in the NHANES, whose data led to the exciting findings in this paper.
Funding
This research was supported by National Heart, Lung, and Blood Institute (R 01 HL123407), National Institute of Neurological Disorders and Stroke (R 01 NS060910), and National Institute on Aging Training Grant (T 32 AG000247).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leroux, A., Di, J., Smirnova, E. et al. Organizing and Analyzing the Activity Data in NHANES. Stat Biosci 11, 262–287 (2019). https://doi.org/10.1007/s12561-018-09229-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-018-09229-9