Introduction

The unique vulnerability of children to environmental hazards has been documented in many scientific studies and government reports, including a landmark 1993 US National Academy of Sciences report on pesticide exposures [1]. Yet, studies of children’s health have failed to account fully for the range of environmental influences in pregnancy, and the postnatal period that can substantially influence health from childhood through adult life. The Developmental Origins of Health and Disease hypothesis was first formulated by Barker and colleagues in the context of nutritional influences [2,3,4,5]. Nonetheless, it is widely accepted that biological, psychosocial, chemical, and physical exposures are equally influential [6].

Until now, progress toward elucidating the role of the environment in childhood obesity and other chronic conditions has been slow and incremental. Most studies have examined relatively small populations of children [6]; have considered only one chemical exposure at a time; have had little statistical power to examine interactions among chemical, social, and behavioral factors; and have had limited ability to examine gene–environment interactions [7]. Little is known about possible interactions and synergies among chemicals or between chemicals and other environmental hazards, even though the environment of a child includes mixtures of chemical and biological toxicants. Gene-environment interactions and epigenomic effects of exposures are just beginning to be explored.

For example, the recent explosive increase in the prevalence of obesity reflects a complex interplay among (1) changes in individual behaviors; (2) changes in community structure, lifestyle, and the “built environment”; and (3) exposures to certain synthetic chemicals (e.g., endocrine disruptors) that might disrupt energy balance [6]. Control of the obesity epidemic will require understanding each of these factors and the interplay among them. While previous cohort studies have contributed greatly to identifying many individual-level factors that contribute to the development of obesity in children and its persistence into adulthood in the US and other countries [8,9,10,11,12,13,14,15,16,17,18,19,20,21], these studies have several limitations [6]:

  • Previous studies have not fully capitalized on the life-course approach to chronic disease epidemiology [22].

  • Although some studies have collected genetic data on participants and been able to identify polymorphisms that increase the risk of obesity, they have not simultaneously collected the data on environmental exposures needed to carefully examine the interactions of genetic and environmental factors with diet, physical activity, or epigenetic changes—all of which might predict risk of obesity.

  • Many prior cohorts have been limited in their capacities to identify risk factors for obesity that may be unique among Hispanics, a population for which obesity prevalence is increasing especially rapidly [23, 24].

  • Past studies have not assessed features of the built environment that encourage healthy diet and physical activity among children living in urban areas [25].

In the context of the broad array of environmental stressors, increasing human and laboratory evidence suggests that exogenous chemicals influence developmental metabolic programming and provoke oxidative stress, a major pathophysiologic mechanism that underlies cardiometabolic risks [26]. These include (1) phthalates (used to soften plastics and as scents), which increase expression of peroxisome proliferator-activated receptors [27] that play key roles in lipid and carbohydrate metabolism [28]; (2) bisphenols (found in aluminum can linings and thermal paper receipts), which are mildly estrogenic, increase fat in adipocytes [29], and disrupt pancreatic β-cell function; (3) polycyclic aromatic hydrocarbons (PAHs, present in air pollution), which promote inflammation and increase visceral fat in animal models [30, 31]; and (4) organophosphate pesticides (OPs), which are thyroid hormone antagonists that contribute to pre-diabetes, lipid metabolism abnormalities, and obesity in animals [32].

Longitudinal studies of prenatal exposure in humans, especially for phthalates and bisphenols, have not yielded expected findings [33,34,35,36,37]. This might be attributed to (1) limited exposure assessment via collection of a limited number of spot samples during later pregnancy, which restricts insight into how the effects of exposure depend on the earliest stages of fetal development; (2) lack of fetal growth data to evaluate intrauterine effects; (3) use of body mass index (BMI) rather than specific measures of fat mass; (4) failure to measure replacements of bisphenol A and di-2-ethylhexylphthalate, or DEHP (particularly bisphenol S, or BPS, diisononylphthalate, or DINP, and diisodecylphthalate, or DIDP) which have begun to be used over the past decade; and (5) lack of mechanistic insight.

The purpose of this manuscript is to describe the NYU Children’s Health and Environment Study (CHES), which was designed to overcome these limitations and identify environmental and genetic causes of normal and abnormal growth, development, and health from fetal life onward.

Scope of research

The general aims of NYU CHES are to:

  • Evaluate influences of prenatal non-persistent chemical exposures on fetal and postnatal growth.

  • Evaluate prenatal non-persistent chemical exposures in relation to epigenetic marks and gene expression.

  • Identify metabotypes related both to exposures and cardiometabolic outcomes, and assess exposures and metabotypes in relation to oxidative stress and perturbations of adiponectin, leptin, and sex hormones.

  • Pool our data with that of other cohorts in the US National Institutes of Health (NIH) Environmental influences on Child Health Outcomes (ECHO) program and answer collaborative research questions on the impact of the preconceptual, prenatal, and postnatal environment on childhood obesity, neurodevelopment, pre/peri/postnatal outcomes, upper and lower airway outcomes, and positive health.

Study population and design

Overview

NYU CHES is a clinically enrolled, prospective cohort study from fetal life onward. Since March 2016, NYU CHES staff enrolled pregnant women into a biobank study from three NYU Grossman School of Medicine affiliates: NYU Langone Hospital—Manhattan, Bellevue Hospital, and NYU Langone Hospital—Brooklyn, diverse hospitals serving a wide array of populations. Formerly known as Tisch Hospital, NYU Langone Hospital—Manhattan is a major acute care center for the New York City metropolitan area. Bellevue is the flagship hospital of the largest municipal hospital system in North America (the New York City Health and Hospitals Corporation). NYU Langone Hospital—Brooklyn’s Family Health Center is the second largest federally qualified health center in the nation.

Eligibility and enrollment

Eligible women were ≥ 18 years old, < 18 weeks pregnant, had a pregnancy that was not medically threatened, and planned to deliver at one of the study hospitals. Study staff were bilingual and study materials were available in English, Spanish, and Chinese. To proceed from the biobank study into NYU CHES, participants must have completed an initial questionnaire collecting sociodemographic data and medical history, as well as behaviors, exposures, and experiences in the first trimester.

Study cohort

Pregnant women and their children

Between March 22, 2016 and April 15, 2019, we enrolled 2469 pregnant women into a pregnancy biobank, of whom 2193 women completed a questionnaire and continued into NYU CHES (Fig. 1). From 276 pregnant participating in the biobank, who were not enrolled in NYU CHES because of no questionnaire at enrollment (despite all our efforts), we are aware of 171 live births, while 16 electively terminated and 1 had a stillbirth. Early miscarriage may have contributed to the lack of a questionnaire in 41 of the 276. These represent an important population to study together with miscarriages in women who did complete a questionnaire (n = 88, see below), as environmental exposures in early pregnancy may have differential effects on earlier as opposed to later miscarriage among those who completed questionnaires. Of 2193 pregnant women who continued into NYU CHES, 88 miscarried (mentioned above), 28 terminated, and 20 experienced stillbirth, while 57 were lost to follow up. The 2000 live deliveries resulted in 2037 children; the mothers of 1624 (80%) of these children consented to join the postnatal phase of the study and gave permission to share their children’s identifiable data with the ECHO program.

Fig. 1
figure 1

NYU Children’s Health and Environment Study (NYU CHES): first 2000 births

The cohort is multiracial, multiethnic, and has substantial socioeconomic diversity (Table 1). The NYU Langone—Manhattan mothers are older, less likely to be Hispanic, more likely to have private health insurance, more likely to be employed, and more likely to be married or partnered; have higher income and education; and have lower BMI than the participants from the other two study sites. Across all recruitment locations, the vast majority are married or partnered and never smoked. While Manhattan mothers were more likely to have used alcohol prior to pregnancy (20%), 10% of participants from two other recruitment locations continued to use alcohol during pregnancy.

Table 1 Descriptive characteristics of NYU CHES study participants by recruitment location

To evaluate whether study participants are representative of their respective hospital populations, we compared their aggregate demographic characteristics to those of all patients seen during the same time period at NYU Langone—Manhattan and Brooklyn, the two recruitment locations for which we had hospital-wide information (Table 2). Data for all patients who had their first prenatal visit at these two locations between March 22, 2016 and April 15, 2019 and met the study eligibility criteria were aggregated. Because we did not approach every patient who received prenatal care at these sites, the percentage of study participants should not be misinterpreted as a participation rate. Descriptive data suggest that maternal age, race and ethnicity, insurance status, and parity were nearly identical among participants compared to the overall clinic populations. Participants appeared to be more likely to be married, however, compared to the overall population of pregnant women who visited these clinical sites.

Table 2 Representativeness of NYU CHES participants from NYU Langone—Manhattan and Brooklyn

To compare NYU CHES births with those in all of New York City, we obtained aggregate-level data on women who gave birth in New York City in 2016 and who sought prenatal care in the first trimester using publicly available data from the Bureau of Vital Statistics at the New York City Department of Health and Mental Hygiene [38]. Comparison of NYU CHES live births to the population of births in New York City reveals substantial similarity, supporting the broader generalizability of findings from NYU CHES (Table 3). Although NYU CHES participants with live births are substantially more likely to be Hispanic, less likely to be Asian or Non-Hispanic Black, and more likely to be married than those who gave birth across New York City, distributions of maternal age, education, parity, and pre-pregnancy BMI category are similar, as are their children’s delivery method, gestational age at birth, sex, and birth weight.

Table 3 Comparison of NYU CHES participants with women who delivered live births in New York City, 2016

From 2193 pregnant participating in NYU CHES, 20 women had stillbirths, 28 had elective terminations, and 88 experienced miscarriages. The cohort includes 37 multiple births (Table 4). Cesarean section rates were comparable at the two NYU Langone hospitals, and higher than at Bellevue (32–37% vs. 22%). Preterm birth was higher at Bellevue versus the other sites (10% vs. 7–8% for preterm birth), in keeping with known associations of sociodemographic factors with preterm birth [39, 40]. Nonetheless, low birth weight was the lowest in NYU Brooklyn (6%) compared to NYU Manhattan and Bellevue (9 and 8%, respectively).

Table 4 Birth outcomes in the initial phase of NYU CHES

Data collection in the prenatal phase

Visits in pregnancy were nested in clinical care, with questionnaire and specimen collection conducted during three routine prenatal visits in the following intervals: < 18 weeks; 18–25 weeks; and > 25 weeks gestation. These were followed by visits at birth and at regular postpartum intervals (which occur separately from clinical care). Maternal blood, urine, and saliva samples were collected at each prenatal visit, vaginal samples were collected at least once in pregnancy, and fecal samples were collected in a subsample postnatally. At birth, cord blood samples included whole blood, serum, plasma, and PAXgene tubes for RNA analysis. Placental cores (2 × 2 cm) and segments of the umbilical cord were also collected. Maternal urine samples were available for all three time points during pregnancy in a majority of participants, and nearly four-fifths of participants had placental and/or cord blood samples available (Table 5).

Table 5 Questionnaire and prenatal/neonatal specimen availability by time point, live deliveries (n = 2000)

Questionnaires and medical chart abstraction

Questionnaires administered in the first two pregnancy intervals and at birth collected information on participant and partner demographics as well as participants’ reproductive health and history, medication and substance use, employment, address, and home life (Table 6). The questionnaires included a variety of validated psychosocial scales (the Pregnancy-Related Anxiety Scale [41] and the Patient Health Questionnaire-9 (PHQ-9) [42] for measurement of depression). Women were also asked about their sleep during pregnancy using the Pittsburgh Sleep Quality Index (PSQI) [43]. Once during pregnancy, participants completed the Diet History Questionnaire II (DHQ II), a publicly available food frequency questionnaire developed by the US National Cancer Institute [44]. Of the four versions of the DHQ, we used the version that asks about diet in the past year, including portion size (available and validated in English and translated to Spanish). Maternal physical activity was assessed in all three intervals during pregnancy using the International Physical Activity Questionnaire-Short Form (IPAQ) [45].

Table 6 Variables measured in NYU CHES through child age 2 years

Substantial data are obtained directly from electronic health records, including maternal age at enrollment, parity/gravity, gestational hypertension, preeclampsia, weight and blood pressure at each prenatal visit, and mode of delivery. Gestational diabetes will be evaluated using the American College of Obstetrics and Gynecology guidelines for glucose tolerance testing. Fetal growth data and estimated gestational dating collected during regular ultrasounds are also extracted from electronic health records, including crown-rump length, biparietal diameter, head circumference, transverse cerebellar diameter, femur length, and abdominal circumference, as appropriate (Table 6).

Specimen analyses

With ECHO support, we are analyzing urine at each time point in pregnancy for OP metabolites, phthalate metabolites, bisphenols, PAHs, biomarkers of oxidative stress (F2-isoprostane and 8-hydroxydeoxyguanosine), cotinine, creatinine, and metabolomic indicators (Table 6). We are analyzing adiponectin and leptin in cord blood, as well as cytokines in serum at each pregnancy time point and in cord blood. Thyroid function (thyroid stimulating hormone, free and total thyroxine, free and total triiodothyronine, and thyroid peroxidase antibody) is being measured in serum samples collected at < 18 weeks, and maternal sex hormones (fractionated estrogens, free and total testosterone, estradiol, estrone, estriol, sex hormone binding globulin, and dehydroepiandrosterone) are being measured in the third pregnancy interval. Genome-wide methylome and transcriptome analyses will be performed on cord blood samples.

Data collection in the postnatal phase

ECHO also supports the postnatal phase of the study through age 2 years with in-person assessments that occur at ages 12–23 and 24–35 months outside of regular medical care. Each visit is coupled with a comprehensive questionnaire filled out by mothers prior to or during the visit. In addition, evaluations include questionnaires administered online or via phone at 4–7, 8–11, and 18–23 months (Table 6). Mothers are encouraged to use the information from visits with their primary care providers to report on children’s anthropometric measures in every questionnaire. They also report on history of hospitalizations and illnesses. Questionnaires evaluate breastfeeding and infant feeding history, sleep patterns, childcare, and media exposures. Child early development is assessed using the Ages and Stages Questionnaire–Third Edition (ASQ–3) [46] and the World Health Organization questionnaire on achievement of major gross motor milestones [47]. Parental rating of the child’s behavior is assessed at 12–17 months using the Brief Infant–Toddler Social and Emotional Assessment (BITSEA) [48], and at 24–35 months using the Child Behavior Checklist (CBCL) [49]. Children’s language development is assessed using the Language Development Survey (LDS) at age 24–35 months [49]. When the children are 24–35 months, we obtain information on parenting using the Parenting Scale [50] and family relationship using the Family Environment Scale (FES) [51].

In-person visits at ages 12–23 and 24–35 months take approximately 60 min and include collection of child urine samples and anthropometric measures; a dual-energy X-ray absorptiometry scan to evaluate bone, fat and muscle mass; and measurement of anogenital distance, a well-known proxy for sex steroid exposure in utero [52]. With ECHO support, 12–23 and 24–35 month urine samples will be assayed for the same analytes as the prenatal samples. During these in-person visits, we also ask mothers to complete a 24-hour recall of their child’s food intake using the Automated Self-Administered 24-hour Dietary Assessment Tool (ASA—24). Maternal depression is assessed via the Edinburgh Postnatal Depression Scale (EPDS) in the 4–7 month questionnaire and by the Brief Symptoms Inventory (BSI) in the 18–23 month questionnaire [53, 54]. At age 24–35 months, maternal stress is assessed using the Perceived Stress Scale (PSS) [55]. Food insecurity is assessed at 4–7 months using the US Department of Agriculture’s Core Food Insecurity Module.

Parents also consent to prospective, passive data collection from the electronic health record for pediatric visits. This includes anthropometric measurements; outpatient medical diagnoses; medications/immunizations; laboratory values including hemoglobin and blood lead; and inpatient and emergency room visit records, including diagnoses, procedures, and medications.

Because NYU CHES is a participating cohort in the NIH ECHO Program, our data collection is being aligned and supplemented, as necessary, to fulfill the ECHO-wide Cohort Data Collection Protocol (EWCP). Harmonized data will then be consolidated across the 70 ECHO cohorts nationally, comprising a sample of at least 50,000 children. This will enhance the statistical power and generalizability of findings related to environmental and preventable predictors of childhood disease and disability. The EWCP includes measures of neurodevelopment, asthma, obesity, pre-, peri-, and immediate postnatal outcomes, as well as positive health. To align our cohort with the EWCP and ensure its completion as children evolve through life stages beyond infancy, we have planned follow-up visits at 36–59 and 60–83 months of age.

Data management and statistical power

Data preparation

The questionnaire data are collected and managed using REDCap electronic data capture tools hosted at NYU Langone Health [56, 57]. An extensive set of validation rules and skip logic are implemented to ensure data quality. Data collected by measurements are recorded in REDCap or downloaded from the specific softwares to a secure location on the study server. Information from participants’ electronic health records is obtained from EPIC (all sites) and QuadraMed (Bellevue). The data from each data source are checked and cleaned. All missing data, implausible values, logical errors, and outliers are reviewed. Then, the data from all sources are harmonized and consolidated.

Privacy protection

The study data are securely stored on the NYU Langone Health network behind a firewall. The data are accessible exclusively to the NYU CHES staff authorized by the Principal Investigator. A set of physical, technical, and administrative controls are implemented to ensure data protection. Data access rights are assigned to personnel according to their role in the study.

The links between participants’ unique identification numbers and potentially identifying and protected health information (PHI) are securely stored on the NYU Langone Health server and are accessible only to authorized study personnel. With participant permission, specimens will be banked indefinitely with identifying links. A data transfer agreement is required for release of any data to external investigators. Depending on the Institutional Review Board approval stipulations, the datasets for analyses are either de-identified or limited.

Statistical power

Power analyses are presented in Table 7 and summarize minimally detectable difference in the unit of standard deviation (SD) for a continuous outcome between exposed and unexposed groups according to various exposure prevalences. For example, when 10% of the cohort are the exposured, the NYU CHES full cohort with the targeted 2000 live births will have 80% power to detect a difference as small as 0.209 SD between the exposure groups at a type I error level of 5% using a two-sample t test. Considering the exposure variable as continuous variable, the sample size of 2000 will have 80% power to detect a change of 0.063 SD in the outcome for one SD increase in the exposure variable. This effect size translates into 0.13 BMI in children, which is much smaller that the main effect previously reported for other environmental exposures such as dietary factors [58]. Note that many outcome variables will be repeatedly measured in NYU CHES, and this can lead to increased statistical power to study exposure effects. Minimally detected odds ratios for 20% exposure when the incidence in the exposed group is 5% or 10% are 1.65 and 1.96, respectively, in the CHES full cohort (n = 2000), and these minimum detectable ORs will be improved to 1.34 and 1.24 for one SD increase in continuous exposure variable. For gene-environment interaction analysis, power analyses were performed assuming an additive mode of inheritance for the gene. With a type I error level of 5% and the sample size of 2000, and the main effect of gene set at 0.20 SD increment of outcome per SD change in the exposure and the main effect of environment at 0.05 SD increment of outcome per SD change in the exposure, we have 80% power to detect minimal detectable interaction effect of 0.09, if the allele frequency is 40%. For allele frequencies equal to 30%, 20%, and 10%, the minimal detectable interaction effects will be 0.10, 0.11, and 0.15, respectively.

Table 7 Minimally detectable difference in standard deviation between groups according to exposure prevalence

Strengths and limitations

A leading strength of the cohort is its socioeconomic and racial/ethnic diversity, which is a byproduct of recruitment at a diverse array of clinical venues. Our cohort also includes a wide range of data from various sources—biological samples, questionnaires, and physical measurements from in-person visits, as well as electronic health records. We also have complete address histories for all NYU Langone participants that will allow us to geocode and use spatial mapping to assess spatially defined factors. Extant geospatial linkages include databases containing information on air pollution, noise, and neighborhood characteristics, including access to parks and healthy foods. The collection of specimens in three phases of pregnancy is another strength in that it allows us to examine trajectories of exposures over time and average them to account for variability due to non-persistence of certain chemical exposures. Our cohort is also well positioned to examine mixtures of non-persistent environmental chemicals at each time point in pregnancy and infancy. In addition, collection of infant samples is rare and permits examination of that window of vulnerability in relation to trajectories of ex utero growth.

Although we conceived of NYU CHES in the context of examining environmental exposures and child obesity, as part of the EWCP that is actively being implemented, our cohort is extremely well poised to examine neurodevelopmental and respiratory effects of prenatal and infant exposures to phthalates, bisphenols, OP pesticides, and PAHs, as well.

Selective non-response during the follow-up and retention are always challenges with an urban cohort, and the population of New York City is particularly fluid. Nonetheless, the comparison of mothers who provided consent to participate in the postnatal follow-up with those who decided not to (for reasons such as outmigration from New York City, lack of interest, or loss of contact with the study) confirms that participation in the postnatal phase of NYU CHES has not been selective. As shown in Table 8, participating mothers had similar age at enrollment, race/ethnic background, marital status, income, and employment status. Minor differences in characteristics such as education (higher educated mother participated in the follow-up), alcohol use (mothers who drank during pregnancy participated in the follow-up), and parity (participation rate of nulliparous mother was lower than parous mothers) will be considered in future analysis using appropriate epidemiological methods (e.g., inverse probability weighting).

Table 8 Descriptive characteristics of NYU CHES study participants by postnatal enrollment status

Currently, the study collects information on fathers/partners through maternal reporting. These data provide valuable information including sociodemographic characteristics and weight and height, but the study remains limited in direct assessments of the fathers/partners as well as their evaluation of children’s growth and development. We also have not collected health histories or specimen from any parents biologically related to the child other than the birth mother.

Collaboration

Regular data transmission to the NIH ECHO Data Analysis Center will permit centralized data analyses around common hypotheses, as well as preparation of de-identified and restricted datasets. For more information, see www.nih.gov/echo. For investigators interested in our primary data or in ancillary studies, NYU CHES has developed publications, specimen, and data sharing policies that are meant to encourage maximal use of this rich resource. More information can be obtained by contacting NYULHEnvPeds@nyulangone.org.