Background

Diesters of phthalic acid (phthalates) and some phenols are man-made chemicals widely used in personal care and consumer products. Some of these compounds are endocrine-disruptors and can impact various health outcomes in animals[1, 2], which raises concern about their potential health impacts in humans. Widespread exposure has been documented in pregnant women[35], who deserve specific consideration because of concerns on the effects of exposure to endocrine disruptors during intra-uterine life[6, 7].

A common approach for investigating human exposure to these compounds is the measurement of urinary concentrations of phthalates metabolites[8] and of the sum of concentrations of free and conjugated forms of phenols[9].

Phthalates and phenols are non persistent compounds in humans with a half life in non-pregnant subjects generally estimated to be below 24 hours[10, 11]. A high day-to-day variability in their urinary concentrations has been documented among non pregnant[12] and pregnant[13] women, as well as within-day variations for some phthalates metabolites[14, 15] and for bisphenol A (BPA)[16], but not for triclosan[17]. Other potential sources of variability in biomarker levels that can be seen as nuisances include duration of storage of the biological sample at room temperature and between-subject variations in gestational age at urine sampling or in time elapsed since last urine void.

Making sampling conditions identical across study participants is a way to limit this nuisance and make biomarker levels a better proxy of short-term personal exposure to these compounds in descriptive or etiologic studies[18]. However, for large-scale observational studies, some degree of variation with the sampling protocol (e.g., in hour of urine collection) is hardly avoidable for some participants. Unless one is interested in the average exposure of the population as a whole (in which case maximizing variability in hour of sampling might be a good option for quickly metabolized compounds), this variability is a potential source of nuisance. Because excluding participants not strictly adhering to the sampling protocol might induce selection bias, alternative approaches allowing to statistically standardize measured concentrations are worth investigating.

Such approaches have to our knowledge rarely been applied. In purely descriptive studies, assayed biomarker levels are often left untransformed. When studying the impact of biomarker levels on health outcome, adjusting for sampling conditions influencing biomarker levels is sometimes performed. This approach may not be efficient because adjusting for sampling conditions in a regression model aiming at characterizing the effect of exposure on disease risk is unlikely to correct for the effect of sampling conditions on biomarker levels. As an illustration, in a study aiming at characterizing the association between serum concentration of 25-hydroxyvitamin D and cancer risk where between-subject differences in season of collection of blood sample existed, Wang et al.[19] considered several ways to handle differences in this sampling condition influencing the biomarker level. They have shown that, because of seasonal variations in 25-hydroxyvitamin D, choosing season-specific cut-offs to categorize the levels of this biomarker was a more efficient approach than adjusting for the date of sampling in a regression model where cancer occurrence was the dependent variable. Choosing season-specific cut-offs for categories of biomarker levels fluctuating with season is, in terms of identifying the group with the highest estimated exposure, equivalent to correcting biomarker levels by a value depending on the season of sampling. This approach has the advantage of being applicable independently of any information on health outcome, e.g., in descriptive (biomonitoring) studies.

Here, we generalize this approach to the situation where several sampling conditions are considered simultaneously, using a 2-step standardization method based on regression residuals. The specific objectives of our study were to determine the influence of sampling conditions on select phthalates metabolites and phenols urinary concentrations among pregnant women; we then described the concentrations of phthalates metabolites and phenols standardized for sampling conditions.

Methods

Study population

We conducted a case – control study nested within Eden[20, 21] and Pélagie[22, 23] mother-child cohorts (Figure1). These cohorts aim to study the effects of fetal and early life events and exposures on health at birth and later in life. Women from the Pélagie cohort (n = 3,421) were enrolled before 19 weeks of gestation (counted from the first day of the menstrual period) from April 2002 to February 2006 in 3 districts of Brittany (France). Women from the Eden cohort (n = 2,002) were enrolled before the end of the 28th week of gestation, from April 2003 to March 2006, at the obstetrical departments of the University Hospitals of Nancy and Poitiers, France. Pregnant women were followed up until delivery, and children are being followed-up. Participants provided informed consent for data and biological sample collection for themselves and their offspring. These cohorts received the approvals of the appropriate ethical committees (Comité Consultatif pour la Protection des Personnes dans la Recherche Biomédicale, le Kremlin-Bicêtre University Hospital, and Commission Nationale de l’Informatique et des Libertés). During pregnancy, women completed questionnaires on socio-demographic characteristics, occupation, and lifestyle. We performed a nested case – control study, including all women (n =72) who delivered boys with external genitalia malformation identified at birth by pediatricians (cases). Three women (controls) were matched to each mother of a case for sex of the baby (i.e., male), center, date of recruitment and gestational duration at the time of collection of the urinary sample, corresponding to 216 controls (Figure1)[24]. The case – control study aimed at characterizing the impact of phthalates and phenols on congenital malformations[25], but this report focuses on issues related to exposure assessment.

Figure 1
figure 1

Flow Chart of Study Population, Composed of Pregnant Women from Eden and Pélagie Cohorts, France, 2002 – 2006.

Urine collection and analysis

For Pélagie cohort, women collected first morning urine void at home between 6 and 19 gestational weeks, as early as possible after recruitment, and mailed it by normal post to the research laboratory, where samples were stored and frozen at − 20 ° C (median storage duration at room temperature, 2 days, Table1). Mailed vials contained nitric acid in order to avoid bacterial proliferation. For Eden cohort, women were asked to collect first morning urine void at home just before the prenatal study visit, between 24 and 30 gestational weeks, using a polypropylene container (FP40VPS, manufactured by CEB, Angers, France). Women who had forgotten to bring a urine sample collected it in the hospital during the prenatal visit. Samples were aliquoted and frozen at − 80 ° C (median storage duration at room temperature 4 h, Table1). Time of urine sampling was recorded only for women of Eden.

Table 1 Characteristics of French Pregnant Women at the Time of Urine Sampling, and of their Offspring (Eden and Pélagie Cohorts, 2002 – 2006)

In 2008, frozen urine samples were shipped on dry ice to the National Center for Environmental Health laboratory at the Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia (USA). The involvement of the CDC laboratory was determined not to constitute engagement in human subject research. Measurements of 11 phthalate metabolites concentrations (see Additional file1: Table S1) were conducted using isotope dilution on-line solid-phase extraction-high performance liquid chromatography-electrospray ionization-isotope dilution tandem mass spectrometry[26]. Molar concentrations of 4 metabolites of di(2-ethylhexyl) phthalate (DEHP, see Additional file1: Table S1) were summed as total DEHP (mol/l). Also, we applied correction factors of 0.66 and 0.72 to the monoethyl phthalate (MEP) and monobenzyl phthalate (MBzP) concentrations, respectively, because the analytic standards used were of inadequate purity[27].

Urinary concentrations of 9 phenols were estimated for the population of Eden only (n = 191) by using a modification of a method involving isotope dilution on-line SPE coupled to high-performance liquid chromatography-tandem mass spectrometry[9]. Phenols were not measured for the Pélagie samples because acidification with nitric acid affected the performance of the analytical method. Assessment of concentrations was not possible for 1 urine sample of a control woman (urine container broken). Total parabens (PB) concentration was calculated by summing butyl-, ethyl-, methyl- and propyl-paraben molar concentrations.

Statistical analysis

Imputation of missing data

Concentrations below the limit of detection (LOD) were replaced by LOD/21/2[28]. Missing values in sampling time of day for Eden cohort (n = 25) were imputed using linear regression adjusted for date and sampling season, parity, education level, occupation, active smoking and center. When it was used as an adjustment factor, sampling time was assumed to be 7:00 A.M. for women of Pélagie; models describing the influence of sampling time on biomarker concentrations were estimated excluding Pélagie subjects.

Correction for case – control sampling

To make the distribution of biomarker concentrations relevant for the source population (i.e., mothers of male newborns from our cohorts), we corrected for the over-representation of cases induced by our case – control design using a reweighing approach[29]. Center-specific weights corresponded to the inverse of the inclusion probability of controls, so as to give cases and controls the same relative weight than in the original cohorts (about 1 case for 37 male newborns). Unless otherwise specified, this correction was applied in all regression models.

Standardization for sampling conditions

We used a 2-step standardization method based on regression residuals to standardize biomarker levels on sampling conditions, that is, to limit the impact of between-subject variations in sampling conditions. The principle is to take away from the observed biomarker concentration a value depending on how much the sampling conditions for subject i differ from the standard sampling conditions, i.e. those that should have been observed for all subjects in ideal conditions. This 2-step standardization method is detailed in the Additional file2 statistical appendix and outlined below:

First step of standardization: Influence of sampling conditions on biomarker concentrations: The first step consists in a description of the association between sampling conditions and the level of each biomarker, adjusted for potential confounders. Sampling conditions considered were hour, season and day of sampling, gestational age at collection, duration of storage of urine sample at room temperature before freezing (in multiples of 24 hours for Pélagie cohort where sampling hour was unknown). We also considered urinary creatinine concentration, seen as a marker of urinary dilution. Creatinine also depends on individual or behavioral characteristics such as muscle mass, and authors have proposed to use specific gravity as a more relevant marker of urinary dilution[13, 30]; however this parameter was not available in our study. The association of sampling conditions with the log-transformed concentration ln([Conc]) = Y of each compound was studied using linear regression models adjusted for all sampling conditions simultaneously (measurement models). Since individual characteristics were possibly associated to sampling conditions and biomarker concentrations, measurement models were further adjusted for maternal age, body mass index before pregnancy, parity, education, current occupation, active smoking, year of sampling and center.

Second step of standardization for sampling conditions: Using the estimated parameters of the measurement models, we predicted the concentrations that would have been observed assuming that all samples had been collected under the same standard conditions. These conditions were assumed to correspond to the median values for hour of sampling (7:30 A.M.), urinary creatinine concentration (1.2 g/l), and time elapsed between sample collection and freezing (5 hours); the day of sampling was assumed to be Monday, the trimester of sampling April-June and the gestational age at collection was assigned as the category corresponding to between 6 and 10 gestational week in Pélagie cohort and to between 24 and 25 week in Eden cohort. For each biomarker, this standardized concentration ([Conci]standardized) was estimated from the measured concentration [Conci]measured in each subject i using formula (1):

ln ( Conc i standardized ) ) = ln ( Conc i measured ) j [ β samp cond j × ( X j i X j std ) ] 1
(1)

where βsamp cond j is the regression parameter quantifying the effect of sampling condition j on the biomarker’s concentration, as estimated in the above-defined measurement model, Xj i corresponds to the value of this condition for subject i, and Xj std corresponds to the chosen standard value for sampling condition j. This formula is justified in the Additional file2 statistical appendix.

Finally, relative variations between median measured and standardized biomarker concentrations were calculated. All calculations were conducted using Stata/SE 10.0.

Results

The study included 287 pregnant women (Table1, Figure1). Eight of the 11 phthalates metabolites and 5 of the 9 phenols were detected in at least 95 % of the population (see Additional file1: Table S1). Pearson coefficients of correlation between log-transformed biomarker levels were below 0.70, but for the correlations between 2,4-Dichlorophenol (2,4-DCP) and 2,5-DCP (Pearson correlation coefficient of 0.87) and between mono-n-butyl phthalate (MBP) and mono-3-carboxypropyl phthalate (MCPP, correlation 0.76; see Additional file1: Table S2).

Sampling conditions and phthalates biomarkers

MBP and MCPP had significantly lower concentrations after 2005 than in 2003 – 2004, while for mono carboxyoctyl phthalate (MCOP) concentrations increased with sampling year. Concentrations of phthalates metabolites tended to decrease with maternal age, in particular for mono-isobutyl phthalate (MiBP). A higher educational level was associated with lower concentrations of MiBP, MBzP and MCOP (see Additional file1: Table S3).

For Eden cohort, 95 % of urine samples were collected before 10:30 AM. Apart from MEP, phthalates metabolites concentrations tended to decrease with increasing sampling hour (adjusted P  ≤  0.05 for metabolites of DEHP, MBP, MCPP, mono carboxynonyl phthalate (MCNP) and MCOP, Table2). Concentrations of all phthalates metabolites increased with urinary creatinine level. Gestational age at sampling was not associated with the urinary concentrations of phthalates metabolites. Only the concentration of MBP decreased with the time elapsed between sample collection and freezing. No phthalate metabolite was associated with either day or season of sampling (Table2).

Table 2 Adjusted Association Between Log-Transformed Phthalate Monoester Metabolites Urinary Concentrations and Sampling Conditions among French Pregnant Women From Eden and Pélagie Cohorts, 2002 – 2006 a

Sampling conditions and phenols biomarkers

A higher educational level was associated with higher parabens concentration (see Additional file1: Table S4). Sampling hour was negatively associated with BPA concentration (Table3). Urinary creatinine was positively associated with the concentrations of all phenols but triclosan. BPA concentration increased with gestational age at sampling. It tended to increase with the duration of sample storage at room temperature (Table3).

Table 3 Adjusted Association Between Log-Transformed Phenol Urinary Concentrations and Sampling Conditions among French Pregnant Women From Eden Cohort, 2002 – 2006 a

Standardization of biomarker concentrations on sampling conditions

Table4 shows the relative change in biomarker concentrations corrected for case – control sampling: after an additional standardization for sampling conditions, the strongest relative variations were observed for the concentrations of MCNP, for the metabolites of DEHP and MCOP (+80 %, +56 % and +44 %, respectively); median phenols concentrations varied between − 38 % for 2,5-DCP and +15 % for BPA. The correlation coefficients between log-transformed biomarker levels before and after standardization ranged between 0.88 for MBzP and 0.99 (P >  0.01) for triclosan (Table4).

Table 4 Phthalates Metabolites and Phenols Urinary Concentrations Among Pregnant Women From Eden and Pélagie Cohorts, France, 2002-2006

Discussion

Within our population of pregnant women, the hour of urine collection (in the morning) was negatively associated with the concentration of most metabolites of phthalates (apart from MEP) and also of BPA (a result based on Eden cohort only). Standardization for sampling conditions modified the median concentrations by − 38 % for 2,5-DCP up to +80 % for MCNP, but standardized levels were relatively strongly correlated with unstandardized ones (correlation coefficients ranged between 0.88 for MBzP to 0.99 for triclosan).

Concentrations of some phthalate metabolites and of BPA decreased with increasing hour of collection in the morning. These changes may be due to exposure being more frequent at specific hours of the day (e.g. during the evening meal and less frequently in the night and early morning), and to the toxicokinetics of phthalates and phenols in each individual. In the case of Bisphenol A, for example, Teeguarden et al concluded that spot urine samples reflect exposure in the prior meal, or prior 4- to 6-hour period, but not during the whole 24-hour period preceding urine sampling[31]. Other studies in observational settings reported strong variations in biomarker urinary levels throughout the day[14, 16]. We found no relation between gestational age at sampling and phthalates concentration. However, our study design had limitations to investigate this relation because we examined 2 distinct and relatively short periods of gestation for phthalates metabolites concentrations and only 1 for phenols concentrations, so that variability in sampling week was limited.

The decrease in the concentrations of MBP with the increasing duration of storage of urine samples at room temperature could be explained by microbial degradation or by irrecoverable adsorption of the monoesters metabolites to other urinary components or sediments[32]. The overlap between the 2 cohorts in the distributions of duration of storage was limited, so that the estimate of the influence of storage duration is mostly based on subjects from Eden cohort for durations shorter than 24 h, and on subjects from Pélagie cohort for durations of 24 h or more. The increasing concentration of BPA with increasing duration of storage at room temperature was unexpected; it might be due to a leakage of BPA from the plastic containers (or their caps) used to collect urine samples, as might happen if some women had used polycarbonate containers instead of the polyprolene containers planned for the study. Our analysis allowed to identify this potential issue and the statistical approach used attempted to correct for any resulting error.

The decrease in urinary concentrations of MBP, a metabolite of di-n-butyl phthalate (DnBP), observed from year 2005 onwards and the simultaneous increase in concentrations of MCOP, a metabolite of diisononyl phthalate (DiNP), could reflect changes in phthalates usage in Europe. DiNP is used as substitute to replace DEHP in many applications (ECPI, 2006): between 1999 and 2004, the proportion of DEHP to total phthalate usage decreased, and the proportion of DiNP and diisodecyl phthalate (DiDP) increased (ECPI, 2006). However, we did not observe a temporal decrease in DEHP metabolites.

Our results suggest that, like in other countries, French pregnant women are exposed to a range of non-persistent endocrine disruptors. MEP, MBP and MiBP were the phthalates found at the highest concentrations. The concentrations of these phthalate metabolites and of MCPP and the DEHP metabolites had the same magnitudes as those observed among pregnant women elsewhere[4, 5, 13, 33, 34]. MiBP concentrations reported in the USA among pregnant women[13, 33] were lower than in our study (see Additional file1: Figure S1). These geographical differences could be due to the fact that di-isobutyl phthalate, of which MiBP is a major metabolite, is used in Europe as a substitute of DnBP, banned by the European Union in personal care and cosmetic products[35].

Concerning phenols (see Additional file1: Figure S2), after adjustment for creatinine, BPA concentrations were higher in our French population (median, 2.5  μ g/g) than those observed among pregnant women from Rotterdam (median, 1.6  μ g/g)[5], and from Cincinnati, Ohio (median at 16 gestational weeks, 1.7  μ g/g)[36]. Mean values were lower in Eden cohort (3.6  μ g/g) than in a study in Norway[4], where higher levels (creatinine-adjusted mean of 5.9  μ g/g) could be the consequence of a high consumption of canned fish and seafood[4].

The concentrations of biomarkers issued from biochemical assays cannot always be used in a straightforward way as an exposure variable in epidemiological studies[19, 37] and may require additional modeling steps, just like for other exposure metrics. This can also probably apply to descriptive (biomonitoring) studies. Indeed, some degree of heterogeneity in sampling conditions is unavoidable in observational settings. The 2-step standardization method based on regression residuals that we proposed constitutes a way to reduce undesirable variability in biomarker urinary concentrations due to sampling conditions, and allows more relevant comparisons between subjects and possibly between studies. This source of variability can be seen as a source of measurement error in exposure, which may have impacts in studies of the association between biomarker levels and health, by impacting the regression models estimates in either direction[38] and/or confidence intervals. If we except the case of creatinine, which can be seen as a proxy for a sampling condition (time elapsed since the last void), and is very often corrected for in descriptive or etiologic studies, our report is to our knowledge one of the first attempts to limit variations in phthalate and phenols biomarker levels due to variations in urine sampling conditions in a descriptive setting using a statistical approach.

In a further step, we suggest to use the standardized biomarker concentration to characterize the relation between biomarker levels and specific health outcomes assessed in the same population[24]. Further developments of our approach that may be useful for such etiological studies would be to acknowledge for the variability in the regression coefficients corresponding to the effect of sampling conditions on biomarkers estimated in the measurement model (Eq. A.1, see Additional file2 statistical appendix). In particular, regression models in which the standardized concentrations are used as covariates should take this variability into account. Incidentally, it can be noted that using unstandardized (raw) levels in models not accounting for measurement error due to variability in sampling conditions can also impact on variance estimates and on bias; we believe that an approach like ours, aiming at making sources of measurement error explicit and at correcting for them, is a step in the good direction. We chose to standardize each biomarker level on all sampling conditions simultaneously, but in future studies authors may prefer to standardize only for those sampling conditions that turn out to be associated with the considered biomarker with a p value below a given level (say, p >  0.2).

As an alternative to using standardized biomarker levels, some authors include sampling characteristics as covariates in regression models describing associations between biomarker levels and health outcome; this 1-step approach may not allow efficiently standardizing sampling conditions, as the parameter associated with sampling conditions will reflect the association between the sampling condition and the health outcome, and not with the biomarker level[19]; our 2-step approach allows to separately consider the influence of sampling conditions on biomarker levels in a first step and to correct for it in a second step that does not consider the health outcome; in a final step, the association between standardized biomarker concentrations and the health outcome can be characterized. There is a vast body of literature on how to handle and try to correct measurement error in covariates or health outcomes[39]. However, it is focused on situations in which there is some knowledge either on the standard errors attached to the error-prone variables or on the misclassification rate, on situations in which validation data in which both true and error-prone variables have been assessed in a sub-populations, or in which instrumental variables are available. These situations do not correspond to ours, in which we do know and measure some factors causing measurement error (the sampling conditions), and empirically estimate the influence of these factors on the mismeasured concentrations.

The impact of sampling conditions on biomarker levels was empirically estimated based on the association observed in our data. An alternative would be to use a toxicokinetic model; however such models are not currently available for most of the studied metabolites[40, 41], in particular for pregnant human subjects; only limited data on the half-life or other toxicokinetic parameters of the studied compounds are available, in populations distinct from pregnant women, who are different from non-pregnant women in terms of metabolism for specific xenobiotics[42]. The lack of repeated assessment and of information on timing of exposure did not allow to develop such toxicokinetic modeling within our population.

Our two-step approach is to our knowledge original although it follows a logic previously used in some biomarkers studies[19], and also in other areas of the epidemiologic and clinical literature, for example in studies of lung function, in which results of a lung function test (e.g., FEV1) are standardized on gender and age, to limit the impact of these (nuisance) factors.

Each pregnant woman provided only a single urine sample, which probably limited the accuracy of our estimates of the influence of sampling conditions. In practice, many studies in the general population rely on a single urine sample, and standardizing for sampling conditions should also be attempted in this setting. We assumed that adjustment for individual characteristics such as age, occupation or smoking, made women with different sampling conditions more comparable. However, this approach might be limited by the existence of unmeasured lifestyle or occupational factors simultaneously associated with exposure and sampling conditions. For instance, if women who collected a urine sample early in the morning used more phthalate-containing cosmetics than those who provided a urine sample later in the morning, we might attribute to variations in sampling hour differences actually due to real exposure contrasts. Time since last exposure (and amount of exposure) are also parameters likely to influence biomarker levels. These were not available in our study; their assessment is challenging in observational studies focused on compounds with several sources and whose presence in consumers’ products is not known by study participants. Moreover, time since last exposure is likely to be shorter for subjects frequently exposed to these compounds (and hence also possibly more highly exposed to these compounds), so that standardization for time since last exposure might artificially decrease the between-subject contrasts in exposure.

The efficiency of our approach may differ between compounds. In the case of standardization for sampling hour, for example, the approach is more likely to be efficient for compounds in which biomarker levels in urine follow a similar temporal pattern throughout the day for most participants; such a situation is close to what has been described for MEP[14]. For other compounds for which temporal patterns strongly differ between participants, as has been described for mono-(2-ethyl-5-hydroxyhexyl) phthalate[14], our approach is likely to be less efficient; in such cases, there may be no efficient statistical alternative to collecting several urine samples per day or 24-hour urine samples[14], at least in a sub-population, after which measurement error models[39, 43] or toxicokinetic models (if available) could be used. Similarly, if there is no consistent pattern of variation in exposure levels throughout the week, as seemed to be the case in our population for most compounds, our approach is unlikely to correct for daily variations in exposure and thus to make biomarker levels more representative of the weekly exposure average.

For the above-mentioned reasons, some error in our estimates of the influence of sampling conditions on biomarker concentration is expected, so that we cannot exclude that the standardized concentrations sometimes entail more bias than the original measure[44]. Consequently, studies on exposure-response relations using an approach such as ours should also report the association between the uncorrected biomarker concentrations and the health outcome, in addition to the association relying on standardized biomarker concentrations[24]. Furthermore, information on sampling conditions such as those considered here (in addition, whenever relevant, to batch number, assay date, and information on any deviation from the planned protocol) should be collected for all study subjects so that their possible impact can be characterized and if required corrected for.

Conclusions

In conclusion, hour of sampling was associated with the urinary concentrations of select phthalate metabolites and phenols. This confirms the relevance for studies aiming to characterize the health effect of compounds with a short half-life such as phthalates and phenols to rely on repeated biomarker assays. Our approach used to standardize concentrations of biomarkers in urine specimens collected under varying conditions (e.g., time of day) could be relevant for future studies aiming at describing the urinary concentrations of biomarkers, or their influence on human health outcomes.