Analysis of Human Co-exposure to Lead and Cadmium Using Human Biomonitoring (HBM) Data in a Bayesian Copula-Based Regression Framework

Sy, Moustapha; Conrad, André; Jung, Christian; Lindtner, Oliver; Greiner, Matthias

doi:10.1007/s12403-023-00573-w

Analysis of Human Co-exposure to Lead and Cadmium Using Human Biomonitoring (HBM) Data in a Bayesian Copula-Based Regression Framework

Original Paper
Open access
Published: 07 June 2023

Volume 16, pages 503–516, (2024)
Cite this article

Download PDF

You have full access to this open access article

Exposure and Health Aims and scope Submit manuscript

Analysis of Human Co-exposure to Lead and Cadmium Using Human Biomonitoring (HBM) Data in a Bayesian Copula-Based Regression Framework

Download PDF

Moustapha Sy ORCID: orcid.org/0000-0001-9461-7316¹,
André Conrad²,
Christian Jung¹,
Oliver Lindtner¹ &
…
Matthias Greiner¹

1163 Accesses
Explore all metrics

Abstract

The identification of human co-exposure to industrial chemicals or environmental substances is of high interest in human health risk assessment. Due to their ubiquity and persistence in the environment, heavy metals such as cadmium (Cd) and lead (Pb) are of particular concern. Approaches to adequately investigating combinations of these and other often highly correlated variables are lacking. This study proposes a modeling approach to investigate the co-exposure to Cd and Pb, and better understanding the variations of blood Cd and Pb (CdB and PbB, respectively) together with potentially determinant factors. A copula-based regression model was built, using Bayesian inference and Markov Chain Monte Carlo simulation, to relate CdB and PbB of 3- to 14-year-old children participating in the German Environmental Survey for Children (GerES IV) with socio-demographic and ancillary exposure-relevant information. A minor to negligible dependence between CdB and PbB was observed, suggesting that Cd and Pb are subject to differing exposure sources/pathways or kinetics within human body. Despite the resulting low association between CdB and PbB, the developed approach provides methodological bases for enhancing the assessment of the cumulative exposure to multiple substances and for deepening the understanding of the determinants of these exposures.

Graphical Abstract

Assessment of the Long-Term Exposure to Lead in Four European Countries Using PBPK Modeling

Article Open access 28 January 2023

Estimation of Urinary Arsenic Exposure using Copula-Based Regression: A Case Study of West Bengal

Article 14 August 2014

Description of exposure profiles for seven environmental chemicals in a US population using recursive partition mixture modeling (RPMM)

Article 21 December 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In the recent years and within the scientific community involved in human health risk assessment, a growing attention is directed toward the identification of human co-exposure to industrial chemicals or environmental substances and their potential effects on human health (Rotter et al. 2018). Despite that raised attention, most studies focus mainly on the exposure to a single element, whereas the human exposure is generally the outcome of diverse conditions and is therefore often a combination of several substances co-occurring in the environment (IPCS, 2009). The assessment of the co-exposure to multiple substances or substances in mixtures is generally known as combined or cumulative exposure (Price and Chaisson 2005).

At the European level, initiatives such as the EuroMix project, a tiered strategy for risk assessment of mixtures of multiple chemicals, tackle this challenge. The EuroMix project aims to establish and to disseminate new, efficient, and validated test strategies for the toxicity of chemicals in a mixture aiming to deliver refined information for future safety assessment of chemicals (Beronius et al. 2020; Rotter et al. 2018). In addition, EuroMix aims for a refined strategy for grouping chemicals into cumulative assessment groups and prioritizing related data gaps. An additional goal is, to derive a harmonized approach to assessing risk including information on possible additive, synergistic or antagonistic effects of the chemicals in the mixtures at real life exposure levels (Beronius et al. 2020). To this end, human biomonitoring (HBM) studies can be valuable research method and sources of information. HBM investigates human exposure to chemicals and their effects through systematic standardized measurement of the concentration of those compounds or their metabolites in human specimens (Angerer et al. 2007). HBM reflects the past and current aggregated internal exposures from all routes and sources at either an individual or a population level. Moreover, HBM studies are a reliable way of gaining insight into (individual) co-exposures to multiple substances. The application of HBM in exposure and risk assessment gained momentum at the European level with European Human Biomonitoring Initiative HBM4EU (Ganzleben et al. 2017).

One important challenge in HBM studies is the interpretation of the HBM data from an exposure assessment and a risk management perspective with the aim, for instance, to identify the relevant exposure sources or pathways (Angerer et al. 2007). Such interpretation generally consists in linking biomarkers of exposure with external exposure estimates and information on exposure-relevant factors through multivariate analysis. The goal of this linkage is to understand the variation, for example, of external exposures, individual food consumption patterns, physiological parameters, socio-demographic factors, and their influences on concentrations of chemicals and their metabolites in human samples, such as blood and urine. Regarding the analysis of co-exposure to multiple substances, approaches jointly analyzing multiple input variables to multiple output variables are therefore of high importance.

Weighted quantile sum (WQS) regression that was lately introduced for the analysis of health effects of chemical mixtures is, to the best of our knowledge, one of the rare available methods allowing to account for highly correlated data in co-exposure settings (Carrico et al. 2015; Keil et al. 2020; Lee et al. 2019; Tanner et al. 2019). However, in WQS regression, the co-exposure to multiple substances and the associated correlations are considered to predict health effects in a single-output regression structure. The development of robust multi-output regression approaches enabling to extract insightful patterns from multivariate data is thus critical. In the field of machine learning, techniques or algorithms were newly developed for multi-output regression and classification problems. These methods rely on diverse methodologies including for instance multiple independent single-target methods, and input or output space expansion approaches. Indeed, Tsoumakas et al., (2014) introduced the random linear target combinations (RLC) method, which consists in using random linear combinations for generating new output variables, as an output expansion approach. Spyromitros et al. (2016) presented two methods based on input space expansion: stacked single-target (SST) that consists in fitting single-output regressions on the input variables expanded by predictions of the output variables whereas the ensemble of regressor chains (ERC) is built on single-output regressions fitted on the input variables sequentially expanded by the output variables. Despite their efficiency under a predictive perspective, these algorithm-based methods are generally inconvenient to help understanding the dependence structure between output variables within the data. Copula-based regression models appear as an adequate alternative since it provides, while relating multiple outputs and input variables, a mathematical representation of the dependence structure of the variables of interest. Copulas are flexible probabilistic tools for modeling the joint distribution of random vector and are specifically convenient to capture the dependence structure among the vector components (Park et al. 2021; Smith 2013; Song et al. 2009). Copula-based regression modeling was recently implemented in diverse applications such as the electricity market, the prediction of crash counts, in web-marketing, or econometrics etc. (Park et al. 2021; Pitt et al. 2006; Sahu et al. 2003; Smith et al. 2012). In addition, the combination of copula-based regression and the Bayesian approach offers an appropriate framework enabling not only to estimate the parameters of the model but also to conveniently accommodate the complexity stemming from the copula representation. The copula regression model is difficult to estimate by classical maximum likelihood estimation when the multivariate dimension is high, as the likelihood become intractable (Smith et al. 2012).

The main objective of the present study was to propose a flexible and robust framework of mathematical modeling, which could help advancing the assessment of human co-exposure to multiple substances. Because of their large distribution, accumulation and persistence in the environment and in biological systems, cadmium (Cd) and lead (Pb) were selected as candidate substances. Both substances are relatively well known considering their exposure pathways and potential adverse health effects to humans (Heinemeyer and Bösing 2020; IARC 2006; Jan et al. 2015; Tchounwou et al. 2012; Timothy and Williams 2019; WHO, 2009).

In this work, a copula-based regression model was developed to analyze the internal co-exposure of children aged three to fourteen years to cadmium and lead. The copula regression model was applied to HBM data (Cd and Pb in whole blood) collected in the German Environmental Survey for Children (GerES IV) and the individual information (e.g., age, and other co-variates such as the tobacco smoking at home, or regional aspects, etc.) collected using standardized interviews and questionnaires. This analysis intends to better understand the variations of Cd and Pb concentrations in human blood in conjunction with potential determinants of the external exposure to these heavy metals. The unknown quantities of the copula regression model were estimated, under the Bayesian framework, by performing Monte Carlo Markov Chain (MCMC) simulation. The influence of the characterization of correlations provided by the implemented model was assessed using GerES IV data and its predictive performances were evaluated in comparison with machine learning algorithms by using simulated data.

Materials and Methods

Data

The modeling framework that was implemented here was built by considering:

(i)
The concentrations of Cd and Pb in whole blood samples of children aged from 3 to 14 years participating in GerES IV as outputs, and
(ii)
The ancillary exposure information collected for these participants as input variables supposed to be influential of the co-exposure to Cd and Pb.

GerES IV was conducted from 2003 to 2006 by the German Environment Agency (UBA) with the aim to provide representative data for health-related environmental monitoring and reporting at a national level (Schulz et al. 2012). GerES IV is the environmental module of the German Health Interview and Examination Survey for Children and Adolescents (KiGGS baseline study) that was carried out by the Robert Koch Institute (RKI) (Kurth et al. 2008). GerES IV allowed for the first time to include children aged 3 to 5 years old and to update information collected in GerES II for children aged 6–14 years (Schulz et al. 2007). In total, 150 sampling points in Germany were included and 1790 children participated in GerES IV. In addition, GerES IV was organized in four modules (a base module providing measurements of substances in human samples and interviews as well as three additional modules in which indoor air measurements, stress hormones and noise pollution, and sensitization to indoor mold were analyzed, respectively), which resulted overall in more than 1000 variables with various pieces of information (Schulz et al. 2007, 2012). Methods for the analysis of Pb and Cd in blood (PbB and CdB) are described elsewhere (Becker et al. 2008).

In our analysis, an initial data extraction procedure was applied to the global GerES IV dataset contained in the public use file made available by UBA, which consisted in:

The selection of input variables by using preliminary multivariate analyses (linear regression for continuous variables or analysis of variance for categorical variables) to investigate the dependence between output variables and the considered input variables. Input variables, which were more significantly influential on the variations of PbB and CdB concentrations, were selected. These developments are not presented here, but further information are provided in supplementary materials S1 and S2.
The removal of individual records with missing data and individual records for which the concentrations of CdB or PbB were below the respective limits of quantification.

This selection phase resulted in a dataset with 480 individual records and 5 input variables (age, number of smokers living in the same dwelling as the participant, sex, living in former East or West Germany, and the child´s smoking status). The potential impact of this selection phase on the analysis is discussed in Sect. Study Limitations. Figure 1 illustrates the relationship between CdB and PbB.

Modeling

Copula-Based Regression Model

The modeling framework presented here relies on the concept of copulas. Copulas are multivariate functions defined, according to Sklar (1959), as joint multidimensional distribution functions having uniformly distributed margins (${U}_{j}$ with $j = 1,\cdots ,q$) on [0,1] as described by Eq. (1).

$$\begin{array}{*{20}c} {{\mathbb{C}}\left( {\left[ {0,1} \right]^{q} } \right) \to \left[ {0,1} \right]} \\ {{\mathbb{C}}\left( u \right)\, = \,{\mathbb{C}}\left( {u_{1} , \cdots , u_{q} } \right)\, = \,{\mathbb{P}}\left( {U_{1} \le u_{1} , \cdots , U_{q} \le u_{q} } \right)} \\ \end{array}$$

(1)

where $u = ({u}_{1}, \dots , {u}_{q})$ is the vector of the uniformly distributed margins.

It exists a broad variety of families of copula functions (Nelsen 2006). Depending on the nature of the problem (e.g., dependence between the considered variables, computational complexity, etc.), an adequate choice of copula function is required. However, methods to adequately select between copula functions are lacking (Manner 2007). Within the variety of families of copula functions, elliptical copulas and Archimedean copulas are frequently and widely used in high dimension problems, since they can straightforwardly be constructed based on their parametric forms. Archimedean copulas (e.g., Clayton, Frank, Gumbel, Joe, etc.) are known to be easily deduced and to accommodate different types of dependence, and elliptical copulas, which are based on elliptical distributions (e.g., the Gaussian and Student´s T), offer classical correlation structures useful to fully describe the dependencies between variables in multivariate cases (Atique and Attoh-Okine, 2018). Specifically, elliptical copulas are of interest in this analysis for being specifically suitable in multivariate cases as they can be constructed from a continuous multivariate distribution as follows (Smith et al. 2012; Song et al. 2009). Let $Y=({Y}_{1}, \cdots , {Y}_{q})$ be a vector of random variables with a multivariate distribution function ${F}_{Y}(y;\theta )$, marginal distribution functions ${F}_{j}({y}_{j})$ , and marginal densities ${f}_{j}({y}_{j})$ for $j = 1,\cdots ,q$ where $y=({y}_{1},\dots ,{y}_{q})$, and $\theta$ represents the set of parameters of the copula function. This multivariate distribution yields the copula function below.

$${\mathbb{C}}\left( {u;\theta } \right)\, = \,{\mathbb{P}}\left( {F_{1} \left( {Y_{1} } \right) \le u_{1} , \cdots , F_{q} \left( {Y_{q} } \right) \le u_{q} } \right)\, = \,F_{Y} \left( {F_{1}^{ - 1} \left( {u_{1} } \right), \cdots ,F_{q}^{ - 1} \left( {u_{q} } \right);\theta } \right)$$

(2)

The copula framework that we adopted was built using a multivariate Gaussian for diverse reasons including the parsimonious aspect of its associated parameter inference (it has a less complex structure compared to the Student´s T copula), the possibility it offers to capture the dependency structure of the data using the classical correlation matrix, and the symmetrical aspect observed between both variables of interest as shown in Fig. 1 (Joe 2014; Park et al. 2021). The Gaussian copula has the following general form:

$${\mathbb{C}}\left( {u;R} \right)\, = \,{\mathbb{C}}\left( {u_{1} , \cdots ,u_{q} ;R} \right)\, = \,{\Phi }_{R} \left( {\phi^{ - 1} \left( {u_{1} } \right), \cdots ,\phi^{ - 1} \left( {u_{q} } \right)} \right)$$

(3)

where ${\Phi }_{R}$ represents the probability distribution function (PDF) of a multivariate Gaussian ${N}_{q}(0,R)$ with a q-dimensional vector of zeros as mean and R as correlation matrix, and ${\phi }^{-1}$ corresponds to the inverse of the PDF of a standard univariate Gaussian $N(\mathrm{0,1})$.

Marginal Regressions

A critical aspect for building a copula-based regression model concerns the marginal distributions of the multivariate distribution. The concentrations of cadmium and lead in whole blood, which are positive and right-skewed variables, can be modeled using the lognormal distribution that is often well suited for such characteristics (Ott 1990).

In this work, we opted to first transform Cd and Pb concentrations with logarithm and to model these log-Cd and log-Pb using univariate normal distributions which are equivalent to consider lognormal distributions for both marginal distributions. Therefore, the marginal regression models are expressed as follows:

$$\begin{array}{*{20}c} {Y_{ij} \, = \,X_{i} .\beta_{j} \, + \,\varepsilon_{ij} } \\ {\varepsilon_{ij} \sim N\left( {0,\sigma_{j} } \right)} \\ \end{array}$$

(4)

The ${Y}_{ij}$ are the logarithms of Cd and Pb concentrations for individual $i$ and substances $j$ with $i = 1,\dots ,n$ and $j=1, 2=\{Cd, Pb\}$. ${X}_{i}=(1, {X}_{i,1}, \cdots , {X}_{i,p})$ is the (p + 1)-dimensional vector of co-variates for individual $i$ with the first element being equal to one for the intercept and p = 5 is the number of considered input parameters (see Section Data). The (p + 1)-dimensional regression coefficients ${\beta }_{j}$ and the real-valued residual coefficients ${\sigma }_{j}$ for each substance are unknown quantities to be inferred from the data.

Bayesian Inference

The Bayesian approach is, due to its flexibility, suitable to accommodate the complexity of the copula model. This flexibility is possible by using the combination of a priori knowledge on the parameters of interest (priors), the likelihood that the chosen structure produced the observations on-hand (likelihood) in order to generate posteriors that give an inferred, and an a posteriori representation of the modeled system as the ranges of parameters variations are characterized (Gelman et al. 2013).

Likelihood

In statistical modeling and specifically parameter inference, the likelihood describes the probability that a set of observed data was produced by a given parametric model. This joint probability of the observed data is then represented as a function of the model parameters. In this copula-based regression model, the likelihood is given by:

$${\mathbb{P}}\left( {Y|\beta_{j} ,\sigma_{j} ;R} \right) \propto \left| R \right|^{{ - \frac{n}{2}}} .exp^{{\left( { - \frac{1}{2}\mathop \sum \limits_{i = 1}^{n} Z_{i} \left( {R^{ - 1} - I_{q} } \right)Z_{i}^{t} } \right)}} .exp^{{\left( { - \frac{1}{2}\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{q} \frac{1}{{\sigma_{j}^{2} }}\left( {y_{ij} - X_{i} .\beta_{j} } \right)^{2} } \right)}} .\mathop \prod \limits_{j = 1}^{q} \left( {\sigma_{j}^{2} } \right)^{{ - \frac{n}{2}}}$$

(5)

where $\propto$ represents the proportionality symbol up to a known regularization term, ${I}_{q}$ is the q-dimensional identity matrix. The ${Z}_{i}=({Z}_{i1},\dots ,{Z}_{iq})$ are q-dimensional vector of latent variables with ${Z}_{ij}={\Phi }^{-1}\left({F}_{j}({y}_{ij})\right)$. The terms ${\Phi }^{-1}, {F}_{j} {\text{and}} {f}_{j}$ correspond to the quantile function of a standard normal distribution, the cumulative distribution function, and density of the marginal regression models, respectively.

The two first elements of Eq. (5), which are function of the correlation matrix R, embody the dependence structure of the model. This implies that if no correlation between the variables of interest is considered, the likelihood reduces to the product of PDFs that correspond to the likelihood of a multivariate distribution with independent margins.

Priors

The specification of the priors for unknown quantities is an essential requirement in Bayesian analysis. The defined priors are described below:

$$\begin{array}{*{20}c} {\pi \left( {\sigma_{j}^{2} } \right) \sim InvGamma\left( {\alpha_{0} ,\gamma_{0} } \right)} \\ {\pi \left( {\beta_{j} } \right) \sim N_{p + 1} \left( {b_{0}^{j} ;B_{0}^{ - 1} } \right)} \\ {\pi \left( R \right) \sim I\left\{ {R_{jk} :R_{jk} \, = \,1 \left( {j\, = \,k} \right), \left| {R_{jk} } \right| < 1 \left( {j \ne k} \right) {\text{and}} R {\text{positive definite}}} \right\}} \\ \end{array}$$

(6)

where ${\alpha }_{0}={\gamma }_{0}=0.01$ are the hyperparameters of the inverse gamma distribution defined for the variance of the residuals. For the ${\beta }_{j}$ regression coefficients, the hyperparameters $b_{0}^{j} \, = \,\left( {0, \cdots ,0} \right)^{\prime }$ corresponding to the mean vector and ${B}_{0}=0.01\times {I}_{p+1}$ corresponding to the precision matrix (the inversion of the covariance matrix) were adopted for the (p + 1)-dimensional normal distribution. For the correlation matrix R, a non-informative prior satisfying the constraints for a correlation matrix was defined. This prior is uniform on the restricted space of correlation matrix (Barnard et al. 2000; Chib and Winkelmann 2001; Park et al. 2021).

Posteriors and MCMC Simulation

By applying the Bayes theorem, the following full joint posterior distribution of all the unknown parameters is derived.

$${\mathbb{P}}\left( {\sigma_{1}^{2} , \cdots ,\sigma_{q}^{2} ;\beta_{1} , \cdots ,\beta_{q} ;R|Y} \right) \propto {\mathbb{P}}\left( {Y|\beta_{j} ,\sigma_{j} ;R} \right)\, \times \,\pi \left( R \right)\, \times \,\mathop \prod \limits_{j = 1}^{q} \pi \left( {\sigma_{j}^{2} } \right)\, \times \,\mathop \prod \limits_{j = 1}^{q} \pi \left( {\beta_{j} } \right)$$

(7)

Due to the complexity of this distribution, an MCMC simulation scheme consisted of 2 chains, 10⁵ total iterations including a burn-in period of 5 × 10⁴ iterations, and a thinning parameter fixed at 10 was implemented and is organized as follows:

At each iteration and for each simulated Markov chain,

1.
The posterior of the standard deviations of the residuals ${\sigma }_{j}$ (with $j=1, 2$) is generated using a random walk Metropolis–Hastings algorithm,
2.
The posterior of the regression coefficients ${\beta }_{j}$ is generated using a random walk Metropolis–Hastings algorithm, and
3.
The posterior of the correlation matrix is generated using the parameter expansion and reparameterization Metropolis-Hasting procedure, which was developed by Liu and Daniels (2006) and lately applied by Park et al., (2021).

Because of its non-triviality, a more detailed description of the sampling procedure applied for the correlation matrix R is given in Section S1 of the Supporting Information S1.

Predictability of the Bayesian Copula Model

The predictive performances of the Bayesian copula model were evaluated by following two approaches:

In the first approach, the influence of the characterization of the dependence structure between the Cd and Pb concentrations on the model predictability is evaluated by comparing the results drawn from the Bayesian copula-based model (i.e., the predicted Pb and Cd concentrations) and the multivariate model with no correlation, which is equivalent to define R as an identity matrix.
In the second approach, an experiment to compare of the Bayesian copula model with the random linear combinations, ensemble regressor chains, and stacked single-target methods, which are machine learning (ML) algorithms developed for multi-target regression, was conducted (Spyromitros et al., 2016; Tsoumakas et al., 2014). To this end, a synthetic dataset that consisted of three output variables and six input variables was generated. The generation of the synthetic data, the ML algorithms and the metrics used to compare the predictions are described in the supplementary material S1.

Results

Bayesian Inference

Descriptive statistics of the unknown parameters of the Bayesian copula-based regression model were calculated from the posteriors sampled with MCMC and are displayed in Table 1.

Table 1 Descriptive statistics of the parameters´ posteriors drawn from the Bayesian copula-based regression (BCR) model applied to the GerES IV data

Full size table

The median of the correlation coefficient ${R}_{12}$ between Cd and Pb concentrations in children´s blood samples, which is central in the Bayesian copula-based regression model, is estimated at 0.067 with a 95% credible interval (abbreviated here 95% CI and equivalent in Bayesian inference to the frequentist confidence interval) varying from − 0.034 to 0.167. Figure 2 illustrates the distribution of the posterior of ${R}_{12}$. The resulting posterior distribution of ${R}_{12}$ values illustrate a minor to negligible dependency between the concentrations of Cd and Pb in whole blood of the children participating in GerES IV.

The standard deviations obtained for Cd and Pb are in similar ranges with 95%CI varying from 0.422 to 0.487 for Cd and 0.480 to 0.551 µg/L for Pb, respectively. However, proportionally to the order of magnitude of the concentration values, these standard deviations denote a significantly higher variability of Cd concentrations in blood in comparison with Pb.

Because of the logarithmic transformation of Cd and Pb concentrations, the interpretation of the derived regression coefficients ${\beta }_{j}$ can be difficult. Nevertheless, the influence of the input variables was evaluated by analyzing the variations of Cd and Pb concentrations induced by variations of input variables. Specifically, a slight increase (about 6%) and a mild decrease (about 15%) of the concentration values are generated by the age of children as the regression coefficient ${\beta }_{1}^{j}$ equal to 0.043 with 95% CI = [− 0.008, 0.092] and − 0.103 with 95% CI [− 0.159, − 0.047] for Cd and Pb, respectively. For sex and the former Western/Eastern German sample point variable, an opposite and mild variation is noticed between Cd and Pb. Indeed, about a 9% decrease between girls and boys and a 14% decrease from former West to East Germany of the concentration of Cd are observed whereas increases around 12% between girls and boys and 8% from western to eastern Germany are observed. Regarding input variables associated with the smoking behavior around the children (the number of smokers in the dwelling and the smoking status), significant increases (about 50% and 25%) are induced on Cd concentrations with ${\beta }_{12}=0.067$ with 95% CI [0.020, 0.112] and ${\beta }_{15}=0.733$ with 95% CI [0.529, 0.928]. For Pb, mild increases (about 10% and 12%) are observed. This observation coincides with the association between the exposure to Cd and tobacco smoke, which is known to be much higher than the association between Pb and tobacco smoke (Bernhard et al. 2005).

Overall, similarities are observed between both marginal regression models considering the parametric uncertainty. Indeed, the parameter-specific ranges of the 95% CIs are for Cd and for Pb in the same order of magnitudes. For instance, the ranges for the regression coefficient associated with children age ${\beta }_{j1}$, the number of smokers in the dwelling ${\beta }_{j2}$ , or their smoking status ${\beta }_{j5}$ are equal to 0.100, 0.092 and 0.399 for Cd, and 0.112, 0.103, and 0.458 for Pb, respectively. An identical observation is made for the standard deviations of the residuals ${\sigma }_{j}$, as the 95% CI ranges are 0.06 and 0.07 for Cd and Pb, respectively. Therefore, in both marginal regressions, the same level of imprecision or variations around the central input and output values (mean or median) is observed. This could result from the shrinkage effect stemming from the application of the log transformation on the marginal concentration values, which are both coming from a right-skewed distribution.

Influence of Characterizing the Dependence Structure

Fixing the correlation coefficient ${R}_{12}$ at zero reduced the Bayesian copula model to independent multivariate regression models (one for each of Cd and Pb), which enabled us to evaluate the impact of modeling the dependence structure between Cd and Pb concentrations. The resulting posterior distributions of the model parameters are summarized by the descriptive statistics presented in Table 2. A significant shrinkage of the parametric uncertainty is noted since the ranges of the 95% CI, which vary from 0.001 to 0.01, are tenfold lower than those produced by the Bayesian copula-based regression model. The standard deviations of the residuals are also smaller than to those generated by the Bayesian copula-based regression model by a factor ranging from 40 to 50. This illustrates a better coverage of the global uncertainty using the copula-based framework. Indeed, the uncertainty that is supposed to stem from the correlation matrix R and thus the dependence structure of the output space is propagated to the other model parameters and consequently characterized by their posteriors.

Table 2 Descriptive statistics of the parameters posteriors drawn from the independent multivariate regression models applied to the GerES IV data. The independent multivariate regression models is equivalent to the BCR model with the correlation coefficient ${R}_{12}$ being fixed at 1

Full size table

An identical observation (i.e., a better characterization of the parametric uncertainty by the copula-based model) is made considering the evaluations using simulated data as shown in Table S1 of the supplementary material S1. The derived 95% CIs drawn from the copula-based model cover the simulated parameter values better than the independent multivariate regression models. Despite these discrepancies between both models, similar predictions on the test set extracted from the GerES IV data based on the medians of model parameters are produced as illustrated in Fig. 3.

Discussion

Lessons Learned from this Analysis

In this study, we proposed a modeling approach with the background objective to enhance the analysis and the assessment of the cumulative exposure to multiple chemicals. This modeling approach relies first on the use of biomarkers of exposure and exposure-related data collected from HBM studies, which generally carry valuable information about the ancient and current exposure of human populations and potential exposure-related determinants. As candidate substances, cadmium and lead, two heavy metals that have been extensively studied over the last decades, were selected for illustrative purposes. Second, the proposed approach was built on the implementation of a copula-based model to solve the multi-output and multivariate regression problem posed by the simultaneous consideration of Cd and Pb concentrations in whole blood of German children as outputs and the analysis of the influence of ancillary exposure-related information treated as inputs of the marginal regression models. Specifically, the copula framework enabled accommodating the added complexity by the characterization of the dependence structure between Cd and Pb concentrations. The developed model was fitted to the study data by using Bayesian inference. The Bayesian inference helped us to characterize the uncertainty of the model parameters, namely the marginal regression coefficients and standard deviations of the residuals and the correlation matrix R that is the key parameter in this copula model.

A minor to negligible dependence was observed between Cd and Pb concentrations in whole blood with low correlation coefficient values (the 97.5th percentile of the correlation coefficient smaller than 0.20). This means that high or low exposure to Cd does not imply high or low exposure to Pb and inversely. Even though both heavy metals are co-occurring in environmental and consumer matrices (air, soil, dust, food, drinking water, and other consumer products), the negligible dependence observed in our analysis does not support analyzing the co-exposure to these two elements using the notion of correlation. Exposure sources and pathways specific to each of both or different kinetics within human organism could be investigated to explain this low or absence of association. Moreover, slightly stronger associations between PbB and CdB have been observed in the literature that are explained by either local co-contamination from soil in (King et al. 2015) or by maternal age in (Nakayama et al. 2019).

High parametric uncertainty and residual values were drawn from the copula-based model in comparison with the multivariate regression models without dependence characterization. This is exclusively explained by the inclusion, within the copula-based model structure, of a supplementary source of uncertainty related to the correlation between Cd and Pb concentrations in blood and its inter-dependencies with the marginal regression parameters ${\beta }_{j}$´s and ${\sigma }_{j}$´s as illustrated by the equation of the model likelihood in Section Bayesian inference with the correlation-specific terms. In this regard, the Bayesian inference combined with MCMC simulation offered the required flexibility to tackle the complexity introduced in the model by the estimation of the coefficient of the correlation matrix. This observation points out the characterization of the global (parametric) uncertainty in the context of highly correlated outputs. Indeed, the evaluations drawn from the simulated data showed a better adequacy between the estimated and observed parameter values and a better coverage of the parametric uncertainty around the various coefficients of the marginal regression models derived from the copula-based modeling approach.

The fitted copula-based model and the estimated parameter values exhibited insightful relationships between the input variables and the Cd and Pb concentrations. For Cd, an increase of the whole blood concentration is associated with children´s age whereas Pb in whole blood is negatively associated with age. Due to its ubiquity in the environment, humans can be exposed to Cd via numerous pathways including various dietary sources (EFSA 2009). This is supported by first GerES V observations of higher internal Cd exposure being associated with a vegetarian diet in bivariate analysis for participants aged 3 to 17 years (Vogel et al. 2021). The decrease of Pb with age has been observed in several other studies. A possible explanation discussed in literature is that Pb exposure can be higher in younger children because of the global oral exposure, which is highly relevant for this subgroup of the population due to hand-to-mouth behavior and the accumulation of Pb in soils and dusts (Burm et al. 2016). The population body burden of lead decreased strongly in recent decades against the background of the ban of Pb in fuel since the 70 s in Europe, which have considerably reduced the release of Pb in the environment as indicated by the review and collation of historical data directed by Bierkens et al. (2011) or by Lermen et al. (2021). Slight differences in Cd and Pb concentrations were noticed between German boys and girls and between former West and East Germany. Physiological processes can induce differences between males and females regarding the growth of human body, particularly during the adolescence (Ramos et al. 1998; Schedler et al. 2019). Moreover, the body burden of Pb is associated with skeletal growth, which could thus justify the observed differences of Pb concentrations between German boys and girls (EFSA 2012; O´Flaherty 1991). This aspect might also be discussed in view of the observed decrease with age (Tebby et al. 2022). Considering that GerES IV was performed only around 15 years after the German reunification, differences between former East and West Germany might find some explanation in spatial differences in heavy metal pollution (MSC-E, 2020). Higher PbB levels in children and adolescents for residence in former East Germany have also been observed in multivariate analyses of data collected in GerES V, performed from 2014 to 2017 (Hahn et al. 2022). It has to be considered that our regression model only includes a quite small number of predictors. Therefore, we see the former East/West Germany variable as a reasonable, though imperfect proxy for differences in exposure-relevant behaviors or conditions not being included in the regression model. For example, ambient air pollution concentrations or sufficiently detailed information on general food consumption were not part of the GerES IV dataset. GerES V results also indicate a further alignment of internal metal exposures in former East and West Germany (Vogel et al. 2021). Therefore, a more detailed analysis of reasons for still observed exposure differences between former East and West Germany in future studies is warranted.

The impact of variables associated with the smoking activity was found to be higher on Cd than on Pb. This observation is in line with the general knowledge around the associations between environmental tobacco smoke and the exposure to heavy metals (Bernhard et al. 2005), with previous findings of Conrad et al. (2010) who studied the exposure of German children (about 25%) to environmental tobacco smoke at home and also with the conclusions drawn from a recent study conducted on Swedish adolescents by Almerud et al. (2021).

Study Limitations

The inclusion of the correlation-specific terms generated convergence difficulties with the parameters for which the estimated value is close to zero. For instance, this is the case of the correlation coefficient as the 95% CI covers the value zero. In such situation, the MCMC chains might require more iterations to approximate the targeted distribution (Robert and Casella 2004). Options such as the collinearity within the input space or the identifiability of model parameters might be explored as possible explanations. The collinearity within the input space refers to the presence, which is not the case in our dataset, of one or more variables that stem from the linear combination of other input variables (Dormann et al. 2013; Kutner et al. 2004). Generally, this occurs when a categorical variable with K classes is one-hot-encoded (i.e., transformed into K binary variables) and all K variables are considered in the input space. The second option refers to the concept of identifiability of model parameters, which is a classical issue in regression analysis. Model identifiability is a fundamental prerequisite for model identification that concerns the uniqueness of the model parameters determined from observations (Godfrey and DiStefano, 1985; Lecca 2020). In our model, the marginal regression models are built on a small number of input variables (five) and a high number of records (about more than 300), which discards the issue of uniqueness of parameter estimates. This is also confirmed by the results and predictive performances observed with the simulated data, which have a larger input space and stronger dependence within output space.

However, the selected input variables can be assumed to be insufficient to fully reproduce the variations of the internal exposures to Cd and Pb measured in GerES IV participants. Indeed, the inclusion of individually based data informing the consumption or use of the dietary and non-dietary consumer products, which can be major sources of exposure to Cd and Pb, could be considered to improve the analysis (Heinemeyer and Bösing, 2018; Järup 2003; Hahn et al. 2022). Because of the close cooperation between the KiGGS baseline study and GerES IV, health, socio-demographic, anthropometric, and environmental data from both studies could be combined at an individual level and jointly analyzed and used to carry out a more comprehensive regression analysis as a next step (see for example Hahn et al. 2022). Moreover, the relationship between the factors potentially influential on human exposure (e.g., socio-demographics, smoking activity, and other exposure-relevant information) and the estimate of the internal exposure is not necessarily linear. There is a growing advocacy to develop quantitative approaches such as physiologically based toxicokinetic models, which enable better relating the determinants of the external exposure to the internal exposure estimates by simulating the kinetic of substances in human body and integrating physiological and substance-specific parameters (Louro et al. 2019; Sarigiannis et al. 2019).

Another limitation in this study arises from the discard of a large number of individual records (approximately 660 records) corresponding to Cd concentrations below the LOQ and missing values. The treatment of missing data and concentrations under LOD or LOQ, especially in HBM datasets, is of high interest in exposure science. Different approaches going beyond the removal of records corresponding to missing values and the replacement of below LOQ concentrations by fixed value (e.g., LOQ/2) were recently suggested and discussed under the HBM4EU initiative including for instance single and multiple imputation techniques (Vrijheid et al. 2019). In this specific analysis, we tested the replacement of below LOQ concentrations by fixed value, which transformed the marginal distribution of Cd concentrations into a bimodal distribution. This raised the complexity of the model and resulted in convergence issues. Multimodal distributions or mixtures of Gaussian distributions could be considered to improve the analysis. Other alternatives could also be evaluated by using single or multiple imputation methods, which can help avoiding this bimodality.

Perspectives

Every year the chemical industry and globally the anthropogenic activities emit significant amounts of chemicals released in the environment or in the form of mixtures in dietary and non-dietary consumer products, constituting in many cases emerging risk to the general public (Egeghy et al. 2012; Huang et al. 2017; Mitchell et al. 2013). The developed approach stands in line with the efforts invested, in the recent years within the fields of chemical safety and exposure science, to enhance the analysis of the cumulative exposure to multiple substances occurring in the environment and to deepen the understanding of the determinants and health impacts of multiple substance exposure. Furthermore, this modeling framework and more generally the rich class of copula functions could provide to exposure and risk assessors an attractive approach to enhance the assessment of cumulative exposures also on the basis on human biomonitoring data. Cumulative exposure to multiple substances can occur through behavioral co-exposure with distinct pathways or co-exposure to mixtures. This is particularly significant in the context of high correlations as illustrated by the application with the simulated data and the comparison with state-of-the-art machine learning approaches.

Supplementary Information

The supplemental materials are contained in a Word document (supplemental materials S1) and an Excel document (supplemental materials S2), which can be accessed online on the journal website. The supplemental materials consist of further information on the variables discarded from the preliminary analysis, the descriptions of the posterior sampling of the correlation matrix R, of the generation of the simulated data, and of the machine learning algorithms compared with the implemented Bayesian copula-based regression.

Data availability

Enquiries about data availability should be directed to the authors.

References

Almerud P, Zamaratskaia G, Lindroos AK, Bjermo H, Andersson EM, Lundh T, Ankarberg EH, Lignell S (2021) Cadmium, total mercury, and lead in blood and associations with diet, sociodemographic factors, and smoking in Swedish adolescents. Environ Res 197:110991
Article CAS Google Scholar
Angerer J, Ewers U, Wilhelm M (2007) Human biomonitoring: state of the art. International Journal Hygiene and Environmental Health 210:201–228
Article CAS Google Scholar
Atique F, Attoh-Okine N (2018) Copula parameter estimation using Bayesian inference for pipe data analysis. Can J Civ Eng 45:61–70. https://doi.org/10.1139/cjce-2017-0084
Article Google Scholar
Barnard J, McCulloch R, Meng X (2000) Modeling covariance matrices in terms of standard deviations and correlations with application to shrinkage. Stat Sin 10:1281–1311
Google Scholar
Becker, K., Müssig-Zufika, M., Conrad, A., Lüdecke, A., Schulz, C., Seiwert, M., Kolossa-Gehring, M., 2008. German Environmental Survey for Children 2003/06 – GerES IV – Human Biomonitoring: Levels of selected substances in blood and urine of children in Germany. WaBoLu-Hefte 01/2008. Edition Umweltbundesamt. Accessible by clicking on the following link https://www.umweltbundesamt.de/publikationen/german-environmental-survey-for-children-200306.
Bernhard D, Rossmann A, Wick G (2005) Metals in cigarette smoke. IUBMB Life 57(12):805–809. https://doi.org/10.1080/15216540500459667
Article CAS Google Scholar
Beronius A, Zilliacus J, Hanberg A, Luijten M, van der Voet H, van Klaveren J (2020) Methodology for health risk assessment of combined exposures to multiple chemicals. Food Chem Toxicol 143:111520
Article CAS Google Scholar
Bierkens J, Smolders R, Van Holderbeke M, Cornelis C (2011) Predicting blood lead levels from current and past environmental data in Europe. Sci Total Environ 409(23):5101–5110
Article CAS Google Scholar
Burm E, Song I, Ha M, Kim Y-M, Lee K, Kim H-C, Lim S, Kim S-Y et al (2016) Representative levels of blood lead, mercury, and urinary cadmium in youth: Korean environmental health survey in children and adolescents (KorEHS-C), 2012–2014. Int J Hyg Environ Health 219(4–5):412–418. https://doi.org/10.1016/j.ijheh.2016.04.004
Article CAS Google Scholar
Carrico C, Gennings C, Wheeler DC, Factor-Litvak P (2015) Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat 20(1):100–120. https://doi.org/10.1007/s13253-014-0180-3
Article Google Scholar
Chib S, Winkelmann R (2001) Markov chain monte carlo analysis of correlated data. J Bus Econ Stat 19(4):428–435
Article Google Scholar
Conrad A, Schulz C, Seiwert M, Becker K, Ullrich D, Kolossa-Gehring M (2010) German environmental survey IV: children´s exposure to environmental tobacco smoke. Toxicol Lett 192:79–83
Article CAS Google Scholar
Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G et al (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1):27–46
Article Google Scholar
EFSA (European Food Safety Authority) (2009) Scientific Opinion of the panel on contaminants in the Food Chain on a request from the European commission on cadmium in food. EFSA J 980:1–139
Google Scholar
EFSA (European Food Safety Authority) (2012) Lead dietary exposure in the European population. EFSA J 10(7):2831. https://doi.org/10.2903/j.efsa.2012.2831
Article CAS Google Scholar
Egeghy PP, Judson R, Gangwal S, Mosher S, Smith D, Vail J, Cohen-Hubal EA (2012) The exposure data landscape for manufactured chemicals. Sci Total Environ 414:159–166
Article CAS Google Scholar
Ganzleben C, Antignac J-P, Barouki R, Castano A, Fiddicke U, Klánová J, Lebret E, Olea N, Sarigiannis D, Schoeters GR, Sepai O, Tolonen H, Kolossa-Gehring M (2017) Human biomonitoring as a tool to support chemicals regulation in the European Union. Int J Hyg Environ Health 220(2):94–97. https://doi.org/10.1016/j.ijheh.2017.01.007
Article CAS Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRC, Boca Raton, p 675
Book Google Scholar
Godfrey KR, DiStefano JJ III (1985) Identifiability of model parameter. IFAC Proceedings Volumes 18(5):89–114. https://doi.org/10.1016/S1474-6670(17)60544-5
Article Google Scholar
Hahn D, Vogel N, Höra C, Kämpfe A, Schmied-Tobies M, Göen T, Greiner A, Aigner A, Kolossa-Gehring M (2022) The role of dietary factors on blood lead concentration in children and adolescents—results from the nationally representative German environmental survey 2014–2017 (GerES V). Environ Pollut 299:118699. https://doi.org/10.1016/j.envpol.2021.118699
Article CAS Google Scholar
Heinemeyer, G., Bösing, U., 2020. [in German] Eintragspfade von Blei in den menschlichen Organismus]. Abschlussbericht. Umwelt und Gesundheit 02–2020. Accessible at: https://www.umweltbundesamt.de/publikationen/eintragspfade-von-blei-in-den-menschlichen.
Huang L, Ernstoff A, Fantke P, Csiszar SA, Jolliet O (2017) A review of models for near-field exposure pathways of chemicals in consumer products. Sci Total Environ 574:1182–1208
Article CAS Google Scholar
IARC (International Agency for Research on Cancer), 2006. Working group on the evaluation of carcinogenic risks to humans inorganic and organic lead compounds IARC monographs on the evaluation of carcinogenic risks to humans Lyon.
IPCS (International Programme on Chemical Safety), 2009. Assessment of combined exposures to multiple chemicals: Report of a WHO/IPCS International Workshop. IPCS harmonization project document No. 7. ISBN 978 92 4 156383 3.
Jan AT, Azam M, Siddiqui K, Ali A, Choi I, Haq QMR (2015) Heavy metals and human health mechanistic insight into toxicity and counter defense system of antioxidants. Int J Mol Sci 16(12):29592–29630
Article CAS Google Scholar
Järup L (2003) Hazards of heavy metal contamination. Br Med Bull 68:167–182. https://doi.org/10.1093/bmb/ldg032
Article Google Scholar
Joe H (2014) Dependence modeling with copulas. CRC Press, Boca Raton
Book Google Scholar
Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, White AJ (2020) A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect. https://doi.org/10.1289/EHP5838
Article Google Scholar
King KE, Darrah TH, Money E, Meentemeyer R, Maguire RL, Nye MD, Michener L, Murtha AP, Jirtle R, Murphy SK, Mendez MA, Robarge W, Vengosh A, Hoyo C (2015) Geographic clustering of elevated blood heavy metal levels in pregnant women. BMC Public Health 15:1035. https://doi.org/10.1186/s12889-015-2379-9
Article CAS Google Scholar
Kurth BM, Kamtsiuris P, Hölling H, Schlaud M, Dölle R, Ellert U, Kahl H, Knopf H, Lange M, Mensink GBM, Neuhauser H, Rosario AS, Scheidt-Nave C, Schenk L, Schalck R, Stolzenberg H, Thamm M, Thierfelder W, Wolf U (2008) The challenge of comprehensively mapping children´s health in a nation-wide health survey: design of the German KiGGS-Study. BMC Public Health 8:196
Article Google Scholar
Kutner, M., Nachtsheim, C., Neter, J., 2004. Applied linear regression models. 4th. New York, 2004.
Lecca P (2020) Model Identifiability. In: Lecca P (ed) Identifiability and regression analysis of biological systems models springer briefs in statistics. Springer, Cham
Google Scholar
Lee M, Rahbar MH, Samms-Vaughan M, Bressler J, Bach MA, Hessabi M, Grove ML, Shakespeare-Pellington S, Desai CC, Reece JA, Loveland KA, Boerwinkle E (2019) A generalized weighted quantile sum approach for analyzing correlated data in the presence of interactions. Biom J 61:934–954
Article Google Scholar
Lermen D, Weber T, Göen T, Bartel-Steinbach M, Gwinner F, Mueller SC, Conrad A, Rüther M et al (2021) Long-term time trend of lead exposure in young German adults—evaluation of more than 35 years of data of the German environmental specimen bank. Int J Hyg Environ Health 231:113665. https://doi.org/10.1016/j.ijheh.2020.113665
Article CAS Google Scholar
Liu X, Daniels MJ (2006) A new algorithm for simulating a correlation matrix based on parameter expansion and reparameterization. J Comput Graph Stat 15:897–914
Article Google Scholar
Louro H, Heinälä M, Bessems J, Buekers J, Vermeire T, Woutersen M, van Engelen J, Borges T, Rousselle C, Ougier E, Alvitoa P, Martins C, Assunção R, Silva MJ, Pronk A, Schaddelee-Scholten B, Gonzalez MDC, de Alba M, Castaño A, Viegas S, Humar-Juric T, Kononenko L, Lampen A, Vinggaard AM, Schoeters G, Kolossa-Gehring M, Santonen T (2019) Human biomonitoring in health risk assessment in Europe: Current practices and recommendations for the future. Int J Hyg Environ Health 222:727–737
Article Google Scholar
Manner, H., 2007. Estimation and model selection of Copulas with an application to exchange rates. Research Memorandum 056, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR). RM/07/056. https://doi.org/10.26481/umamet.2007056.
Mitchell J, Pabon N, Collier ZA, Egeghy PP, Cohen-Hubal E, Linkov I, Vallero DA (2013) A Decision analytic approach to exposure-based chemical prioritization. PLoS ONE 8:E70911
Article CAS Google Scholar
MSC-E (Meteorological Synthesizing Centre - East), 2020. Country-scale assessment of heavy metal pollution: A case-study for Germany. Joint report of MCS-E and UBA. Technical Report 1–2020. Accessible online at the following address: https://www.msceast.org/reports/1_2020.pdf
Nakayama SF, Iwai-Shimada M, Oguri T, Isobe T, Takeuchi A, Kobayashi Y, Michikawa T, Yamazaki S, Nitta H, Kawamoto T, The Japan Environment and Children´s Study Group (2019) Blood mercury, lead, cadmium, manganese and selenium levels in pregnant women and their determinants: the Japan environment and children´s study (JECS). J Expo Sci Environ Epidemiol 29:633–647. https://doi.org/10.1038/s41370-019-0139-0
Article CAS Google Scholar
Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer Series in Statistics, New York, p 272
Google Scholar
O’Flaherty EJ (1991) Physiologically based models for bone-seeking elements. part III. human skeletal and bone growth. Toxicol Appl Pharmacol 111:332–341
Article Google Scholar
Ott WR (1990) A physical explanation of the lognormality of pollutant concentrations. J Air Waste Manag Assoc 40(10):1378–1383. https://doi.org/10.1080/10473289.1990.10466789
Article CAS Google Scholar
Park ES, Oh R, Ahn JY, Oh MS (2021) Bayesian analysis of multivariate crash counts using copulas accident analysis and prevention. Accid Anal Prev 149:105431
Article Google Scholar
Pitt M, Chan D, Kohn R (2006) Efficient bayesian inference for gaussian copula regression models. Biometrika 93:537–554
Article Google Scholar
Price PS, Chaisson CF (2005) A conceptual framework for modeling aggregate and cumulative exposures to chemicals. J Expo Anal Environ Epidemiol 15:473–481
Article CAS Google Scholar
Ramos E, Frontera WR, Llopart A, Feliciano D (1998) Muscle strength and hormonal levels in adolescents: gender related differences. Int J Sports Med 19(8):526–531. https://doi.org/10.1055/s-2007-971955
Article CAS Google Scholar
Robert CP, Casella G (2004) Diagnosing Convergence. In: Robert CP, Casella G (eds) Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer, New York
Google Scholar
Rotter S, Beronius A, Boobis AR, Hanberg A, van Klaveren J, Luijten M, Machera K, Nikolopoulou D, van der Voet H, Zilliacus J, Solecki R (2018) Overview on legislation and scientific approaches for risk assessment of combined exposure to multiple chemicals: the potential EuroMix contribution. Crit Rev Toxicol 48(9):796–814
Article CAS Google Scholar
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics 31:129–150
Article Google Scholar
Sarigiannis D, Karakitsios S, Dominguez-Romero E, Papadaki K, Brochot C, Kumar V, Schumacher M, Sy M, Mieleke H, Greiner M, Mengelers M, Scheringer M (2019) Physiology-based toxicokinetic modelling in the frame of the European Human Biomonitoring Initiative. Environ Res 172:216–230
Article CAS Google Scholar
Schedler S, Kiss R, Muehlbauer T (2019) Age and sex differences in human balance performance from 6–18 years of age: a systematic review and meta-analysis. PLoS ONE 14(4):e0214434. https://doi.org/10.1371/journal.pone.0214434
Article CAS Google Scholar
Schulz C, Conrad A, Becker K, Kolossa-Gehring M, Seiwert M, Seifert B (2007) Twenty years of the german environmental survey (GerES): human biomonitoring—temporal and spatial (West Germany/East Germany) differences in population exposure. Int J Hyg Environ Health 210(3–4):271–297. https://doi.org/10.1016/j.ijheh.2007.01.034
Article CAS Google Scholar
Schulz C, Seiwert M, Babisch W, Becker K, Conrad A, Szewzyk R, Kolossa-Gehring M (2012) Overview of the study design, participation and field work of the German environmental survey on children 2003–2006 (GerES IV). Int J Hyg Environ Health 215:435–448
Article Google Scholar
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8:229–231
Google Scholar
Smith MS (2013) Bayesian approaches to copula modelling. In: Damien P, Dellaportas P, Polson NG, Stephens DA (eds) Bayesian Theory and Applications. Oxford University Press, Oxford, pp 336–360
Chapter Google Scholar
Smith MS, Gan Q, Kohn RJ (2012) Modelling dependence using Skew T copulas: Bayesian inference and applications. J Appl Economet 27:500–522
Article Google Scholar
Song PXK, Li M, Yuan Y (2009) Joint regression analysis of correlated data using Gaussian Copulas. Biometrics 65:60–68
Article Google Scholar
Spyromitros XE, Tsoumakas G, Groves W, Vlahavas I (2016) Multi target regression via input space expansion treating targets as inputs Mach. Learn 104(1):55–98
Google Scholar
Tanner EM, Bornehag CG, Gennings C (2019) Repeated holdout validation for weighted quantile sum regression. MethodX 6:2855–2860
Article Google Scholar
Tchounwou P, Yedjou C, Patlolla AK, Sutton D (2012) Heavy metals toxicity and the environment. EXS 101:133–164. https://doi.org/10.1007/978-3-7643-8340-4_6
Article Google Scholar
Tebby C, Caudeville J, Fernandez Y, Brochot C (2022) Mapping blood lead levels in French children due to environmental contamination using a modeling approach. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2021.152149
Article Google Scholar
Timothy N, Williams ET (2019) Environmental pollution by heavy metal: an overview. Int J Environ Chem 3(2):72–82. https://doi.org/10.11648/j.ijec.20190302.14
Article Google Scholar
Tsoumakas G, Spyromitros XE, Vrekou A, Vlahavas I (2014) Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15–19, 2014. Proceedings, Part III. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Multi target regression via Random Linear Target Combinations ECML/PKDD. Springer, Berlin, pp 225–240
Google Scholar
Vogel N, Murawski A, Schmied-Tobies MIH, Rucic E, Doyle U, Kämpfe A, Höra C, Hildebrand J, Schäfer M, Drexler H, Göen T, Kolossa-Gehring M (2021) Lead, cadmium, mercury, and chromium in urine and blood of children and adolescents in Germany—human biomonitoring results of the German environmental survey 2014–2017 (GerES V). Int J Hyg Environ Health. https://doi.org/10.1016/j.ijheh.2021.113822
Article Google Scholar
Vrijheid, M., Montazeri, P., Rambaud, L., Vogel, N., Vlaanderen, J., Remy, S., Govarts, E., Schoeters, G., 2019. Statistical Analysis Plan. Deliverable Report D10.5. Accessible at: https://www.hbm4eu.eu/work-packages/deliverable-10-5-statistical-analysis-plan/.
WHO, 2009. Exposure to lead: a major public health concern. WHO/CED/PHE/EPE/19.4.7. ISBN 978–92–4–003763–2.

Download references

Acknowledgements

Authors are grateful to the Robert Koch Institute for carrying out the GerES IV fieldwork, for providing blood samples for analysis in GerES IV, and for the excellent cooperation between the KiGGS baseline study and GerES IV. Thanks are also due to various colleagues at the German Environment Agency for sample and data management and the analysis of Pb and Cd in blood as well as colleagues from the German Federal Institute for Risk Assessment for their helpful support over the whole submission process. We also wish to thank all participants who gave their time and provided samples to the KiGGS baseline study and GerES IV. GerES IV funding by the Federal Ministry of Education and Research (BMBF) and the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU) is gratefully acknowledged.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work is a fulfillment of first author´s postdoctoral project entitled “Total Exposure” funded by the German Federal Institute for Risk Assessment (BfR) and also affiliated to the European Initiative HBM4EU. The BfR is financed by the German Federal Ministry of Food and Agriculture. Bundesinstitut für Risikobewertung, Horizon 2020 Framework Programme, 733032

Author information

Authors and Affiliations

Exposure Department, German Federal Institute for Risk Assessment (BfR), Max-Dohrn Str. 8-10, 10589, Berlin, Germany
Moustapha Sy, Christian Jung, Oliver Lindtner & Matthias Greiner
Department of Environmental Hygiene, German Environment Agency (UBA), Corrensplatz 1, 14195, Berlin, Germany
André Conrad

Authors

Moustapha Sy
View author publications
You can also search for this author in PubMed Google Scholar
André Conrad
View author publications
You can also search for this author in PubMed Google Scholar
Christian Jung
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Lindtner
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Greiner
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MS was responsible for the methodology, data management and analysis, the implementation of the methods, the analysis and interpretation of the results, review, and editing. AC contributed to the compilation of the data as well as to the interpretation of the results, review, and editing. CJ contributed to the interpretation of the results, review, and editing. OL co-supervised the study and contributed to the interpretation of the results, review, and editing. MG co-supervised the study and contributed to the interpretation of the results, review, and editing.

Corresponding author

Correspondence to Moustapha Sy.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1223 KB)

Supplementary file2 (XLSX 19 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sy, M., Conrad, A., Jung, C. et al. Analysis of Human Co-exposure to Lead and Cadmium Using Human Biomonitoring (HBM) Data in a Bayesian Copula-Based Regression Framework. Expo Health 16, 503–516 (2024). https://doi.org/10.1007/s12403-023-00573-w

Download citation

Received: 01 March 2022
Revised: 12 May 2023
Accepted: 22 May 2023
Published: 07 June 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12403-023-00573-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analysis of Human Co-exposure to Lead and Cadmium Using Human Biomonitoring (HBM) Data in a Bayesian Copula-Based Regression Framework

Abstract

Graphical Abstract

Similar content being viewed by others

Assessment of the Long-Term Exposure to Lead in Four European Countries Using PBPK Modeling

Estimation of Urinary Arsenic Exposure using Copula-Based Regression: A Case Study of West Bengal

Description of exposure profiles for seven environmental chemicals in a US population using recursive partition mixture modeling (RPMM)

Introduction

Materials and Methods

Data

Modeling

Copula-Based Regression Model

Marginal Regressions

Bayesian Inference

Likelihood

Priors

Posteriors and MCMC Simulation

Predictability of the Bayesian Copula Model

Results

Bayesian Inference

Influence of Characterizing the Dependence Structure

Discussion

Lessons Learned from this Analysis

Study Limitations

Perspectives

Supplementary Information

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 1223 KB)

Supplementary file2 (XLSX 19 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation