Abstract
Radon is a noble gas that occurs naturally as a decay product of uranium. Aside from smoking, radon is considered to be one of the major causes of lung cancer. Indoor environments, where radon can accumulate and potentially reach high concentrations, are of a particular concern. A mixed effects additive model along with a data-driven cross validation model selection method is applied to model the mean indoor radon concentration of dwellings in Austria. For this model a prediction approach is introduced, which enables the mapping of indoor radon potential to identify radon areas in Austria. The data used for modeling was collected in monitoring campaigns for private dwellings in Austria from 2013 to 2019. The proposed method allows policy makers to identify regions with high indoor radon concentrations and enables them to meet regulatory requirements or prioritize radon protection measures. The currently published Austrian radon map and the delineation of radon areas in Austria is based on this proposed method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Radon, 222Rn, is a radioactive noble gas which occurs naturally as a byproduct of uranium decay. Lacking color and smell, radon can only be detected by measurement devices and is the main contributor to natural radiation. Due to its potential to cause lung cancer and possibly other diseases (Darby et al. 2005; Rericha et al. 2006; Smith et al. 2007; Bølviken et al. 2003) monitoring campaigns and surveys have been promoted all over the world to assess the risk and identify areas prone to high radon concentrations. Identifying such areas, as well as understanding potential risk factors is essential in dealing with prevention and mitigation of radon exposure.
Typically areas with higher radon potential are called radon prone areas, radon priority areas, radon areas, or radon protection areas. In this paper, those areas are named radon prone. Some countries define a radon-prone area as an area with a high probability of finding high indoor radon concentrations (IRC). Another method is to define a radon-prone area that has an average IRC exceeding a certain radon concentration levels. Austria followed the approach to define radon areas based on average radon concentrations for administrative areas (municipalities). The average radon concentrations can be compared to reference levels, which is in line with 2013/59/Euratom (2014). The majority of methods to identify such regions are based on adapted cluster detection methods (Sarra et al. 2016). More recent work focused on prediction with a quantile regression and utilizing splines (Borgoni et al. 2019), which is more in line with this work.
Besides geogenic factors such as uranium content and permeability of the soil, building characteristics and living habits also influence IRC. Studies on the effects of building factors and geology on measured IRC have been conducted before (Buchli and Burkart 1989; Gates and Gundersen 1992; Gunby et al. 1993; Price et al. 1996; Apte et al. 1999; Kemski et al. 2009; Borgoni 2011; Cinelli et al. 2011, and others). To account for these known effects, this paper aims to model average radon concentrations as a function of building factors as well as underlying geogenic information.
This paper roughly extends the methodology that has been introduced in Borgoni et al. (2014) by adopting a linear additive mixed effects regression approach with the main goal of prediction. Building factors, geology, and other explanatory variables, that were collected in the extensive Austrian ÖNRAP 2 survey (Gruber et al. 2021b), play an integral part along with a flexible spatial spline in predicting radon levels. Multilevel models that incorporate random effects have been used in the past for modeling radon concentrations, see for example Apte et al. (1999), Price et al. (1996), and Gelman (2006) to name a few, and account for the hierarchical structure of the data. The methodology presented in this paper has been tested in a European exercise on a data set for a region in Spain and compared to other modeling approaches by the same authors Gruber et al. (2021a).
The previous Austrian radon map was based on the data and methodology of the first Austrian radon programme (ÖNRAP 1) (Friedmann, 2005). Compared to the method of the previous Austrian radon map, the proposed additive mixed model enables prediction of the mean radon concentration of a dwelling at any coordinates while specifying building characteristics and other variables. The currently published Austrian radon map is based on the predictions of the proposed model.
The next sections are organized as follows. Section 2 introduces the data. Section 3 contains the methodology for modeling, model selection, and prediction. The results of Sect. 3 are outlined in Sect. 4. Finally, some concluding remarks and summary are provided in Sect. 6.
2 Data description
The second national Austrian radon survey (ÖNRAP 2, Gruber et al. 2021b) is used as the data basis for IRC. The characteristics of the data and the survey design from data collection to data wrangling along with descriptive statistics have already been described in Gruber et al. (2021b). In the following paragraphs only a short summary of the data is presented.
The ÖNRAP 2 data set includes about 50000 measurements of IRC of around 27000 dwellings. In each dwelling two measurements in two different rooms were intended. The measurements were carried out with passive track etch (CR39) detectors. The radon concentrations are the means over the measurement periods in Becquerel per cubic meter (\(Bq/m^{3}\)). The measurement period was about six months, where half of the measurements were carried out in the winter and the other half in the summer. This accounts for seasonal variations of indoor radon concentrations. The sampling distribution of the IRC measurements as well as log IRC measurements are shown in Fig. 1. The left histogram shows the highly skewed IRC measurements and visual inspection of the right histogram roughly confirms an approximate log-normal distribution, which was also found by other authors such as Nero et al. (1986, 1990).
In addition to the indoor radon concentration (IRC), building characteristics of the measured dwellings and rooms were collected in a questionnaire. The data has been geo-referenced and the coordinates were extracted using the addresses of the dwellings. The geo-referenced data was used to link the position of the measurements to the geological map of Austria (Geologische Bundesanstalt (GBA), 2020).
Table 1 summarizes the data set, where for each parameter the variable type, typical values, and the source of the parameter are described. In the model selection in subsequent sections, variables are selected from this table. Some of these variables are considered as particularly important for the variability of indoor radon and will be included by default in the modeling process as so-called ’default starters’. Other variables are called ’contenders’ and their importance is tested. The purpose of the variables for modeling is listed in the column ’model’ in Table 1, where ’dep’ stands for dependent variable and ’default’ and ’cont’ for default starters and contenders respectively. The following gives an overview of the relevance of the variables regarding IRC.
The coordinates of the dwellings are a proxy for all natural and artificial spatial dependent features. In context of radon this might include geogenic factors (uranium content, soil permeability), weather and climate conditions and building types. The main source of indoor radon is the soil gas. Geogenic factors determine the available amount of radon in the soil. The construction of a building determines how much radon can move from the soil inside the building and the ventilation rate of fresh air into the building. Weather and climate conditions also correlate with ventilation rates of fresh air into the building and heating habits.
Concerning geology, different types of bedrock show different uranium concentrations and permeability and, therefore, might describe variations of the indoor radon concentrations. The geological units in low resolution are used as described in Table 1. This ensures a high amount of samples within each geological unit. As geology is a spatial feature, the coordinates already cover a part of the information.
Typically, rooms that are in direct contact with the soil show higher radon concentrations than rooms that are not. This is also true regarding the floor level, where lower radon concentrations are usually found in rooms on higher floors. As for the year of construction of a dwelling, older buildings are usually more prone to higher radon concentrations than younger buildings, which might be due to improvements in construction technique and material fatigue. An unoccupied basement may serve as a buffer and typically reduces the radon concentrations in higher occupied floors.
Thermal retro fitting is also worth observing due to the fact that dwellings with air tight building envelopes may show higher indoor radon concentrations due to a lack of ventilation of fresh air. Finally, dwellings mainly built with stone walls typically show higher radon concentrations than dwellings mainly built with concrete walls. This may also correlate with the year of construction. Typically IRC is higher in winter months. Thus the proportion of time the dwelling is measured in winter months is also observed and included as ’proportion measurement time’ in Table 1.
Additional data used for prediction purposes in Sect. 3.3 are a national 250 by 250 m wide grid layer (Statistik Austria 2020b), the settlement area of Austria (Statistik Austria 2020a) and the geo-referenced municipalities of Austria.
The data went through a significant cleaning and preparation process before modeling. The majority of this work dealt with corrections of answers in the questionnaire. It was not an exception to find unclear answers that had to be corrected and simplified where possible. Several questions had the answer choice of ’unknown’ and it was quite common to see combinations of known and unknown answers for different rooms in the same dwelling. Plausibility checks and rules were implemented to deal with these cases. A simple example of a plausibility check for the answer of room earth bound is checking the floor level and room type. Corrective actions can be set if these answers do not agree, for example if the the room is on the second floor, but it was indicated in the questionnaire that the room was earth bound. Similar checks can be implemented for the other explanatory variables obtained from the questionaire. Other data cleaning procedures were converting binary variables with original answers of ’yes’ and ’no’ to indicator variables or checking the plausibility of numeric variables. For the final data only complete cases regarding the variables were used which leads to 47010 observations.
3 Methodology
In this section three major issues related to the modeling methodology with generalized additive mixed models are presented: (1) Definition of the generalized additive mixed model in Sect. 3.1, (2) method to identify influential explanatory variables for a final model in Sect. 3.2, and (3) predicting radon-prone areas and assessing uncertainty of predictions in Sects. 3.3 and 3.4 respectively. The latter issue may serve as a useful tool for categorizing municipalities in radon classes by using the aggregated prediction from (3) and comparing its uncertainty with predefined class limits. Issues (1) and (2) allow the identification of a set of influential explanatory variables that may affect mean IRC.
3.1 A generalized additive mixed model for mean log IRC
Generalized additive models extend the class of generalized linear models to include penalized splines or smoothing functions, which are added in an additive nature in the model and depend on explanatory variables (Wood 2017). This flexibility that can be obtained from additive modeling allows to consider spatial dependencies, which come from the specific locations of the monitored dwellings as well as the geological typology.
Mixed models contain fixed as well as random effects, and in combination with a spline term these models are used to analyze complex and hierarchical data structures. For each dwelling that was part of the monitoring campaigns, measurements from two different rooms were taken. Measurements that come from the same dwelling may be dependent for different reasons such as building characteristics, the number of inhabitants, or the tendency of inhabitants to open windows to let in fresh air. To account for this dependency within a particular dwelling, a random effect for the dwelling is included in the additive model.
The dependent variable indoor radon concentration (IRC) is transformed with the natural logarithm, similarly as in Borgoni et al. (2014). The additive model in subject level notation for the dwelling j is as follows:
where log(.) is the element-wise logarithmic function, \(\pmb {X}_{j}\) is a matrix of dimension \(n_{j} \times p\) of p explanatory variables for \(n_{j}\) measurements in dwelling j, such as building factors, geology, number of inhabitants, and duration of measurement, \(\pmb {\beta }\) is the \(p \times 1\) vector of fixed effects, \(\pmb {Z}_{j}\) is a matrix of dimension \(n_{j} \times q\) of indicator variables for dwellings, \(\pmb {u}_{j}\) is a \(q \times 1\) vector of random effects representing dwellings, \(\pmb {X}_{sp, j}\) is a \(n_{j} \times s\) matrix containing the terms required for the spline construction based on the coordinates of the dwellings, \(\pmb {\beta }_{sp}\) is a \(s \times 1\) vector of spline coefficients, \(\pmb {I}_{j}\) is an \(n_{j} \times n_{j}\) identity matrix, and \(\pmb {\epsilon }_{j}\) is a \(n_{j} \times 1\) vector of errors. The number of knots of the spline is represented by s, which can be pictured as points that connect piece-wise polynomial functions to generate a spline. A typical value for \(n_{j}\) is two, or two rooms per dwelling, but other scenarios with just one room (\(n_{j} = 1\) for dwelling j) are also possible. As there is only one random effect in the applied model, q takes the value one.
The purpose of the spline term is to account for the spatial dependency of the measurements and serves as an additional intercept term next to the intercept of fixed effects, \(\beta _{0}\) of \(\pmb {\beta }\), for a base level of indoor radon concentration. As noted in Borgoni et al. (2019), radon dynamics show patterns that are typically not linear. The inclusion of a flexible spline allows nonlinear modeling of mean IRC as a function of the coordinates of the measurement locations and can thus account for spatial local effects. More specifically, the spline is of the class thin plate regression splines. This class of splines is discussed in detail for generalized additive mixed models in Wood (2003), as well as in Pratesi et al. (2009) and Ruppert et al. (2003). What will be mentioned here is that the main reason for choosing this spline was its ability to represent smooths of more than one explanatory variable and that it is somewhat of an ideal smoother when it comes to smoothing functions of several variables (Wood 2003).
The random effects term \(\pmb {u}_{j}\) represents the effect of dwelling j. It is assumed that the effect of the dwelling is normally distributed with expectation \({\textbf {0}}\) and variance \(\sigma ^{2}_{d}\). The remaining variation of the log indoor radon concentration that cannot be explained by the fixed effects, the spline, or the random effects for the dwelling, is represented by the error term \(\pmb {\epsilon }_{j}\) that is normally distributed with expectation \({\textbf {0}}\) and variance \(\sigma ^{2}\pmb {I}_{j}\) as in Eq. 2. The covariance matrix of \(log({\textbf {IRC}}_{j})\) can be derived from the definition of the model in Eq. 1 and takes the following form in matrix notation
where \(\pmb {J}_{j}\) a square matrix of ones with dimension \(n_{j} \times n_{j}\). The diagonal elements \(\sigma _{d}^{2} + \sigma ^{2}\) represent the variance of log IRC. Within dwelling j, the rooms have a positive covariance \(\sigma ^{2}_{d}\) at the off-diagonal entries in Eq. 3, which is assuming that rooms in dwellings have similar log IRC characteristics.
Usually the penalization of the spline requires the estimation of smoothing parameters commonly accomplished by cross-validation methods. However, the mixed model allows a representation of the spline as a random effect and the smoothing parameter can thus be estimated with mixed model methodology (Wood 2017). In particular, after a suitable Bayesian reformulation of the smoothing term in terms of random effects, the model in Eq. 1 can be fitted by maximizing the restricted maximum likelihood (REML) in a way that is well known for linear mixed models. The REML algorithm to fit the model in 1 is implemented in the R software (R Core Team 2019) in the package gamm4 written by Wood and Scheipl (2017).
3.2 Model selection
The model described in Sect. 3.1 should only contain variables, that contribute to the prediction accuracy of mean log IRC and have a significant effect. The classical model selection methods such as step-wise forward, backward, and best subset selection based on information criteria or statistical significance of estimated coefficients are not chosen due to their problems, which range from biased \(R^{2}\) values and p-values that are too small to residual confounding (Harrell 2015).
To offset some of the problems, the model selection chosen is based on cross validation and prediction error, see for example Hastie et al. (2001) for a comprehensive treatment, thereby deviating from the classical procedures to a more data-driven and predictive approach. The step-wise forward foundation in combination with a 50 times repeated stratified 5-fold cross validation is used to identify influential explanatory variables. The conventional k-folds cross validation splits the data set into k separate subsets, where \(k-1\) subsets act as the training set, which are used for model fitting, and the fifth set is used as test data. The procedure is repeated k times with test and training data chosen in different order, and overall test and training errors can be obtained.
The difference of the cross validation method used here, compared to conventional k-fold cross validation methods, is that the validation is stratified according to dwelling unit, district, and geological type within a district. Stratification results in a better coverage of the whole area in each step of the validation and also ensures that observations from a particular dwelling fall in the same fold. In each forward step, the cross validation is repeated 50 times, which in general guarantees more numerical stability and precision (Kim 2009; Molinaro 2005). The mean absolute error is used as the error measure for the validation. This procedure ensures that along with the default starters, outlined in in column ’Model’ in Table 1, only influential variables are included in the model.
As a first step, before validation of the explanatory variables, an optimal estimate of the basis dimension s of the thin plate spline is found by the same cross validation method, except that it was repeated 10 times. The search for an optimal s is run with a null model, a model that contains only the default explanatory variables, due to computing resources constraints. Values from 10 to 400 are chosen as a suitable range for s to be tested. An optimal s is chosen as the minimal number so that no significant improvement in the cross validation test error can be seen.
The validation of the explanatory variables is then started with the same null model that has been used in the validation of the spline dimension above, but now using the cross validated s for spline construction. In each step a variable from the pool of contenders, see again column ’Model’ in Table 1, is selected and tested in the 50 times repeated stratified 5-fold cross validation. Such a variable contributes to the improvement of the model, if the 75th percentile of the 50 cross validation test errors of the model containing the new variable is less than the 25th quantile of the 50 cross validation test errors of the currently best model. Out of all variables from the contender pool that improve the model according to the above criteria, the one with the smallest mean test error is added to the model.
3.3 Prediction
Using the Austrian definition of a radon-prone area, it is possible to directly model the means conditional on the building factors, geology, and coordinates, as in the model in Sect. 3.1. The final model obtained by the cross validation based model selection can be used for prediction. In particular, it is only required to plug in values for the building characteristics and other explanatory variables in the design matrix \(\pmb {X}_{j}\) along with the coordinates of the dwellings in the spline matrix \(\pmb {X}_{sp, j}\). The predictions on the log scale then are
where the index j is dropped indicating that the predictions are stacked row-wise for a certain number of dwellings in a particular region, such as a municipality. Transforming back to the original scale, \(Bq/m^3\), and considering the properties of the log-normal distribution, i.e. \(E(Y) = exp(\mu + \sigma ^{2}/2)\) for \(log(Y)\sim N(\mu , \sigma ^{2})\) (Shao 2003), the predicted values for m grid cells and thus m dwellings in a municipality are
The random effect term for the dwelling in Eq. 1 contributes to the prediction only as a variance component. This is reasonable because there is no interest in any individual dwelling. Transforming back to the original scale in this way has also been mentioned in Borgoni et al. (2014). Thus, a dwelling with particular characteristics can be predicted at any location. Finally, the average prediction of a municipality is
where \(\pmb {1}_{m}\) is a column vector of length m, and t indicates matrix transposition. It has to be noted that two other types of transformations back to the original scale are possible and have been used for the development of an Austrian indoor radon map. Coming back to the properties of the log-normal distribution, it is also well known that \(exp(\pmb {\hat{\tau }})\) represents a median prediction, for example see again Shao (2003). A geometric mean prediction can be obtained by first averaging the predictions on the log scale in Eq. 4 and transforming back as a last step.
3.4 Uncertainty of predictions
To assess the uncertainty of the predicted average radon concentration, approximate asymptotic confidence intervals can be calculated for each municipality. A multivariate weak convergence theorem for continuously differentiable functions of estimators which are asymptotically normal and the Cramer-Wold Device, see for example theorem 1.9 (iii) and 1.12 of chapter 1 in Shao (2003), can be used to derive variances of the municipality predictor in Eq. 6. Applicability of the theorem is further justified by the fact that the fixed effect estimators \(\pmb {\hat{\beta }}\) and \(\pmb {\hat{\beta }}_{sp}\) of the additive mixed model in Eq. 1 are asymptotically normally distributed according to Wood (2017). The variance parameters in the exponential function in Eq. 5 are treated as known. Applying these results, it is a matter of calculus and matrix algebra to calculate an \(m\times (p+s)\) matrix \(\pmb {D}\) of first derivatives with respect to \(\pmb {\beta }\) and \(\pmb {\beta }_{sp}\) of \(\pmb {\hat{y}}\) in Eq. 5, and along with the \((p+s)\times (p+s)\) covariance matrix \(\pmb {V_{\beta }}\) of the fixed effects estimators, the asymptotic distribution of the municipality predictor is
which can be used to calculate confidence intervals. As for the other types of predictions briefly mentioned in Sect. 3.3, the result in Eq. 7 can still be used by leaving out the variance components in the exponent for the median predictor, and confidence intervals for the geometric mean prediction can be obtained by transforming the endpoints of the confidence intervals on the logarithmic scale. The confidence intervals can be a useful tool to classify municipalities into radon risk regions and assess the uncertainty of the municipality predictions.
4 Results
In this section the methodology is applied to the indoor radon data described in Sect. 2. The structure of the subsections roughly follows Sect. 3. Sections 4.1 and 4.2 contain the results of the model selection using the stratified step-wise forward k-fold cross validation and the final model with all influential explanatory variables along with an interpretation and discussion of the estimates. Predictions of grid cell midpoints and municipalities are shown in Sect. 4.3.
4.1 Selecting a best model
The validation of the basis dimension s of the spline is shown in Fig. 2, which marks an optimal s of 300 with a dashed vertical line. The difference in cross validation test error by increasing s is found to be negligible for values above 300. Figure,3 shows the very last step of the model selection procedure. Variables to the left of the dashed vertical line were selected for the final model. Here the variable type foundation was found to be a significant improvement on top of window condition in the model. The stopping criterion, described above, is marked by the gray horizontal bar. Inclusion of further contender explanatory variables does not further improve the model, and the selection is halted after six steps. The results of intermediary steps are available upon request.
Last step of the cross validation (CV) procedure: cross validation errors in terms of mean absolute error (MAE) are shown for each explanatory variable tested in the model (red: test data, blue: training data, to the left of dashed vertical line: variables that enter the final model, gray horizonal bar: stopping criterion)
4.2 The final model and interpretation of effects
After completion of the cross validation, the influential explanatory variables are identified. Observations which are complete regarding the influential variables but have been removed from the cross validation data set because of missing entries, are added to the final data set. This increases the data set for a final model fit from 47010 to 47344 observations. The added observations were checked for systematic distributional deviations before inclusion, but none were found. Table 2 contains the estimated fixed effects coefficients for \(\pmb {\beta }\) from the final model. The estimated coefficients for \(\pmb {\beta }_{sp}\), which are required for the spline construction, are not shown because a lack of direct relevance. It is important to note that the building factors were obtained by a questionnaire as mentioned in Sect. 2, and it has to be assumed that not all answers are known with certainty by participants.
Care must be taken when interpreting and analyzing the coefficients in Table 2. The usual interpretation of regression coefficients can only be made on the log scale of IRC. Interpreting coefficients on the original \(Bq/m^{3}\) scale requires a transformation with the exponential function, i.e. \(exp(\beta _{k})\), of the coefficients. This allows an interpretation as multiplicative effects. In what follows, interpretation focuses on the logarithmic indoor radon scale. The reference levels of the categorical variables, which are contained in the intercept term, are shown in the respective rows in parentheses in Table 2.
Some of the estimated coefficients take values as expected and have already been reported by other authors (e.g. Gruber et al. 2021a; Borgoni et al. 2019). The ground is the main source of indoor radon and, therefore, the distance to the ground is a proxy for IRC. In the model this fact is covered by the three explanatory variables room earth bound, floor, and basement. Rooms with direct contact to the underground (earth bound) show 0.3155 higher log IRC predictions. This is the largest effect among all explanatory variables that relate to building characteristics.
The estimated effects of year of construction, main material of the walls, type of building, thermal retro fitting, type of construction, airtight windows, and geology in Table 2 largely follow expected patterns that are already well known in the radon community and confirm findings of Demoury et al. (2013), Kropat et al. (2014), Sundal et al. (2004), Collignan et al. (2016), Jiranek and Kacmarikova (2014), Pampuri et al. (2018), Yang et al. (2019), Hahn et al. (2015), Sundal et al. (2004), Tondeur et al. (2014), Gruber et al. (2021b) and Bossew et al. (2008).
Contrary to the intuition that a foundation blocks radon migration into a building, foundation partly, which is contained in the intercept term, has the weakest effect. The difference of foundation fully, which has an estimated coefficient of 0.042, and foundation partly is small, but this could indicate again that participants had problems filling out the questionnaire regarding foundation as a building characteristic.
The effect of the number of persons older than 14 years is also interesting and may relate directly to ventilation and its proxies. Its weak negative effect is statistically significant and predicted mean log IRC decreases by 0.039 per one person increase. This is reasonable because the number of persons living in a dwelling generally correlates with the frequency of opening doors and windows. The estimated coefficient of proportion measurement time, indicates that the longer the measurement duration in winter months, the higher the predicted mean IRC. This variable has the largest effect on predicted mean log IRC and the expected increase is statistically significant.
The geological units used in the final model can be divided into a higher and lower group of log IRC. Bohemian Massif (reference level), Quaternary, and Tertiary Basins belong to the group with higher IRC levels. Only the Bohemian Massif can be linked directly to high radon concentrations due to its composition of granitoid rocks which typically contain high uranium concentrations. The other estimated effects of geological units in the group with higher IRC levels are not easy to interpret. This is particularly true for Tertiary Basin, which usually shows low uranium concentration and lower permeability due to sealing layers of clay that are typical for this unit. It is worth noting that compared to the other geological units, the number of measurements within Tertiary Basin, Quaternary and Austroalpine units is high and may therefore also feature many different building characteristics. This may indicate that the resolution of some geological units is too low and may not add a lot of information to the model. The spatial spline incorporates the position of the measurements and thus also includes information pertaining to geological units.
It is worth emphasizing again, that the effects as seen in Table 2, were not primarily chosen based on their statistical significance, but because of their contribution to the prediction error in the cross validation procedure in the previous section. This circumvented model building based on statistical significance, or p-values, and affords a more data-driven approach. The result is a final model that can be used for predicting average radon concentrations based on coordinates of a dwelling and building factors as well as investigating the estimated effects. The inclusion of explanatory variables allows an easy interpretation and serves as a tool for prediction based on simple building and geology factors. Complex interactions among the various explanatory variables as well as the spline were not investigated due to time and computation resource constraints.
The estimates of the variance component of the random effect, \(\sigma _{d}^{2}\), and the error variance, \(\sigma ^{2}\), take the values 0.386 and 0.118, respectively. The correlation between two rooms in a dwelling is thus 0.770, which can be easily derived from the variance structure in Eq. 3. This positive correlation shows that higher log IRC values in one room tend to be accompanied by higher values in the other room. These estimated variances are not investigated further and only play a role as variance components in the REML estimators \(\pmb {\hat{\beta }}\) and \(\pmb {\hat{\beta }}_{sp}\) as well as transformation of predictions in subsequent Sect. 3.3.
The estimated spatial spline can be visualized for a fixed combination of building and geology factors as a three-dimensional surface, where the coordinates and predictions are shown on the respective axes. A contour of the spline is visible in the next subsection in Fig. 6 in terms of predicted grid cell midpoints. The fixed effects in Table 2 thus cause vertical shifts of the spline surface but have no influence on the spline curvature. This is a simple approach with strong assumptions, but serves its original purpose, where it was intended to subtract the effects of the building factors from the model for more precise spatial prediction on basis of simple predefined dwelling characteristics.
Model diagnostics checking was performed mainly on basis of standardized residuals, as suggested by Verbeke and Molenberghs (2000), Brown and Prescott (1999) for mixed models, and inspecting their sample distribution. A histogram with a density plot along with a quantile to quantile plot can be found in Fig. 4. It can be verified that the residuals reflect the postulated normal distribution reasonably well for log IRC.
4.3 Predicting radon areas
As for the exact specification of the design matrix \(\pmb {X}\) in Eq. 4 in Sect. 3.3, a reference dwelling was selected that is a good representation of an average Austrian dwelling found in all federal states. A most common dwelling can be found by looking at the frequency of building factor levels, which was used as a basis for the reference dwelling. Table 3 lists the characteristics of the reference dwelling along with the most frequent dwelling obtained from the questionnaire in Austria. Major differences among the dwellings are direct contact to the ground, the building year, and the type of cellar. The reference dwelling represents more modern houses in Austria that were build later than 2000 and do not have a cellar. Since the geology of Austria is known for each coordinate, the geological factor is taken as is at the particular location of prediction.
The grid cells mentioned in Sect. 2 were taken as the basis for prediction. In particular, the mean IRC is predicted for the mid points of the grid cells. As a rule, only cells that have at least 25% of its area in a municipality are used, which are shown as gold cells in Fig. 5. Furthermore, cells have to cover an area that is considered to be habitable. Highly alpine regions, where no habitation is possible or no human infrastructure is available are thereby excluded. A map of the federal state of Upper Austria with all grid cells satisfying these requirements and with transformed predictions as in Eq. 5 can be seen in Fig. 6.
Figure 7 shows the municipalities in Austria ordered from lowest to highest prediction. The horizontal lines at 100 and 300 \(Bq/m^3\) indicate potential limits of radon risk categories, which can be compared to the upper and lower limits of the confidence intervals and categorize a municipality as a certain radon area. It can be seen that some municipalities have broader confidence intervals than others. The different widths of the confidence intervals mainly come from the variability of estimated geology effects, the spatial spline, and the amount of grid cells that cover the municipality. The radon expert as well as policy makers can thus make use of the point estimator along with its variability.
A map of Austria in terms of predicted municipalities, using the estimator in Eq. 6, is shown in Fig. 8. The patterns that are visible in Fig. 6 are still prominent on the municipality level. Regions of higher radon concentrations can be clearly identified. The primary reason for the municipality estimator and thus the map in Fig. 8 is for administrative reasons, as the practical implementation of the relevant radon regulation needs to be done on the municipality level.
Municipalities that are covered by few grid cells, i.e. coverage requirements are not met, are thus only predicted by those few grid cells. Even though a prediction over the whole area by interpolation is possible with the additive mixed model, this way was chosen for the current map. The map of Austria has been published in 2019 and is available for public use (Austrian Agency for Food and Health Safety (AGES) 2021).
2095 predicted municipalities in Austria by averaging the transformed predicted grid cell midpoints in Eq. 6
5 Discussion
Reviewing the estimated coefficients of the final model, it is evident that most of the building factors are statistically significant while decreasing the prediction error in the cross validation. The building factor room earth bound is highly significant in this model. This confirms that rooms which are in direct contact with the ground have a higher indoor radon concentration. Other effects such as the year of construction, floor level, type of building, air tight windows, or the main material for walls largely behave as predicted.
The effects of the particlular levels of foundation or thermal retro fitting may reflect that these variables were often unknown by the participants. This could be a potential drawback of collecting information about building factors by questionnaire and may lead to unwanted uncertainty, due to the fact that the average person is not an expert on building characteristics. It has to be noted that the final model is a simplification and does not contain complex interactions that may take place in reality. This lack of interactions requires careful interpretation of the estimated coefficients.
Certainly indoor radon gas accumulation is influenced and dependent on more factors than those that were collected in the questionnaire. Adding more explanatory variables than collected in the questionnaire may further improve the prediction error. Other authors on indoor radon, such as Kemski et al. (2009) and Rowe et al. (2002), mention the potential of soil radon measurements, radiometric data, and weather-related variables as possible explanatory variables for IRC. The proposed methodology allows a straight-forward inclusion of further explanatory variables, as was shown in the Metro-Radon comparison study by Gruber et al. (2021a).
The reference dwelling plays a significant role in the prediction approach introduced in this paper. A standard situation for a dwelling, or reference dwelling as it is called in this paper, has been used before by Friedmann (2005) for the previous Austrian radon map. The purpose of the reference dwelling here is identical and this predefined dwelling is intended to represent a common class of residential dwellings in Austria. The settings of the reference dwelling in Table 3 are directly comparable to the settings in Friedmann (2005). This is also in accordance with the requirements of the Austrian radon map, where the focus is on residential dwellings.
The most important feature of the generalized additive mixed model is that the dependence on the actual measurements in order to generate a map is bypassed. This implies that radon risk, based on the reference dwelling, can be predicted at points where measurements from the campaign are not available. Compared to the previous method, this is a significant advantage and the main reason for using this approach for the second Austrian radon map.
6 Conclusions
In this paper an additive mixed model is introduced to predict a reference dwelling on grid cell midpoints on basis of influential building factors and geological and spatial information in order to generate a radon map of Austria. The additive model that incorporates highly non-linear spatial effects can be estimated within the restricted maximum likelihood framework and is applied to the data of the measurement campaigns carried out by the Austrian Agency for Health and Food Safety (AGES) within ÖNRAP 2 (Gruber et al. 2021b).
The additive model thus allows an investigation of the effects of building factors and geology next to estimating spatial effects of the measurement coordinates. The fact that at most two measurements were taken from dwellings is incorporated as a random effect for dwellings. All building characteristics from the questionnaire, as well as more information such as measurement duration, were tested in the models in a specific order of contender and default starter variables and their inclusion assessed by a repeated stratified 5-fold cross validation. With this data-driven approach, a high prediction quality can be ensured.
The findings about the effects of the building factors in the model not only provide useful insights into the characteristics of buildings in Austria, but prove to be a fundamental component of prediction. With these effects a representative standard dwelling can be defined. Along with the estimated spline in the generalized additive mixed model, which is a function of the coordinates, a standard dwelling can be predicted at grid cell midpoints that cover a municipality.
By averaging these predicted reference dwellings for all grid cell midpoints in a municipality, an overall average prediction along with a measure of uncertainty can be obtained and compared to known indoor radon reference levels. Local authorities can use these results to implement preventive indoor radon risk measures as well as to assess the current risk. The mapping method outlined in this paper has successfully been implemented in the official Austrian indoor radon map and was the basis for the delineation of radon areas in Austria. The results are published for public use by AGES.
References
2013/59/Euratom (2014) Council directive 2013/59/euratom of 5 december 2013 laying down basic safety standards for protection against dangers arising from exposure to ionising radiation, and repealing directives 89/618/euratom, 90/641/euratom, 96/29/euratom, 97/43/euratom. Off. J. Eur. Union, 1-73
Apte M, Price P, Nero A, Revzan K (1999) Predicting new hampshire indoor radon concentrations from geologic information and other covariates. Environ Geol 37:181–194
Austrian Agency for Food and Health Safety (AGES) (2021) Austrian Interactive Radon Map. https://geogis.ages.at/GEOGIS RADON.html. Accessed 12 December 2022
Bølviken B, Celius R, Nilsen R, Strand T (2003) Radon: a possible risk factor in multiple sclerosis. Neuroepidemiology 22:87–94
Borgoni R (2011) Quantile regression appriach to evaluate factors influencing residential indoor radon concentration. Environ Model Assess 16(3):239–250
Borgoni R, Francesco D, de Bartolo D, Tzavidis N (2014) Hierarchical modeling of indoor radon concentration: how much do geology and building factors matter? J Environ Radioact 138:227–237. https://doi.org/10.1016/j.jenvrad.2014.08.022
Borgoni R, Carcagni A, Salvati N, Schmid T (2019) Analysing radon accumulation in the home by flexible m-quantile mixed effect regression. Stoch Environ Res Risk Assess 33:375–394. https://doi.org/10.1007/s00477-018-01643-1
Bossew P, Dubois G, Tollefsen T (2008) Investigations on indoor radon in Austria, part 2: geological classes as categorical external drift for spatial modelling of the radon potential. J Environ Radioact 99:81–97
Brown H, Prescott R (1999) Applied mixed models in medicine. John Wiley and Sons, New York
Buchli R, Burkart W (1989) Influence of subsoil geology and construction technique on indoor air 222rn levels in 80 houses of the central Swiss Alps. Health Phys 56:423–429
Cinelli G, Tondeur F, Dehandschutter B (2011) Development of indoor radon risk map of the Walloon region of Belgium, integrating geological information. Environ Earth Sci 62:809–819
Collignan B, Le Ponner E, Mandin C (2016) Relationships between indoor radon concentrations, thermal retrofit and dwelling characteristics. J Environ Radioact 165:124–130
Darby S, Hill D, Auvinen A, Barros-Dios J, Baysson J, Bochicchio F, Doll R (2005) Radon in homes and risk of lung cancer: collaborative analysis of individual data from 13 European case-control studies. Br Med J 330(6485):223–226
Demoury C, Ielsch G, Hemon D, Laurent O, Laurier D, Clavel J, Guillevic J (2013) A statistical evaluation of the influence of housing characteristics and geogenic radon potential on indoor radon concentrations in France. J Environ Radioact 126:216–225
Friedmann H (2005) Final results of the Austrian radon project. Health Phys 89(4):339–348
Gates A, Gundersen L (1992) Geologic controls on radon. Geological Society of America, Washington, p 271
Gelman A (2006) Multilevel (hierarchical) modeling: what it can and can’t do. Technometrics 48:432–435
Geologische Bundesanstalt (GBA) (2020) Geological map of Austria, 1:500 000, Data Set KM500 Austria. https://www.geologie.ac.at/en/research-development/mapping/geology. Accessed 11 April 2023
Gruber V, Baumann S, Alber O, Laubichler C, Bossew P, Peterman E, Quindos Poncela L (2021a) Comparison of radon mapping methods for the delineation of radon priority areas—an exercise. J Eur Radon Assoc. https://doi.org/10.35815/radon.v2.5755
Gruber V, Baumann S, Wurm G, Ringer W, Alber O (2021b) The new Austrian indoor radon survey (ÖNRAP2, 2013–2019): design, implementation, results. J Environ Radioact 233:106618
Gunby J, Darby S, Miles J, Green B, Cox D (1993) Factors affecting indoor radon concentrations in the United Kingdom. Health Phys 64:2–12
Hahn E, Gokuna Y, Andrews W Jr, Overfield B, Robertson H, Wiggins A, Rayens M (2015) Radon potential, geologic formations, and lung cancer risk. Prev Med Rep 2:342–346
Harrell FE (2015) Regression modeling strategies. Springer- Verlag, New York
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin. https://doi.org/10.1007/978-0-387-21606-5_7
Jiranek M, Kacmarikova V (2014) Dealing with the increased radon concentration in thermally retrofitted buildings. Radiat Prot Dosim 160:43–47
Kemski J, Klingel R, Siehl A, Valdivia-Manchego M (2009) From radon hazard to risk prediction-based on geological maps, soil, gas and indoor measurements in Germany. Environ Geol 56:1269–1279
Kim J (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53:3735–3745
Kropat G, Bochud F, Jaboyedoff M, Laedermann J, Murith C, Palacios M, Baechler S (2014) Major influencing factors of indoor radon concentrations in Switzerland. J Environ Radioact 129:7–22
Molinaro A (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Nero A, Schwehr M, Nazaroff W, Revzan K (1986) Distribution of airborne radon-222 concentrations in U.S. homes. Science 234:992–997
Nero A, Gadgil A, Nazaroff W, Revzan K (1990) Indoor radon and decay products: concentrations, causes, and control strategies. Lawrence Berkeley National Laboratory, Berkeley
Pampuri L, Caputo P, Valsangiacomo C (2018) Effects of buildings’ refurbischment on indoor air quality. Results of a wide survey in radon concentrations before and after engergy retrofit interventions. Sustain Cities Soc 42:100–106
Pratesi M, Ranalli M, Salvati N (2009) Nonparametric m-quantile regression using penalized splines. J Nonparametr Stat 21:287–304
Price P, Nero A, Gelman A (1996) Bayesian prediction of mean indoor radon concentrations for Minnestoa counties. Health Phys 71:922–936
R Core Team (2019) R: a language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Rericha V, Kulich M, Rericha R, Shore D, Sandler D (2006) Incidence of leukemia, lymphoma, and multiple myeloma in Czech uranium miners: as case-cohort study. Environ Health Perspect 114:818–822
Rowe J, Kelly M, Price L (2002) Weather system scale variation in radon-222 concentration of indoor air. Sci Total Environ 284:157–166
Ruppert D, Wand M, Carroll R (2003) Semiparametric regression. Cambridge University Press, Cambridge
Sarra A, Fontanella L, Ippolit L, Valentini P, Palermi S (2016) Quantile regression and Bayesian cluster detection to identify radon prone areas. J Environ Radiact 164:354–364
Shao J (2003) Mathematical statistics. Springer-Verlag, New York
Smith B, Zhang L, Field R (2007) Iowa radon leukemia study: a hierarchical population risk model. Stat Med 10:4619–4642
Statistik Austria (2020a) Dauersiedlungsraum. https://data.statistik.gv.at/web/meta.jsp?dataset=OGDEXT DSR 1/. Accessed 11 April 2023
Statistik Austria (2020b). ETRS-LAEA-Raster-2000M. https:// www.statistik.at/services/tools/services/regionales/regionalstatistisches-datenangebot/. Accessed 11 April 2023
Sundal A, Henriksen H, Soldal O, Strand T (2004) The influence of geological factors on indoor radon concentrations in Norway. Sci Total Environ 328:41–45
Tondeur F, Cinelli G, Dehandschutter B (2014) Homogenity of geological units with respect to the radon risk in the walloon region of Belgium. J Environ Radioact 136:140–151
Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data. Springer-Verlag, Berlin
Wood S (2003) Thin plate regression splines. J R Stat Soc Ser B 65(1):95–114. https://doi.org/10.1111/1467-9868.00374
Wood S (2017) Generalized additive models. Chapman and Hall/CRC, New York
Wood S, Scheipl F (2017) gamm4: generalized additive mixed models using ’mgcv’ and ’lme4’ [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=gamm4/ (R package version 0.2-5)
Yang S, Goyette Pernot J, Hager Jorin C, Niculita-Hirzel H, Perret V, Licina D (2019) Radon investigation in 650 energy efficient dwellings in Western Switzerland: impact of energy renovation and building characteristics. Atmosphere 10(12):777
Acknowledgements
This work was commissioned and financed by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation, and Technology of the Republic of Austria.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alber, O., Laubichler, C., Baumann, S. et al. Modeling and predicting mean indoor radon concentrations in Austria by generalized additive mixed models. Stoch Environ Res Risk Assess 37, 3435–3449 (2023). https://doi.org/10.1007/s00477-023-02457-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-023-02457-6