ProZES: the methodology and software tool for assessment of assigned share of radiation in probability of cancer occurrence

ProZES is a software tool for estimating the probability that a given cancer was caused by preceding exposure to ionising radiation. ProZES calculates this probability, the assigned share, for solid cancers and hematopoietic malignant diseases, in cases of exposures to low-LET radiation, and for lung cancer in cases of exposure to radon. User-specified inputs include birth year, sex, type of diagnosed cancer, age at diagnosis, radiation exposure history and characteristics, and smoking behaviour for lung cancer. Cancer risk models are an essential part of ProZES. Linking disease and exposure to radiation involves several methodological aspects, and assessment of uncertainties received particular attention. ProZES systematically uses the principle of multi-model inference. Models of radiation risk were either newly developed or critically re-evaluated for ProZES, including dedicated models for frequent types of cancer and, for less common diseases, models for groups of functionally similar cancer sites. The low-LET models originate mostly from the study of atomic bomb survivors in Hiroshima and Nagasaki. Risks predicted by these models are adjusted to be applicable to the population of Germany and to different time periods. Adjustment factors for low dose rates and for a reduced risk during the minimum latency time between exposure and cancer are also applied. The development of the methodology and software was initiated and supported by the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU) taking up advice by the German Commission on Radiological Protection (SSK, Strahlenschutzkommission). These provide the scientific basis to support decision making on compensation claims regarding malignancies following occupational exposure to radiation in Germany. Electronic supplementary material The online version of this article (10.1007/s00411-020-00866-7) contains supplementary material, which is available to authorized users.

The baseline in the LSS cohort depends on attained age a and age at exposure e (related to birthyear). Stomach cancer rates also depend on the city of residence which are weighted according to the distribution of person years (Jacob 2013): where PYH and PYN are person years accumulated in the Hiroshima and Nagasaki sub-cohorts, respectively. For males, PYH=731077 and PYN=306213. For females, PYH=1232155 and PYN=488748.
The baseline parameters are shown in Table S1.3.

Model description
For colon cancer, the attained-age dependence of the ERR in the LSS differs considerably between males and females. However, no biological mechanism is known that supports such a difference. Furthermore, the baseline rates of males and females show similar age dependencies. To take both aspects into account it was decided to apply multi-model inference (MMI): first, a set of models with sex-specific age dependencies, and a set of models with common age dependence were selected. Each set of models was then assigned an equal weight of 50% (see Fig. S2.1).
For each group the criteria of likelihood ratio test and MMI based on AIC values were applied. Of a total of 46 models applied to the LSS data set for colon cancer, both selection criteria led to a set of six models, with five models being the same in both approaches (see Fig. S2.1). Then, both models different in both criteria were included in the final set of models. Thus, MMI was realized among a set of seven models. MMI was implemented in the following way: For given sex, the final model is constructed from five models: two models from the sex-specific models (Group 1), and three models from the group of models with common age dependence (Group 2). The models of groups 1 and 2 are given a total weight of 0.5 each.
The set of selected models together with their characteristics are summarized in Table S2.1 and described explicitly in the following sub-sections.

Baseline
For all selected models (see Table S2.1) the functional form of the fitted baseline is the same for given sex. Baseline for males and females are slightly different and for males it is: Parameters of the baseline functions are given in Table S2.2. Factor , accounting for differences in baselines between the two cities, was found insignificant and is set equal to zero.

Excess risks
For all fitted models the dimension of dose in the equations is Gy. Correspondingly, linear dose response coefficients are either Gy −1 for ERR-type models or (PY Gy) −1 for EAR-type models.

Model description
For radiation risk of lung cancer, the model of Furukawa et al. (2010) was selected for use in ProZES. It is based on the LSS cohort and takes into account information on smoking. An important part of the model is the interaction of radiation and smoking, as described below.
Denoting the effect of radiation exposure as ( ), the effect of smoking as ( ), and total and baseline lung cancer incidence rates as and 0 , correspondingly, the combined effect of both factors can be described either with an additive model (AM): or with a multiplicative model (MM): Consequently, 0 is the baseline incidence rate of lung cancer among never-smokers. If both factors are not independent and there is an interaction between radiation exposure and smoking, then a generalized additive model (GAM): and a generalized multiplicative model (GMM): = 0 (1 + ( ))(1 + ( ) ( )) = 0 (1 + ( ))(1 + ′ ( , )) can be applied. where c is the city index = { 1, for Hiroshima 2, for Nagasaki and the index NIC ('not-in-the-city' status) equals to zero for 23.46% of the cohort members and equals to one, otherwise. In the total cohort, 28.8% of members were residents of Nagasaki, others resided in Hiroshima.
The modified radiation effect is modelled as follows: where cpd is the smoking intensity, i.e. the number of cigarettes smoked per day. Correspondingly, the simple radiation-only effect is represented as ( ) = ′( , 0).
For current and past smokers (also called ever-smokers), modelling of the smoking effect depends on availability of information on smoking habits. That is, when information on smoking habits is available, then the function for the smoking effect appears as follows: ( ) = 50 exp [ 1, + 2 − 30 10 + 3 ln 50 + 4 ln 2 50 where Π is the smoking 'dose' (pack-year), ts is the smoking duration (year), tq is the number of years since quit smoking for ex-smokers. For persons with unknown smoking status, the average smoking effect is modelled as a constant factor: where 0 is a constant dependent on sex and birth cohort (expressed in case of the LSS cohort via age at exposure e).

Radiation-related risk of lung cancer
The relative risk of radiation-induced lung cancer can be expressed as follows: ( , ) = 0 (1 + ( )) i.e. the equations for the relative risk are different for additive and multiplicative models: The excess relative risk due to radiation is correspondingly: Applying multi-model inference, the risk for generalized models with non-zero interaction between radiation and smoking can be written as: and for simple models with independently acting factors for radiation and smoking: where wA and wM are the AIC-weights of the additive and multiplicative models, correspondingly.

Model of radiation related ERR of lung cancer used in ProZES
Based on AIC alone, the simple models are inferior to the generalized ones and would not be used for the final MMI. The generalized models can better express the complex interaction between radiation and smoking. However, for large smoking intensities the functional form of the generalized models predicts a vanishing ERR. Although this might reflect the large influence of heavy smoking and strongly increased baseline risk, it is questionable if such a strong decrease in relative radiation risk is plausible. Therefore, with the aim to be used in compensation claims, the following model is suggested for ProZES: As a result, for large smoking intensities the ERR levels to a constant value instead of going to zero. This ensures that also heavy smokers might get compensated after occupational exposure.
For never-smokers only generalized models are used because generalized models are preferable based on AIC, and without smoking term they are indistinguishable from the simple ones. Implementation of the model of lung cancer depends on availability of information on personal smoking habits. If such information is absent, then ProZES accounts for German-specific behavioral patterns regarding smoking (Schulze and Lampert, 2006). If personal smoking status is unknown, then random sampling is applied to decide whether the person should be regarded as never-smoker (35% of males and 53% of females), or ever-smoker (65% of males and 47% of females), for which average smoking habits are assumed based on personal age and birth cohort (Tables S3.2 and S3.3).

Transfer of lung cancer risk from the LSS to the German population
The lung cancer model takes into account radiation and smoking effects; thus, the baseline incidence rate is defined for non-exposed never-smokers. Unfortunately, these data are not readily available for the German population. Therefore, the transfer factor is modelled stochastically with the assumption that the ratio of baselines is log-uniformly distributed in range from 1/3 to 3, as outlined in the methodology section.

Model description
The pooled study of Preston et al. (2002) includes not only the LSS, but also several other studies of radiation-induced breast cancer from Western populations. Therefore, it was decided to use the pooled study for ProZES. Radiation risk is given by the following EAR-model (EAR per 10 4 PY): ( , , ) = exp ( 10 ( − 25) + 1 ln 50 + 2 max (0, ln 50 )) (S4 .1) where parameters of Eq. (S4.1) and their covariances are shown in Table S4.1. Diagonal elements of the covariance matrix represent variances of the parameters. The model (S4.1) describes the excess absolute risk. Correspondingly, excess relative risk in the target population is estimated by dividing EAR from Eq. (S4.1) by the baseline incidence rate observed in Germany in the year the cancer was diagnosed. The age-dependent baseline rate in the pooled cohort is not available, so the transfer of risk from the pooled cohort to the German population cannot be modelled taking into account the (unknown) ratio of the baseline rates. The LSS is a major (64% of person-years) contributor to the pooled cohort, so baseline rates in the LSS were compared to Germany to estimate the range of baseline ratios. For ProZES, the transfer factor is then modelled stochastically with the assumption that the ratio of baselines is log-uniformly distributed in range from 1/3 to 3, as outlined in the methodology section.

Model description
The thyroid cancer model is based on an analysis of the LSS data with explicit modelling of the screening effect of medical surveillance for members of the so-called Adult Health Study (AHS) (Jacob 2014). The screening effect was found to be statistically significant.
The baseline incidence rate λ0 (s,a,e,c,AHS,NIC) depends on the explanatory variables of sex s, attained age a, age at exposure e, city (Hiroshima: c=1; Nagasaki c=2), status of participation in the AHS screening program (no: AHS = 0; yes: AHS = 1) and of having been in the city at the time of bombing (for distance from hypocenter <10 km: NIC = 0; otherwise: NIC = 1). The baseline incidence rate factorizes and an adjustment factor accounting for screening effect for the AHS members: where for non-zero factor AHS: and a factor accounting for residential status (city and 'not-in-the-city' factor-NIC): ( , ) = exp( ( − 1) + ).
In the further calculations, city-and NIC-status have been averaged out with weights defined from the number of cancer cases observed in each of the sub-groups of the LSS cohort. For the dose response an ERR model was chosen using the form ( , , ) = exp ( + ln 60 + − 20 10 ) where the α's are the model parameters, D is the thyroid dose and the parameter s equals to +1 for females and to −1 for males.

group DIG) for members of the LSS cohort
The ERR-type model dominates in the generated distribution of assigned share Z and contributes with 71.3% to the total generated sample. The parametric form of the baseline rate appears as follows: and the risk function as: The EAR-type model contributes with 28.7% to the generated sample of Z. For this model, the parametric baseline rate is defined in the following form: The EAR is defined with linear dose response, and the only modifier depends on attained age:  β1  β2  β4  β5  β8  β9  β10  β11  β12  β13  β14  β15  β19  β25 β1  β2  β4  β5  β6  β8  β9  β10  β11  β12  β13  β15  β19  β25

1: Statistical properties of the models fitted to characterize risk of cancer for remaining organs (group REM) for members of the LSS cohort
The group of "remaining" solid cancers, the REM group, has combined all diagnoses in the LSS cohort, for which the number of the observed cancer cases was not sufficient for statistically significant inference of radiation risk.

1: Statistical properties of the models fitted to characterize risk of cancers of male genital organs (group GNM) for members of the LSS cohort
Modelling of radiation risk of cancers of male genital organs (dominated by cancers of prostate: 387 of 403 considered) resulted in three models (see Table S10.1).

1: Statistical properties of the models fitted to characterize risk of cancers of urinary tract organs (group URI) for members of the LSS cohort
The group of urinary cancers (URI) combines cases diagnosed with cancer of kidney (178 cases), renal pelvis and ureter (92 cases), urinary bladder (511), and other parts of urinary system (26 cases). The ERR-type model has a baseline of the form: and the risk is represented as: The baseline of the EAR-type model has the following form: and the risk is represented as:

1: Statistical properties of the models fitted to characterize risk of cancers of brain and central nervous system (group BCNS) for members of the LSS cohort
The fitted models have the following common parametric form for the function describing the baseline rate: and the radiation risk is specified using a constant linear dose-response EAR-type model: and a linear dose-response ERR-type model with an attained age effect modifier: For cancers of the BCNS group, the screening factor for the LSS cohort is where sgn is +1 for positive value of the argument, and -1 for negative values.  β1  β4  β5  β9  β19  β33  β1   3,346E-02 -2,465E-02  4,294E-02 -6,459E-03 -2,826E-03 -9,378E-03   β4   -2,465E-02  2,998E-02  1,329E-03 -3,866E-04 -4,701E-03 -5,709E-04

1: Statistical properties of the models fitted to characterize risk of non-melanoma skin cancers (group SKIN) for members of the LSS cohort
The fitted risk models, one of ERR-type and one of EAR-type, share the same form of parametric baseline: Radiation risk for skin cancer demonstrates essentially a non-linear dose response, and the model fitting resulted in a risk function with a dose response to the power of 1.55 for the ERR-type model and to the power of 1.60 for the EAR-type model (see equation below and Table S13.2). Besides the dose response, the risk function for the ERR-type model only has an age-at-exposure modifier: while the risk function for the EAR-type model also depends on attained age:   The leukaemia group HEM1 includes the following diagnoses: acute lymphoblastic leukaemia (ALL), prolymphocytic leukaemia of B-cell type, lymphoid leukaemia/unspecified The risk models of the group HEM1 are represented by the two EAR-type models:  EAR-LNT model of EAR-type with linear non-threshold (LNT) dose response ( AIC =90.18%);  EAR-QDR model of EAR-type with pure quadratic (QDR) dose response ( AIC = 9.82%). The models share the same form of the parametric baseline: NIC is an indicator variable for not-in-city at the time of bombing in Hiroshima (Hi) and Nagasaki (Na) (NIC=1 for being not-in-city, NIC=0 for being in city). The risk functions have different dose response but the same effect modifiers. The model with the linear dose response has the form: and the model with the quadratic dose response: The variable f indicates females (f=1 for females, f=0 for males). These two models form a twin pair. The model parameters and their standard deviations are shown in Table S14.1. The leukaemia group HEM2 includes the following diagnoses: Hodgkin lymphoma, Non-Hodgkin lymphoma, chronic lymphoblastic leukaemia (CLL), lymphoma of peripheral and cutaneous T-cell, malignant immunoproliferative disease, hairy cell leukaemia.
The group HEM2 contains a relatively large number of cancer cases -449, including 103 'not-in-city' cases. However, fitting male and female cases separately resulted in no radiation risk for females and in a reasonably well-defined radiation risk function for males. Fitting both genders together also resulted in significant non-zero risk. Since such gender differences are not biologically plausible and also problematic for compensation claims, the decision was made to use for both genders the same set of risk models. An MMI weight of 50% is given to a set of models derived jointly for males and females, and another 50% weight to models derived from the male dataset only. Within the two groups further AIC-based weighting was performed, resulting in one ERR and one EAR model for each group. Thus, radiation risk is characterized by a set of four linear (LNT) models:  EAR-LNT model of EAR-type ( AIC = 21.00%)  ERR-LNT model of ERR-type ( AIC = 29.00%)  EAR-LNT-male model of EAR-type ( AIC = 35.87%)  ERR-LNT-male model of ERR-type ( AIC = 14.13%) These four models are applied for the both sexes.
All selected models share the same form of the sex-specific parametric baseline: The variable f indicates females (f=1 for females, f=0 for males), correspondingly for males (m=1 for males, m=0 for females), and b is the birthyear. NIC is an indicator variable for not-in-city at the time of bombing in Hiroshima (Hi) and Nagasaki (Na) (NIC=1 for being not-in-city, NIC=0 for being in city). The risk functions for all models are linear in dose without effect modifiers: Since all models are linear in dose the HEM2 group has no twin models. The leukaemia group HEM3 includes the following diagnoses: acute myeloid leukaemia (AML), subacute myeloid leukaemia, myeloid sarcoma, acute promyelocytic leukaemia, acute myelomonocytic leukaemia, monocytic leukaemia, other leukaemia of specified cell type, leukaemia of unspecified cell type, other or non-specified.
The fitting resulted in four models of EAR-and ERR-types with quadratic (QDR) and threshold linear spline (TLS) dose dependencies. The models are:  ERR-TLS model of ERR-type and threshold linear spline dose response ( AIC = 6.33%);  ERR-QDR model of ERR-type and quadratic dose response ( AIC = 30.35%);  EAR-TLS model of EAR-type and threshold linear spline dose response ( AIC = 8.44%);  EAR-QDR model of EAR-type and quadratic dose response ( AIC = 54.87%). Both models of ERR-type have the same structure of the parametric baseline rate: The variable f indicates females ( = 1 for females, = 0 for males), correspondingly for males (m=1 for males, m=0 for females), and b is the birthyear. NIC is an indicator variable for not-in-city at the time of bombing in Hiroshima (Hi) and Nagasaki (Na) (NIC=1 for being not-in-city, NIC=0 for being in city). Their risk functions appear as follows with a linear spline dose response: and with a pure quadratic response: The other two models of EAR-type also share the parametric form of the baseline rate: and the risk functions of EAR-type appear as follows for a linear spline dose response: and for a pure quadratic dose response: The four selected models form two pairs of twins, one pair is formed by the ERR-TLS and ERR-QDR models, and one pair by the EAR-TLS and EAR-QDR models.      β1  β2  β3  β4  β5  β6  β7  β8  β9  β10  β11  β12  β13  β14  β1 2,704E- The leukaemia group HEM4 includes only chronic myeloid leukaemia (CML).
The fitting resulted in a group of six models, four of ERR-type and two of EAR-type. These models are:  ERR-t-QE model of ERR-type with quadratic-exponential dose response and time-sinceexposure effect modifier ( AIC = 7.08%);  ERR-t-LNT model of ERR-type with linear dose response and time-since-exposure effect modifier ( AIC = 21.00%);  ERR-e-QE model of ERR-type with quadratic-exponential dose response and age-atexposure effect modifier ( AIC = 1.72%);  ERR-e-LNT model of ERR-type with linear dose response and age-at-exposure effect modifier ( AIC = 7.00%);  EAR-t-LNT model of EAR-type with linear dose response and time-since-exposure effect modifier ( AIC = 55.12%);  EAR-e-LNT model of EAR-type with linear dose response and age-at-exposure effect modifier ( AIC = 8.08%).
The four models of ERR-type share the same form of the parametric baseline rate: The variable f indicates females (f=1 for females, f=0 for males), and correspondingly for males (m=1 for males, m=0 for females). NIC is an indicator variable for not-in-city at the time of bombing in Hiroshima (Hi) and Nagasaki (Na) (NIC=1 for being not-in-city, NIC=0 for being in city). For the ERR-type models, the risk functions appear as follows: if time since exposure is selected as an effect modifier for risk, and if age at exposure is used to modify radiation risk. The variable n indicates the city (n=1 for Nagasaki, n=0 for Hiroshima).

S18. Lung cancer after exposure to radon in mines
Currently, the implemented model is based on a study of the German Wismut miner cohort (Kreuzer et al. 2015). The Wismut cohort is the worldwide largest epidemiological cohort of miners exposed to radon. Furthermore, the cohort is relevant for compensation claims after radon exposure in mines in Germany.
The model selected for ProZES was developed for a sub-cohort of Wismut workers hired in 1960 or later, when dosimetric control and safety in workplaces were significantly improved compared to the preceding period, thus resulting in generally lower and better quantified estimates of exposures. Exposure is given in terms of working level month, WLM (ICRP 2010).
The parametric baseline has the following form: 0 = exp ( 1 + 2 ( − 1973) + 3 ln 70 + 4 max 2 (0, ln 5 )), is the calendar year. Radiation risk is described by a simple ERR model with a linear dose response without effect modifiers, where is the total exposure in WLM: = 6 ⋅ .

S19. Lung cancer after indoor exposure to radon
The model for lung cancer after indoor exposure to radon and progeny was defined using results of the study of Darby et al. (2005). It is based on a pooled analysis using data from 13 European casecontrol studies of lung cancer after residential radon exposure.
Radon exposure is quantified by indoor air activity concentration (Bq m −3 ) times duration of exposure. According to Darby et al. (2006, Table B15), the average percentage of time spent indoors at home among cases was approximately 60%. Correspondingly, the excess relative risk value reported in this study (0.16 per 100 Bq m −3 ) is attributed to 30 years of residential exposure to radon in air at concentration of 100 Bq m −3 with average indoor occupancy 60%, which corresponds to the cumulative exposure time of 157788 hours.
Thus, the required input includes both the average radon activity concentration in air q (Bq m -3 ) and the duration of indoor exposure T (hours). Concerning the duration of exposure, it is important to note that the radon indoor model is different from all other ProZES models. In other models the exposure duration is only used to estimate the dose rate and correct for a potential low dose rate effect, which might increase in particular the error bounds of assigned share. In contrast, for the indoor model the total exposure is directly proportional to exposure duration, and risk increases linearly with duration. For example, an average annual working time in Germany in the period 2000-2016 accounted for approximately 1400 hours (OECD.Stat, https://stats.oecd.org/Index.aspx?DataSetCode=ANHRS).