Background

The incidence of Type 1 diabetes is characterized by extensive differences between populations, from 0.7/100,000/year in Peru [1] to 45/100,000/year in Finland in 1996. The incidence is increasing in many populations; in Finland [2, 3], England [4], Norway [5], Israel [6], Austria [7], and several other countries [8]. In Finland, the incidence has more than tripled from 1953, when it was 12/100,000/year [9], with an average increase of 2.4 percent per year according to log-linear model of disease incidence.

The reasons for the increasing incidence of Type 1 diabetes are not known, largely because the etiology of the disease is still poorly understood. Type 1 diabetes develops in individuals who are genetically susceptible. An exposure to some yet unknown triggering environmental factor(s) may be required. The genetic background is complex, involving a major contribution from the HLA region, but also several other genes may be involved, each having a minor effect on disease susceptibility [1012]. However, the roles for these genes have been difficult to assess because of their small effects and because of the small size of the samples studied thus far [1315]. Twin studies have revealed that 70–75 per cent of the risk of Type 1 diabetes is related to genetic effects and 25–30 per cent to environmental factors [16, 17]. The estimated proportion of HLA of the genetic risk varies [1820].

Candidates for environmental components include for instance viral infections [21], early introduction of cow's milk in infancy [22], short duration of breast feeding [23], or nitrites and nitrosamines in the diet [24]. However, convincing evidence for some major environmental factor to be the initiator of the disease process has so far not been presented.

Significant, but modest shift towards excess sharing (50.43 %) in single and multipoint linkage analysis of randomly collected families was found by Zollner at al. [25]. It supports the existence of several loci with skewed transmission in human chromosomes. Some other reports suggesting non-Mendelian inheritance on some HLA loci alleles have been published [2628], but the results are conflicting [2932]. An important observation of transmission distortion for INS-IGF2 VNTR [33] has been reported earlier. This locus is probably the second most important Type 1 diabetes susceptibility locus. In our data, some evidence of increased transmission for a special Finnish high-risk HLA haplotype (A2, Cw1, B56, DR4, DQ8) was found [34], as this haplotype was seen to have been transmitted at a rate higher than 50% to the offspring, using single ascertainment correction. After adjusting for ascertainment, no statistically significant transmission distortion was found at the A, B and DR loci, however [34]. These results do not exclude the possibility that other HLA or non-HLA susceptibility genes could be transmitted in non-Mendelian fashion. If true, transmission distortion would naturally slowly influence the frequencies of diabetic alleles in the population and thus affect the incidence of Type 1 diabetes.

In this paper we present a simple genetic model in order to evaluate the magnitude of the allele frequency change in time, and by assuming reasonable penetrance probabilities, evaluate the effect on the time trend of the incidence of Type 1 diabetes. We fit this model by applying the method of maximum likelihood, using data of newly diagnosed Finnish Type 1 diabetes cases under the age of 15, registered between 1965 to 1996.

Results

Material

Data on the new cases of Type 1 diabetes in Finland were obtained from two nationwide sources: new cases between 1965 and 1986 were obtained from the Central Drug Registry of the Social Insurance Institution, and between 1987 and 1996 from the prospective childhood Type 1 diabetes registry. In Finland, all children with Type 1 diabetes are treated in hospital at the time of diagnosis and therefore case ascertainment is virtually 100% complete. Details of the data collection are described elsewhere [36, 37].

Theoretical considerations

Considering three values of the transmission distortion parameter τ, we illustrate how the allele and genotype frequencies develop in time when the above one locus genetic model is assumed (Figures 1, 2). In order to evaluate the effects of allele frequency and penetrance on incidence, Figures 3, 4, 5 were drawn. Obviously, in the situation where the susceptibility allele is dominant with high penetrance and high initial allele frequency, the incidence is high. The relative change in incidence is most prominent when there is a large difference in the relative genotype specific penetrances, even if the change in allele frequency is small.

Figure 1
figure 1

Predicted allele frequencies in annual population of children 0–15 years of age plotted against calendar year, using the population genetic model for the incidence change. Transmission probability of the susceptibility allele 'A' from a heterozygous parent, τ, is 0.52 (---), 0.55 (-- --) and 0.6 (-- ··· --). Allele frequency of 'A' starts from = 0.15 in both figures. The transmission distortion effect is assumed to have been acting since the introduction of insulin in 1930s.

Figure 2
figure 2

Predicted genotype frequencies in annual population of children 0–15 years of age plotted against calendar year, using the population genetic model for the incidence change. Transmission probability of the susceptibility allele 'A' from a heterozygous parent, τ, is 0.52 (---), 0.55 (-- --) and 0.6 (-- ··· --). Allele frequency of 'A' starts from = 0.15 in both figures. The transmission distortion effect is assumed to have been acting since the introduction of insulin in 1930s.

Figure 3
figure 3

Predicted curves for disease incidence (/100,000/year) of a population experiencing transmission distortion as a function of allele frequency, where for homozygote AA the penetrance is λ AA = 160 and λ aa = 10. Four models of gene expression were explored ('A' dominant to 'a' (---), alleles codominant (-- --), allele effects multiplicative (-- ··· --), and 'A' recessive to 'a' (·········)). The penetrance for heterozygous genotype is always between (or equal to) those of homozygotes.

Figure 4
figure 4

Predicted curves for disease incidence (/100,000/year) of a population experiencing transmission distortion as a function of allele frequency, where for homozygote AA the penetrance is λ AA = 40 and λ aa = 10 (low differences in penetrances). Four models of gene expression were explored ('A' dominant to 'a' (---), alleles codominant (-- --), allele effects multiplicative (-- ··· --), and 'A' recessive to 'a' (·········)). The penetrance for heterozygous genotype is always between (or equal to) those of homozygotes.

Figure 5
figure 5

Predicted curves for disease incidence (/100,000/year) of a population experiencing transmission distortion as a function of allele frequency, where for homozygote AA the penetrance is λ AA = 500 and λ aa = 0 (high differences in penetrances). Four models of gene expression were explored ('A' dominant to 'a' (---), alleles codominant (-- --), allele effects multiplicative (-- ··· --), and 'A' recessive to 'a' (·········)). The penetrance for heterozygous genotype is always between (or equal to) those of homozygotes.

Analysis of Type 1 diabetes in Finland 1965–1996

The above genetic model for the increasing incidence was fitted to the data consisting of all new Type 1 diabetes cases in Finland during the period from 1965 to 1996. In doing so we postulated that there can be transmission distortion at one susceptibility region for Type 1 diabetes. A potential example of this could be HLA-DR4, which is the best known genetic marker and it has been suggested to be inherited in a non-Mendelian fashion [26, 27]. Thus, one can consider DR4 allele as "allele A", and all other DR alleles lumped together as "allele a". We further assume that the incidence is increasing with age. The allele frequency of DR4 in the present Finnish population has been estimated to be 0.18 [42], and thus corresponding to the allele frequency of DR4 in the Finnish population we chose the starting value (in 1930s) of the allele frequency to be 0.2.

Two models were fitted: one with transmission probability fixed to 0.5 (M1) and another where transmission probability was estimated from incidence data (M2). The estimated parameter values and the value of the deviance (-2 × the log-likelihood) are given in Table 2. When the two models were compared using the likelihood ratio test (χ2 = 231.62, 1 df., p < 0.001), model M2 fitted better. The point estimate of transmission distortion τ was 0.998 and the genotype frequencies at the starting point (year 1930) were (0.02, 0.22, 0.76). The observed and fitted incidence for both models M1 and M2 of Type 1 diabetes are plotted in Figure 6.

Table 2 Parameter estimates of genotype and age group specific penetrances per 100,000, allele frequency, and transmission distortion under the two models: (M1) transmission probability fixed to 0.5 and (M2) transmission distortion estimated.
Figure 6
figure 6

The observed incidence of Type 1 diabetes in Finland from 1965 to 1996 and expected incidence under two models: (M1) no transmission distortion (τ fixed 0.5) and (M2) allowing transmission distortion (τ has been set to the estimated value of 0.998).

Discussion

An attempt to explain the observed increase in the incidence of Type 1 diabetes in Finland solely by the transmission distortion of the diabetic allele A from a heterozygote parent (A, a) to an offspring led to estimated transmission probability 0.998. Such an extreme form of transmission distortion seems biologically and empirically [35] very unlikely, given current knowledge of Type 1 diabetes genes and their effects. Therefore, it is evident that biologically reasonable transmission distortion alone, with penetrances as defined for example DR4 carrier and non-carrier genotypes, can explain only a small part of the rapid increase in the incidence of Type 1 diabetes observed in Finland. One could naturally try to answer the question of whether the DR4 allele frequencies have increased over time by obtaining a large random sample from the general population and then estimating allele frequencies in different age groups. Presently no such data are available, and even if there were, if the survival depends on the HLA types, the oldest age groups would be selected on the basis of HLA types and therefore the data would be biased. Moreover, the observed increase could be explained by realistic non-Mendelian transmission rates only if relative penetrance differences of susceptibility genotypes would be much greater than those known for DR4 today. Therefore, these results emphasize the role of other, probably environmental factors modifying the disease incidence. Environmental factors could either modify the penetrance of susceptibility gene(s), or as triggering factors, could contribute directly to the incidence. Factors which have changed rapidly during the last few decades should be important in this respect, but none with well established association with Type 1 diabetes is known. It has been hypothesized that changes in penetrance might be linked to patterns of childhood immunization, but this has yet to be confirmed [43].

However, it follows from the principles of population genetics that, when there is no selection and inheritance is Mendelian, the allele frequencies, in a large population, will be stable: this is easily shown by using equation (1) and letting τ = 0.5. It is difficult to imagine that there would be actual positive selection associated with diabetes prone genotypes. At best, one would think that insulin treatment for diabetes made them selectively equal to the non-diabetic ones. It should be noted that over longer time periods, very slow changes in allele frequencies are possible as a consequence of changes in the mutation-selection equilibrium. However, this effect is negligible in the course of only a few generations.

Irrespective of the effect on incidence, if there is segregation distortion and it has been acting over generations, it could have contributed to maintaining, or even accumulating diabetic alleles in the population. For example, the DR4 allele is quite abundant in the Finnish population, despite the fact that it confers increased susceptibility to diabetes and some other autoimmune diseases. One could speculate that the effect of the eliminating selection, caused by premature death of many of the susceptible individuals, could have been balanced, or even exceeded, by the effect of segregation distortion. This balancing effect of transmission distortion might just represent, for example, an advantage in the prenatal period.

The model presented here is a single codominant major gene model which allows for the possibility to non-Mendelian transmission. Since Type 1 diabetes is likely to be an oligogenic disease, we acknowledge that this is only a simple approximation made in order to compare the rate of genetic changes to the real increase in incidence observed in Finland. Unfortunately, given the present state of our knowledge about the etiology of Type 1 diabetes, it is difficult to evaluate the significance of alternative hypotheses, such as changes of some environmental "triggering" factors, that could explain the observed trend. However in a more complex genetic model the effect of a non-Mendelian transmission of a single gene would presumably be diluted by other susceptibility genes and therefore an attempt explain the observed increase in incidence purely in terms of transmission distortion would be even harder.

Methods

In the construction of the genetic model we make the following assumptions:

1. A single diabetes-associated susceptibility factor (allele(s) or haplotype(s)) is assumed to show transmission distortion. The allele(s)/haplotype(s) showing the transmission distortion and conferring increased susceptibility to Type 1 diabetes is denoted by 'A'; other alleles/haplotypes are simply collapsed to 'a'.

2. τ denotes the probability of inheriting A from a heterozygous (A, a) parent. If inheritance is Mendelian, τ = 0.5. In model M2 segregation distortion is assumed to have been acting at approximately the same rate through (some) generations; the time period of interest ranges from the 1930s to the present. This is because the insulin replacement therapy, which stopped Type 1 diabetes to be a fatal disease, was introduced in the early 1930s.

3. The evolutionary forces of mutation, drift and migration, are omitted from the model because of the short time period of interest (appr. 70 years), large population size (3–5 million), and very low immigration rate.

4. The mating is random with respect to the susceptibility factor of interest.

5. Excluding the possible transmission distortion effect, the formation of zygotes is random, and there is no homozygosity deficiency (which, on the contrary, is known to exist in HLA in certain populations, as shown in [39, 40]).

6. For simplicity, penetrances of susceptibility alleles are assumed to be constant through calendar time. Thus, all individuals with a certain genotype and in some specified age class have the same probability of acquiring Type 1 diabetes through the considered calendar time period. The probability of Type 1 diabetes varies between age classes and thus we adjust for age effects in search for transmission effects of susceptibility alleles. We note that joint estimation of age specific penetrances of a latent susceptibility gene and environmental effects, given that we use only the age and year specific number of new Type 1 diabetes cases and population at risk, is beyond the scope of this study.

In the following, k is used to denote the genotype ({k = 1,2,3} for genotypes (A, A), (A, a) and (a, a), respectively), and b the birth cohort. The genetic model was constructed as follows. The genotype frequencies are assumed to be in Hardy-Weinberg equilibrium in the first generation. Let denote the genotype frequencies of genotype k in generation t, and let and denote the allele frequency of 'A' and 'a' in generation t. All mating types, their frequencies (based on the assumption of random mating), and the genotype frequencies in the offspring are given in Table 1.

Table 1 Mating types, mating frequencies and offspring genotype probabilities after one generation of random mating with transmission probability τ.

In the standard fashion, from the Table 1, the expected new genotype frequencies in generation t+ 1 are

These are the expected genotype frequencies in the offspring of parents with corresponding genotype frequencies

, and . As the incidence of Type 1 diabetes in our data depends on the genetic susceptibility in children aged 14 years or under, the genetic change should be calculated individually for every birth cohort. However, because we are mainly interested in finding out the approximate magnitude of the effect, not its exact value, we simply use the genotype frequencies given by the non-overlapping generation model. They are now treated as genotype frequencies of the distinct annual birth cohorts, with the interval between two consecutive generations chosen to be 25 years. In order to obtain the genotype frequencies for annual birth cohorts between these generations, a linear approximation of (equation 1) was used, giving then

where

is the genotype frequency in the birth cohort born in year b. The rate of change of the allele and genotype frequencies depends on the deviation of τ from the Mendelian expectation, 0.5. The incidence is now a function of the birth cohort genotype frequencies and the penetrance parameters.

The following notation was used

i = calendar year

j = age

b = i-j = year of birth (of a birth cohort)

N b = size of the birth cohort obtained from the national population registry (constant)

d ij = number of new cases of Type 1 diabetes in year i in the j-years-old

N ijk = number of genotype k carriers in year i in age class j

= = the frequency of genotype k in the cohort born in year b

λ ijk = penetrance for genotype k in year i at age j.

q0 = frequency of allele A in the year in which insulin treatment was introduced (1930)

The observed data consist of {d ij , N ij ; i = 65,...,96, j = 0,...,14}. However, here we assume for simplicity a constant population size and therefore use only the numbers of incident cases in the analysis. In order to reduce the number of parameters to be estimated, we suppose that λ ijk does not depend on i, i.e. λ ijk = λ jk and further, that λ jk is constant in each of the age groups 0–4.99, 5–9.99, 10–14.99. We index these three age groups by j = 1,2,3 and similarly the three genotypes ((A, A), (A, a), (a, a)) by k = 1,2,3. Since Type 1 diabetes is a rare disease and we assume that the numbers of new cases in each (i, j, k) cell are mutually independent, it is natural to have d ij ~Poisson(μ ij ), where . We express the likelihood of the data P(d ij ; i = 65,...,96, j = 1,2,3 | θ) in the logarithmic form: , where θ = (λ jk (j, k = 1,2,3), τ, q0). The following natural constraints on the three parameters λ jk are then imposed: for every j = 1, 2, 3 we assume that λj 1≥ λj 2≥ λj 3and similarly for every k = 1, 2, 3 that λ1k≤ λ2k≤ λ3k. Log-likelihood function was maximized using the SAS/IML software nlpnra-function, which performs maximization of restricted non-linear functions by the Newton-Raphson method [41].