Over the past few years, understanding how genetic variation in individuals and in populations contributes to the biological pathways involved in determining human traits and mechanisms of disease has become a reachable goal for genetic research. Following on from the achievements in molecular studies of monogenic disorders, recent studies have used strategies of hypothesis-free fine mapping of genes and loci to identify underlying factors in common complex diseases with major impacts on public health. These diseases, which include cancers, coronary heart disease, schizophrenia, autism and multiple sclerosis, arise from complex interactions between environmental factors and variation in several different genes. Until recently, detection of the genes underlying these diseases met with only limited success, but the past two years have witnessed the identification of more than 100 well established loci. These successes mainly involved the collection of very large study cohorts for any individual trait and international collaborations on an unprecedented scale [1].

The detection of genes underlying common complex diseases might not always need large global population samples. Samples of individuals from genetically isolated populations, or 'population isolates', have already proved immensely useful in the identification of rare recessive disease genes. Such genes are only detectable in isolated populations with a limited number of founders, where rare disease alleles are enriched, thus resulting in homozygote individuals affected by the disease. Impressive accomplishments in disease-locus mapping and gene identification using genome-wide scans of only a handful of affected individuals in such populations have been reported, typically based on linkage analyses and homozygosity scanning [2, 3]. It is becoming increasingly apparent that studies locating genes underlying complex phenotypes also benefit from the study of samples from homogeneous populations with a limited number of founders - 'founder populations' (Table 1).

Table 1 Recent genetic studies of complex diseases and traits in special populations

Success stories from population isolates

One of the most impressive examples of the resourceful use of known genealogy, large extended families and vast amounts of medical data in genetic studies is provided by the company deCODE genetics in Iceland, where more than 50% of the adult population have volunteered their medical and genetic information to be used in genetic research [4, 5]. Although the Icelandic population does not represent a population isolate as conventionally defined, genetic drift over generations has reduced the amount of variation within it relative to the rest of Europe [6]. This, among other benefits of a geographically isolated population, has enabled the identification by means of linkage, and more recently by genome-wide association (GWA) studies, of an impressive number of variants contributing to the development of common/complex disease [5]. Among these are gene loci for myocardial infarction and stroke (ALOX5AP and chromosomal region 9p21) [7, 8], type 2 diabetes (TCF7L2 and CDKAL1) [9, 10], atrial fibrillation (4q25) [11] and prostate cancer (2p15 and Xp11.22) [12]. In addition to disease genes, the Icelandic population has revealed genes contributing to a number of complex traits, such as adult stature (several loci, including ZBTB38) [13] as well as skin and hair pigmentation (SLC24A4, KITLG, TYR, OCA2, MC1R and 6p25.3) [14, 15]. The continuing work by deCODE genetics on 50 common diseases is sure to result in a slew of additional gene findings and help to characterize the allelic spectrum of disease-predisposing variants. The wisely designed strategy of fully harvesting the unique population and the combined power of linkage and association has been the basis of the success of genetic research in Iceland.

Another population isolate with proven value in gene mapping is the population of Finland, where genes for 35 monogenic diseases that are more frequent than in other populations have been identified [16]. Features of the Finnish population have also been an advantage in studies of schizophrenia spectrum disorders: a balanced translocation (1;11)(q42.1;q14.3) segregating with schizophrenia was first described in a large Scottish family [17] and evidence for association of the gene DISC1 with the disorder was subsequently obtained in Finnish families with diagnosed schizophrenia [18, 19]. Large pedigrees from the Finnish population were also used successfully in a study of familial combined hyperlipidemia that identified the gene for upstream stimulatory factor 1 (USF1) as a risk factor for this complex disease [20]. This association was subsequently replicated in other populations, and evidence of the functional significance of the gene variants and their association with cardiovascular disease and dyslipidemia at the population level has also been obtained [2123]. Another excellent example from Finland is a gene conferring susceptibility to asthma (NPSR1), discovered in Kainuu and North Karelia subpopulations of Eastern Finland representing regions of the late-settlement [24].

The communal lifestyle and genetic isolation of the Hutterites, who live in the northern United States and western Canada, have especially aided studies of asthma and related traits [25]. Recently, the chitinase 3-like 1 gene was identified as an asthma-susceptibility gene in Hutterites, and the finding subsequently replicated in two population cohorts of European descent [25]. Studies of type 2 diabetes and obesity have used Pima Indians [26], as well as other genetic isolates, such as Finland and Sardinia [27, 28]. Genes contributing to neuropsychiatric disorders are sought, and previous gene discoveries are confirmed, in studies of special populations, such as people from the Antioquia in Colombia and the Central Valley of Costa Rica [29], Basques from Spain [30], the Micronesian population of the islands of Kosrae [31] and Palau [32], Bulgarian Gypsies [33], and sub-isolates from Sweden [34] and Israel [35]. Other special populations utilized in recent genetic studies of complex diseases include French Canadians [36], Ashkenazi Jews [37], Mennonites [38], Newfoundlanders [39], sub-isolates from the Netherlands [40] and the Amish [41].

The important observation from all these studies is that the genetic variants identified within isolates and/or exceptional families seemingly segregating a common disease in a near-Mendelian fashion are not restricted therein, but are being replicated in large-scale population samples and uncovering new pathways behind these disease processes.

Reduced haplotype complexity

The increasing information in public databases on single nucleotide polymorphisms (SNPs) and their haplotype-tagging properties [4244] as well as advances in genome-wide data collection using advanced technology platforms [45] have facilitated the recent deluge of studies utilizing the genome-wide SNP-association strategy to identify loci influencing disease phenotypes. This GWA approach is essentially 'hypothesis free'. It circumvents the necessity of understanding disease pathogenesis, which has previously guided studies of candidate genes selected for their biological relevance. In a GWA study, a dense set of SNPs totaling up to 1 million across the genome is genotyped using a standard platform and tested for association with a disease or quantitative trait. Successful gene identification by GWA studies, which operates very much under the common-disease, common-variant hypothesis, requires that the susceptibility variant itself, or a variant highly correlated with it, is among the markers typed.

As a result of the International HapMap Project [44], the linkage disequilibrium (LD) patterns of most genomic regions are known and SNP genotyping platforms have been designed to detect a restricted number of haplotype-tagging variants with the hypothesis that they should capture most of the common variation within genomic regions [46, 47]. Ultimately, the LD structure of each study population determines the number of genotyped SNPs needed for complete coverage in a GWA study.

Several studies have been undertaken to characterize differences in the magnitude and distribution of LD in global populations [4851]. Even though the density of SNPs required for 100% coverage of the genome in whole-genome genotyping efforts in various global populations remains unknown, on the basis of the size of LD blocks in 'young isolates', populations that are relatively recently (less than 2,000 years ago) inhabited or isolated, it has been concluded that GWA studies in populations such as that of Finland, the Dutch isolate referred to above, Costa Rica, Antioquia, Sardinia or the Ashkenazim require some 30% fewer markers than in more outbred populations, and that the current GWA panels provide excellent genome-wide coverage with a very small number of gaps (Figure 1) [48]. In an isolated population there are a potentially fewer number of haplotypes being segregated through the population and the haplotype-tagging SNPs should also be able to detect those haplotypes that carry more rare alleles. In a more outbred population with considerably higher numbers of haplotypes for a given locus, the causative allele is more likely to be located on several haplotypic backgrounds, thereby diluting its signal to an extent that precludes its identification by genetic means. The value of population isolates and their genomic LD patterns may thus be even greater when lower-frequency (less than 5%) variants are considered [52].

Figure 1
figure 1

Considerable differences in LD map length across populations. The length of the LD map in LD units (as defined in [88]) in 12 different population samples is depicted in order of decreasing map length. AZO, Azores; CAU, outbred European-derived sample; SAF, Afrikaner; NFL, Newfoundland; SAR, province of Nuoro in Sardinia; ASH, Ashkenazi; ERF, a village in southwestern Netherlands; FIP, Finland nationwide; ANT, Antioquia; CR, Central Valley of Costa Rica; FIC, early-settlement Finland; FIK, Finnish sub-isolate of Kuusamo. Adapted from [48].

The problem of GWA studies carried out in genetic isolates is that the strong LD that initially helped identify the disease locus may in the end hamper efforts to distinguish the biologically relevant variants from insignificant polymorphisms in complete LD with them. Comparing the GWA data across isolates from different populations should help pin down the potential causative variants for functional studies.

Restricted allelic and locus heterogeneity

Extensive allelic and locus heterogeneity, a key feature of common complex diseases, can obscure the association signal within disease-associated genomic regions. This problem is reduced in population isolates. When combined with geographic isolation that prevents the influx of new alleles, genetic drift acts to randomly raise some alleles to fixation and send others to extinction, thus reducing heterogeneity. A representative example of such drift, and of the founder effect, is the enrichment of various recessive diseases in founder populations, such as Ashkenazi Jews [53] and Finns [54], and an exceptionally low prevalence of other diseases in Finns, such as cystic fibrosis or phenylketonuria, which are common in other European populations. In founder populations, these recessive diseases are often characterized by a presence of one founder mutation, whereas numerous mutations in the same genes are identified in the global population [16]. Although allelic heterogeneity is expected to exist behind common diseases even in isolated populations, it is a reasonable expectation that the number of predisposing alleles will be more restricted than in more heterogeneous populations.

Furthermore, isolated populations may facilitate studies of the possible joint actions of associated gene loci as well as studies of the population effect of these associated markers, even before the actual causative variant has been identified. This may be possible as in isolated populations with a high degree of LD, the tagging of specific allele is more reliable than in heterogeneous populations in which broader allelic diversity of associated alleles can obscure these examinations.

In contrast to the 'gene-breaking' mutations underlying most monogenic diseases, variants that affect susceptibility to complex diseases are suggested to be ones that leave gene structure untouched and instead affect the dynamics of gene expression. Such variation can be situated in enhancer elements in the vicinity of the phenotype-causing genes or in the promoters of these genes where various transcription factors bind (cis-acting variants). SNPs elsewhere in the genome (trans-acting variants) may affect the phenotype via the function of the protein or RNA that the trans-acting gene encodes. These cis- and trans-acting variants account for much of the variation in gene expression between individuals. A good example of the identification of a trait-associated variant in a strong cis-regulatory element, using LD and samples from a population isolate, was the finding of the DNA variant behind lactose tolerance/intolerance: the variant was initially found among Finns and later confirmed to represent the common Caucasian mutation. This led to the identification of a regulatory DNA region with enrichment of mutations underlying the trait in numerous global populations [55].

Identification of rare variants

Susceptibility to common complex diseases probably involves the contribution of both common variants and rare mutations [56] and the relative significance of each in particular traits and disease phenotypes will have to be determined by large-scale resequencing studies of associated loci in large study samples. Whereas several common variants are likely to explain a substantial fraction of the heritable variation in complex traits, rare variants probably contribute significantly by having greater effects on the phenotype, as proposed for extreme lipid levels [57, 58] (Figure 2). Furthermore, although rare variants are by definition rare by themselves, in a particular population there could exist a myriad of these variants and in combination they might explain a considerable proportion of the variance in a trait of interest [58]. Consequently, in addition to the interrogation of common polymorphisms, the rare variants implicated in many Mendelian diseases along with structural variation in the genome are now studied with increasing interest [59]. Identification of rare high-impact alleles may be of critical importance for our detailed understanding of the biology behind common diseases or traits.

Figure 2
figure 2

Contribution of rare and common variants to the distribution of a quantitative phenotype. Although common genetic variants explain the majority of the phenotypic variance in the population, the contribution of rare variants with strong effects may be observed at the extreme ends of the phenotypic distribution.

A whole-genome strategy based on common haplotype-tagging SNPs is unlikely to be very successful in detecting rare variants that increase disease susceptibility [60]. The statistical power to detect susceptibility alleles is positively correlated with the frequency and the penetrance of the allele. Even though detection of rare alleles with high penetrance is essentially as feasible as the detection of common alleles with more modest penetrance, it is unclear how well these rare variants are captured with the GWA arrays designed to tag common SNPs. Thus, while genome-wide association studies are likely to continue to identify the 'low-hanging fruit', study of linkage and association in exceptional families as well as in population isolates may be necessary to identify and define those risk alleles (the majority) that, although significant, are lost in the sea of peaks that fail to reach genome-wide significance in GWA studies as a result of their rarity or population-specific effect [60]. The founder effect, genetic bottlenecks and genetic drift have worked to increase the frequency of certain rare alleles in the population isolate, thus improving the power to detect those in genome-wide studies.

Notably, owing to founder effect and genetic drift, each genetic isolate typically has a unique profile of rare disease alleles [61]. Some rare variants that are readily detected in one population isolate may go unnoticed in others, necessitating the use of multiple isolates to get a picture of the full spectrum of variants with effects on phenotype [62]. Importantly, if the impact of the rare variants on the disease phenotype is really high, measuring them in a clinical setting might turn out to be of critical importance for 'family-specific' or personalized medicine, revealing individuals with the highest genetic risk. The existence of such population- or family-specific alleles is entirely possible - even expected -and personalized medicine just might become more personal than we ever dreamed of.

Population isolates help to minimize the environmental component of disease

In contrast to monogenic diseases, where the genetic composition of an individual often solely determines the disease phenotype, environmental factors are critical risk factors for complex diseases. The incidence and prevalence of many common diseases may vary between founder populations [63], and establishing whether this variation in disease incidence is the result of genetic background or of environmental factors characteristic for the population can be challenging because of complex interactions between genetic risk factors and environmental exposures [6365]. Natural selection induced by the environment can, for instance, modify allele frequencies and may lead to distinctive disease susceptibilities in different populations [66, 67]. Furthermore, inbreeding in founder populations can increase the incidence of some common diseases, for instance via increased homozygosity of rare variants with large recessive effects [68]. In addition to increasing the incidence of the disease in a given population, environmental factors may have an effect on the severity of the disease phenotype.

Data from model animals suggest that the impact of gene-environment interaction on the phenotype may be considerable [69]. Therefore, accurate determination of phenotype, minimally perturbed by differences in environment, is of great importance for GWA studies - arguably even more so than in linkage studies using family data. Although there is variation in environmental exposures between individuals even in the most homogeneous populations, in population isolates the cultural, environmental and phenotypic homogeneity can facilitate disease-gene identification by reducing variance caused by environmental background. More uniform patterns of, for example, nutrition or exposure to pathogens or homogeneous diagnostic standards, more easily obtained for small populations, provide the best human approximation to controlled experiments in uniform conditions in inbred strains of experimental animals.

The importance of knowing the study population

Population isolates with diverse ethnic backgrounds and different degrees of inbreeding have been described from around the world. Each has its unique characteristics, and may have its own advantages and disadvantages in research into complex diseases (Table 2). Such facts should be considered in study design. Several factors, such as the demographic history of the population, age distribution, number of founders, growth pattern, and degree of genetic and cultural isolation since foundation, determine the features of the genetic landscape of a population isolate [70].

Table 2 Use of isolated versus outbred populations

Relatively young and small founding populations that have experienced population bottleneck events in their history followed by recent expansion in population size should be ideal for initial locus identification using GWA scans. This is because the population history has created a setting in which the genomes are characterized by a high degree of LD and low genetic diversity [48]. Distinguishing the biologically relevant variants at the associated loci would require older isolates with shorter LD intervals. In small, very ancient isolates with limited population growth, such as the Saami of northern Scandinavia, LD is the result of genetic drift, not a founder effect. These old isolates may be very useful for identifying common disease alleles by drift mapping [71]. Population isolates may also contain sub-isolates, which display different LD intervals of disease alleles as well as different mutation frequencies [72]: these sub-isolates may thus be ideal for complex disease gene mapping even when the founder population itself lacks any obvious advantage.

Population isolates have thus earned their place as an indispensable resource for medical genetics through their use in identifying numerous Mendelian disease genes. Their utility is increasingly valued also in complex disease gene mapping. Genetic, environmental and phenotypic homogeneity, good genealogical records, high participation rates in genetic studies, extended LD in the genome, as well as reduced allelic and locus heterogeneity are highly beneficial features for such studies.

Not all genetic isolates are alike: each population has its own advantages and disadvantages for studies of complex diseases, and thus knowing the genetic makeup of the study population is crucial. The choice and design of statistical methods also deserve particular care in studies utilizing population isolates [73] and the study strategy should also differ depending on the allelic architecture of the disease. The global wealth of population isolates with well established history and carefully phenotyped study samples is paving the way to a more comprehensive understanding of complex disease genetics. The scientific community might observe the resource of population isolates to be harnessed not only in medical genetics but also in public-health genomics.