Introduction

Genome-wide association studies are becoming an increasingly effective tool for identifying genetic factors contributing to complex traits (Amos 2007). This approach, however, is only applicable to genetic model species with available genome sequences and genome-wide polymorphism data. The majority of species including several ecological model species lack these data and it is unlikely that this situation will substantially change in the near future. Therefore the candidate gene approach is an appropriate choice when searching for functional or adaptively relevant polymorphisms in genetic non-model organisms (Fitzpatrick et al. 2005; Tabor et al. 2002). In this approach, functional genes identified from studies in genetic model organisms, such as Caenorhabditis elegans, Drosophila melanogaster, Mus musculus or Homo sapiens serve as ‘candidate genes’ for similar phenotypic traits in other organisms. The structure of many genes and their cellular functions are highly conserved between evolutionary divergent animal taxa which makes this approach promising (Fitzpatrick et al. 2005). For example, information about the genetic basis of complex traits obtained from Drosophila has been used as a model for human traits and diseases (Mackay and Anholt 2006).

Once a gene is identified it is even more intriguing to find functional genetic polymorphisms in the non-model species that vary with the trait of interest. Various research strategies to detect functionally important genetic variation in natural populations have been proposed (Vasemagi and Primmer 2005), and several species-specific polymorphisms in candidate genes have been detected (e.g. Abzhanov et al. 2006; Fidler et al. 2007). The latter study replicated previously reported associations between a dopamine receptor D4 variant and human personality in a wild bird species. Interestingly, the avian functional polymorphism—although different in type—was located in the same genomic region (exon) as the mammalian polymorphism (Fidler et al. 2007).

In this paper we describe two different strategies to find naturally occurring polymorphisms in candidate genes, which are likely to have functional consequences for circadian behavioural and physiological rhythms in birds. The circadian clock is perhaps the aspect of animal behaviour most fully characterised at the molecular level (Bell-Pedersen et al. 2005). This knowledge can be utilized in a candidate gene approach to analyse the influence of natural genetic variation on circadian behaviour. Our species of interest is the blue tit Cyanistes caeruleus, which is a common European passerine bird. Free-living populations of blue tits are studied by many research groups across Europe, mostly in the context of population or behavioural ecology. Johnsen et al. (2007) for example reported a polymorphic tandem repeat in the CLOCK gene, one of the core genes involved in generating endogenous rhythms. The authors reported a correlation between the CLOCK repeat length and latitude of different blue tit populations.

Strategy I: search for tandem repeats in exonic regions

Rationale and approach

The first strategy was to look for simple tandem repeat elements in exons of the candidate genes, either in protein coding regions or in untranslated regions (UTRs). It has been shown that the mutation rate of microsatellites is higher than that of non-repeat sequences (Jeffreys et al. 1988). These mutations are a consequence of different mechanisms such as unequal crossingover, gene conversion or replication slippage, that lead to a change in sequence length (reviewed in Nikitina and Nazarenko 2004). Several studies have shown that the number of repeats at mini-/microsatellite loci can influence different aspects of gene function. In particular, increasing length of trinucleotide repeats are associated with various inherited neurodegenerative disorders in humans (e.g. fragile X syndrome, Verkerk et al. 1991; Huntington disease, Mirkin 2007; Spinocerebellar ataxias, Orr and Zoghbi 2007 and Friedreich ataxia, Campuzano et al. 1996). Many of these disorders involve repeats which produce polyglutamine tracts in the amino acid sequence, i.e. they directly affect protein structure. Gene expression can also be altered by other mechanisms such as degradation of mRNA, decrease in protein production, an increase in DNA methylation resulting in the absence of gene expression, repressed transcription through increased nucleosome stability or gene silencing (Choong et al. 1996; Imagawa et al. 1995; Pieretti et al. 1991; Wang et al. 1994). The relevance of tandem repeats in gene regulation was emphasized by a study showing that trinucleotide repeats in Saccharomyces cerevisiae are more frequent in open reading frames (ORFs) of genes that encode proteins involved in the regulation of transcription than in any other type of ORF (Young et al. 2000). Furthermore, dinucleotide repeats have the ability to form left-handed Z-DNA which plays a role in regulation of transcription due to altered DNA structure and protein binding affinity (Comings 1998).

In order to find repeating elements with a high probability of showing functional polymorphisms in clock genes, we first queried the National Center for Biotechnology Information (NCBI) database on the web (http://www.ncbi.nlm.nih.gov/) with the following key words: “biological rhythm* or biological timing or circadian rhythm* or central clock and Eukaryota”. Secondly, we queried the University of California Santa Cruz (UCSC) Genome browser (http://www.genome.ucsc.edu/cgi-bin/hgGateway) searching for “Simple tandem repeats within RefSeq exons of chicken”. The NCBI query resulted in a list of 206 genes that have been reported to have a function related to the endogenous clock. The UCSC query resulted in a list of 438 genes that are known to contain simple tandem repeats in exons of annotated genes of the chicken (Gallus gallus). The intersection between the two sets of genes contained the following five genes: CLOCK (circadian locomotor output cycles kaput protein), NPAS2 (neuronal PAS domain protein 2), ADCYAP1 (=PACAP, adenylate cyclase activating polypeptide 1), CREB1 (cAMP responsive element binding protein 1, containing two distinct repeats), CSNK1A1 (casein kinase 1 alpha 1) (see Table 1). CSNK1A1 contains a repeat sequence with a period length of about 1,700 base-pairs. Therefore it is unlikely to show a polymorphism and consequently was not considered further.

Table 1 Genes involved in the endogenous clock that contain a repeat element in RefSeq exons of chicken

We then Blasted the chicken mRNA sequence of the exon comprising the repeat against the NCBI databases “nucleotide collection” and “Non-human, non-mouse ESTs” limiting the search to “Aves” and against the zebra finch (Taeniopygia guttata) WGS database of the NCBI trace archives (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi). The chicken sequence and an aligned sequence of a second bird species (in these cases sequences of zebra finch or wild turkey Meleagris gallopavo) were used to generate primer oligonucleotides flanking the requested exon. For designing forward and reverse primers for PCR amplification we used the program PrimaClade (http://www.umsl.edu/services/kellogg/primaclade.html), which is suited to design primers from multiple-species alignments. Primers were between 18 and 26 base-pairs in length and had one to three degenerated positions if necessary (Supplementary Table). After amplifying the targeted sequence of the blue tit genome in a thermocycler (Supplementary Table) we ran the PCR products of each gene from 16 presumably unrelated adult individuals on a 10% polyacrylamide gel. If bands on the gel showed any inter-individual difference due to length variance of the amplified products we confirmed the presence of a polymorphism by running the fragments on an ABI 3130 sequencer. For this purpose we used fluorescently labelled forward primers in the PCR reactions.

Results

We amplified each of the requested exons of the four candidate genes in the blue tit genome (Table 1). Four of the five simple tandem repeats of interest were also found in the blue tit (Table 2). Furthermore, these repeats were polymorphic: in a sample of 148 presumably unrelated blue tits sampled in 2007 in our study population (Westerholz, 48°08′ N 10°53′ E, Southern Germany) the number of alleles varied between 5 and 7 (Table 2). Sequencing one carrier of each occurring allele confirmed that the detected variation in length of the alleles is caused by different copy numbers of the repeating elements. Genotype proportions were in Hardy–Weinberg equilibrium (Table 2, all P > 0.1). We also tested whether the developed primers for the tandem repeats could be used in another songbird, the blackcap (Sylvia atricapilla). In a sample of 70 individuals from Southern France (43°31′ N 4°43′ E), we could directly genotype all four microsatellites. All markers were in Hardy–Weinberg equilibrium except for NPAS2 which turned out to be monomorphic in this sample. Heterozygosity values ranged between 0.30 and 0.69.

Table 2 Detected repeats in clock genes of blue tits

Strategy II: search based on reported genotype–phenotype associations

Rationale and approach

The second strategy to find functionally important polymorphisms in genes involved in generating endogenous rhythms was based on reported genotype–phenotype associations in model organisms. First, we searched Web of Science® (http://apps.isiknowledge.com) for articles reporting an association between polymorphisms in clock genes and a relevant phenotype. We used key words such as: “polymorphism AND sleep”, “circadian rhythm AND polymorphism”, “clock gene AND sleep”, “circadian rhythms AND gene”, “polymorphism AND clock” or in general “polymorphism AND bird AND association”.

In total, 24 studies tried to link a naturally occurring allelic variation in a clock gene with a phenotype that is influenced by the endogenous circadian clock (Table 3). Phenotypes investigated in humans were either extreme diurnal preferences classified as morningness or eveningness (Horne and Ostberg 1976), or a variety of sleeping patterns, including various sleep disorders (Table 3). In non-human animals, studies reported the period length of different behavioural rhythms. For further work, we selected those studies that reported a significant association between genotype and phenotype.

Table 3 Reported studies in different species that aimed to link a circadian phenotype with a certain polymorphism in one of the clock genes

Second, we attempted to localize the position of the investigated polymorphism in the homologous bird gene. In the case of single nucleotide polymorphisms (SNPs) located in protein-coding regions, we proceeded as follows. First, we identified the position of the amino acid encoded by the polymorphic codon in the organism as reported in the original paper. We then aligned the amino acid sequence of the entire protein from the reported organism with the chicken genome in the UCSC browser. If we found a good alignment between the protein region of interest and a sequence in the chicken genome, we aligned the complete protein of chicken from the NCBI database entry to the chicken genome in the UCSC browser. In the resulting protein-DNA alignment the intron-exon structure of the chicken gene is shown, and we could determine the exon comprising the potentially polymorphic codon.

For SNPs located in UTRs of a gene and for the tandem repeat in exon 18 of PERIOD 3 (see Table 3), the respective mRNA sequence of the model organism obtained from the NCBI database was directly aligned to the chicken genome. This was done by a NCBI “blast” (Altschul et al. 1997, Benson et al. 2008, http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) and UCSC “blat” (Kent 2002, http://www.genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=Chicken&db=galGal3&hgsid=104455251) homology search. The best alignment was then tested for consensus in the region of the studied polymorphism.

To amplify the sequences of interest in the blue tit we designed PCR primers as described above for the tandem repeat sequences. The goal was to amplify the entire exon containing the homologous position of the polymorphism of interest. Degenerated primers were developed from the alignment between chicken and zebra finch, or, if no zebra finch sequence was available, between chicken and human, Japanese quail (Coturnix japonica) or starling (Sturnus vulgaris) sequences (Supplementary Table). After PCR amplification of genomic DNA from blue tits (Supplementary Table), the PCR products of 10–14 presumably unrelated adult individuals were directly sequenced. Electropherograms were studied by manual inspection and sequences then aligned by using the programme Bioedit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html) and screened for sequence variations in either heterozygous or homozygous forms.

The ABI Prism ® SNaPshot™ Multiplex Kit (Pati et al. 2004) was then used to genotype 149 presumably unrelated adults sampled in our population in 2007 for each detected exonic SNP.

Results

In total, we studied polymorphisms at 18 different sites (17 SNPs and one VNTR = variable number tandem repeats) in 10 genes for which a significant association to a behavioural trait was reported (Table 3). Fourteen of these published polymorphisms were discovered in human genes, the remaining four in Drosophila, mouse (Mus musculus) and Syrian hamster (Mesocricetus auratus). For seven of the SNPs we could identify the exact position in the homologous exon of four different chicken genes. At six sites the amino acid and the nucleotides in the mRNA coding for this amino acid were identical between chicken and the studied organism. At one SNP site (exon 4 of AANAT; Table 3) the coded amino acid was not identical between chicken and the reported species, but we still found high similarity at the surrounding amino acids. We obtained a specific PCR product for the blue tit of these seven target regions. For the marker CKIδ no polymorphisms were detected in the sample of 10 blue tit sequences.

Overall, we discovered seven exonic SNPs in the amplified fragments of blue tit DNA in the genes AANAT (2 SNPs), PERIOD 2 (2 SNPs) and CKIε (3 SNPs). SNP sites were located between 1 and 79 base-pairs away from the position of the reported SNP in the model-organism. All detected SNPs in blue tits were silent (synonymous).

For six of these seven SNPs we genotyped 149 individuals and detected two alleles at each locus (Table 4). Genotyping of one SNP in CKIε failed so far. Genotype frequencies for the six SNPs were not significantly different from those expected under Hardy–Weinberg equilibrium (Table 4, all > 0.3). We found significant linkage disequilibrium between the two SNPs in the gene PERIOD2 and between the two SNPs in the gene CKIε (D(PERIOD2) = 0.995, p (PERIOD2) = 0.035; D(CKIε) = 0.998, p (CKIε) < 0.0001).

Table 4 Single nucleotide polymorphisms (SNPs) in three clock genes of blue tits

Discussion

We reported high success rates for two strategies to detect potentially functional polymorphisms in candidate genes of interest. The first approach was to search for tandem repeats in exonic regions of genes. All four target regions that contained tandem repeats in chicken genes could be amplified in the homologous blue tit genes. Backström et al. (2008) designed primers at conserved sites between chicken and zebra finch, and found cross-species primer amplification success for blue tits of 83% (N = 122 markers tested). Difficulties referring to cross-species primer design turned out to be of minor importance in both strategies of our study. Only three out of eleven pairs of primers designed from chicken and another bird reference species did not amplify the homologous fragment in blue tits and this problem could be solved by designing new sets of primers. Furthermore, four of the five studied tandem repeats could be found in the blue tit and they were polymorphic. Thus, gene-associated tandem repeats seem to be highly conserved among the class of Aves.

In contrast, it is estimated that only 13–25% of anonymous microsatellites developed for specific passerines co-amplify in other passerine species (Dawson et al. 2006; Hansson et al. 2005; Primmer et al. 1996). However, this relatively low cross-species amplification success may be due to variable primer binding sites, and not to the absence of the tandem repeat itself. In general, tandem repeats have a high potential to be functional and polymorphic (Contente et al. 2002; Iglesias et al. 2004; Nikitina and Nazarenko 2004). Therefore, this group of markers—if available—could be a first choice for testing genotype–phenotype associations.

Our second strategy was to focus on gene regions (particularly exons), for which positive associations with phenotypic traits had already been reported. Our analysis showed a high success rate in finding a polymorphism in the target region. In summary, we were able to detect 1–2 polymorphisms in four blue tit regions that showed high homology to the target regions of human, Drosophila and hamster. Although the detected SNPs in blue tit genes are silent, they might have functional consequences by mechanisms affecting mRNA structure (Duan et al. 2003; Kimchi-Sarfaty et al. 2007; Shen et al. 1999) or pre-mRNA splicing (Cartegni et al. 2002). In total we sequenced about 2,180 base-pairs of genomic blue tit DNA, of which 1,630 base-pairs were exonic and 550 base-pairs intronic. Thus SNPs in exons occurred on average every 233 base-pairs in the single blue tit population we studied. This is more frequent than the estimate reported for collared flycatchers Ficedula albicollis (on average one SNP in every 550 base-pairs of coding sequence; Backström et al. 2008), but it has to be considered that our calculation is based on a rather short DNA sequence of about 2.2 kb.

Both search strategies revealed only a small number of polymorphisms with the likelihood of being functional. This can now be tested by association with behavioural traits of interest. Thus, the advantage of this approach—in contrast to large-scale genotyping—is that it reduces the multiple testing problem and the costs. A prerequisite for these approaches is that the genes underlying phenotypic traits are already known. Many other strategies, for example QTL mapping (Flint and Mott 2001), candidate gene approach (Tabor et al. 2002) and targeted mutation (knock-out and knock-in technologies, Austin et al. 2004; Rago et al. 2007) have recently been developed to find the genetic basis of phenotypic traits. The approach described here allows us to investigate in a following study whether particular genetic polymorphisms are associated with variability in phenotypes, under the assumption that the function of a gene is conserved between different animal taxa. The candidate-gene strategy is appropriate for the majority of organisms where no databases for genetic polymorphisms are available. A clear disadvantage of that approach in contrast to whole genome studies is that relevant polymorphisms might be missed.