Legumes are second only to grasses in agricultural importance [1]. They are a mainstay of sustainable agricultural systems because of their ability to reduce atmospheric nitrogen (N2) to ammonia via a symbiosis with bacteria called rhizobia. This provides legumes and subsequent crops with a free and renewable source of nitrogen in lieu of expensive, environmentally-unfriendly fertilizers. Development and differentiation of root nodules, the organ that accommodates nitrogen-fixing rhizobia in legumes, is orchestrated by transcription factors [29]. Transcription factors are DNA-binding proteins that regulate the transcription of most, if not all genes [10]. As a result, TFs play central roles in all aspects of plant biology, including development and differentiation of organs and adaptive responses to changes in the environment [11]. Transcription factors as a whole are an important target of plant research because they are a key to understanding the regulation of important plant processes as well as potential tools to optimize these processes for agriculture.

The importance of TFs in plant biology is reflected by the fact that approximately 5% of all plant genes encode such proteins [10]. Thus, even species with relatively small genomes, such as Arabidopsis thaliana contain thousands of TF genes [10]. This presents a real challenge for systematic approaches to decipher the function of TF genes in plants. Classical, 'forward' genetics has uncovered the roles of perhaps a hundred TF genes in Arabidopsis [12] and far fewer in other species [11]. Reverse-genetic approaches, using T-DNA insertion mutants for instance [13], provide a means to decipher in a systematic and relatively rapid manner the function of TF genes/proteins, although gene-redundancy often stymies this enterprise [12]. Another stumbling-block is that phenotypes associated with non-redundant TFs may be subtle in nature.

Transcript profiling can help to uncover the functions of TF genes/proteins by revealing where and when in a plant TF genes are expressed. This information can help direct our attention to particular organs, developmental stages, or conditions under which aberrant phenotypes might become apparent in a TF mutant of interest.

Medicago truncatula is a model legume species that is currently the focus of an international genome sequencing effort [14]. Several generations of cDNA [15] and oligonucleotide arrays [16] have been developed for transcriptome analysis of Medicago truncatula, including most-recently an Affymetrix GeneChip that contains 51,000 probe-sets representing a large proportion of all the genes in this species [17]. While these tools now provide a means to measure the transcriptional output of a large proportion of genes in Medicago, inherent limitations in the sensitivity of hybridization-based technologies [18] mean that transcripts of a substantial number of genes cannot be detected even when probes for these transcripts are present on the array/chip. Furthermore, expansion of arrays to encompass novel genes uncovered by genome sequencing is not a trivial task. An alternative to arrays that is 2–3 orders of magnitude more sensitive and more flexible in terms of expansion to encompass novel genes is quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Platforms for qRT-PCR analysis of thousands of Arabidopsis and rice TF genes have been developed by us and others [19, 20], and utilized to identify TF genes involved in Arabidopsis responses to nutrient stress and pathogen attack [2124]. Here we describe a bioinformatics pipeline to identify putative TF genes in Medicago truncatula and to design gene-specific oligonucleotide primers for qRT-PCR analysis of all predicted TF transcripts. Over 1000 TF primer pairs were tested and used to identify sets of organ-enhanced TF genes that may play important roles in organ development or differentiation in this species.

Results and Discussion

Identification of putative transcription factors

TF protein families are generally defined by the type(s) of DNA-binding domain they contain and putative TF genes are often identified on the basis of DNA sequences that encode known DNA-binding domains [10, 11, 25, 26]. We utilized this approach to identify putative TFs of Medicago amongst the set of proteins predicted from genomic sequence by the International Medicago Genome Annotation Group (IMGAG). Proteins of IMGAG release 1, which contained over 40,000 predicted proteins, were screened for the presence of known or presumed DNA-binding domains (Table 1), using INTERPRO [27]. Medicago proteins containing putative DNA binding domains and other domains associated with TFs were then used as query sequences in WU-BLASTX [28] which included searches of both the non-redundant DNA database of NCBI [29] and the well-curated protein database, UniProt [30] to check annotations of related proteins in support of tentative Medicago TF assignments. This process resulted in a list of 1045 putative TF genes (see Additional file 1). We utilized genomic sequences rather than the large collection of partial cDNA sequences present in Expressed Sequence Tag (EST) databases for Medicago as the starting point for TF gene discovery because protein sequences derived from genomic sequence are more complete and the set of IMGAG proteins essentially contains no redundancy. Although identification of the 'complete' set of Medicago TFs from IMGAG-annotated proteins will only be possible upon completion of genome sequencing, we expect little or no redundancy in the protein set targeted by our primer set. This approach avoids wasting money on redundant primer sets and re-organization of primers when redundancy is detected, both of which would have been inevitable if we chose to use ESTs in addition to genomic sequences to identify Medicago TFs.

Table 1 Classification of putative transcription factors of Medicago into families and sub-families

PCR primer design

To ensure maximum specificity and efficiency during PCR amplification of TF cDNA under a standard set of reaction conditions, a stringent set of criteria was used for primer design. This included predicted melting temperatures (Tm) of 58°C to 61°C, limited self-complementarity and poly-X, and PCR amplicon lengths of 100–150 base pairs (bp). Secondary hits were minimized by aligning primer candidates to all known Medicago sequences via WU-BLAST [28] and eliminating primer pairs with multiple potential hits.

PCR primer testing: gene-specificity and amplification efficiency

PCR primers were tested on Medicago cDNA free of genomic DNA contamination as follows. First, total RNA was extracted from various organs using Trizol reagent (Invitrogen GmbH, Karlsruhe, Germany), which yielded high quality RNA as judged by gel electrophoresis and by Agilent 2100 BioAnalyser using RNA 6000 Nano Chips (Agilient Technologies, Waldbronn, Germany). Typical RNA yields ranged from 0.5–1.0 μg RNA/mg fresh mass for nodules and leaves, respectively. Isolated RNA was treated with DNAse I (Ambion, product number 1907) to remove all contaminating genomic DNA, and this was always confirmed by PCR using primers to non-coding regions of the Ubiquitin gene (TC102473; AC137828-19.4). After inactivation of DNAse I, RNA was reverse transcribed using SuperScript III reverse transcriptase (Invitrogen GmbH, Karlsruhe, Germany) and oligo-dT12–18 to prime the reaction.

Specificity of PCR primers was assessed in three ways: by melting curve analysis of PCR reaction products; by separating the products of all reactions via electrophoresis in 3% agarose gels; and by sequencing a sub-set of PCR reaction products (Figure 1). 94.5% (998/1045) of primer pairs gave unique PCR products of the expected size. Only 3.3% (34/1045) of primer pairs yielded no product and 2.2% (23/1045) gave non-specific products. (see Additional file 1). Sequencing was performed on 178 randomly-chosen PCR products amplified from a 1:1 mixture of leaf and root cDNA. In the vast majority of cases (92.7% or 165/178), the sequence of the PCR product was identical to that of the intended target gene. In 5.1% of cases, the amplicon sequence matched multiple related genes, including the target gene, while in only 2.2% of cases the amplicon sequence did not match the target gene sequence.

Figure 1
figure 1

Specificity of transcription factor PCR primers. Specificity was confirmed by dissociation curves with a single peak (A) while double peaks (B) indicated off-target ampification. The derivative of fluorescence intensity is shown on the y-axis. Separation of PCR products on 3% (v/w) agarose gels following electrophoresis (C) confirmed the presence of unique amplicons of the expected size for most reactions. Few reactions yielded no products (indicated by arrow).

Ideally, PCR results in an exact doubling of the amount of dsDNA after each temperature cycle. In practice, however, this is generally not the case because the reactions are less than 100% efficient. Primer sequences can affect PCR efficiency, so we determined the efficiency of each TF primer pair from amplification plots, using LinRegPCR software [31]. First, the correlation coefficient derived from linear regression analysis of each amplification plot (e.g. see Figure 2) was used to assess the 'quality' of each reaction, and all reactions with an R2 < 0.990 were excluded from further analysis (10.6% of reactions). Next, average PCR efficiencies (E) were computed for each individual primer pair across all analyzed samples. 53.4% (558 TF genes) displayed PCR efficiencies greater than 0.80, while 39.7% (415 TF genes) had efficiencies between 0.51–0.80. Only 2.6% (27 TF genes) had mean E values below 0.4; these all yielded R2 < 0.99 in LinRegPCR analysis and mostly represented reactions that lacked detectable fragment amplification (CT > 40) or that generated unspecific PCR products (Figure 2; see Additional file 1). A similar range of PCR efficiencies were determined for Arabidopsis and rice TF primers previously [19, 20].

Figure 2
figure 2

Amplification efficiency of transcription factor-specific primer pairs. Typical real-time RT-PCR amplification plots of 384 TF genes (left) and distribution of PCR efficiencies for all 1045 TF primer pairs (right).

Selection of reference genes

Reference genes with stable expression/transcript levels throughout development and in the face of environmental challenge are crucial for the normalization of expression data of other genes. Potentially useful reference genes were chosen based on published data for Medicago (e.g. Msc27 [32]) and Arabidopsis thaliana [33]. The closest Medicago homologues of Arabidopsis genes were identified by BlastN [29]. Gene-specific primer pairs for the genes encoding elongation factor 1α (EST317575), glyceralaldehyde-3-phosphate dehydrogenase (MtC00030_GC; CT573421_3.4), β-tubulin (TC106341), Pentatricopeptide repeat protein (TC96273), actin2 (TC107326; AC137836_27.5), Ubiquitin (TC102473; AC137828-19.4), Helicase (CB892427), and the genes PDF2 (TC107161), UPL7 (TC111218), PTB (TC111751), UBC (AW686873), bHLH (CX538576), and UBC9 (TC106312) were designed using the criteria described above (Table 2). The specificity of PCR primers was tested using 18 first-strand cDNAs from six different organs of Medicago (three biological replicates each). All primer pairs produced a single PCR product of the expected size, as shown by gel electrophoresis and unique dissociation curves generated by the PCR machine after 40 cycles (Figure 3). To determine which reference genes were best suited for transcript normalisation, we used the software geNORM [34], which uses pair-wise comparison and geometric averaging across a matrix of biological samples to determine gene expression stability (M; [35]). The genes PDF2, PPRep, Ubiquitin, and PTB h ad the lowest M (greatest transcript stability) and, therefore, were judged to be the best reference genes for this diverse set of developmental samples (Figure 4; Table 2).

Table 2 Medicago reference genes and primers for qRT-PCR
Figure 3
figure 3

Specificity, efficiency, and reproducibility of PCR primers designed to amplify reference gene transcripts. Specificity of primers was confirmed by the presence of unique amplicons of the expected size following electrophoresis on 3% (v/w) agarose gels (A) and by dissociation curves with a single peak (B to D). Typical real-time RT-PCR amplification plots of three reference gene transcripts (E to G).

Figure 4
figure 4

Ranking of 8 reference genes in M. truncatula. Transcript levels of all 8 genes were measured by qRT-PCR, using 18 independent cDNA preparations from six different organs with three replicate measurements of each cDNA preparation. A low value for the average expression stability M, as calculated by geNORM software, indicates more stable expression throughout the various organs.

Identification of organ-enhanced TFs of Medicago

To get an overview of TF gene expression in Medicago truncatula and to identify TFs induced in specific organs, we used the real-time RT-PCR platform described above. Transcript profiling was performed on six different organs of Medicago (leaves, stems, flowers, pods, roots, and nodules) with three independent biological replicates for each (see Additional file 2). The fraction of genes for which transcripts were detected within 40 cycles ranged from 77.2% in leaves to 90.8% in pods. Transcripts from nearly all putative TF genes (96.8% or 1011/1045) were detected in at least one organ. Genes were called detected if they were expressed in at least two biological replicates with a CT < 40. Approximately half of all TF genes exhibited differential expression during plant development, based on significant differences (p ≤ 0.05) in transcript levels between organs. Few TF genes (1.19% or 12/1011) were expressed exclusively in vegetative organs (leaves, stems, roots or nodules), and even fewer (0.5% or 5/1011) were expressed only in reproductive organs (flowers or pods). A relatively small number of TF genes exhibited greater than ten-fold ratios in expression level in one organ compared to any other organ (Table 3). For comparison, we have included gene expression ratios derived from Affymetrix array data from the same RNA samples. While there is reasonable qualitative agreement between gene expression ratios obtained using the two methods, the lack of quantitative agreement is likely due to the limited sensitivity and low signal to noise ratio near the detection limit of Affymetrix arrays [19]. The genes listed in Table 3 may control development and/or differentiation in Medicago and are interesting targets for future research.

Table 3 Organ-enhanced TF genes


We have established a flexible platform for high-throughput qRT-PCR analysis of Medicago TF gene expression that is based on gene-specific primers arrayed in 384-well plates and SYBR® Green detection of gene-specific PCR amplicons. Currently, the platform has primer pairs for 1045 TF genes and we have plans to extend this to all predicted Medicago TF genes as genome sequencing progresses. At this stage, the resource has been utilized by several groups in Europe, Australia, and the USA, and we expect it will become the 'gold-standard' for TF transcript profiling in Medicago truncatula.


Plant material and growth conditions

Medicago truncatula cv. Jemalong, line A17 wild type plants were vernalized for 3 days in the dark at 4°C on sterile, wet filter paper. Germinated seedlings were transferred to pots containing Turface (BWI Texarcana, Texarcana, TX). Plants were grown in growth chambers under a 16 h day and 8 h night regime, at 200 μE light intensity, 24°C and 40% relative humidity.

Vegetative organs (leaves, stems, roots, and nodules) were harvested 28 days after planting. Leaf material did not include petioles and stems did not include buds. Roots consisted of the entire root system with laterals. Several plants grown at the same time were pooled for each of the three biological replicates. Biological replicates were planted on separate days. Nodules were harvested from plants inoculated with Sinorhizobium meliloti strain 1021 one and seven days after sowing. Reproductive organs were harvested from plants that were vernalized for two weeks to decrease the time to flowering. Flowers were harvested on the day of opening. Pods were harvested from 1 to 21 days after the appearance of the floral bud to cover a wide range of developmental stages. Harvested plant material was frozen in liquid nitrogen before storage at -80°C.

RNA isolation and cDNA synthesis

Total RNA was extracted using Trizol reagent [36], following the manufacturer's instructions (Invitrogen GmbH, Karlsruhe, Germany). RNA was quantified using a Nanodrop Spectrophotometer ND-100 (NanoDrop Technologies, Wilington, DE). Sixty μg of total RNA were digested with RNase free DNase1 (Ambion Inc., Houston, TX), according to manufacturer's protocol. RNA integrity was checked using an Agilent 2100 BioAnalyser and RNA 6000 Nano Chips (Agilient Technologies, Waldbronn, Germany), and by electrophoresis on a 3% (v/w) agarose gel before and after DNase I treatment. The absence of contaminating genomic DNA after DNase I treatment was verified by PCR analysis, using primer pairs designed to amplify a 107 bp genomic fragment of the control gene, Ubiquitin (TC102473intronF, 5'-GTCCTCTAAGGTTTAATGAACCGG-3'; TC102473intronR, 5'-GAAAGACACAGCCAAGTTGCAC-3').

First-strand complementary DNA was synthesized by priming with oligo-dT12–18 (Qiagen, Hilden, Germany), using SuperScript III reverse transcriptase (Invitrogen GmbH, Karlsruhe, Germany) following the instructions of the provider. To assess cDNA synthesis efficiency, qPCR was used to amplify segments in the 5' and 3' regions of Ubiquitin cDNA approx. 1600 and 400 bp from the 3'-end, respectively (primers: TC102473_5'F, 5'-TTGGAGACGGATTCCATTGCT-3'; TC102473_5'R, 5'-GCCAATTCCTTCCCTTCGAA-3; TC102473_3'F, 5'-GGCCCTAGAACATTTCCTGTGG-3'; and TC102473_3'R, 5'-TTGGCAACCAAAATGTTCCC-3'). If ΔCt (Ct3'-Ct5') < 2, then cDNA synthesis efficiency was judged to be satisfactory, and the cDNA was considered suitable for qRT-PCR analysis.

PCR primer design

The primer design pipeline was implemented in object-oriented PERL modules supported by a MySQL relational database. Primers iterated through three phases before approval: design, specificity, and selection.

The design phase interrogated TF genes with a sliding window 250 bp across that stepped 50 bp along the entire target sequence, generating primer candidates at each window. Experimental conditions, as outlined in the Results section above, were enforced by the following MIT Primer3 parameters: PRIMER_MIN_TM 58, PRIMER_OPT_TM 60, PRIMER_MAX_TM 61, PRIMER_SELF_ANY 6, PRIMER_SELF_END 2, PRIMER_MAX_POLY_X 3, and PRIMER_PRODUCT_SIZE_RANGE '100–150' [37]. The specificity phase aligned primer candidates via WU-Blast to a database of all known Medicago sequences. The selection phase sorted primer candidates by the number of possible secondary hits, self-complementarity, and poly-X characteristics. Secondary hits were defined as specificity alignments that contained at least one of the terminal ends of the primer and achieved 80% or greater identity over the length of the primer. The sequences of each primer pair are given in Supplementary Material (see Additional file 1).

Real-time PCR conditions and analysis

PCR reactions were carried out in an ABI PRISM® 7900 HT Sequence Detection System (Applied Biosystems, Foster City, CA, USA). SYBR® Green was used to quantify dsDNA synthesis. Reactions (5 μl total volume) were performed in an optical 384-well plate containing 2.5 μl 2 × SYBR® Green Power Master Mix reagent (Applied Biosystems, Warringen, UK), 5 ng cDNA and 200 nM of each gene-specific primer. Primer pairs were aliquoted using a pipetting robot (Evolution P3 liquid handling system, Perkin Elmer, MA, USA) to minimize pipetting errors. cDNA was aliquoted as a master mix of cDNA and 2 × SYBR® Green reagent, using an electronic Eppendorf multipipette. Reaction plates were sealed with a transparent adhesive cover before proceeding (Applied Biosystems, Foster City, CA, USA). All templates were amplified using the following standard PCR protocol: 50°C for 2 min; 95°C for 10 min; 40 cycles of 95°C for 15 sec and 60°C for 1 min, and SYBR® Green fluorescence was measured continuously. Melting curves were generated after 40 cycles by heating the sample up to 95°C for 15 sec followed by cooling down to 60°C for 15 s and heating the samples to 95°C for 15 sec.

Data analysis was performed with the SDS 2.2.1 software (Applied Biosystems). To determine the threshold cycle value (CT) for each PCR reaction, the threshold (ΔRn) was set within the logarithmic amplification phase. All amplification plots were analyzed with an ΔRn of 0.2. PCR efficiency (E) was estimated using LinReg software with data obtained from the exponential phase of each individual amplification plot and the equation (1+E) = 10slope [31]. To compare data from different PCR runs and different cDNA samples, CT values were normalized against the geometric mean of four reference genes (Ubquitin, PPRep, PDF2, and PTB), whose transcript levels were most stable across the biological samples analyzed. The average of the geometric mean of these four genes for all 18 samples was CT 21.23 ± SD1.15. For normalization, the mean reference gene CT value was substracted from the CT value of the TF gene of interest, yielding a ΔCT value. The expression ratios for the identification of organ-enhanced genes were obtained using the following formula on all 30 organ combinations: ( 1 + E ) Δ Δ C T , where ΔΔCT was calculated by ΔCTA minus ΔCTb, A and B are averages of three biological replicates of the two organs being compared, and E is the PCR efficiency. Dissociation curves were analysed using SDS 2.2.1 software (Applied Biosystems). RT-PCR products were resolved on 3% (w/v) agarose gels (LE Agarose, Biozym, Oldendorf, Germany) run at 4 V cm-1 in TAE Tris-Acetate-EDTA buffer, along with a 200-bp DNA-standard ladder (Promega GmbH). A subset of 178 RT-PCR products was sequenced at the JC Venter Institute (Rockville, MD, USA).