Introduction

The Drosophila wing has been used as a model for investigating the molecular mechanisms of organ morphogenesis and, ultimately, the evolution of animal size and shape. Several genes have been identified in Drosophila melanogaster, many of which participate in signaling pathways that regulate processes of tissue patterning, growth, and differentiation during wing development (Lecuit and Le Goff 2007). Expression differences in one such gene (Ultrabithorax) might be sufficient to promote the alternate development of wing or haltere in Drosophila (Weatherbee et al. 1998). And results from recent studies strongly suggest that the qualitative morphological difference between winged and unwinged polyphenic adults in pea aphids and ant species might be controlled through differential expression of genes orthologous to those identified in D. melanogaster (see Brisson et al. 2010). But in the case of quantitative variations in Drosophila wing, which are expected to be controlled by several genes with smaller phenotypic effects (see Matta and Bitner-Mathé 2010), transcriptional differences might be relatively small and difficult to detect.

Accurate estimation of transcript abundance requires a sensitive technique, such as reverse transcription followed by real-time quantitative PCR (qPCR). Nevertheless, gene expression analysis through qPCR has some intrinsic problems. Stochastic errors might be introduced throughout the experiment: from extraction and processing of RNA samples to preparation and efficiency of reaction mixtures (Huggett et al. 2005; Rebrikov and Trofimov 2006). Biological differences, like overall variation in transcriptional activity, might also affect the quantification of gene-specific differences (Vandesompele et al. 2002; Andersen et al. 2004). Such experimental variations can be minimized through normalization strategies during data analysis. A popular choice is the use of stably expressed genes as internal reference controls. The reasoning is that reference (REF) genes are subjected to the same experimental variation as the genes of interest (GOI) but are unaffected by the experimental conditions that are being compared. However, traditional REF genes have been used for internal normalization without any validation in the specific experimental conditions, thus assuming a universal stability for these genes (Guenin et al. 2009).

Systematic selection of stable REF genes is a common procedure in gene expression studies in plants (see Cruz et al. 2009; Artico et al. 2010). Excluding few recent works (Supplementary material), REF gene selection has still been overlooked or not systematically assessed in insect species, even in Drosophila studies. In this study, the major goal was to identify bona fide REF genes for gene expression analyses of quantitative and qualitative morphological differences in D. melanogaster wings. Such genes, or the respective orthologous, might also be included in validation tests as good candidates for finding stable REF genes in different morphological comparisons, using Drosophila or other insect species. In addition, our results show that biased results in qPCR experiments may be avoided in practice, through a systematic REF gene validation in every experimental condition.

Materials and methods

Fly strains and two-class comparison

Phenotypic variation in adult wings and gene expression levels in imaginal wing disks were analyzed in a two-class comparison using four D. melanogaster strains: nub 2 strain is homozygous for a hypomorphic regulatory mutation of nubbin (nub) gene and was obtained from Bloomington Stock Center (FlyBase ID: FBst0000358); 1C, 1L, and 1R strains were provided by Bitner-Mathé and coworkers. A quantitative comparison was performed between 1R and 1L, which were previously established through a bidirectional artificial selection for increasing the quantitative divergence in the outline shape of the wings, as briefly described in Supplementary material—the complete selection program and results since first generation will be published elsewhere by Bitner-Mathé and coworkers. In turn, a qualitative comparison was made between a wild-type (1C) and a vestigial-winged strain (nub 2).

Biological replicates

Two biological replicates of each strain were established by setting two sets of 15 fly pairs in separate culture bottles for mass crossing and oviposition at 25°C. To improve larval staging, cornmeal sucrose medium was supplemented with 0.05% bromophenol blue (see Supplementary material). Fly pairs of 1C, 1L, and 1R strains were taken from the 67th non-overlapping generation. Offspring siblings from each strain and biological replicate were used in both phenotypic (adults) and gene expression (larvae) analyses.

Phenotypic scoring

Wing length (W L) and width (W W) are orthogonal linear measurements that were estimated in vivo using a stereoscopic microscope with ocular scale; each unit corresponds to 0.026 mm, approximately. The outline shape of the wing was estimated through width-to-length ratio, so that wings with higher W W/W L values are rounder than those with lower W W/W L values. The relationship between W W/W L and the outline shape of the wing is straightforward: wings with higher W W/W L values are rounder because they have higher W W and/or lower W L values. A disproportionate increase in W W relative to W L or a disproportionate decrease in W L relative to W W can also lead to the higher W W/W L values found in rounder wings. Note that W W/W L is an in vivo surrogate of the wing shape index proposed by Klaczko and Bitner-Mathé (1990), which was derived from the fitting of an ellipse to the wing contour. These authors also proposed that the geometric mean of the two ellipse radii might be used as an index of wing size; for details, see also Matta and Bitner-Mathé (2004). To obtain an in vivo surrogate for this index, W L/2 and W W/2 were taken as proxies of the major and minor ellipse radii, respectively. Hence, wing size (W SI) was estimated as the geometric mean of W L/2 and W W/2. Statistical analysis was performed with SYSTAT© v10.0 (SPSS Inc.).

Dissection and sampling of imaginal disks

From each strain and biological replicate, 100 intact disks from exactly 25 male and 25 female larvae were sampled and pooled together, as follows. Each late-third instar wandering larva was sexed and dissected in a drop of cold PBS on a glass slide. The pair of wing disks was immediately collected using fine-tip forceps and cleaned in another drop of PBS, before short-time storage in 1 mL of RNAlater® (Qiagen) at 4°C. Once completed (~2 dissection days per sample), all samples were stored at −80°C (~30 days) until RNA isolation.

RNA isolation and cDNA synthesis

RNAlater® was thoroughly removed and samples were snap-frozen in liquid nitrogen for manual grinding and homogenization. Total RNA was isolated with RNeasy® mini kit (Qiagen), following manufacturer instructions. During purification, on-column DNA digestion was performed with RNAse-free DNAse I (Invitrogen), according to RNeasy® protocol. First-strand cDNA was synthesized from 2 μg of total RNA with 50 μM of Oligo(dT20) and 10 mM of each dNTP, plus 200 units of SuperScript™ III RT enzyme (Invitrogen), 5× First-Strand buffer (Invitrogen), and 20 mM of dithiothreitol, in 20 μL final volume and incubation step of 60 min at 50°C. RNA quality was checked (Supplementary material).

qPCR primers

Detailed information on all genes, primers, target specificity, and expected amplification products is presented in Supplementary material. Primers were designed with Primer3Plus© (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) using the following criteria: primer length between 19–22 bp, amplicon size around 120 bp (103–136 bp), and temperature of melting (Tm) around 60°C (59.5–61.9°C); except for Act5C primers (Tm around 65°C), which were designed by Qin et al. (2005). Primers were screened for hairpin or dimer formation and were manufactured by IDT© using standard desalting purification.

Real-time qPCR

Technical triplicates of each reaction mix were prepared in 40 μL final volume containing: 1× PCR buffer (Invitrogen), 3 mM of MgCl2, 0.1× of SYBR® Green I (Molecular Probes), 25 μM of each dNTP, 0.2 μM of each primer, 0.5 units of Platinum® Taq DNA polymerase (Invitrogen), and 20 μL of water-diluted cDNA (1:50), which corresponds to 40 ng of total RNA. The commercial solution of SYBR® Green I 10,000× in DMSO was 1:100 diluted in DMSO and then 1:100 diluted in water (1× immediate working solution). qPCR was performed on Chromo4™ real-time PCR detector system (BioRad) using 96-well full-skirt clear optical plates (Axygen) sealed with MicroAmp™ optical adhesive film (Applied Biosystems). Amplification conditions were: hot start of 5 min at 94°C followed by 40 cycles of 15 s at 94°C, 10 s at 60°C, 15 s at 72°C, 35 s at 60°C, and a plate reading step for quantification of raw fluorescence, via OpticonMonitor™ v3.1.32 software (BioRad). Dissociation curves were estimated from 30°C to 100°C, with a plate reading step at 1°C increments. Raw fluorescence data was exported to comma separated files (.csv) without background or baseline subtraction. Quality control was performed (Supplementary material).

qPCR basic data analysis

Transcript abundance was evaluated through the fractional quantification cycle (C q) at which a characteristic point in the reaction curve (the maximum of the second derivative) was achieved—given a good mathematical modeling, C q should be more precise in estimating transcript abundance than threshold cycles (Zhao and Fernald 2005; Rebrikov and Trofimov 2006). Here, C q values were estimated using Miner (Zhao and Fernald 2005) v2.2 (http://www.miner.ewindup.info/). To check the repeatability of technical triplicates and biological replicates, outlier detection and ANOVA of C q values were performed (Supplementary material). Efficiency (E) of amplification is also an important parameter in qPCR (Rebrikov and Trofimov 2006) and Miner can perform a direct and robust estimation through mathematical modeling of raw fluorescence data (Zhao and Fernald 2005). Average efficiencies of each primer pair are presented in Supplementary material. Since constant efficiency is essential for a reliable comparison across the given samples, standard errors of average efficiencies are also presented.

Selection of candidate reference genes

An ideal validation set should include candidate REF genes with no prior expectation of expression differences between the experimental groups (fly strains in our case) and should not include knowingly co-regulated genes. The six putative REF genes analyzed here were: Act5C, eIF-2α, Gapdh1, RpL32 (also known as rp49), Tbp, and βTub56D (detailed information in Supplementary material). Apart from being commonly used in D. melanogaster qPCR studies (references in Supplementary material), no evidence that the expression of these traditional REF genes is directly associated with quantitative variation in the wing was found to date. So we expect them to be roughly stable across wing disks from fly strains with divergent wing morphology. And, by inspecting the FlyBase report of each gene, no prior knowledge of co-regulation among these genes was found. Even when prior knowledge is not available, REF genes might still be carefully selected through the identification of orthologous genes in related species (see Cruz et al. 2009; Artico et al. 2010). The number of genes to be included in the validation set is a trade-off between precision and practical implications (see Supplementary material).

Stability ranking of candidate REF genes

REF gene validation represents a circular problem: stability between experimental groups has to be assessed without the use of any other reference. So here we compared two different approaches for the identification of stable REF genes, by using the statistical algorithms NormFinder© (Andersen et al. 2004) and geNorm (Vandesompele et al. 2002). Transcriptional stability was independently evaluated in each comparison (quantitative or qualitative). For each REF gene, C q values were converted into linear relative quantities Q = E ΔCq; where E is the average amplification efficiency, and ΔCq is the average C q of technical triplicates in the sample with highest concentration (lowest average C q) minus the average C q of technical triplicates in the sample in question. Q values by group (fly strain) and subgroup (biological replicate) were analyzed with NormFinder v0.953 (http://www.mdl.dk/publicationsnormfinder.htm). To derive a stability index, NormFinder uses a model-based approach that takes into account both inter- and intragroup variation: the basic assumption is that a stable REF gene should have minimal variation across experimental groups and subgroups (details in Andersen et al. 2004). Candidate REF genes are ranked according to this index and the pair that minimizes inter- and intragroup variation is appointed as the most stable pair of REF genes. Q values by group were also analyzed with geNorm v3.4 (http://medgen.ugent.be/~jvdesomp/genorm/). To derive a stability index, geNorm uses a pair-wise approach: the basic assumption is that stable REF genes should have similar intergroup variation, thus leading to small pair-wise difference (details in Vandesompele et al. 2002). Briefly, stability is estimated by the average pair-wise variation of the given REF gene with all other genes in the validation set. REF genes are then ranked through successive step-wise exclusion of the least stable gene, followed by re-estimation of pair-wise stability values, until the most stable pair is identified and cannot be further separated.

Genes of interest

Three GOI were analyzed: nubbin (nub), rotund (rn), and Epidermal growth factor receptor (Egfr). They are involved in regulation of growth and patterning during imaginal wing development and were chosen on the basis of mutant phenotypes or putative association with quantitative variation in D. melanogaster wings. nub encodes a transcription factor of the POU homeodomain family, which regulates development through different processes like cell migration and proliferation—NUB is first observed in the presumptive wing region of early-second instar disks and accumulates in a concentric domain within the wing pouch, being required for correct specification of the proximal–distal axis (Ng et al. 1995; Kolzer et al. 2003). rn encodes a zinc-finger transcription factor also involved in proximal–distal patterning of wing disks; in rn mutants the wing blade is shortened and somewhat rounded—RN is expressed beginning in the second larval instar and its concentric domain is contained within NUB domain (Kolzer et al. 2003). The EGFR signaling pathway regulates different developmental processes, like cellular differentiation and proliferation—in wing disks, EGFR is involved in specification of third and forth longitudinal veins during third larval instar (Bier 2000). Findings of Dworkin et al. (2005) support a significant association between a quantitative variation in wing shape and a single nucleotide polymorphism in a noncoding site of Egfr.

REF gene validation and relative expression of GOI

In a given experimental condition, bona fide REF genes are expected to produce similar normalization results for the same GOI. For this reason, we performed a comparative validation test by normalizing the expression of each GOI with each candidate REF gene, as well as with each stable pair identified by NormFinder and geNorm. In every case, the relative expression of GOI was estimated through a normalized expression ratio (NER), as follows. Using the efficiency calibrated model (Pfaffl et al. 2002), NER = (E GOI)ΔCqGOI/(E REF)ΔCqREF, where E is the average amplification efficiency and ΔC q is the average C q in control group minus the average C q in sample group. When more than one REF gene was used as reference, normalization factors were obtained by the geometric average of (E REF)ΔCqREF terms. The use of such efficiency corrected model is highly recommended: in models with no correction, even small efficiency differences can lead to a great bias in the C q values (Rebrikov and Trofimov 2006), which will affect the NER estimates. Experimental significance of NER was estimated through 1,000 randomizations of C q data in each experimental comparison, using REST© v2009 (Pfaffl et al. 2002)—available at http://rest.gene-quantification.info/. In the cases where outlier reactions were excluded, NER was also estimated with REST© vMCS and results did not change.

Multiple tests

Whenever multiple tests were performed, statistical significance was adjusted using standard Bonferroni correction (α/n): n is the number of tests and uncorrected α was set to 0.05.

Results and discussion

Morphological variation

In the quantitative comparison, wings from 1R ‘round’ strain are significantly rounder (higher width-to-length ratio) than wings from 1L ‘long’ strain; descriptive statistics and ANOVA for all wing traits are presented in Table 1. Significant differences in wing length (W L) and width (W W) were also found: 1R wings have lower average length but higher average width than 1L wings. Considering that wing length is largely related to the proximal–distal axis and wing width to the anterior–posterior axis, the quantitative wing shape variation between 1R and 1L strains is probably achieved through developmental changes along these axes. Regarding the wing size (W SI), no significant difference was observed between the strains. In turn, significant sex differences were found for all wing traits, the female wings being on average longer and larger than male wings (data not shown). This sexual dimorphism was already observed in species of the D. melanogaster subgroup, even in interspecific hybrids (see Matta and Bitner-Mathé 2004, 2010). Owing to such differences, and to significant sex–strain interactions, the sexes were balanced during the sampling of imaginal wing disks. Nested effects of biological replicates were also tested but did not affect any wing trait (Table 1). Since adult individuals analyzed in this morphological comparison are siblings of late-third instar larvae from which the wing disks were obtained, replicate effects in the cDNA samples should also be negligible. In fact, when C q values of qPCR reactions were analyzed (Supplementary material), no nested effects of biological replicates were observed for any gene in any comparison (quantitative or qualitative).

Table 1 Descriptive statistics and ANOVA of quantitative variation in D. melanogaster wings

Because of severe morphological defects in nub 2 wings, the qualitative comparison was performed through visual inspection (Fig. 1). Wings of nub 2 homozygous mutants are abnormally folded and strongly reduced along the length and width, relative to wild-type individuals (1C strain). In addition, development of wing margin, hinge region, and most veins is greatly compromised (see Ng et al. 1995).

Fig. 1
figure 1

Male wings from fly strains of the qualitative comparison. a Wild-type 1C strain. Traced lines show the approximate boundary of anterior–posterior (AP) axis (see Ng et al. 1995) and the direction of proximal–distal (PD) axis. b Homozygous nub 2 mutant strain

Stability of candidate REF genes

In the quantitative comparison, RpL32, Tbp, and eIF-2α were top-ranked as most stable REF genes by NormFinder and geNorm (Table 2). Besides, the ranking order of the remaining genes did not vary between algorithms. If the validation set is ideal, NormFinder and geNorm should, at least in theory, produce identical or slightly different ranking results. This seems to be the case here. Figure 2 presents inter and intragroup variation, as estimated by NormFinder. In the quantitative comparison, top-ranked candidate REF genes (RpL32, Tbp, and eIF-2α) show small intergroup variation, which indicates they should be stably expressed across wing disks from morphologically divergent strains, such as 1R and 1L. However, only RpL32 and Tbp presented small intragroup variation (across biological replicates). And so, these two REF genes are expected to produce the most reliable normalization results.

Table 2 Stability of reference (REF) genes, as ranked by NormFinder and geNorm
Fig. 2
figure 2

Intergroup variation of candidate REF genes, as estimated by NormFinder. Variation in REF gene expression was estimated for 1L strain relative to 1R in the quantitative comparison (open circles) and for nub 2 strain relative to 1C in the qualitative comparison (solid squares). Error bars correspond to the respective intragroup variation. Stable genes should present minimal intergroup variation (close to zero) and small error bars; values are in log2

In the qualitative comparison, Act5C, RpL32, and Tbp were top-ranked by NormFinder (Table 2). Regarding geNorm results, RpL32 and Tbp were also top-ranked, but the ranking order of the remaining genes was rather different from that of NormFinder—note that Act5C showed the largest ranking variation (first to fourth positions). In practice, different ranking results are not uncommon (Andersen et al. 2004; Cruz et al. 2009; Bustin et al. 2005). Because of its pair-wise approach, geNorm may not perform properly if co-regulated genes are inadvertently included in the validation set (Andersen et al. 2004; Bustin et al. 2005). In our case, apart from careful selection of candidate REF genes (Materials and methods), the similar ranking results in the quantitative comparison do not indicate the presence of co-regulated genes. Alternatively, ranking results may differ if candidate REF genes are not roughly stable across experimental groups, so that the assumption of an ideal validation set is not fully met. In fact, NormFinder results suggest that no single gene has small inter- and intragroup variation in the qualitative comparison (Fig. 2). This result seems to be corroborated by a non-normalized exploratory analysis (Supplementary material) and might be explained by the severe developmental problems of nub 2 wings, which could affect the expression of many genes during imaginal disk development.

To reduce possible assumption errors in the choice of a single REF gene, a normalization factor can be estimated with two or more genes. The reasoning is that intergroup variation in the geometric average of different genes should be smaller than in a single gene (Vandesompele et al. 2002). Table 2 presents the most stable pairs of REF genes, as estimated by each algorithm (Materials and methods). Regardless the experimental comparison, all stable pairs contained RpL32, Tbp, or eIF-2α genes. We also note that stable pairs did not change when 1C strain was included in the quantitative comparison (data not shown). However, as discussed by Andersen et al. (2004), in order to minimize the average intergroup variation in an ideal normalization factor, the selected genes should have small intragroup variation and opposite intergroup variation, such as eIF-2α and Tbp genes in the qualitative comparison (Fig. 2). At least in theory, the stable pairs indicated by NormFinder might produce more reliable normalization results than other pairs or single REF genes.

The optimal number of REF genes necessary to normalization is a trade-off between minimizing intergroup variation and practical implications, since more genes will have to be tested. geNorm uses a step-wise approach to estimate the change in pair-wise variation (V) when the subsequent most stable gene is included in the normalization factor (V n /V n+1). This value is generally compared to a empirical cut-off value of 0.150, below which the inclusion of another REF gene should have no significant contribution in reducing intergroup variation in the normalization factor (Vandesompele et al. 2002). Here, the change in pair-wise variation when including a third REF gene was far below the cut-off in both quantitative (0.026) and qualitative (0.045) comparisons. Hence, only two REF genes should be sufficient for an accurate normalization in each case.

Relative expression of GOI and REF gene validation

Relative expression of each GOI (nub, rn, and Egfr) was estimated through normalized expression ratio using the stable pairs of REF genes as internal references (Fig. 3). Such normalization factors should produce more reliable results than single REF genes. In the qualitative comparison, nub was expected to be down-regulated in wing disks from nub 2 hypomorphic mutants relative to wild-type 1C individuals. In fact, negative NER values were observed for nub, regardless the internal reference. Because of the severe morphological alterations in nub 2 wings, it was also expected that lower levels of NUB could affect the regulation of synchronic or downstream expressed genes, such as rn and Egfr. No matter the internal reference, significant down-regulation was observed for rn and Egfr, suggesting that the transcription of these genes is somehow regulated by the amount of NUB. Fold-change for each GOI is presented in Supplementary material. In the quantitative comparison, we had no prior expectations regarding the relative expression of GOI. We observed that, when using each stable pair as reference, NER values did not differ from zero for any GOI (Fig. 3). So we assume that rn, nub, and Egfr genes are not differentially expressed in third instar wing disks of 1L and 1R strains. And, in our experimental conditions, the use of only two REF genes seems sufficient for accurate normalization of GOI, as anticipated by geNorm.

Fig. 3
figure 3

Normalized expression ratio (NER) of each GOI (nub, rn, and Egfr). Normalization was performed using the stable pairs indicated by NormFinder (pair 1) and geNorm (pair 2). Results from normalization using the least stable REF genes in qualitative (eIF-2α and Gapdh1) and quantitative (Act5C and βTub56D) comparisons are also presented. NER values were obtained for nub 2 strain relative to 1C (qualitative comparison) and for 1L strain relative to 1R (quantitative comparison); values are in log2; 95% confidence intervals are shown (error bars). Differentially expressed genes should have log2 NER ≠ 0. Significance of NER values were estimated through 1,000 randomizations of the original C q data. To reduce the detection of false positives, Bonferroni correction was also applied: α = 0.0056; *P < α

To validate the REF genes and test the common practice of using a traditional REF gene without proper evaluation in the given experimental condition, the expression each GOI was also normalized with each candidate REF gene. The reasoning is that bona fide REF genes and stable pairs should produce similar normalization results. In the quantitative comparison, results obtained with stable pairs (Fig. 3) were also observed when the four most stable REF genes were individually used as reference (Supplementary material), so that no GOI presented significant differential expression. However, when normalized with the least stable gene (βTub56D; Fig. 3), all three GOI presented negative NER values. If we had chosen this traditional REF gene without prior investigation, we would have concluded that nub, rn, and Egfr were significantly repressed in 1L sample relative to 1R. Conversely, rn presented positive NER value when normalized with the second least stable gene (Act5C; Fig. 3), which would have lead us to the opposite conclusion that rn was significantly induced in 1L sample. To reliably estimate an expression difference as low as 0.5 log2 units, the intergroup variation of the REF gene should, in theory, be lower than half this target difference (Andersen et al. 2004; Huggett et al. 2005). In the quantitative comparison, the four most stable REF genes have met this criterion, but neither Act5C nor βTub56D presented intergroup variation smaller than 0.25 log2 units (Fig. 2). Hence, the conflicting expression differences estimated with these two traditional REF genes should in fact represent a systematic bias introduced by their intergroup variation. Even in the qualitative comparison, results did change when rn gene was normalized with the least stable REF gene. By using Gapdh1 as reference, the NER of rn was significantly different from that estimated with each stable pair (Fig. 3; note that 95% confidence intervals do not overlap). As for the remaining REF genes, normalization results did not differ from those using most stable pairs (Fig. 3 and Supplementary material).

Overall we have identified and validated stable pairs of REF genes that should lead to accurate estimation of expression differences in genes involved in the developmental patterning and growth of D. melanogaster wing, which is an important experimental model for studies on organ morphogenesis and morphological evolution in animals (Lecuit and Le Goff 2007). More specifically, our results indicate that a normalization factor using RpL32 and Tbp should produce reliable results despite the type of morphological variation in the wing; given that this pair was validated in the two-class comparison. For those interested in selecting and validating REF genes for other morphological comparisons in Drosophila or other insect species, we recommend including RpL32 and Tbp (or the respective orthologous) in the validation set. Our findings also stress the importance of performing REF gene validation in every experimental condition, whenever possible. Even in a single experimental comparison, false-positive results in opposite directions were observed and could have led to different interpretations and misleading conclusions. To improve the confidence on qPCR results and benefit the most from the high sensitivity of qPCR technique (see also Huggett et al. 2005; Andersen et al. 2004; Guenin et al. 2009), REF gene validation should be regarded as a mandatory step in qPCR experiments.