Background

RNA interference (RNAi) is a naturally occurring phenomenon by which RNA duplexes known as short interfering RNA (siRNA) can reduce gene expression through enzymatic cleavage of a target mRNA mediated by the RNA-induced silencing complex (RISC). The ability of synthetic siRNA to inhibit targeted genes with near specificity makes it an extremely powerful tool for functional genomics that has drawn considerable interest recently [1, 2]. RNAi is commonly achieved by introducing chemically synthesized siRNA 19–22 mers into cells by transfection. However, many cells and cell lines are either refractory to or adversely affected by transfection, and the transient nature of this methodology renders it unsuitable for the generation of long-term cell lines of the desirable phenotype. Two alternatives to synthetic siRNA are DNA-vector mediated RNAi production [35], and most recently viral-mediated siRNA synthesis [610]. For the latter technologies, sense and antisense strands can be expressed from different promoters [11]. Alternatively, short hairpin (sh) RNAs, expressed from a single promoter, are processed into siRNAs by Dicer or a homologous double strand RNase [12].

One caveat of siRNA design is that not all 19–22 base RNA duplexes will cleave their target with efficacy, and much effort has gone towards identifying a set of rules for selecting an effective siRNA target site within a gene. Recent findings [13, 14] offered the first clue towards the development of guidelines for selecting an siRNA target site. These studies showed that the RISC complex is asymmetric and favors the strand of the siRNA duplex with the least thermodynamically stable 5' terminus. Subsequently, Reynolds et al. designed an algorithm based on statistical data showing patterns of efficacy for siRNA oligonucleotides containing specific residues at defined positions within the 19-mer [15]. A limitation of their study is that a small number of genes were tested. Several additional algorithms for designing effective siRNAs have been published since those initial reports with surprisingly disparate results, making the determination of which residues are generally favorable for siRNA efficacy a point of controversy [1620]. Additionally, whether any of the algorithms developed for synthetic siRNA oligonucleotides apply to the design of shRNA expressed stably from a vector has not been well explored.

In the present report, we construct and analyze a set of 27 shRNAs for 11 different human genes. To our knowledge this is the largest individual set of data published for shRNA 19-mers. We describe a method for simultaneously preparing wild type and control mutant shRNA vectors that is time and cost efficient, and show that sequencing of shRNA plasmids can be quite problematic due to the intrinsic secondary structure of the hairpin. We examine several different strategies for overcoming this problem including the use of modified BigDye chemistries and the addition of agents known to relax DNA structure. The knockdown efficacy for each of the 27 shRNAs was evaluated against six published algorithms for siRNA oligonucleotide design by linear regression and ROC curve analyses. We describe a modification of three of the algorithms that provides fair-to-good prediction of shRNA efficacy, and confirm the significance of the modified algorithms using a pooled set of shRNAs from previous publications. These findings should be of general applicability in the design and construction of shRNA vectors.

Results and discussion

Design and preparation of shRNA plasmids

To address the question of how shRNA sequence correlates with knockdown efficacy, 27 shRNA vectors from 11 different genes were designed and constructed (Table 1). Target sequences were selected in the coding region of each gene and were designed to broadly conform to the seminal studies of sequence features for siRNA oligomer efficacy [1315]. Accordingly, sequences are low in runs and have a G/C ratio of about 50%. The shRNAs were designed to target sites that are devoid of single nucleotide polymorphisms, and correspond to all splice variants amplified by our real time PCR primer sets.

Table 1 ShRNA vectors prepared for this study

Since siRNAs can have off-target effects, it is important for functional assays to make a specific mutant with one or more base mismatch within the target recognition site as a control [21]. To conserve time and cost, we have developed a method of making wild-type and mutant shRNA vectors simultaneously (detailed in Methods and Figure 1). Gene knockdown results for four wild-type/mutant shRNA pairs are shown in Figure 2. These results demonstrate the utility of this method in providing a point mutant shRNA vector that can serve as a loss-of-function control for gene knockdown by wild type shRNAs. Though detailed protocols have been published for construction of shRNA vectors [22], this is the first protocol for producing wild-type and mutant vectors simultaneously and should facilitate the implementation of highly controlled system for shRNA.

Figure 1
figure 1

Design for producing wild-type and mutant shRNA vectors simultaneously. A forward strand of the wild-type hairpin (blue) is synthesized together with a reverse strand containing a one bp mutation within both the sense and antisense copy of the target sequence (shown in red). The double stranded hybrid is ligated into the retroviral vector 5' of an H1 promoter and transformed into competent bacteria. Since replication is semi-conservative, the daughter bacteria will be of two different populations that carry either a double-stranded wild-type or a double-stranded mutant vector and can be isolated by preparing and sequencing individual colonies.

Figure 2
figure 2

Gene expression analysis for wild-type and mutant shRNA vectors prepared simultaneously using wild-type/mutant double stranded hybrids. (A) Sequences of the target sites for four wild-type and mutant shRNA vectors that were prepared simultaneously as detailed in Figure 1. (B) Realtime analysis of shRNA knockdown, and loss of knockdown by mutant shRNA vectors from (A). Values are standardized to 100% in non-transduced THP1 cells. The expression in THP1 cells transduced with an empty vector (EV) is shown as an additional control. Values represent average +SEM for at least three assays performed in duplicate.

Strategy for accurate sequencing through hairpin structures

Verifying the sequence of an shRNA hairpin is essential since mismatch of even one nucleotide within the target sequence can ablate knockdown (Figure 2 and [5, 23].) An issue that is frequently encountered in the preparation of shRNA vectors is that many are difficult to sequence due to the intrinsic secondary structure of the hairpin. One strategy recently proposed to overcome this issue involves engineering a restriction site within the loop/stem region of the hairpin to physically separate the inverted repeats by digestion, and then piecing together sequence using sense and antisense primers [24]. However, the ability to achieve sequencing of shRNA constructs without modifying stem/loop sequence would be of clear advantage. To address this possibility, we evaluated modified sequencing reactions for improvement in the read-through of the hairpin secondary structure in three shRNA hairpins. Modifications include adding agents known to relax DNA structure including DMSO, Betaine, PCRx Enhancer and ThermoFidelase I; and adding increasing amounts of dGTP BigDye terminator (dGTP) chemistry to the standard BigDye v1.1 (BD) chemistry which contains dITP rather than dGTP.

Sequencing results for each of the three DNA constructs are summarized in Table 2. Read-through of the hairpin structure was measured as the ratio of the peak height about 300 bases after the hairpin structure to the signal about 50 bases before the hairpin structure. A ratio of 1 indicates no loss in signal and 0 indicates complete loss of read-though. In the absence of any additive to BD chemistry, the hairpin caused a reduction in peak height ratio for our less tightly structured hairpin, pHSPG-shmutTLR4, to 0.4, and a complete loss in read through for the other two plasmids. This can be visualized as an abrupt stop in the sequence peak profile for pHSPG-shTLR4 (Figure 3A).

Table 2 Evaluation of sequencing results of three DNA hairpin constructs. Average ratio of peak height after to before the hairpin region was determined as a measure of how well the sequence read through the hairpin structure. The greater the peak height ratio, the greater the ability to sequence through the hairpin. A value of 1 indicates no loss in peak height, and a value of zero indicates a complete stop in sequence after the hairpin region. All values are the averages of at least triplicate sequencing reactions.
Figure 3
figure 3

DNA sequencing of pHSPG-shTLR4 using modified reaction conditions. DNA sequencing peaks are shown in a full scale view where base positions are indicated by the row of numbers in each panel and the Y axis is the signal intensity. Sequencing reaction conditions shown are BigDye v1.1 (BD) chemistry (A), 0.83 M Betaine + 1 ×PCRx Enhancer in BD chemistry (B), 10:1 BD:dGTP chemistries (C), 0.83 M Betaine + 1 ×PCRx Enhancer in 10:1 BD:dGTP chemistries (D), and 1 × ThermoFidelase I in 10:1 BD:dGTP chemistries (E). The drop in signal (step in peak height) at the hairpin is highlighted by an arrow in the 10:1 BD:dGTP chemistries panel.

Among the DNA relaxing agents, 5% DMSO, 0.83 M Betaine and 1 × PCRx Enhancer each improved the sequence read significantly for some constructs. However, the addition of 0.83 M Betaine plus 1 × PCRx Enhancer to BD chemistry was found to sequence most consistently, with peak height ratios of 0.5–0.9 (Table 2 and Figure 3B). The addition of 10:1 BD:dGTP chemistries alone also improved read through somewhat, with peak height ratios of 0.5–0.6 (Table 2 and Figure 3C). The sub-optimal peak height ratio for 10:1 BD:dGTP can be attributed to a visible step in the sequence peak profile after the secondary structure region where the signal is reduced (Figure 3C, arrow). Increasing the dGTP chemistry content to 5:1 and 3:1 BD:dGTP or using straight dGTP chemistry increased the peak height ratio and reduced the step somewhat (0.6 to 0.8 ratio). However, the mixed incorporation of dITP and dGTP resulted in worse peak broadening as the amount of dGTP used increased [see Additional file 1], and dGTP only chemistry caused severe sequence compressions (data not shown). The best overall results were observed by combining Betaine plus PCRx and 10:1 BD:dGTP mixed chemistries together. This combination reduced the step with less peak broadening and increased peak height ratios to 0.9–1.0 (Table 2 and Figure 3D). ThermoFidelase I, a DNA destabililizing enzyme that is frequently used to improve sequencing of genomic DNA [25, 26], did not improve sequencing of any of the three hairpins in straight BD chemistry (data not shown), and actually reduced the peak height ratio significantly in 10:1 BD:dGTP chemistries for all three shRNA constructs, causing the reappearance of a stop at the hairpin structure (Table 2 and Figure 3E).

In summary, the combination of 10:1 BD:GTP chemistries, 0.83 M Betaine, and 1 × PCRx Enhancer provided optimal sequencing, and mixed BD:dGTP chemistries, Betaine, PCRx Enhancer, and DMSO each had some positive effects on their own. ThermoFidelase I, however, probably should be avoided for shRNA vectors with difficult intrinsic secondary structure.

Correlation between shRNA knockdown efficiency and published algorithms for siRNA design

To determine whether the efficacy of knockdown by shRNA vectors correlates with published rules for the design of effective siRNA oligonucleotides, shRNAs were evaluated for their ability to knockdown gene expression. The shRNAs were transduced stably into either THP1 or Jurkat human cell lines as detailed in Table 3, first two Columns. The average knockdown was determined from RNA collected on three or more different days and is listed for each shRNA (Column 3). Knockdown was shown to be reproducible for cell lines that were independently transduced and sorted, suggesting that knockdown is a function of the shRNA target sequence rather than features of the viral transduction [see Additional file 2]. More than one third of the shRNA vectors constructed were unable to suppress transcription (<10% in Column 3), despite comparable growth rates and long term expression of the GFP marker at high levels in these cell lines. Furthermore, great variations in knockdown efficacy for several shRNAs made against many of the same genes (i.e., CLR16.2, CLR19.3 and TLR4) argue against any simple biological reasons for differences in efficacy for these genes. Many of the ineffective shRNAs have negative 5' ΔΔG values and high Reynolds scoring, each which have been hypothesized to correlate with siRNA knockdown efficacy (Table 3, Columns 4 and 5) [1315]. Conversely, among the shRNAs that were able to confer gene knockdown, several had either positive 5'ΔΔG values or low Reynolds scores. These findings indicate that 5'ΔΔG and Reynolds scoring algorithm for siRNA may not provide positive correlative criteria for shRNA design.

Table 3 Comparison of knockdown efficacy and siRNA design algorithm. Average knockdown was measured by real-time PCR of triplicate samples. All averages are accurate within 10% SEM. Asterisks indicate high Takasaki et al. algorithm scores that have poor corresponding knockdown efficacy.

To determine whether other published algorithms for siRNA oligonucleotide design can be applied to shRNA vectors, each of the shRNA target sites was evaluated by four additional algorithms, and scores were plotted against the percent knockdown for each shRNA (Table 3, Columns 6–9 and Fig. 4). For each algorithm plot a best fit line was drawn and the R2 value calculated as an indication of whether the variance in knockdown efficacy can be explained by the algorithm scoring. Results confirm a poor association between shRNA efficacy and either 5' ΔΔG (free energy differential) considerations [13] or the Reynolds et al. algorithm [15], and also demonstrate a poor association with the Hsieh et al. algorithm [19], with each in fact showing a weak reverse correlation with the data. The algorithms of Amarguizoui et al. [20], Ui-Tei et al. [18], and Takasaki et al. [17], correlate directly with shRNA efficacy. However, none of the algorithm scores explain a significant percentage of the variance in knockdown efficacy. Among the algorithms tested, the Takasaki et al. scoring system shows the highest association, with an R2 value of 0.0251.

Figure 4
figure 4

Correlation between shRNA knockdown efficacy and scoring for six published algorithms for siRNA. Algorithm scores for each shRNA target site from Table 2 are plotted against observed knockdown efficiency for the Hsieh et al. (A), 5' ΔΔG (free energy differential) (B), Reynolds et al. (C), Amarzguioui et al. (D), Ui-Tei et al. (E) and Takasaki et al. (F) algorithms. The 5' ΔΔG score is plotted on a reverse horizontal axis since knockdown efficacy is predicted to correlate with negative 5' ΔΔG value. A trend line is shown along with the R2 value for each plot. Knockdown of less than 10% is plotted as zero.

Because these results suggest that a linear relationship does not strongly apply to shRNA knockdown for any of the six algorithms, we evaluated each of the algorithms by ROC curve analysis to determine whether any algorithm is superior to the others at identifying effective shRNAs. The ROC curve is a plot of sensitivity (the true positive fraction, TPF) versus 1 minus the specificity (the false positive fraction, FPF) that is generated by varying the decision threshold between the minimum and maximum algorithm score. The diagonal of the ROC plot represents the ROC curve for an algorithm that is no better at discrimination than random selection. Algorithms that are poor discriminators have ROC curves that track along the diagonal and have an area under the ROC curve (AUC) that is not significantly different from the AUC of the diagonal (0.5). Algorithms that are good discriminators have ROC curves with strong convex deviation from the diagonal and AUCs that approach 1 and are significantly different from the AUC of the diagonal.

The Hsieh et al. algorithm had a concave ROC curve (Fig. 5A) indicating unacceptable sensitivity and specificy in discriminating effective from ineffective shRNAs. The ROC curves for all other algorithms (Figs. 5B–F) tracked near the diagonal of the ROC plot and had AUCs that were not significantly different from the AUC of the diagonal (Figs 5B–F). Thus, none of the algorithms showed a statistically significant ability to discriminate between effective and ineffective shRNAs.

Figure 5
figure 5

ROC curve analysis of siRNA scoring algorithms. The true positive fraction was plotted against the false positive fraction as the decision threshold varied from minimum to maximum scores (see Materials and Methods for details) for the Hsieh et al. (A), 5' ΔΔG (free energy differential) (B), Reynolds et al. (C), Amarzguioui et al. (D), Ui-Tei et al. (E) and Takasaki et al. (F) algorithms using an efficacy threshold of 50% knockdown. ROC curves for modified Amarzguioui et al. (G), Ui-Tei et al. (H) and Takasaki et al. (I) algorithms are also shown. A set of 38 published shRNAs (Table 5) was analyzed using the modified Amarzguioui et al. (J), Ui-Tei et al. (K) and Takasaki et al. (L) algorithms to confirm the utility of the modified algorithms. The area under the curve (AUC) and the probability (p) that the AUC is significantly different from 0.5, the area under diagonal, is indicated for each ROC curve.

The Takasaki et al. algorithm (Fig. 5F) showed the most promise as a discriminator of effective from ineffective shRNAs. However, this algorithm suffered from a relatively high false positive fraction for decision thresholds near the maximum score as indicated by the weak, erratic deviation from the diagonal near the origin of the ROC curve (Fig. 5F). This indicated that the algorithm assigned a high score to a number of ineffective shRNAs. Inspection of the data revealed that two of the three high-scoring ineffective shRNAs targeted genes whose expression was successfully knocked-down by other shRNAs (Table 3, asterisks). Thus it is unlikely that the inefficacy of the shRNAs is a consequence of selective pressure against the stable suppression of gene expression. It is more likely that the Takasaki et al. algorithm does not account for a critical feature of effective shRNAs.

Application of an algorithm modification based on the stability of the 6 central bases of each shRNA

Inspection of the physical properties of the high scoring ineffective shRNAs revealed that the average stability of the duplex formed by the 6 central bases of the shRNAs (bases 6–11 of the sense strand hybridized to bases 9–14 of the antisense strand) was greater than the average stability of high scoring effective shRNAs (ΔG = -13.1 ± 0.1 versus -11.1 ± 1 kcal/mol respectively). Based on this observation, the Takasaki et al. algorithm was modified such that shRNAs with a central duplex ΔG equal to or less than -12.9 kcal/mol were assigned a minimum score (Table 4). This modification assigned minimum scores to five shRNAs, four which were ineffective, thus increasing the specificity of the algorithm without a significant loss in sensitivity. A minimum score assigned to one effective shRNA (71% knockdown), indicates that other properties in addition to central duplex stability influence efficacy. Nevertheless, the addition of this modification eliminated the weak erratic deviation of the ROC curve from the diagonal for high decision thresholds and increased the AUC to 0.79 (Fig. 5I). Similar modification of the Amarzguioui et al. and Ui-Tei et al. algorithms also raised the AUCs of their ROC curves (Figs. 5G and 5H). With this modification, the AUCs of the ROC curves for all three modified algorithms were significantly different from the AUC of the diagonal (Figs. 5G–I), indicating statistically significant predictive capability. Differences between AUCs of the ROC curves for the modified algorithms were not significant, so on statistical grounds all three of the modified algorithms were of equal utility. The 5' ΔΔG, Reynolds et al, and the Hsieh et al. algorithms were not improved to a statistically significant predictive capability by applying the central duplex ΔG modification (data not shown).

Table 4 Modification of algorithm scores based upon shRNA central duplex ΔG. The percent knockdown data represents the average knockdown as shown in Table 3. shRNAs with a central ΔG equal to or less than -12.9 kcal/mol are underlined. These were assigned minimum scores according to the algorithm modification. Minimum scores are: Amarzguioui et al. algorithm, -4; Ui-Tei et al. algorithm, -2; Takasaki et al. algorithm, -13.26. The three shRNAs that scored high in the original Takasaki et al. algorithm but have poor knockdown efficacy are marked with asterisks. The modification minimized scoring for these shRNAs, thus increasing specificity of the algorithm.

To address the possibility that the improvement achieved by the modification of the Amarzguioui et al, Ui-Tei et al, and Takasaki et al. algorithms is a consequence of overfitting our set of shRNAs, an independent set of 38 shRNAs pooled from previous publications ([18, 2733]; Table 5) were subjected to analysis. While none of the ROC curves for the three unmodified algorithms had an AUC significantly different from that of the diagonal (Amarzguioui et al., p = 0.174; Ui-Tei et al. p = 0.09; Takasaki et al., p = 0.26), all of the modified algorithms yielded ROC curves with AUCs significantly different from the AUC of the diagonal (p = 0.0001–0.009; Figs. 5J–L). On statistical grounds, all three of the modified algorithms were of equal utility as the AUCs of the ROC curves for the modified algorithms were all significantly different from the AUC of the diagonal, but not significantly different from each other. This analysis of an independent set of shRNAs suggests that the modification of the algorithms is of general validity.

Table 5 Previously published shRNA sequences analyzed in this study

Because minimizing the false positive rate is the primary concern in shRNA design, we recommend using the modified Ui-Tei et al. algorithm, which had the lowest high false positive fraction at decision thresholds near the maximum score as indicated by the strong deviation from the diagonal near the origin of the ROC curve (Figs. 5H and 5K). Using a decision threshold of 3 limits selection of shRNAs to a region of the ROC curve where the sensitivity was acceptable (0.28–.33), while the specificity was very good (1.0). By setting this decision threshold, the false positive fraction was minimized, while 28 – 33% of the effective shRNAs were identified from our shRNAs and the published set of shRNAs respectively. Should the sensitivity need to be increased, we recommend using a decision threshold of 2. This threshold had a sensitivity of 0.54 – 0.55 and a specificity of 0.88 – 0.9. If the decision threshold was further relaxed to 0, the sensitivity increased to 0.86 – 0.9, but the specificity fell to 0.55 – 0.54. We recommend using the highest of these decision thresholds possible.

Though statistically small, this study has the advantage to our knowledge of being the largest published set of 19-mer based shRNAs to date. In addition, unlike other shRNA studies that are necessarily skewed toward effective shRNAs, our study includes both functional and non-functional shRNAs. We have shown that modified Ui-Tei et al., Amarzguioui et al. and Takasaki et al. algorithms are fair to good predictive tools that distinguish effective from ineffective shRNAs. However, significant shortcomings still exist in the modified algorithms. A direct assessment of the algorithm modifications using shRNAs designed according to each original and modified algorithm would lend support to these findings. These algorithms are meant to reduce the number of false positive shRNAs selected, not completely eliminate them altogether, and thus this would require a large number of shRNAs to obtain a statistically significant difference in false positive rate. The availability of larger shRNA data sets should support the development of algorithms with improved sensitivity and specificity. Additionally, several software applications for siRNA oligonucleotide design that were not considered in this study may be of use in the design of shRNAs [16, 3436]. Criteria for designing functional siRNA oligonucleotides remain controversial as evidenced by the large number of studies still being devised for siRNA design, and since we did not test these sequences as siRNAs it cannot be established whether the modification of these algorithms also applies in the context of siRNA oligonucleotides. shRNA has an added layer of complexity over siRNA oligonucleotides since the hairpin needs to be processed within the cell before entering the RISC complex. Moreover, selective pressure against the stable expression of shRNAs that are deleterious to cell growth would be expected to lend an additional constraint to the stable expression of certain shRNAs. Despite these complexities, our findings begin to bring insight into the ability to apply siRNA algorithms for design of functional shRNAs.

Conclusion

We have provided several important strategies that should facilitate the generation of effective shRNA vectors for gene knockdown in mammalian cells. The ability to produce wild-type and mutant shRNA vectors simultaneously using mixed oligonucleotide pairs provides an efficient method to generate a specific control vector with little added time or cost. This strategy should be particularly useful in generating specific controls in high throughput applications. Difficulty in sequencing through the high intrinsic secondary structure of some hairpin vectors also has presented a major constraint in the construction of shRNA vectors, and the knowledge that sequencing issues can be resolved by modifying BigDye chemistries and adding Betaine and other DNA relaxing agents should be valuable regardless of the method of shRNA design and construction. Using data from 27 shRNAs that we have constructed we have performed an analysis of the ability of published algorithms for siRNA oligonucleotide target selection to predict knockdown efficacy. Our results show that shRNA efficacy cannot strictly be explained by any of the six algorithms tested. We provide a modification, however, that greatly improves the predictability of the Ui-Tei et al., Amarzguioui et al. and Takasaki et al algorithms. Results were confirmed using data from 38 previously published shRNAs. These findings should be of significant applicability in the design and preparation of functional shRNAs.

Methods

Cell lines and cell culture

THP1 monocytic cell and Jurkat T cell lines were cultured in RPMI, 10% FCS. Cultures were maintained between 2 and 8 × 105 cells/ml and standardized to equivalent densities before assessing knockdown efficiencies.

Plasmid design and construction

Retroviral vectors for shRNA expression have a pHSPG backbone [37] with an inserted H1 RNA promoter driving shRNA expression. The pHSPG vector also has a green fluorescent protein (GFP) gene driven by a phosphoglycerate kinase promoter as a marker. The H1 promoter and shRNA expression cassette were inserted into the pHSPG vector by one of two methods. In the first method, a double stranded oligomer is synthesized with Bgl II and Xho I half sites on the ends. This is prepared as either a matched pair or a wild-type/mutant hybrid (Fig. 1). To prepare wild-type and mutant shRNA vectors simultaneously, a forward strand oligomer is synthesized that contains the wild-type hairpin. In parallel, a mutant reverse strand with a one bp mismatch within the target sequence is also synthesized. Despite the mismatches between the forward wild-type and reverse mutant strands, annealing can still occur efficiently under optimized conditions. The ds oligonucleoltide is annealed by combining 1000 pmol of each oligomer strand in 50 μl of annealing buffer (100 mM potassium acetate, 30 mM HEPES-KOH, pH 7.4, 2 mM Mg-acetate). The mixture is boiled for five minutes and then cooled slowly to 4°C. The annealed double stranded oligomer is ligated into Bgl II and Xho I half sites 3' of the H1 promoter that is inserted into the 3' long terminal repeat (LTR) of pHSPG generating a self-inactivating LTR. The double stranded hybrid is ligated into the vector 5' of a pol III promoter and is transformed into competent bacteria. Since replication is semi-conservative, the daughter bacteria will be of two different populations that carry either a double-stranded wild-type or a double-stranded mutant vector. Bacteria carrying either wild-type or mutant vectors can then be isolated from individual colonies and sequenced. Oligos used for this method had the sequence: GATCCCC-N19-TTCAAGAGA-rN19-TTTTTGGAAA; and TCGATTTCCAAAAA-N19-TCTCTTGAA-rN19-GGG (where N19 is the sense of the target sequence and rN19 is the antisense). We have routinely used DH5α to prepare wild-type and mutant shRNA vectors with approximately equal yields of each type of vector; however, a repair-deficient E. coli mutant could theoretically improve the efficiency of simultaneous construction.

A second design involves PCR using a primer complementary to the 5' end of the H1 promoter together with an shRNA-specific long-primer whose 3' end is complementary to the 3' end of the H1 promoter. PCR is performed using Pfx polymerase with PCRx enhancer (this combination has proved essential for reducing the number of mutations introduced within the amplified region). Oligos used for this method were: GCGGCCGCGATATCGAACGCTGACGTCATCAACCC (universal oligo); and TGCTCTAGAAAAA-N19-TCTCTTGAA-rN19-GGGAAAGAGTGGTCTCATACAGAACTTATAAGATTCC, where N19 is the sense of the target sequence and rN19 is the antisense. Sequences complimentary to the H1 promoter are underlined. PCR fragments were digested with EcoRV and XbaI and ligated into the 3' LTR of pHSPG. All constructs were verified by sequencing.

Sequencing of shRNA vectors

DNA sequencing was done at the UNC-CH Genome Analysis Facility. Sequencing reactions were 12.5 uL total volume containing 1 × BigDye Terminator v1.1 Cycle Sequencing Ready Reaction Mix (Applied Biosystems), 0.26 ug of DNA and 3.75 pmole of primer. LTRa primer (sequence CGCGAACAGAAGCGAGAA) that binds the HSPG vector approximately 120 bp downstream from the inserted hairpin was used in all sequencing reactions. The shRNA vectors used to assess sequencing efficacy were constructed as stem loop hairpins as described above and contain the following target sequences: pHSPG-shTLR4, AGGTGATTGTTGTGGTGTC; pHSPG-shmutTLR4, AGGTGATTCTTGTGGTGTC; pHSPG-shmCNN3, AGGAATGAGCGTGTATGGG; and pHSPG-shTLR2, GTATGAACTGGACTTCTCC. Modified sequencing reactions substituted part or all of the BigDye v1.1 chemistry with ABI Prism dGTP BigDye Terminator Ready Reaction Mix (Applied Biosystems). Ratios of 20:1, 10:1, 5:1 and 3:1 BD:dGTP chemistries and straight dGTP chemistry were used. Additives evaluated in sequencing reactions were: 0.83 M Betaine (Sigma part # B-0300), 5% DMSO (Sigma part # D-2650), 1 × PCRx Enhancer (in Invitrogen kit part # 11495-017), 1 × (1 uL Thermofidelase/20 uL sequencing reaction) ThermoFidelase I (Fidelity Systems) and 10 × primer concentration. The thermal cycler protocol used for cycle sequencing was: 95'C for 3 minutes (or 5 minutes when using ThermoFidelase I) followed by 25 cycles of 98'C for 40 seconds (1st cycle) or 10 seconds (subsequent cycles), 50'C for 5 seconds and 60'C for 4 minutes. Sequencing reactions were purified using Centri-Sep 96 well spin plates (Princeton Separations), and the purified reaction products were run on a 3730 DNA Analyzer (Applied Biosystems) with a 50 cm array using the LongRead protocol. As a measure of read through efficacy peak height ratios were determined about 300 bases after and 50 bases before the hairpin.

Virus preparation, transduction and cell sorting

To prepare virus, pHSPG-shRNA plasmids were co-transfected into 293T cells with gag/pol and VSVg vectors by the calcium phosphate method. Viral supernatants were collected 24 and 48 hours following transfection and used to transduce THP1 or Jurkat cells by spinoculation. THP1 cells were transduced with virus on two consecutive days to increase transduction levels. Following approximately one week of culture, stably transduced cells were isolated by sorting for GFP. FACS analysis studies suggest that GFP expression is 95% stable for at least two months following sorting (not shown).

RNA expression analyses

Total RNA was isolated with an RNeasy isolation kit (Qiagen) using the recommended protocol. To increase specificity, cDNA was reverse transcribed using oligo dT primer and Superscript III RT (GibcoBRL). Real-time PCR experiments were performed using an AB Prism 7700 instrument (Applied Biosystems) with 57°C annealing temperature. For 18s, CLR19.6/NALP11, CLR19.3/NALP12, MYD88, TLR2, TLR4, and TRAF6, real-time PCR was performed using Absolute QPCR Mix (ABgene) mix and either TET or FAM labeled probes. The following are the sequences of the oligonucleotides used, listed as [forward; reverse; probe]: 18s-[CGGCTACCACATCCAAGG; GCTGCTGGCACCAGACTT; Tet-CAAATTACCCACTCCCGACCCG-Tamra]; CLR19.6/NALP11-[TCAATGATGCGTAAGGAAAGA; ACTTTCCCATTGCAGCATGA; Fam-CTTTGCATGCCTCCTGATTGCGGT-Tamra]; CLR19.3/NALP12-[AGAGGACCTGGTGAGGGATAC; CTTCCAGAAGGCATGTTGAC; Fam-CCCGTCCTCACTTGGGAACCA-Tamra]; MYD88-[CTCTGTAGGCCGACTGC; CTGCTGCTGCTTCAAGATA; Fam-TGGCAATCCTCCTCAATGCTGGGTC-Tamra]; TLR2-[GGTCATCATCAGCCTCTCCA; GAGCTGCCCTTGCAGATAC; Fam-CCTCCAATCAGGCTTCTCTGTCTTGTGACC-Tamra]; TLR4: [AGAGCCTAAGCCACCTCT; CTAGAGATGCTAGATTTGTCTCCA; Fam-AGCCACCAGCTTCTGTAAACTTGATAGTCCAGA-Tamra]; TRAF6: [CCATGCGGCCATAGGTT; TTTCCAGCAGTATTTCATTGTCA; Fam-TGGACATTTGTGACCTGCATCCCTTATTGAT-Tamra]. For ASC/PYCARD, CLR16.2, MAL/TIRAP, TRAM/TICAM2, and TRIF/ICAM1, realtime PCR was performed using ABsolute SYBR green mix (ABgene) and the following primers, listed as [forward; reverse]: ASC/PYCARD1-[AACCCAAGCAAGATGCGGAAG; TTAGGGCCTGGAGGAGCAAG]; CLR16.2-[TCAACACAGCCCTCACTGCTCTCTATCTC; AGCCACCCCAATGGCATTTCCTCTTAAGTC]; MAL/TIRAP-[GGACTCATCTCCTGCCTAAC; CATGGTGAGGCCTGCAATCT]; TRAM/TICAM2-[GGCACAGTGTGGATACAAGT; ACATCTCTTCCACGCTCTGA]; TRIF/TICAM1-[CAGGAGCCTGAGGAGATGAG; GGGTAGTTGGTGCTGGTTTC]. Primers were designed to span exon/intron junctions where possible. All RNA expression analyses were done at least in triplicate for RNA isolated on different days and knockdowns were verified with at least one control hairpin. Values represent average observed knockdown for RNA from different days of cell culture and were standardized to 18s rRNA expression.

Implimentation of algorithms

The free energy (ΔG) of RNA duplex formation for the 5 bases at the 5' end of the sense and antisense strands was determined using the thermodynamic parameters and expanded nearest-neighbor model of Xia et al. [38]. The 5' ΔΔG (differential free energy) was calculated by subtracting the ΔG of the antisense strand from that of the sense strand. Determination of scores for the Reynolds et al., Amarzgiuoui et al., and Takasaki et al. algorithms was as described [15, 17, 20]. The Hsieh et al. score represents the interpretation of the Hsieh et al. design criteria as published by Saetrom and Snove [16, 19]. For the Ui-Tei algorithm sequences with a C or G on the 5' end scored 1 point, whereas those with an A or T scored -1 point. Sequences with an A or T on the 3' end scored 1 point, whereas those with a C or G scored -1 point. Sequences with 5 or more A or T bases in the seven 3' bases scored 2 points, whereas those with 4 A or T bases scored 1 point. Sequences can be classified by score as follows: 4 – class Ia, 3 – class Ib, 2, 1 or 0 – class II and -1 or -2 – class III. All knockdowns of <10% are graphed as 0.

Modifications of the Amarzgiuoui et al., Ui-Tei et al., and Takasaki et al. algorithms were applied as follows. The free energy of RNA duplex formation for 6 central bases of each shRNA (bases 6–11 of the sense strand hybridized to bases 9–14 of the antisense strand) was calculated. shRNAs with central duplex ΔGs equal to or less than -12.9 kcal/mol were assigned a minimum score (-4 for the Amarzgiuoui et al. algorithm, -2 for the Ui-Tei et al. algorithm and -13.26 for the Takasaki et al. algorithm). The scores for shRNAs with central duplex ΔGs greater than -12.9 kcal/mol were left unchanged. The cutoff value of -12.9 kcal/mol was selected empirically based upon the range of central duplex ΔGs for all shRNAs (see Table 4).

ROC curve analysis

ROC curves were constructed as described [39]. ROC analysis requires that each shRNA is classified as either effective or ineffective. For our analyses, a shRNA was classified as effective if it reduced mRNA expression by 50% or more. A ROC curve was generated for each algorithm as follows. The decision threshold was set to one unit below the lowest shRNA score. By definition shRNAs with scores greater than or equal to the decision threshold were predicted to be effective, while those with scores less than the decision threshold were predicted to be ineffective. Then each shRNA was classified as a true positive (effective predicted to be effective), a false negative (effective predicted to be ineffective), a true negative (ineffective predicted to be ineffective) or a false positive (ineffective predicted to be effective). The true positive fraction (TPF) for the decision threshold was calculated as the number of true positives divided by the sum of the true positives and false negatives. The false positive fraction (FPF) was calculated as the number of false positives divided by the sum of the false positives and true negatives. The decision threshold was increased by one unit and the TPF and FPF calculated again. This process was repeated until the decision threshold was one unit greater than the highest scoring shRNA. ROC curves were constructed by plotting TPF versus the FPF for all decision thresholds. The area under the ROC curve was estimated by integration using the trapezoid rule.