Suppression subtractive hybridization method for the identification of a new strain of murine hepatitis virus from xenografted SCID mice

During attempts to clone retroviral determinants associated with a mouse model of Langerhans cell histiocytosis (LCH), suppression subtractive hybridization (SSH) was used to identify unique viruses in the liver of severe combined immunodeficiency (SCID) mice transplanted with LCH tissues. A partial genomic sequence of a murine coronavirus was identified, and the whole genome (31428 bp) of the coronavirus was subsequently sequenced using PCR cloning techniques. Nucleotide sequence comparisons revealed that the genome sequence of the new virus was 91-93 % identical to those of known murine hepatitis viruses (MHVs). The predicted open reading frame from the nucleotide sequence encoded all known proteins of MHVs. Analysis at the protein level showed that the virus was closely related to the highly virulent MHV-JHM strain. The virus strain was named MHV-MI. No type D retroviruses were found. Degenerate PCR targeting of type D retrovirus and 5′-RACE targeting of other types of retroviruses confirmed the absence of any retroviral association with the LCH xenografted SCID mice.


Introduction
Subtraction hybridization enables researchers to compare two populations of mRNA and to obtain clones of genes that are only expressed in one population. The basic principle behind the method is simple: both mRNA populations are converted to cDNA and then hybridized to each other; the hybridized sequences are then removed and the remaining unhybridized cDNAs represent genes that are only expressed in one population. Several different methods of subtraction hybridization have been successfully applied [1][2][3][4] but each had limitations in identifying rare transcripts. Suppression subtractive hybridization (SSH) [5] overcomes this by using an additional suppression PCR step that provides 10-100 fold enrichment of differentially expressed mRNAs, irrespective of their relative abundance. SSH has been successfully used to compare differences in gene expression between two transcriptomes [6][7][8][9][10] but, to our knowledge, the use of SSH for identification of an unknown virus has not been previously reported.
Langerhans cell histiocytosis (LCH) is a rare human disease of unknown etiology. The disease is characterized by the accumulation of clonally derived Langerhans cells [11,12] and inflammatory cells including T cells, macrophages, eosinophils, neutrophils, giant cells, and plasma cells [13,14]. The clinical representation of LCH can involve bone, skin, liver, spleen, lymph nodes, and/or bone marrow [15]. To overcome issues relating to the availability of LCH tissue, the establishment of a mouse model of LCH was attempted by xenografting human LCH tissue into severe combined immunodeficiency (SCID) mice [16]. Following engraftment, several mice developed pre-T cell lymphomas, from which a cell line was developed, called ThyE1M6. Electron microscopy of this cell line appeared to show a budding retrovirus with an appearance similar to that of type-D-like retroviral particles [16]. When ThyE1M6 cells were injected into SCID mice, a lethal syndrome developed within 2 weeks in which mice developed generalized granulomas involving inflammatory cells in multiple organs. A virus was suspected, but antiserum detection tests for known viruses were not possible, as SCID mice do not have B cells and therefore cannot make antibodies. Histopathological studies revealed that the liver was the most-affected organ. Attempts to isolate virus from liver were unsuccessful, and molecular techniques were used in this study to identify the putative virus.
A type-D retrovirus was initially suspected, based on EM morphology of the ThyE1M6 cell line and the biochemical readout of reverse transcriptase activity [16]. Subsequent to the original description of type D retroviral particles by Ristevski et al., other studies confirmed the existence of endogenous type D retroviral sequences in the mouse genome [17]. In this study, total RNA was extracted from infected and uninfected mouse liver, mRNA populations in total RNA were converted to cDNAs, and the cDNAs were subtracted from each other using the SSH method. The genomic sequence of a new strain of murine coronavirus was successfully identified and then sequenced. No type D retrovirus or any other type of retrovirus was found, even after using targeted degenerate PCR or 5 0 -RACE techniques. The new strain of coronavirus was named MHV-MI. To our knowledge, this is the first report of the isolation of an unknown viral sequence using the suppression subtractive hybridization method.

Xenografting of SCID mice
A thymus biopsy sample from a 13-year-old female patient diagnosed with LCH was teased into a single-cell suspension and continuously cultured in the presence of 25 ng/ml (each) of TNF and GM-CSF (Boehringer Mannheim Biochemica, Mannheim, Germany) for 35 days. Live cells were purified on a Ficoll gradient and injected subcutaneously into each of three SCID mice along with TNF and GM-CSF. Cytokine injections were repeated daily for 5 days. Three additional SCID mice were injected with cytokines only as the control. Liver tissues were harvested from the mice after 7 weeks for histopathologic examination and then stored at -80°C for future molecular biological studies. This work was approved by RCH (Royal Children Hospital) Ethics in Human Research and RCH Animal Experimentation Ethics Committee under the project title 'The Biology of Langerhans Cell Histiocytosis'.

RNA extraction
Total RNA was extracted from approximately 50 lg of frozen liver tissues or 1 9 10 6 ThyE1M6 cells using a miR-Neasy Kit (QIAGEN, Clifton Hill, Australia) and following the manufacturer's instruction manual. A minor modification was made in the final stage of the protocol where RNA was eluted with warm DEPC-treated water (50°C) and collected in an Eppendorf tube containing 1 ll of RNasin (Promega, Madison, USA) and1 ll of DTT (10 mM).

cDNA synthesis
A BD SMART RACE cDNA Amplification kit (BD Biosciences-Clontech, Palo Alto, USA) was used for the synthesis of first-strand cDNA from total RNA extracted from mouse liver tissues or ThyE1M6 cells. The protocol in the user manual was slightly modified to adjust the reagent volume. Two micrograms of total RNA (2-4 ll), 0.5 ll of 5 0 -RACE CDS primer (12 lM), 0.5 ll of SMART II Oligonucleotide (12 lM) and deionized water to 5 ll final volume were added together in a sterile PCR reaction tube (0.1 ml). Contents were mixed and briefly spun. The tube was incubated at 72°C in a hot-lid thermal cycler for 2 min and then cooled down on ice for 2 min. A mixture of 59 First-Strand Buffer (2 ll), 10 mM dNTP mix (1 ll), 20 mM DTT (1 ll), and Powerscript Reverse Transcriptase (1 ll) was added to the tube and mixed gently by tapping. The mixture was briefly spun down and incubated at 42°C for 4 h (enough time to transcribe full-length type D retroviral RNA). After the reaction was finished, the transcriptase activity was stopped by incubating the tube at 70°C for 7 min. The transcribed cDNA was then either stored at -20°C for future use or used immediately.

Preparation of forward subtracted cDNA by SSH
The preparation of forward subtracted cDNAs was done according to the BD PCR-Select cDNA Subtraction Kit (BD Biosciences-Clontech, Palo Alto, USA) protocol. cDNAs were synthesized from total RNA of both infected and uninfected mouse liver using the protocol described above. cDNA from infected mouse liver was called 'tester cDNA', and that from uninfected mouse liver was called 'driver cDNA'.
Both cDNAs were then purified using Chroma Spin -400 column chromatography (QIAGEN, Clifton Hill, Australia) and digested with Rsa I restriction enzyme (kit component) to completion. Confirmation of complete digestion was monitored by gel electrophoresis. Digested cDNAs were then purified using the NucleoSpin II protocol (BD Biosciences-Clontech, Palo Alto, USA).
Digested and purified tester cDNA was subdivided into two equal portions and ligated with two different adaptors (supplied in the cDNA subtraction kit). Ligation efficiency was confirmed by PCR using a housekeeping gene (G3PDH)-specific forward primer and the reverse adaptor primer (kit components). Following ligation, two hybridization steps were performed as per the protocol. Normalized and subtracted single-stranded target cDNA molecules anneal with each other, forming double-stranded hybrids with two different adaptor sequences at their 5 0 ends. The adaptor ends were then filled with DNA polymerase, and the subtracted molecules were specifically amplified by nested PCR using adaptor-specific primer pairs.

Preparation of subtracted cDNA libraries
Subtracted cDNAs (the nested PCR products described in the previous section) were treated with Taq DNA polymerase for 10 min at 72°C in a reaction mix of 20 ll containing 17 ll of nested PCR products, 2 ll of PCR buffer (910), 0.5 ll of MgCl 2 , 0.25 ll of dNTP mix, and 0.25 ll of Taq DNA polymerase (Invitrogen, California, USA). Two ll of treated cDNAs was then ligated with 25 ng of pGEM-T plasmid vector (Promega, Madison, USA) using T4 DNA ligase in an overnight reaction at 16°C. Ligase activity was then stopped by heating the reaction mix at 72°C for 10 min. One ll of inactive ligation mix was then used to transform 50 ll of maximum efficiency E. coli (DH5a) competent cells (Invitrogen, Carlsbad, USA) using a MicroPulser (BIO-RAD, Hercules, USA). A sufficient volume of transformed bacteria was plated onto LB agar plates containing 50 lg of ampicillin (Aspen, Deakin, Australia) per ml and an appropriate amount of Xgal (Promega, Madison, USA) and IPTG (Invitrogen, Carlsbad, USA), followed by overnight incubation at 37°C. Multiple plates were grown from each ligation mix. The pGEM-T plasmid contains a LacZ reporter at the multiple cloning site and allows blue/white screening. White colonies represented recombinant cells containing a subtracted cDNA insert, while blue colonies represented background cells containing plasmids only. Fully grown colonies (blue and white) from each plate were collected with a scrubber in 5 ml of LB ? 25 % glycerol and pooled in a 50-ml tube. Preparation of the subtracted cDNA library was then complete. The library was divided into several vials, flash frozen with liquid nitrogen, and stored at -80°C for future use.
Titering and sequencing forward subtracted cDNA clones One vial of frozen forward subtracted library was quickly defrosted on a warm (37°C) metal block rack, mixed by gentle vortexing, and then diluted 10 -3 -fold and 10 -6 -fold with LB broth. Ten ll of each diluted library was mixed with 100 ll of LB broth and inoculated onto LB plates containing 50 lg of ampicillin per ml and the appropriate amount of Xgal (Promega, Madison, USA) and IPTG (Invitrogen, Carisbad, USA). Plates were incubated overnight at 37°C. A satisfactorily grown plate was selected, and both white and blue cells were counted. The titer of the library was calculated from the white colony counts, and the efficiency of the library was calculated from the ratio of white cell counts and total cell counts.
A total of 148 white colonies were randomly selected from a titering plate and grown in 2 ml of LB-ampicillin medium overnight at 37°C with vigorous shaking. Recombinant plasmids were prepared from all colonies using the QIAprep Ò procedure (QIAGEN, Clifton Hill, Australia). Sequencing reactions were carried out using PRISM BigDye Terminator Mix (Applied Biosystems, Foster City, USA) and vector-specific primers. A 3730s Genetic Analyzer (Applied Biosystems, Foster City, USA) was used for reading sequences. The sequences were then identified based on homology searches using the web-based nucleotide sequence analysis program MegaBLAST [18] from the National Center for Biotechnology Information (NCBI).

Primer design and sequencing of the MHV-MI virus genome
GeneFisher2 software (Bielefeld University Bioinformatics Server -BiBiServ) was used to design 34 pairs of overlapping primers, each covering approximately 1000 bp of the MHV-A59 genomic sequence (accession no. AY700211). Primers were designed to enable a single thermal profile to be used for each PCR. Twenty-three pairs ( Table 1, yellow shaded) produced the expected PCR products with a thermal profile of 5 min pre-PCR treatment at 96°C; 35 cycles of 95°C for 15 s, 55°C for 30 s and 72°C for 1 min, followed by a 5-min extra elongation at Virus detection by suppression subtractive hybridization 2947  , and the required volume of water. PCR products were cloned and sequenced using a previously established protocol [19]. A 3730s Genetic Analyzer was used instead of a 310 Genetic Analyzer in the new method.
In the second phase, 11 pairs of nested primers located within the sequences of previously cloned PCR products and four pairs of combined primer (nested and MHV-A59 genome-specific) were designed (Table 1, gray shaded). Varying thermal profiles were used depending on the melting temperatures of primer pairs. All 16 PCR products were cloned and sequenced again.
In the fourth phase, four pairs of nested primers (Table 1, dark gray shaded) were designed to further sequence the lessreliable middle portion of longer sequences (longer than 1300 bp). PCR products were amplified using varying thermal profiles depending on the melting temperatures of primer pairs. Finally, a total of 44 primer pairs were used to amplify the full-length viral genome (Table 1).
All PCR products or RACE products were cloned and sequenced using the same method [19]. Compilation and further analysis of those sequences were performed using pDRAW32 DNA Analysis Software (http://www.acaclone. com).

Designing degenerate primers and optimization of PCR conditions
Nucleotide sequences of almost all known Type-D retroviruses, namely SRV-1, SRV-2, MPMV, Tsukuba monkey virus, squirrel retrovirus, TvERV, MusD1 and MusD2 [17,[20][21][22][23][24][25][26], were aligned using the web-based CLUSTAL W (1.83) program (EMBL-EBI). Two pairs of degenerate primers each from the 'gag', 'pol' and 'pro' regions of the Negative sequence position means the position before the start of the reported sequence à The accession number for the MHV-3 genome sequence is FJ647224.1, and that for the MHV-A59 genome sequence is AY700211. Optimization of PCR conditions for all degenerate primer pairs was done in two phases using a mouse genomic DNA template containing the type D retroviral sequences MusD1 and MusD2 [17]. In the first phase, only one thermal profile (95°C for 7 min, 40 cycles of 95°C for 30 s, 40°C for 10 s, 60°C for 1 min, and an additional 5-min extension at 60°C) was used for all six primer pairs. A standard PCR mix containing 2 mM MgCl 2 was used to amplify the target PCR products in this phase. In the second phase, only the successful primer pairs in the first phase (gag-PP2-FWD/REV and pro-PP2-FWD/REV) were optimized, using various MgCl 2 concentrations and annealing temperatures.

Experimental degenerate PCR
Experimental degenerate PCR reactions were carried out with the cDNAs obtained from infected and uninfected mouse liver. Only successfully optimized primer pairs were used to amplify the target PCR products under the optimized thermal conditions.

Internal control RT-PCR
A predesigned primer pair for the mouse beta actin gene with PrimerBank ID 6671509a1 (http://pga.mgh.harvard. edu/primerbank/) was used to amplify an RT-PCR product from the cDNA of both infected and uninfected mouse liver. The reaction mix (20 ll) contained 1 ll of cDNA, 2.5 mM MgCl 2 , 0.5 ll of each primer (10 lM), 0.5 ll of dNTP mix (10 mM), 2 units of AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, USA) and the required volume of water to make the final volume 20 ll. The thermal profile used was 95°C for 7 min, 40 cycles of 95°C for 30 s, 55°C for 10 s, and 60°C for 30 s, followed by an additional 5-min extension at 60°C.  (Table 2), were synthesized by Sigma-Aldrich (Australia). 5 0 -RACE reactions were carried out with all of these primers against the cDNA obtained from the infected mouse (E4M31) liver according to the BD SMARTTM RACE cDNA Amplification Kit protocol (BD Bioscience-Clontech, Palo Alto, CA, USA). A positive-control 5 0 -RACE was done with the cDNA obtained from the ThyE1M6 cell

Identification of MHV sequences using the SSH technique
Subtraction efficiency tests for control and experimental samples were performed after completion of the forward subtraction process, and the results were satisfactory (data not shown). Successfully subtracted cDNAs were then cloned into E. coli using the pGEM vector and made into a forward subtracted cDNA library following the procedure described above. The titer and efficiency of the library were calculated to be 4.3 9 10 9 cfu/ml and 71.5 %, respectively. Sequence analysis of 148 successful clones revealed that 12 % of the total subtracted genes belonged to murine coronavirus (MHV) ( Table 3). No other viral genes were identified in the subtracted population (148 clones). This indicated that murine hepatitis virus was a likely cause of the infection in transplant-recipient SCID mouse liver.

Sequencing the full-length murine coronavirus (MHV) genome
A total of 43 PCR products of varying sizes (312 bp-1757 bp) and a 3 0 -RACE product (1365 bp) were sequenced on both strands to derive the full-length genome sequence of the murine hepatitis virus. At least three clones from each PCR product or 3 0 -RACE product were sequenced. DNA analysis software pDRAW32 and ClustalW2 (EMBL-EBI) were use to compile and align sequence data. Sequences in the primer regions were corrected using overlapping sequence alignment. Since the sequence of the 5 0 -end forward primer (5 0 GTACCCTTTCTACTCTCA3 0 ) could not be verified by any other means, an 18-bp sequence from the 5 0 end of the full-length genome was excluded and then reported to DNA Data Bank of Japan (DDBJ) with the accession number AB551247. The reported full-length genome of MHV-MI is therefore 31,428 bp long.

Sequence analysis
Initially 12 ORFs were predicted in three reading frames from the full-length viral genome (31,428 bp) using pDRAW32 DNA analysis software (Aca Clone software).
Only five of these matched with known proteins of MHVs: hemagglutinin-esterase (HE), spike glycoprotein (S), nonstructural protein 4 (ns4), membrane protein (M), and nucleocapsid protein (NC). The remaining seven ORFs were either partially matched or did not match with any known MHV proteins. At that stage, the information related to ORFs of many MHVs with published [27][28][29][30][31][32][33] or unpublished sequences (accession nos. FJ647224, FJ647219, NC_006852.1) were consulted, and seven more proteins, namely, replicase polyprotein 1a, replicase polyprotein 1ab, non-structural protein 2a (ns2a), nonstructural protein 5 (ns5), envelope protein (E), and internal protein (I) were predicted. The ORF of replicase polyprotein 1ab contained a ribosomal slippage site at nucleotide position 13,524 bp. ORFs of replicase polyprotein 1ab, replicase polyprotein 1a, ns4, ns5, E, M, NC and I protein contained overlapping sequences of varying length (4-13384 bp). The I protein was predicted within the coding sequence of the NC protein, but in a different reading frame; hence, there was 100 % overlap (410 bp) of the I protein by the NC protein. BLAST analysis against the protein data base revealed that most of the predicted proteins are more similar to the corresponding proteins of MHV-3 (90-99 %) than to those of any other MHV strain  (Table 4). Comparative analysis of ORFs from different MHVs revealed that MHV-MI contains the same number of proteins as JHM virus or its variants RJHM/A, MHV-JHM.IA, SA59/RA59, SJHM/RA59, and strain A59 (Table 4), and hence, MHV-MI is more closely related to JHM virus or its variants. The reported full-length genome sequence of MHV-MI in the DDBJ database (accession no. AB551247) contained all of these 12 predicted ORFs and their nucleotide positions.

Degenerate PCR targeting type D retrovirus
Primers for degenerate PCR targeting type D retrovirus were designed from the known sequences of type D retroviruses as described above. After the first PCR optimization step, only two primer pairs (gag-PP2-FWD/REV and pro-PP2-FWD/REV) successfully amplified the target PCR products of 420 bp and 240 bp, respectively, from the mouse genomic DNA. PCR products were verified by sequencing to be derived from MusD1 or MusD2 retroviral sequences of mouse genomic DNA. After the second phase of optimization, 2 mM MgCl 2 was found to be optimal for both primer pairs, but an annealing temperature of 55°C was found to be optimum for the gag-PP2-FWD/REV primer pair, and 45°C for the pro-PP2-FWD/REV primer pair. Under the optimum conditions, both primer pairs produced sharp PCR bands (Fig. 1). In the experimental degenerate PCR, no PCR products were amplified, even after 70 cycles. RT-PCR internal control with mouse bactin primers produced the expected products (154 bp) from both infected and uninfected mouse liver cDNAs. These results clearly indicated that no type D retrovirus was present in the infected or uninfected mouse liver.

0 -RACE targeting a whole range of retroviruses
Eleven tRNA primers were designed to cover the whole range of retroviruses (Table 2), but none of the primers produced the expected 5 0 -RACE products from cDNA from infected mouse liver (Fig. 2). tRNA primers for Lys, Thr and Gln produced some products (lanes 1, 6 and 7), but similar products were also observed in the 5 0 -RACE  . These products were regarded as not being generated from any retroviral genome. A positive control cDNA from the ThyE1M6 cell line, which expresses endogenous MoMLV (data not shown), yielding a product with the tRNA-Pro primer (Fig. 2). The product was also confirmed by sequencing. An RT-PCR internal control with mouse b-actin gene primers, following the procedure described above, produced the expected product from infected mouse liver cDNA (Fig. 2). These results indicated that no known retrovirus was present in the infected mouse liver.

Discussion
The PCR-Select TM cDNA subtraction method is based on a unique approach of selective amplification of differentially expressed sequences [5,34] and requires a series of quality-assessment steps before the final subtraction stage. We started with three pairs of infected and uninfected mouse liver RNA samples, and one pair of positive control human placental RNA sample (HaeIII treated and untreated). After quality testing in each step of cDNA synthesis, Rsa I digestion, and adaptor ligation, only one pair of 'tester cDNA' and 'driver cDNA' was selected for final subtraction analysis, based on quality assessment. The 'tester cDNA' was derived from the RNA of infected mouse (E4M31) liver, and the driver cDNA was derived from RNA of uninfected mouse (E5M32) liver. Since identification of a viral gene signature from the infected mouse liver was the goal, only the forward subtraction method was used, and the library was constructed from the forward-subtracted cDNAs only. Other gene sequences resulting from forward subtraction analysis (Table 2) were not analyzed for the same reason.
MHVs belong to the group II coronaviruses, which are divided into two groups according to patterns of tropism [35]. Enterotropic strains, such as MHV-D, MHV-Y, MHV-RI, LIVIM, and DVIM, generally produce infections confined to the GI tract [36], whereas polytropic strains, such as MHV-1, MHV-2, MHV-3, MHV-4 (or JHM), MHV-A59 and MHV-S, initiate infection mainly in the respiratory tract, which can disseminate to the liver, spleen, lymph nodes, and brain [35][36][37]. Polytropic strains are usually more virulent than enterotropic strains [38]. Although BLAST analysis of protein sequences showed a higher percentage of homology with MHV-3 proteins, comparative analysis of ORFs (Table 4) revealed that MHV-MI was more closely related to polytropic MHV-JHM virus or its variants. MHV-MI also contained the HE protein, which forms smaller spikes on virions [27] and enhances virulence when paired with spike protein (S) [39]. The HE protein is absent in the MHV-3 strain, and MHV-MI could therefore be predicted to be a polytropic virulent virus and a very close relative of MHV-JHM.
The non-specific SSH approach is useful for identifying unknown putative viruses. Since it did not identify the suspected type D retrovirus or any other retroviruses, targeted degenerate PCR and 5 0 -RACE techniques were used to confirm the negative SSH results. Both experiments confirmed the absence of any retroviral involvement in the diseased mouse livers. It is unclear if the inflammatory LCH-like phenotype observed in the SCID mice is related to the MHV infection, and further studies are required to examine this possibility.