Stable RNA markers for identification of blood and saliva stains revealed from whole genome expression analysis of time-wise degraded samples
Human body fluids such as blood and saliva represent the most common source of biological material found at a crime scene. Reliable tissue identification in forensic science can reveal significant insights into crime scene reconstruction and can thus contribute toward solving crimes. Limitations of existing presumptive tests for body fluid identification in forensics, which are usually based on chemoluminescence or protein analysis, are expected to be overcome by RNA-based methods, provided that stable RNA markers with tissue-specific expression patterns are available. To generate sets of stable RNA markers for reliable identification of blood and saliva stains we (1) performed whole-genome gene expression analyses on a series of time-wise degraded blood and saliva stain samples using the Affymetrix U133 plus2 GeneChip, (2) consulted expression databases to obtain additional information on tissue specificity, and (3) confirmed expression patterns of the most promising candidate genes by quantitative real-time polymerase chain reaction including additional forensically relevant tissues such as semen and vaginal secretion. Overall, we identified nine stable mRNA markers for blood and five stable mRNA markers for saliva detection showing tissue-specific expression signals in stains aged up to 180 days of age, expectedly older. Although, all of the markers were able to differentiate blood/saliva from semen samples, none of them could differentiate vaginal secretion because of the complex nature of vaginal secretion and the biological similarity of buccal and vaginal mucosa. We propose the use of these 14 stable mRNA markers for identification of blood and saliva stains in future forensic practice.
KeywordsBody fluid identification Gene expression Blood Saliva Biological traces RNA markers
Human body fluids such as blood and saliva are the most common sources of biological trace material found at a crime scene. Reliable tissue identification in forensic casework is important as it provides crucial insights into crime scene reconstruction and can thus contribute towards solving crimes. Blood stains are routinely tested in forensic practise using various methods including the tetrabase (4,4-bis(dimethylamino)diphenylmethane) test , the Kastle–Meyer phenolphthalein test, the tetramethylbenzidine test , the orthotolidine test , or the luminol (3-aminophthalhydrazide) chemoluminescence test , with the latter especially appropriate for detecting blood stains after cleaning attempts [2, 5]. All these presumptive—thus indicative but not identifying—tests take advantage of the peroxidase-like activity of the heme unit of the hemoglobin molecule in human blood. Therefore, false-positive results can be caused by the presence of strong oxidants, such as chlorine-containing detergents or by true peroxidases (e.g., from plants) .
Saliva stains are usually detected in forensic practise via an enzymatic amylase test using Phadebas  or with a recently developed enzyme-linked immunosorbent assay-based method . However, because of amylase degradation, the time window for the successful performance of such tests can be limited . Furthermore, no amylase assay can distinguish between salivary amylase and amylases from other tissues (pancreatic, urinary, etc.); therefore, the tests for saliva identification are only presumptive (similar to existing blood identification tests).
On the other hand, methods for identification and quantification of mRNA are already well established, although mostly outside the forensic field. These methods make massive multiplex gene expression profiling possible—among many other applications—for the discovery of tissue-specific mRNA markers. The major concern of using mRNA markers for forensic applications is their assumed high susceptibility to degradation. However, recent studies using a few selected genes demonstrated that it is possible to isolate total RNA of sufficient quality and quantity from biological stains that are several months or even years old [10, 11, 12]. It has also been suggested, although with limited evidence so far, that different types of mRNA seem to follow different rates of degradation . It is assumed that the degradation process of mRNA is influenced by many external and internal factors, including structural peculiarities like the presence of AU-rich elements (ARE motifs), protein binding properties, and cellular localization [14, 15]. However, detailed knowledge on the molecular reasons for differences in RNA degradation between different types of RNAs as well as between mRNAs of different genes is currently lacking and further investigations are sorely needed.
Although a small number of mRNA markers has been tested for tissue identification in forensic science [16, 17, 18, 19], no systematic study has yet been performed. In addition, the identification of candidate markers in previous studies was based on a mixed literature and database search, apparently without strict criteria of selection, considering only a limited number of genes and tissues, and not taking into account RNA degradation levels. Furthermore, expressed sequence tags databases, which were used previously, like the Cancer Genome Anatomy Project , are expected to provide heavily biased information on candidate genes because of the nonrandom character of representation of clone libraries.
To find stable mRNA markers for body fluid identification in forensic practice, we performed a systematic and comprehensive whole-genome gene expression analysis on time-wise degraded blood and saliva stains using the Affymetrix U133 plus2 GeneChip. This expression array contains >54,000 mRNA probe sets, which encompass most, if not all, known and predicted human genes. Tissue-specific expression patterns of the most promising candidate genes from the array analyses were further confirmed using the GNF SymAtlas expression database , which covers about 100 human tissues, and finally verified by quantitative real-time polymerase chain reaction (PCR) in blood and saliva as well as in other body fluids relevant for forensic casework, i.e., semen and vaginal secretion.
Materials and methods
Aliquots of 5 ml of whole blood and saliva were collected from each of five healthy volunteers (four men and one woman) of western European genetic origin under informed consent before their inclusion in the study. Native blood was collected without anticoagulation treatment to avoid disturbing effects of anticoagulation reagents on gene expression. In each sample, 75 cotton swabs were immersed. Special care was taken to shorten the time between collection and swab absorption to avoid blood coagulation. After complete absorption of the fluids, swabs were left until dry on a bench top at room temperature. When dry, the swabs were stored in dust-free nonhumid conditions (but subjected to normal daylight) for different time intervals. Swabs were visually inspected and sorted out to ensure similar liquid content between individual swabs. After 0, 1, 3, 7, 14, 21, 57, and 180 days, swabs were stored at −80°C until RNA isolation. For the time interval 0 days, samples were frozen immediately after drying. Semen and vaginal secretion samples were collected from one male and one female individual absorbed with cotton swabs and dried overnight before RNA isolation.
RNA was isolated using the Qiagen RNeasy kit (Qiagen Benelux B.V.) according to the manufacturer’s instructions with minor modifications. These included cutting up the cotton swab into 1 × 1-mm pieces and soaking them in RLT buffer for 1 h at 4°C before the extraction. Trial experiments to lengthen this incubation time up to 24 h did not reveal any improvement in respect to RNA quantity and quality (data not shown).
Microarray hybridization and gene expression data analysis
Before hybridization to Affymetrix U133 plus2.0 GeneChip arrays (Affymetrix, Santa Clara, CA), RNA isolated from blood and saliva stains was amplified using the Ambion MEGAscript T7 two-cycle amplification kit (Applied Biosystems, The Netherlands). Amplification, labeling, hybridization, washing, and scanning were performed by the microarray core facility of the Erasmus MC Center for Biomics according to Affymetrix specifications. Background subtraction and probe signal summarization were calculated according to the robust multiarray analysis algorithm  using the R Bioconductor software ; the resulting log2 signal values were back-transformed to linear scale. Presence/absence calls for individual probe sets were calculated with the mas5calls function of the Bioconductor mas package. Because the constant global mean assumption does not hold true for arrays hybridized to differentially degraded RNA samples, the normalization of the signal intensities between samples was performed using the nonhuman control genes present on Affymetrix arrays (spiked-in probes). Normalization factors for each array were inferred from the average signal intensities of bioB, bioC, bioD, and Cre control probe sets. Analysis of differential gene expression was performed using the significance analysis of microarrays (SAM) algorithm  implemented in the TM4 software . In the saliva dataset, we selected only genes with signal intensities above 50 (which is below the usually applied background threshold in expression array experiments) that had a signal intensity below 50 in the blood dataset. The selection of blood-targeted genes was done in a similar manner but with different criteria, the lower intensity limit in blood was set to 1,000 to reasonably restrict the number of candidates.
First strand cDNAs were synthesized with SuperScript® III RTS First-Strand cDNA Synthesis Kit (Invitrogen BV, The Netherlands) using total RNA as a template. The primers were designed with Primer3 software  so that forward and reverse primers were complementary to different exons of the respective genes and most closely located to the 3′-end of the corresponding RefSeq cDNA (Electronic Supplementary Material Table S1). Real-time PCR reactions with the SuperScript® III Platinum® SYBR® Green One-Step qPCR Kit (Invitrogen BV) were performed on an ABI 7300 PCR machine (Applied Biosystems, The Netherlands) using the following parameters: initial denaturation at 94°C for 10 min, followed by 45 cycles of denaturation at 94°C for 15 s, and a final annealing/elongation at 60°C for 30 s. Melting profiling and agarose gel electrophoresis were used to confirm the specificity of the primers and the absence of DNA contamination. Quantification of the amplified cDNA yield in comparative blood and saliva PCRs was done by the standard curve method. PCR experiments with semen and vaginal secretion were quantified using delta Ct (dCt) method. In both cases, GAPDH gene was used as an endogenous control to normalize the amplification signal between the samples from different tissues and individuals. Time points were compared to each other without normalization: Assuming the temporal degradation of all RNA molecules, no internal control gene could be used, and the only proper way to normalize RT-PCR signals was to use the same amount of template in each reaction. We found that this requirement holds true for our experiments because the GAPDH expression variability between different samples from the same tissue was relatively low (CV <25%, data not shown), which is probably because of approximately the same amount of blood or saliva absorbed with cotton swabs during material collection.
Results and discussion
Microarray expression data
As expected, hybridization signals demonstrated high variability between individuals; however, the most striking differences were observed between the different tissues. Signal intensities in blood samples were on average about five times higher than in saliva (174.2 ± 1.9 in blood samples vs 26.9 ± 0.7 in saliva; Wilcoxon test rank sum p < 0.001). In addition, at the time-point zero, the number of the probe sets called as present according to the Affymetrix algorithm was, on average, more than three times higher in blood than in saliva (30.2% ± 0.9 vs 9.3% ± 0.6; t test p < 0.001). The SAM test with stringent parameters (false discovery rate was set to 0%) showed that, both in blood and saliva experiments, no genes demonstrated significant expression differences in a time range of 0–57 days of stain storage. Only few genes (37 and 10 significantly differential genes for saliva and blood, respectively) appeared to be differentially expressed at 180 days in comparison to other time points. This suggests that in dried blood and saliva, mRNA molecules remain relatively stable for a long period. Recent studies of Heinrich et al.  also revealed poor correlation between RNA degradation and postmortem time intervals.
Selection of tissue-specific markers
The initial selection of tissue-specific genes was performed using the normalized signal intensities of microarray hybridizations averaged across the five biological replicates at the zero experimental time point. About 500 apparent saliva-specific and 1,000 apparent blood-specific candidate genes were selected. Further refinement of tissue-specific gene sets was achieved by probing the selected candidates against the GNF SymAtlas tissue database  after excluding all cell lines from the database retaining only tissues and organs for the analysis. Genes were selected only if they were highly and exclusively expressed in the target tissue(s) based on the GNF SymAtlas database. For blood, target tissue in the database was defined as whole blood; while for saliva, the target tissues were salivary gland, tongue, trachea, and tonsils. The selection criteria were as follows: high expression (signal intensity >1,000) in target tissue and low expression (signal intensity <200) in nontarget tissues. Using these criteria and combining data from expression array experiments as well as GNF SymAtlas database verification, we identified six saliva-targeted genes and 15 blood-targeted genes that were highly expressed only in target tissues (or respective organs) but not, or nearly not, in the nontarget tissues (Electronic Supplementary Material Figure S1a, S1b, S1c).
RT-PCR confirmation of tissue-specific markers
Expression of the candidate markers in other body fluids
For the 14 blood-targeted genes, we observed no detectable amplification in semen for nine genes (CASP1, AMICA1, C1QR1, ALOX5AP, AQP9, C5R1, NCF2, MNDA, ARHGAP26), keeping with the assumption of high blood specificity of the respective mRNA markers. These genes encode the proteins with important functions in different types of blood cells. They are known to be highly or even specifically expressed in peripheral leukocytes (AQP9, NCF2, CASP1, C5R1, C1QR1, ALOX5AP [30, 31, 32, 33, 34, 35]) and myelocytes or hematopoietic cells (MNDA, ARHGAP26, AMICA1 [36, 37, 38]). However, five genes demonstrated only slightly differential or even comparable expression in blood and semen (CD36, CCR1, PF4, BIN2, and ALOX5), not expected given the information provided by the GNF database, and were therefore excluded from the final list of blood-specific markers. Thus, our microarray-based genome-wide approach to find tissue-specific mRNA markers identified the genes that are functionally relevant for the target tissues.
Furthermore, and not surprisingly, RT-PCR of all saliva- and blood-targeted markers in samples from vaginal secretion revealed gene expression at a level comparable to that in blood and saliva samples (Fig. 2b). The natural occurrence of blood cells in vaginal secretion most likely explains the expression of our blood-targeted markers in vaginal secretion, whereas the high biochemical and histological similarity of oral and vaginal epithelia  makes the similarity of gene expression patterns between both tissues plausible. It should be pointed out that mRNA markers previously claimed to be useful for the identification of vaginal secretion such as HBD-1  and MUC4 [18, 19] are known to be abundant also in oral epithelial cells and the salivary transcriptome [41, 42, 43]. Furthermore, Nussbaumer et al.  ruled out the potential to differentiate saliva and vaginal secretion using solely MUC4. Our results, together with previous findings, suggest that establishing mRNA markers expressed exclusively in vaginal secretion could be a challenging if not impossible task.
Comparison with previously suggested mRNA markers
Interestingly, tissue-specific genes, as identified here, do not overlap with the ones previously suggested for blood and saliva stain identification [18, 19]. This could be explained by the experimental setup and the systematic (but not ad hoc) approach of this study, namely, the degraded biological material analysed and the Affymetrix microarray platform applied. In contrast to previous studies, we restricted our marker ascertainment to those genes, which retained structural mRNA integrity during the stain dry-out process as well as the subsequent long-term storage of 180 days. This allows future application of detection of these markers in forensic stains of unknown age, at least up to an age of 6 months, but expectedly longer. Furthermore, our saliva-specific candidate genes were derived from mouth and pharynx epithelial cells, unlike the previously suggested STATH and HTN3 genes that are expressed in the salivary gland . Secreted mRNAs that are abundant in fresh saliva are more prone to fast degradation by extracellular RNAses ; they are therefore not expected to be present in dried stains, explaining why they were not detected by the relatively low-sensitive microarray hybridization method used in this study. The SPTB and PBGD genes, previously proposed as blood-specific markers , do not demonstrate any overexpression relative to other tissues in whole blood according to the GNF SymAtlas database (data not shown).
In summary, whole-genome expression analysis in time-wise degraded samples from blood and saliva stains in combination with RT-PCR verification of various forensically relevant body fluids has resulted in the identification of stable tissue-specific mRNA markers from five genes for saliva (SPRR3, SPRR1A, KRT4, KRT6A, and KRT13) and nine genes for whole blood (CASP1, AMICA1, C1QR1, ALOX5AP, AQP9, C5R1, NCF2, MNDA, and ARHGAP26). For the first time, mRNA markers were ascertained considering almost the entire human transcriptome and based on experimental data of genome-wide gene expression as well as considering the degradation stability of mRNAs. We could demonstrate that the candidate genes identified here provide informative mRNA markers for blood and saliva identification for stains up to 180 days of age. We would like to propose their application in forensic case work (with the potential practical limitation of coamplification in vaginal secret) for stains of at least 6 months of age. However, we expect that the proposed mRNA markers will successfully identify older blood and saliva stains (respective experiments are currently in progress). Finally, we would like to remark that tissue identification in forensics should be performed in a reciprocal way; so that a tissue is identified because of the presence of markers specific for the relevant tissue together with the absence of markers specific for all other tissues in question. Clearly, more research should be dedicated towards finding the most suitable markers for tissue identification in forensics.
We are grateful to all volunteers who provided tissue samples for this study. We thank Bianca de Graaf as well as Miriam Goedbloed and Silke Brauer for assistance in sample collections and preparations, and Nienke van Doorn for help in RNA extraction. Additional colleagues at the Erasmus MC Center for Biomics are acknowledged for microarray hybridization experiments. This study was supported by The Netherlands Forensic Institute, the Erasmus University Medical Center, and by additional funds from the Fonds Schiedam Vlaardingen to support forensic molecular biology at Erasmus MC. This study received additional support from the Translational Medicine Program of Affymetrix and Erasmus MC.
- 12.Anderson S, Howard B, Hobbs GR, Bishop CP (2004) A method for determining the age of a bloodstain. Forensic Sci Int 3:22–24Google Scholar