Background

Improvements in microarray fabrication and scanning technologies have enabled the production of high-density arrays that can facilitate the detection of all viral isolates, even those belonging to large, diverse families. Methods have been developed to identify conserved nucleic acid regions that are common to larger groupings (lineages, serogroups, genera, or families), thereby minimizing the number of probes required and increasing the chance of detecting novel viruses. The more recent strategies for achieving this include (i) using pairwise sequence comparisons to identify conserved sequences for probes [13] as well as (ii) identifying specific regions from a multiple sequence alignment [46]. Additionally, our group developed a method for designing probes within conserved protein regions [7]. Finally, oligonucleotide tiling arrays used in viral resequencing have been applied to both virus detection [811] and viral transcript profiling [12].

Although best practices have been established for fluorescent signal analysis in expression profiling [13], standardization has not yet been attained for pathogen detection arrays. While low-density microarrays can be analyzed by visual inspection [14], high-density arrays require computational solutions. Correlation of fluorescence to a predicted hybridization signal has been used to infer the presence of a virus [15], while a t-test based method has also been validated with the same datasets [16]. Other methods include semi-supervised classification using the K-nearest neighbor technique [17], an empirical determination of signal cutoffs for tiling arrays [18], and a likelihood metric informed by taxonomic hierarchies [19]. Resequencing arrays are able to leverage redundancy and deep-coverage, thereby helping the scientist to infer what the viral nucleotide sequence is [10], but whether highly divergent strains of viruses can be correctly annotated by this method is unclear.

Pathogen microarray data is noisy; viral signatures can be masked by cross-hybridizing host transcripts, biased template amplification, imperfectly matching probes or as the result of different probe sensitivities. Additionally, only a small proportion of probes are likely to hybridize if virus is present in a sample, leading to distributions with high variance. Nonparametric tests transform data into rank order or categories; this has the effect of rescaling variance and simplifying the distribution to one more easily modeled. If data variance is high or the distribution is heavily skewed, nonparametric tests are more conservative than equivalent parametric tests. What is more, nonparametric tests are valid with small sample sizes (less than 20); this is an advantage in cases where the presence of a virus is being investigated by a small number of probes. Nonparametric methods such as the Mann Whitney U test have been applied in detecting differential gene expression of by microarrays [20].

We compared the ability of three specific nonparametric statistical tests to predict the presence of viral agents in a hybridization experiment The three tests were (1) the Mann-Whitney U, a test of central tendency; (2) the Spearman rank correlation coefficient, a measure of the relation between two variables; and (3) the chi-square, a test of event probability based on the binomial distribution. Along with negative controls, nine viral isolates from different families were hybridized; the isolates had genome sizes ranging from 7 to 156 kilobases. Type I errors that can result from multiple testing were controlled by using the method created by J.D. Storey [21]. Positive predictive value and false positive rates were used to assess the ability of the three different statistical methods to identify viruses.

Methods

Reference array design and hybridization

Microarrays were fabricated wherein probes were synthesized in situ (at Agilent Technologies Inc., Santa Clara, CA) in two orientations: plus (coding sense) and minus (non-coding antisense, the reverse complement); all probes were covalently anchored at the 3' end [7]. For testing purposes, a microarray was fabricated wherein 8,553 probes were deposited in both orientations and in duplicate (GreeneChip Pilot: 38,546 total probes, barcode 25161471, Gene Expression Omnibus (GEO) Platform GPL10319, http://www.ncbi.nlm.nih.gov/geo). Another microarray was fabricated with a larger number of virus specific probes deposited in both orientations but singly (GreeneChip 1.52RC: 31,771 total probes, barcode 25177711, GEO Platform GPL10320). Negative control probes were generated from 1,500 randomly-chosen shuffled viral probe sequences. These probes used the same proportion of nucleotides as the viral probes, but were not homologous to any known sequences. All probes were randomly positioned on the array.

To approximate clinical sample contexts yet allow for examination of a wide range of viral targets, hybridization experiments were conducted in which known concentrations of WNV viral extract were spiked into reactions containing either 10 or 200 nanograms (ng) of human lung tissue RNA (data available in GEO Series GSE21317; see summary of data availability in additional file 1).

We employed viruses that varied in genome type and length to assess the sensitivity and specificity of three methods for statistically analyzing viral microarray data. The data pool included three single-stranded positive sense RNA viruses: West Nile virus (WNV, strain New York 1999, AF202541, ca 11 kb), SARS coronavirus (HCoV-SARS, strain Tor2, AY274119, ca 30 kb), and human echovirus 18 (EV18, strain Metcalf, ATCC VR-48, AF317694, ca 7.4 kb); two segmented single-stranded negative sense RNA viruses: Lassa virus (LASV, strain Josiah, ca 3.4 kb and 7.3 kb) and influenza A virus H1N1 (FLUA H1N1; A/Texas/36/91, CY009316-CY009323, 8 segments of ca 2.3, 2.3, 2.2, 1.7, 1.5, 1.4,1 and 0.9 kb); two non-segmented single-stranded negative sense RNA viruses: Zaire ebolavirus (ZEBOV, NC_002549, ca 19 kb) and vesicular stomatitis virus (VSV, Indiana strain, NC_001560, ca 11.1 kb); and two double-stranded DNA viruses: human adenovirus 4 (HAdV-4, ATCC VR-1572, NC_003266, ca 36 kb) and human herpesvirus 1 (HSV-1, viral culture, NC_001806, ca 152 kb) (see Table 1; data available in GEO Series GSE21318 and Additional File 2).

Table 1 Viral isolates

After both random amplification and dendrimer labeling [22, 23] were completed, the arrays were visualized with an Agilent slide scanner. SPSS version 16 http://www.spss.com was used for statistical analysis and data plotting.

Computation of probe-target characteristics

A database of probe-virus target homologies was created for use in our analysis algorithms. The EMBL viral nucleotide sequence database [24] was filtered for short HIV-1 sequences and combined with the NCBI viral reference-sequence database [25]. A non-redundant database of 74,044 sequences was generated with CD-Hit [26], using a similarity cutoff of 98% to define sequences as identical. All probe sequences were compared to the non-redundant set of viral sequences; the number and position of any mismatches was stored for each probe-virus target pair. Change in Gibbs free energy (ΔG) at 65°C (hybridization temperature) was calculated as a measure of probe-target binding strength [7], a table of values is available in GEO Series GSE21319. The ΔG of probe-reverse complement hybrids was also calculated; probes with higher GC content had a greater ΔG. A Lowess fit of probe-reverse complement ΔG and the fluorescent signal was computed for all hybridizations and used to correct for sequence composition [27].

Closely related viruses may have indistinguishable hybridization profiles (e.g., the intensity of signal from probes will have the same values). We identified a set of viral sequence records likely to be distinguishable by the array. Using the ΔG values, we computed an in silico hybridization profile and used it to compute distances [28]. A cut-off of -32 kJ was used to determine whether a probe was likely to hybridize. In silico hybridization profiles that differed by less than 6 probes were consolidated.

Cross validation with a public dataset

To assess the applicability of our analytic method to another pathogen microarray platform, we used the E-Predict training set [15] (http://www.ncbi.nlm.nih.gov/geo/ accessions GSM40806 to GSM40861 inclusive). The dataset was comprised of 56 ViroChip microarray hybridizations of viral samples previously characterized by direct immunofluorescence. The viruses represented were human papillomavirus 18 (15 samples derived from HeLa cells), influenza A virus (8 samples derived from nasal lavage), hepatitis B (1 sample derived from serum), respiratory syncytial virus (10 samples derived from nasal lavage) and human rhinovirus (22 samples derived from viral culture in cell lines). One clinical isolate of influenza A virus also contained a respiratory syncytial virus (GSM40845). The array was comprised of 12,505 probes; the 265 probes excluded from the E-Predict analysis were also excluded from this study.

Statistical tests and implementation

Three well-known statistical tests were used to evaluate the presence or absence of viruses in our hybridization experiments (see Additional File 3, Figure S1). A set of negative control probes was used as a reference. Probe pairs for both the plus (coding sense) and minus (non-coding antisense) orientations were present on the array. We assessed the relative utility of probe pairings by performing statistical tests using the plus strand alone and the minus strand alone, and by pooling the pairs together. Permutation tests were performed when needed to determine significance. Programs for statistical computation were written either in Perl or with modules from the R Project for statistical computing http://www.r-project.org.

Mann-Whitney U Test

T-tests and ANOVA tests are the most common form of statistical tests in microarray analysis literature [29], and have been successfully applied to pathogen arrays [16]. The Mann-Whitney U test (also called the "Wilcoxon rank-sum test") is the nonparametric analogue of the Student's t-test. By using ranked data, it assesses whether two groups of samples are drawn from the same distribution.

In our experimental application, the signals of viral probes that targeted a single strain were compared to an equal number of negative control probes. If a virus was targeted by less than 10 probes, negative control probes were supplemented to achieve an n of 20. The null hypothesis (H0) was that virus specific probes and negative control probes had the same central tendency (e.g., they were drawn from the same population). In each instance, the significance of the U statistic was computed with the commonly-used normal approximation for large samples [30].

Spearman correlation

Correlation is the strength of the linear relationship between two variables. The most common method of assessing correlation between two continuous variables is Pearson's product-moment correlation coefficient. A normal data distribution is not a requirement of the test although equality of variance (or "homoscedasticity") is assumed [31]. The variance assumption has been shown to be a confounder in microarray analysis [32]; thus we decided to use a more conservative correlation measure that makes use of ranked data: namely, Spearman's rank correlation coefficient.

In this application, the change in the Gibbs free energy values of the predicted probe-virus target hybrids was used as the independent variable, while the dependent variable was the observed probe signal (see [7] for a graphical example). ΔG was used to predict fluorescence and to model surface hybridization kinetics [33, 34]. An equal number of negative control probes were pooled with viral probes to ensure that a wide range of signals would be examined. This was necessary to ensure that a correlation could be computed, because a uniformly high signal would be regarded as uncorrelated to the ΔG value. For negative control probes, the independent variable was the ΔG of the predicted human ribosomal RNA duplexes. The null hypothesis was that a fluorescent signal would not be predicted by ΔG (e.g., the signal might, rather, be randomly distributed with respect to ΔG). One sided p-values for correlations were computed using a standard Student's t-distribution approximation [30].

Binomial Test

The binomial test is a subset of the X2(chi-square) method; it tests the probability of observing a series of events according to an expected probability rate. An example of this kind of test would be an assessment of the fairness of gaming dice [30].

In this application, we "binned" the probe signals into percentiles and then categorized them as positive or negative based on whether their computed values were located above or below a certain threshold. Probes for a single virus were tested against the X2distribution in such a way that the expected probability was equal to the probability rate of the negative control probes. The null hypothesis was that above the threshold, there was no difference between the proportion of virus specific probes and negative control probes. The threshold choice for the binomial test clearly has a strong influence on the results; our own exploratory testing of various percentile thresholds (70th, 80th, 90th, 95th and 99th) showed that the 90th percentile generated the best performance for our needs. All binomial tests were conducted using the 90th percentile in categorizing probes as "positive" or "negative." We concluded that the threshold value for the test, whatever it might be, should be considered a parameter for optimization.

Multiple test correction via the False Discovery Rate (FDR)

An issue that often arises in testing a series of null hypotheses is the increase in the probability of the occurrence of a type I error (e.g., the rejection of H0 when it is true). Conservative familywise correction methods such as Bonferroni's have been applied to microarray data. An alternative method, first described by Benjamini and Hochberg [35] and then revised by Storey [36], involves correcting the proportion of falsely rejected hypotheses. Here we used Storey's method and defined its parameters, within our study, as the proportion of the viruses identified through array hybridization that prove to be false leads (i.e., the q-value) [21]. In a multiple hypothesis test scenario, requiring a q value of ≤ 5% for there to be "significance" would indicate that up to 5% of the viruses identified as present in a sample may, in fact, have been truly absent. In contrast, requiring a p value of ≤ 5% for there to be "significance" indicates that up to 5% of all truly absent viruses may be identified as present. Q-values were calculated from test p-values by using the q-value module from the R Project contributed by Storey.

Assessment of test performance

A table of correct predictions for each hybridized virus was created by using the NCBI Taxonomy database (available as Additional File 4). A prediction was considered a "true positive" if either the specific virus introduced in the experiment or any other virus in the same genus was, thereby, predicted. Q-values with a threshold of 10% were used; and the top 250 predictions for each method were evaluated. Species prediction could also have been assessed, but would have been less indicative of algorithm performance because the concept of what constitutes a viral species is not comparable between the various taxonomic families (e.g., by serology or by molecular phylogeny). What is more, highly similar strains may be impossible to distinguish by examining their microarray signature, yet still be defined as different species; in contrast, viruses from different genera are more easily distinguishable.

False positive rate (FPR) and positive predictive value (PPV) were used to assess the performance of the various test models. PPV is a commonly used standard for diagnostic tests; it is the probability that a positive test result will reflect the presence of a virus in the sample. PPV is the ratio of the number of true positives to the sum of the number of true positives and false positives; false negatives are not used in the PPV method. In our particular application of the method, it was the ratio of the number of statistically significant (q ≤ 0.10) and correct predictions to the sum of all the statistically significant predictions generated; predictions below the significance threshold were ignored. PPV is a useful measure of algorithm performance because it is not sensitive to heterogeneity or misclassification in viral taxonomy; it is not necessary to predict that all members of a genus are present in order to achieve a favorable score.

Results

Deviation from normality in fluorescence distribution

To assess the effects of variability that derive from the concentration of viral nucleic acid and host nucleic acid in clinical samples, we characterized the effect of a varying input of human RNA on the fluorescence distribution of the log-transformed probe signals. While an analysis by Giles and Kipling of Affymetrix arrays based on 25 nt probes indicated that the fluorescence distributions of their log-transformed probe signals were indeed close to normal (Gaussian) [37], our experience with 60 nt probe arrays showed strong deviations from a normal distribution. Increases in RNA input have complex effects on signal distribution: the degree of skew is increased and the upper tail is lengthened (see Figure 1). We tested the normality of these distributions and found strong deviations for all three experiments (using the Kolmogorov-Smirnov test, where p < 0.05). Sample distributions of fluorescent signals deriving from virus-infected tissue-culture cell hybridizations are available in Additional file 5, Figure S2. The observed signal distributions included Gaussian, highly kurtotic and bimodal; host RNA concentration can only partially explain these differences. Other explanations include: a large proportion homologous probes on the array increasing the upper tail, differences in amplification or labeling efficiency from sample to sample, and differences in hybridization affinity due to guanine/cytosine proportion in viral transcripts. While more complex methods of transformation have been described elsewhere, these have focused on improving the sensitivity of transcript profiling measurements and, we surmised, were unlikely to improve sensitivities in our application. Parametric analysis of microarray data is particularly sensitive to transformations; violations of normality will lead to a loss of power, and may result in invalid p values [38].

Figure 1
figure 1

The effect of increasing human nucleic acid on the hybridization signal. Three quantities, 200, 10 and 0 ng of human lung RNA (panels a, b, c), were hybridized to a pathogen microarray comprised of ~38,000 probes. The fluorescent signal from the pathogen-specific probes was log-transformed and used to generate histograms. An expected normal (Gaussian) distribution is depicted by the line in the above figure. Tests for normality indicated significant deviation, with the following Kolmogorov-Smirnov Z values: (a) 41.5, (b) 14.6, (c) 12.1 (all p < 0.01).

Reverse complement controls reduce noise

Replication of hybridization experiments can improve specificity [39]; however, the expense incurred by processing multiple arrays or low quantities of nucleic acid in one's clinical sample may preclude the use of this approach. Alternative strategies that might be adopted include the application of either replicate probes and/or reverse complementary probes to the same array. A study using Agilent 60 nt in-situ synthesized arrays found that replicate probes were highly concordant [40], and useful both in improving normalization and in identifying outlier signals [41]. Reverse complements have been used on spotted oligonucleotide arrays for pathogens [1], although no quantitative assessments of their advantages have been reported up to this point. In such studies as are mentioned above, in situ synthesized oligonucleotides are immobilized at the 3' end. Probe-target helix stability near the surface is likely to be reduced due to steric hindrance. Thus, probes that are mismatched to their targets will be strongly affected by the specific mismatch position (e.g., mismatches near the 5' end will reduce the hybridization signal more than those near the 3' end [42]).

To identify a printing strategy that would reduce false positive signals, we conducted pilot studies using infected cell extracts of WNV containing 104 and 106 copies of viral template RNA as input for the reverse transcriptase reaction. Amplification products were hybridized on an array containing both replicate and reverse complement probes, while human lung RNA was used as a control. In all experiments, replicate probe pairs were better correlated than reverse complement pairs for both WNV cognate probes and non-flavivirus probes. When WNV was present, reverse complement probes had a higher correlation than when no WNV was present; this differential response was stronger in probes that had the highest degree of signal (i.e., those in the top 10th percentile) (see Table 2 and Additional File 6, Figure S3).

Table 2 Correlation between replicate and reverse complemented probe pairs

Correcting for sequence composition reduces the false positive rate

Seven negative control hybridizations and nine viral isolate hybridizations were used to compute false positive rates (FPR) (see Table 1). The top 250 predictions for each statistical test were evaluated. A q-value threshold of 10% was applied; this corresponds to a condition in which 1 in 10 predictions may prove to be false. The binomial test performed well, achieving a 2% false positive rate, whereas the Mann-Whitney U and Spearman correlation tests had high average FPRs (48% and 58% respectively; see Table 3). We noticed that a strong relationship existed between the GC content and the fluorescent signal in our platform (see Additional file 7, Figure S4); it is worth noting that high GC content has previously been implicated in off-target hybridization [43, 44]. We applied a probe-level correction and found that a substantial decrease in the FPR occurred in the case of both the Mann-Whitney U and Spearman correlation methods as a result (see Table 3; data available in additional file 8 and additional file 9).

Table 3 False positive rates for methods of pathogen identification

Even after GC correction, a strong prediction for a chicken endogenous retrovirus was identified from the influenza virus hybridization by all three methods. The virus was cultured in eggs that are likely to express the viral transcripts, indicating that a microarray can indeed identify co-infections.

A binomial test correctly predicts the viral genus in a majority of cases

Statistical methods were compared for GC-corrected signals by using PPV (Positive Predictive Value), which is the probability that a diagnostic test will correctly report the causative agent of an illness. In clinical applications, the identification of a viral genus is sufficient to direct other molecular identification methods (e.g., consensus PCR) as well as to prescribe initial clinical measures. Thus, we measured the success of each statistical method by whether it correctly predicted the presence of a hybridized viral isolate or any other virus within the same genus. The precise number of times that the most statistically significant prediction proved to be correct was also tabulated (see Table 4 for an enumeration of both measures). Statistical analysis was carried out with coding sense probes, antisense probes, or both probe types pooled together, as independent measures of the quantity of viral nucleic acid. A strategy of averaging probe values resulted in a performance similar to that of those instances when coding sense probes alone were used.

Table 4 Positive predictive value for methods of pathogen identification

The binomial test obtained the highest PPV of 91% when both sense probes and antisense probes were taken into account, while the Mann-Whitney U test also performed favorably. In contrast, the Spearman correlation coefficient test had a substantially lower PPV of 33%. A relaxation of the multiple-testing correction requirements resulted in an improvement in the Spearman PPV for antisense probes to 76% but such a relaxation of correction requirements is undesirable due to the potential rise in false positives that might result (see Additional file 10, Table S1). In two cases (those of HSV-1 and HAdV-4), the use of both sense and antisense probes reduced the overall PPV; however the input virus was nonetheless correctly predicted. In another case (that of the LASV, binomial test), the pooling of the two probe methods resulted in detection where the virus would otherwise have been missed.

Validation of binomial and Mann-Whitney U test performance on the ViroChip platform

A publically available dataset from Urisman et al. (ViroChip E-Predict training set; [15]) was downloaded and analyzed using the Mann-Whitney U and binomial tests. The ViroChip dataset was comprised of 56 hybridizations of samples derived from tissue culture, nasal lavage or serum. The samples contained papillomavirus, influenza A virus, hepatitis B virus, respiratory syncytial virus or human rhinovirus. The ViroChip platform differs from ours in the following ways: oligonucleotide probe length (70 nt vs. 60nt), number of probes (~12,000 vs. ~30,000), fabrication method (robot spotted vs. in situ synthesis), incorporation of fluorescence (amino-allyl dUTP vs. secondary hybridization) and presence of GC-matched negative control probes. However, the ViroChip platform does include the use of reverse complement probes and employs a random-PCR based nucleic acid amplification strategy that is similar to ours.

Two other methods (E-Predict [15] and DetectiV [16]) reported a high degree of predictive accuracy when they were used to examine the same dataset. Similarly, we report having achieved a high degree of accuracy when using two nonparametric testing methods (binomial test and Mann-Whitney U) together with the same parameters that were used for our in-situ synthesized probe platform. We report on our performance in this instance by relying on a simple metric: the rank of the first correct prediction (see Table 5). The first prediction of the Mann-Whitney U test was correct in 47% of the cases; a correct prediction was present among the first ten predictions in 91% of the cases. The average correct prediction rank for the Mann-Whitney U test was 3.7. The largest number of incorrect predictions were for human herpesvirus 7 (HHV7), human herpesvirus 6 (HHV6A/B) and human herpesvirus 5 (HHV5), regardless of the sample source. The first prediction of the binomial test was correct in 75% of the cases; a correct prediction was present among the first ten predictions in 98% of the cases. The average correct prediction rank for the binomial test was 1.6. As in the Mann-Whitney U results, the majority of incorrect predictions were for HHV7. Watson et al. reported a similar false-positive result using a t-test method [16]. HHV7 has a seroprevelance of 85% in humans and can be readily detected in blood, saliva and cervical tissue [45]. The degree of prevalence suggests that the predictions represent a true co-infection; however, validation of this would require further molecular tests. Both the binomial and Mann-Whitney U methods generated statistically significant predictions for an influenza virus and RSV co-infected sample.

Table 5 Performance of Mann-Whitney U and Binomial tests with ViroChip pathogen microarray platform

The binominal test outperformed the Mann-Whitney test on the ViroChip data, largely because Mann-Whitney's ranking on HHV7 was lower. The binomial test's performance without optimization was comparable to that of E-Predict (95% correct for first predictions, 100% correct for the first ten) and to that of DetectiV (98% correct for first predictions, 100% correct for the first ten), however, a binomial test does not require probe-level weighting or assumption of normality.

Discussion

Although expression profiling and viral microarrays may employ similar methods for printing probes and for preparing and hybridizing nucleic acids, they differ markedly in their analytical strategy. A fundamental assumption in expression profiling is that the majority of the various gene transcript mRNAs are present in the sample (albeit at different concentrations). Differential expression is identified by computing the ratio of probe signal for a gene in two conditions (e.g., tumor vs. normal tissue). In a viral microarray experiment, absence of signal from all probes is a plausible result. This may reflect any of the following: (1) that no agent is present; (2) that an agent is present but there are no probes for it because it is either truly new or sufficiently different to confound hybridization; (3) that a known agent is present and that probes are appropriate but that levels are insufficient to enable detection.

The results presented here indicate that the quantity of host nucleic acid used in hybridization has substantive effects on a probe's fluorescent signal distribution that manifest as deviations from normality. We speculate that these deviations represent mass effects when concentrations are high enough to drive up the proportion of partially hybridized strands. Probe sequence composition was identified as a major confounder that resulted in high FPR. Probe signal correction by GC content improved predictions. We tested the effect of various probe control strategies on the FPR. A hybridization requirement for both sense and antisense probes was more helpful in reducing the FPR than the use of hybridization to replicate probes. An additional important negative control was the inclusion of shuffled viral probes that enabled examination of the effects of array-wide GC content.

A key strength of this study was its effective control of the familywise error rate, achieved via its use of Storey's FDR technique [36]. While this method successfully lowered the FPR, there was a mild violation in the independence assumption as some probes target multiple viruses. Q values remain useful when p values are correlated within blocks [21], a condition we expect to be true within our probe-sets. Methods that explicitly address dependence have recently been under investigation [46] and should be incorporated in any future versions of our method.

Nonparametric statistical methods are more conservative than equivalent parametric ones; consequently, they are less likely to require that one reject the null hypothesis when it is false (as in the case of type II errors or false negatives). The variability of microarray signal distributions in our study suggested that parametric predictions may be inaccurate; for this reason, we tested a number of nonparametric approaches. Of the three methods that we assessed, the binomial test was the most successful, achieving a 91% PPV. A further development of our study might include empirically determining a more optimal threshold by using spike-in controls or by adding more categories to reflect different levels of confidence (e.g., positive, marginal and negative). The Mann-Whitney U test was also successful in that it predicted the presence of nearly all agents. This result conforms with a similar study using t-tests [16]. Yet the Mann-Whitney U assessment was sensitive to sequence composition; its PPV on uncorrected signal data was only 48%.

The Spearman correlation performed poorly, correctly predicting only three disease agents. A Pearson correlation method (E-Predict) has been successfully applied to pathogen arrays, according to Urisman et al. [15]. The E-predict method performs an in silico hybridization between fully sequenced viral genomes and a viral array. After a clinical sample is hybridized, the predicted fluorescence for each virus is compared to the observed signal. The strength of correlation is used to identify which virus, if any, is responsible for the signal pattern. In our study and in Urisman's, the Spearman correlation performed poorly. We re-explored this result by relaxing the requirement for a multiple testing correction and found that it improved the degree of predictive accuracy, but the Spearman assessment still did not perform as well as the other tests (see Additional file 10, Table S1). Accordingly, we concluded that the Spearman correlation as implemented was not as discriminative as other nonparametric tests.

Conclusion

We report the successful application of two nonparametric tests that require few assumptions that have a high degree of predictive accuracy and have extremely low false positive rates. In a direct assessment of the binomial and Mann-Whitney U tests on a related pathogen microarray platform, the binomial test performed comparably to other reported methods (e.g., E-Predict [15] and DetectiV [16]). In contrast to these methods, the binomial test achieved a high degree of predictive accuracy on ranked data without probe-specific weighting parameters, iterative analysis or assumptions of normality.