Background

Children with fever represent one of the commonest presentations to healthcare professionals [1, 2]. However, available diagnostic tests are poor at rapid and accurate discrimination of the aetiology of fever. This limits the potential to give the right treatment to the right patient at the right time [3]. Many infectious and inflammatory disorders, including tuberculosis (TB) or Kawasaki Disease (KD), present with signs and symptoms that overlap with other conditions, and diagnosis is typically delayed until after initial management strategies for more common bacterial and viral infections fail [4, 5]. Several blood transcript-based diagnostic signatures have been published [6,7,8], but translation into clinically useable tests lags behind.

Most existing transcriptomic diagnostic signatures published to date are binary: either one-vs-all (OVA—e.g. Kawasaki Disease versus other diseases) [9] or one-vs-one (e.g. bacterial versus viral) [10]. These show great diagnostic accuracy in many settings and are well suited to rule-in or rule-out diagnostic dilemmas. However, they are less useful for clinical presentations with diagnostic uncertainty and multiple potential differential diagnoses (Fig. 1A). Application of multiple transcriptomic signatures either in parallel or in series may simulate this process more closely, but their pairwise independence can lead to failure through multiple mechanisms (Fig. 1B, C). A single multiclass signature for the same set of diagnoses might overcome many of these limitations (Fig. 1D), accounting for the dependence of one diagnosis on the other differentials under consideration, whilst utilising the interdependence between transcripts. Indeed, Habgood-Coote et al. recently demonstrated a 161-transcript signature that can differentiate 18 acute paediatric diseases in parallel [11].

Fig. 1
figure 1

Approaches to Diagnostic testing. A Simple schematic approach to clinical diagnostics, with reliance on traditional microbiological testing. B Application of binary diagnostic signatures in parallel, demonstrating problems relating to overlapping or contradictory results. C Application of binary signatures in series, demonstrating carry-forward error in classification. D Application of multiclass signature, avoiding these limitations. Created with BioRender.com

One obstacle in test development is validation of diagnostic signatures that have been discovered using methodology unsuitable for clinical translation, such as microarrays or RNA-sequencing. NanoString technology has been used for quantification of transcripts in many sample-types, enabling its use in transcriptomic signature discovery and validation [12,13,14,15], in cancer prognostics, and companion diagnostics [16,17,18]. We used NanoString to demonstrate the feasibility of validating multiple whole-blood binary-classification transcriptomic signatures in parallel, including a previously unpublished novel 3-transcript signature that differentiates active tuberculosis from other febrile illnesses. Efficiently parallelising validation studies in this way could substantially reduce both time and financial costs.

We also developed two multiclass diagnostic signatures, to explore proof-of-concept computational methodologies that could be applied to multiclass prediction problems in clinical diagnostics. The first model explores multiple OVA signatures in parallel, whereas the second considers all diagnostic categories simultaneously.

Methods

Study design and population

We performed a validation study using a subset of prospectively recruited patients from five distinct paediatric (age < 19 years) cohorts. Patients with comorbidities known to significantly affect gene expression (bone marrow transplant, immunodeficiency, or immunosuppressive treatment) were excluded. We included patients with definite bacterial (DB) or definite viral (DV) infection; Kawasaki disease (KD); or tuberculosis disease (TB). Healthy control samples were also included to improve normalisation protocols and provide context for “normal” transcript levels. All participants were independent from the derivation cohorts of the signatures we evaluated. Clinical data and samples were identified only by study number.

Disease groups were assigned using pre-agreed definitions for each primary study after review of all available clinical and laboratory data. Patients were classified as having a DB infection if a pathogenic bacterium was isolated from a normally sterile site, matching the clinical syndrome at presentation, with or without concurrent viral pathogens detected. A diagnosis of DV infection was made if a pathogenic virus was identified alongside a matching clinical syndrome, without coexisting features of bacterial infection, and with low inflammatory markers (C-Reactive Protein (CRP) < 60mg/L and absolute neutrophil count < 12,000/μL) [19]. Patients were diagnosed with complete or incomplete KD based on the 2017 American Heart Association (AHA) criteria [20]. Assignment to the tuberculosis disease group required a clinical history suggestive of TB and corroborative laboratory testing (culture for M. tuberculosis, Interferon-Gamma Release Assays, or positive tuberculin skin test). TB patients with coincident HIV were excluded. In all groups, samples were collected as soon as possible after presentation, and wherever possible before initiation of relevant treatment. Additional details, including full inclusion and exclusion criteria, are described in the Additional file 1 and original papers [10, 21,22,23].

Ethical approvals

Written informed consent was obtained from parents or guardians using locally approved research ethics committee permissions (Ethical Committee of Clinical Investigation of Galicia (GENDRES CEIC ref 2010/015); UK National Research Ethics Service (UK Kawasaki Genetics 13/LO/0026; EUCLIDS 11/LO/1982; NIKS 11/11/11)). Patients in Cape-Town (ILULU) were recruited under ethics approvals from the local recruiting centre: HREC REF 130/2007 [22].

Selected signatures

Five whole-blood-based RNA signatures that differentiate febrile diseases were selected for validation using NanoString: These signatures are:

  • A 13-transcript signature to distinguish KD from other febrile illnesses [9], previously validated via RT-PCR [24]. (Wright13)

  • A 2-transcript signature to distinguish Bacterial from Viral infections in children [10], previously validated via RT-PCR [25, 26]. (Herberg2)

  • A 2-transcript signature adapted from Herberg2 signature, with FAM89A substituted for the highly correlated but more abundantly expressed transcript EMR1-ADGRE1 [27]. (Pennisi2)

  • A single transcript signature, BATF2, to distinguish TB disease from healthy adults [28], externally validated in RNAseq and microarray datasets [2931]. (BATF2)

  • An unpublished 3-transcript signature to distinguish TB disease from other diseases (TB3)

The target diseases for the signatures were selected to represent a diverse range of important causes of fever in children. The two bacterial-viral signatures (Herberg2 and Pennisi2) were selected for further cross-platform validation, and to assess the effect on performance of the substitution of FAM89A for EMR1-ADGRE1. We selected the 13-transcript signature for KD as this is an important cause of fever in children, that is often misdiagnosed as an infectious disease, and allows us to demonstrate our approach on an important inflammatory disorder. The TB3 signature was included to provide the first cross-platform validation of this novel signature. We compared TB3 to a primarily adult signature (BATF2) for TB disease to investigate the performance of this adult-derived signature in children, and to characterize its performance when applied to a new task of differentiating TB disease from other causes of fever.

Derivation of the TB3 signature is described in more detail in Additional file 1 and results. Briefly, the signature was generated by randomly splitting the discovery cohort described in the Anderson et al. study [22] into training (80%) and test sets (20%), and running Forward Selection-Partial Least Squares on the training set [10, 32].

Transcript selection

A total of 69 transcripts were selected to be run in the Nanostring panel (Table S1). All 20 transcripts from the five validation signatures were included. We selected an additional 40 transcripts that have previously been found to accurately discriminate between one or more of the above comparator conditions in RNA-sequencing or microarray data (Table S1). Selected transcripts include predominantly those associated with protein coding genes, such as the Type 1 interferon stimulated gene IFI44L (Interferon-Induced Protein 44-Like), which has previously been implicated in response to viral infections [10]. Smaller numbers of transcripts are associated with long non-coding RNAs (lncRNAs) or microRNAs. Examples include the lncRNA KLF7-IT1 (Kruppel-Like Factor 7 Intronic Transcript 1) and MIR3128 (microRNA 3128), which have no previous recorded disease associations [33], but were both differentially expressed between bacterial and viral infections in the work of Habgood-Coote et al [11].

We also included three housekeeping transcripts recommended by NanoString, and six more, identified from our microarray and RNA-sequencing data, that had the smallest standard-deviation/mean ratio (coefficient of variation) across multiple separate cohorts for different expression abundance ranges.

Sample and data processing

Total RNA was extracted from whole blood from PAXgene tubes using PAXgene Blood miRNA kits (PreAnalytiX), and transcript expression quantification was undertaken with 100 ng of RNA using the NanoString nCounter® MAX system, and a custom designed codeset of the selected transcripts. Raw counts were normalised and log-transformed (Additional file 1).

Statistical analyses

All statistical analyses were undertaken in R, version 4.1.1 [34].

Descriptive statistics and signature evaluation

The diagnostic accuracy of each signature was calculated as an area under receiver operator characteristic curve (AUC) with 95% confidence intervals (CI), using the DeLong method in the R-package pROC [35, 36]. pROC was used for plotting receiver operator characteristic (ROC) curves with 95% confidence intervals of the sensitivities at fixed specificities. The optimal threshold was chosen to maximise the Youden’s J statistic [37], and then used to calculate additional test statistics with 95% CI, including sensitivity and specificity, using the R-package epiR [38]. A disease risk score (DRS) was calculated for each signature by summation of up-regulated transcripts and subtraction of down-regulated transcripts on a logarithmic scale, as previously described by Kaforou et al. [39]. A logistic regression was refitted on log-scale normalised counts for each signature to retrain coefficients and derive prediction probabilities.

Multiclass prediction models

We chose two distinct models to predict from one of four diseases (DB, DV, KD & TB). The Mixed One-vs-All (MOVA) model optimises four binary OVA models in parallel, one for each disease. The Multiclass model performs multivariate logistic regression. Full descriptions are available in Additional file 1. Both methods use relaxed, regularised binomial/multinomial logistic regression models (elastic net) [40], implemented using the R-package glmnet [41], to account for the large number of predictors relative to samples and inherent multicollinearity in our data. Healthy controls were removed, and samples were weighted by group size to account for class imbalance. The original 60 transcripts were restricted to a subset of 36 that were significantly different for one or more one-vs-all comparisons, using a Mann–Whitney U test, with correction for multiple hypothesis testing. This criterion was applied to remove transcripts performing poorly when moving cross-platform—typically lowly expressed transcripts—enabling a refining of the feature space to more relevant transcripts. In-sample error rates and confusion matrices are reported for both models.

Results

Participants

Samples from 91 children were included in this study: 16 healthy controls, 23 with DB, 20 DV, 14 KD and 18 TB (Fig. 2). One KD patient was removed who received IVIG before blood sampling, and two were removed after transcript expression quantification following blinded review of clinical data, as they did not meet AHA criteria for complete or incomplete KD. We excluded 2 samples due to low expression levels after quality control (Additional file 1). Baseline demographic and clinical data are shown in Table 1. DB and DV patients had similar demographics, whereas healthy controls were older than other patients. KD and TB patients were generally older and were less likely to be of European ethnicity. Ethnicity data for TB patients recruited in Cape Town were not collected in the index study.

Fig. 2
figure 2

Study overview. Schematic overview of study recruitment and analysis. Created with BioRender.com

Table 1 Patient characteristics

Pathogens identified in infected patients are presented in Table S2. Admission-to-sample collection times were short, with median of < 2-days for all groups where data were available. Median days from fever-onset to sampling in KD patients was 6 (IQR 5–9) and was similar to the symptom onset to sample collection time in DB and DV patients. Diagnostic performance of routinely-measured CRP and White Blood Cell, Neutrophil, and Lymphocyte counts for binary and one-vs-all comparisons are shown in Table S3.

Signature validation

Kawasaki 13-transcript signature

The Wright13 DRS was able to diagnose KD from other diseases with high accuracy (Fig. 3A). ROC curve analysis demonstrated an AUC of 0.897 (95% Confidence interval 0.822–0.972, Table 2), with optimal sensitivity of 0.929 (0.661–0.998) and specificity of 0.738 (0.609–0.842) (Fig. 3B). As observed previously, the DRS was more discriminatory earlier in the disease course of KD patients (Figure S1). Refitting a logistic regression model using all 13 transcripts had 100% accuracy to diagnose KD (Table 2, Fig. 3B).

Fig. 3
figure 3

Performance of existing signatures. Plots of Disease Risk Scores by category (left) and ROC-curves (right) for five signatures. A and B Wright13 signature, with boxplots of the DRS by category A and ROC curves of the DRS and LR-probability (B). C and D Herberg2 signature, with boxplots of the DRS by category C and ROC curves of the DRS, LR-probability, and individual transcripts (D). E and F Pennisi2 signature, with boxplots of the DRS by category (E) and ROC curves of the DRS, LR-probability, and individual transcripts (F). G and H TB3 signature, with boxplots of the DRS by category (G) and ROC curves of the DRS, LR-probability, and individual transcripts (H). I and J BATF2, with boxplots of expression by category (I) and ROC-curves of BATF2 expression tasked with differentiating active TB from either controls or other disease groups J. 95% confidence intervals for ROC-curves are included for the DRS and LR-probability only in panels B, D, F and H

Table 2 Diagnostic accuracy of each signature

Bacterial vs Viral 2-transcript signatures

For differentiating DB from DV cases the Herberg2 DRS had an AUC of 0.825 (0.691–0.959, Table 2), with sensitivity of 0.739 (0.516–0.898) and specificity of 0.950 (0.751–0.999) (Fig. 3C). There was marginal improvement after retraining coefficients using logistic regression, which was not statistically significant (p = 0.392, Fig. 3D). Both models performed significantly worse when tasked with differentiating DB from all other disease groups, with AUCs of 0.699 and 0.723 respectively, but were excellent at differentiating DV from all other disease groups, with AUCs of 0.844 and 0.849. This discrepancy may be explained by the high AUC of IFI44L in differentiating DV from other diseases (0.834), compared with the low AUCs (all < 0.7) of IFI44L for DB-vs-other diseases and FAM89A for both comparisons (Table S4).

As previously seen, FAM89A had low expression in most samples, whereas EMR1-ADGRE1 showed more robust expression levels (Table S4). The Pennisi2 DRS replaces FAM89A in Herberg2 with EMR1-ADGRE, which improved the overall signature AUC to 0.867 (0.753–0.982) (Fig. 3E, F), although this was non-significant (p = 0.417). However, EMR1-ADGRE1 had a lower AUC than FAM89A for differentiating DB and DV cases (0.717 vs 0.761, p = 0.636), suggesting the improved performance of the Pennisi2 signature is due to improved transcript interactions, rather than better performance of individual transcripts. Similar to Herberg2, the Pennisi2 signature was more accurate at differentiating viral infections from other diseases than bacterial infections vs other diseases (Table 2).

Tuberculosis signatures

Performance in microarray dataset

The novel TB3 signature includes the transcripts CYB561, GBP6 and KIFC3. It achieved an AUC of 0.928 (0.872–0.985) in the validation cohort, with optimal sensitivity and specificity of 0.886 (0.771–0.971) and 0.859 (0.766–0.938) respectively (Additional file 1: Table S5, Fig. 4).

Fig. 4
figure 4

TB3 signature performance in microarray training and validation cohorts. Boxplots of the DRS A of the TB3 signature and the correspondent ROC curves B in the training, test, and validation sets

Performance in NanoString dataset

The 3-transcript TB signature was able to differentiate TB from other diseases with an AUC of 0.882 (0.787–0.977, Table 2), with sensitivity of 0.833 (0.586–0.964) and specificity of 0.807 (0.681–0.900). Again, retraining using logistic regression demonstrated only marginal improvements (Table 2, Fig. 3G, H), without statistical significance (p = 0.114).

BATF2 alone could accurately differentiate active TB from healthy controls (AUC of 0.910 (0.808–1.000), Table 2), with high specificity, 0.938 (0.698–0.998), and sensitivity of 0.833 (0.586–0.964). However, BATF2 was also overexpressed in patients with other diseases (Fig. 3I, J) and had significantly reduced diagnostic accuracy comparing TB with other disease groups (AUC 0.743 (0.620–0.866), p = 0.043).

Expression patterns of individual transcripts

AUCs and summary statistics for transcript-specific one-vs-all disease comparisons are shown in Additional file 1: Table S3. The highest AUCs were found for transcripts distinguishing DV or TB from other disease groups (Additional file 1: Figure S2).

When ranked by p-value (corrected for multiple testing), 50 transcript-specific one-vs-all comparisons were significant at the alpha = 0.1 level, including 36 unique transcripts shared approximately evenly across comparisons (Additional file 1: Table S4). Two transcripts, KLHL2 & IFI27, contributed to three separate significant comparisons.

MOVA and multiclass model prediction results

When the MOVA-model was used for prediction, it selected 25 unique transcripts across four separate binomial models (Additional file 1: Figure S3), with a maximum of 9-transcripts in any single model. The MOVA-model had an in-sample error rate across all diseases of 13.3%, with worst performance predicting DB. The in-sample error rates of the separate models varied from 2.7% for TB vs other diseases to 18.7% for DB vs other diseases (Additional file 1: Table S6). Prediction probabilities for DB and KD patients were similar to each other, whereas TB patients often had high prediction probabilities for DV (Fig. 5A).

Fig. 5
figure 5

In-sample radar plots for multiclass signatures. Radar plots showing the probabilities of each class predicted by A the MOVA-model, B the Multiclass model. Probabilities separated and coloured by actual disease: Red = DB; Blue = DV; Purple = TB; Yellow = KD

The Multiclass model selected 20 unique transcripts and had 100% prediction accuracy (Additional file 1: Table S6). 17 transcripts overlapped with the MOVA-model. In contrast to the MOVA-model, radar plots show the prediction probabilities are near one for the correct disease for all samples (Fig. 5B). When assessed on healthy controls both models predicted nearly half of patients to have TB, with the Multiclass model classifying more of the remaining cases as DB, whilst the MOVA-model classified most of the remaining controls as DV (Additional file 1: Table S6).

Discussion and conclusion

We have shown that validation of multiple transcriptomic signatures can be efficiently performed through a single NanoString assay. The performance of all tested signatures was similar to that of their primary studies, suggesting that both the signatures and technology are robust to alterations in study design and methodology. This also implies that overfitting to discovery cohorts was not an issue for any of the tested signatures. Further evidence for this was seen by the minimal gains in performance when retraining models using logistic regression, with the exception of the larger KD signature, although this improvement may represent overfitting. Transcriptomic signatures are rapidly expanding, both in scope and number. To understand how best to implement the increasingly complex list of published transcriptomic signatures in clinical practice, new methods are needed that enable efficient evaluation of multiple signatures simultaneously. We have used NanoString to measure transcript abundances covering multiple binary signatures, to facilitate the side-by-side comparison of signature performance, and to consider alternative methodologies for assigning disease class.

A limitation of transcriptome-derived diagnostic signatures is the study-specific bias of transcripts selected by a single methodology and patient cohort [42]. This can lead to reduced performance when signatures are applied to external datasets, or different clinical settings where the case-mix is dissimilar to the discovery cohort [42, 43]. The second phenomenon was evident in our data, for example, the BATF2 transcript performed well in its designed classification task (distinguishing TB from healthy controls) but performed poorly when differentiating TB from other diseases.

The major mechanisms behind such reduced performance in new cohorts are (1) overfitting of the original classification model, (2) under-representative discovery cohorts which do not reflect the clinical variability in the target population, or (3) failure in translation between technologies. The first mechanism can be addressed through thoughtful machine-learning pipelines in signature development, using appropriate test, training, and validation sets. However, mitigation of the second mechanism requires clinical recruitment that is representative of the full range of patient pathologies, rather than a restricted set of target conditions. Deriving binary signatures from these more varied cohorts may be clinically valid in certain contexts, e.g., for patients with specific diseases where confirmatory early diagnosis is needed. Kawasaki disease, for example, commonly presents with a cluster of characteristic clinical features, but diagnosis is frequently delayed due to clinical overlap with other conditions. A signature to diagnose or exclude KD in patients presenting with a KD-like features could improve time-to-diagnosis and outcomes [44].

However, this binary approach is inappropriate for undifferentiated febrile illness, where a wider range of pathologies are present. Multiclass models to classify patients into one or more of many possible outcomes may be the most parsimonious solution. They could enable one-step diagnosis of a range of conditions including those not considered by the physician, improving diagnostic accuracy, and reducing time-to-diagnosis.

Our exploratory analysis of multiclass diagnostic methods demonstrates the potential of this approach using a small dataset generated with techniques that are closer to patient translation than the transcriptomic approaches used to generate data for signature discovery. Our findings are supported by large-scale in silico studies based on transcriptomic data [11]. We demonstrate that two contrasting approaches—MOVA and multiclass—can both yield high in-sample classification accuracy. The MOVA model had slightly worse performance compared to the multiclass model, which had perfect in-sample accuracy. This may be explained by the restriction imposed on the MOVA model to use only certain transcripts for each comparison. However, this also exposes the multiclass model to greater risk of overfitting. Larger studies are needed to test a broader range of conceptual frameworks and methodologies, and to assess the robustness of these two exploratory models.

One unique aspect of this work is the exploration of these two models, which use different approaches to handle the potential for gene-expression patterns to overlap between multiple disorders. The MOVA-model combines four binary models, each of which is optimised for a single one-vs-all comparison. Each binary model was derived using the same initial list of 36 transcripts, so this approach has the potential to introduce redundancy into the combined final model. However, in the final penalised regression models only two transcripts out of 25 were selected for more than one binary model. This may potentially be explained by the model setup. Each binary model was trained to differentiate a single target condition from all other categories, so transcripts sharing expression patterns between two categories were unlikely to be selected by any single binary model.

In contrast, the Multiclass model directly addresses overlapping expression patterns during model training, by providing each transcript its own coefficient for each condition from the four target diagnoses. These differences in approach may provide further explanation for the performance difference between the models.

In our dataset, transcripts that have been shown to differentiate DB from DV infections were much better at differentiating viral infections from other diseases than bacterial infections from other diseases. Most transcripts performed poorly when comparing DB with other diseases: of the top 20 AUCs only one distinguishes DB from other diseases (HP, AUC 0.784). This is consistent with previous transcriptomic studies, which have shown viral infections are easier to distinguish from other causes of febrile illness than bacterial infections are [45, 46]. A plausible explanation for this phenomenon is the existence of highly conserved host-responses to viral infections, such as the Interferon Stimulated Genes [47], whereas host-responses to bacterial infection may be more varied, in part due to their larger and more varied genomes.

The unpublished novel 3-transcript TB signature demonstrated high sensitivity and specificity in this external cohort, with performance similar to, or exceeding that of previously published signatures [48]. The included transcripts are a subgroup of the original 51-transcript signature of Anderson et al. [22], showing in principle that reduction in transcript numbers can be achieved whilst maintaining high performance. Previous signatures developed to distinguish TB from other diseases often fail to differentiate viral infections from TB, potentially due to reliance on interferon stimulated genes [48]. The 3-transcript signature does not include interferon stimulated genes or transcripts from related pathways, which may explain its high performance despite including viral infections in the comparator group. Although non-significant, retraining the signature to the new platform using logistic regression did demonstrate improved accuracy, exceeding the WHO-defined target-product profile for triage assessment [49]. Furthermore, the sparseness of the signature may aid in translation to a clinically useable assay, whilst simultaneously minimising costs.

Clear limitations of our study include the small sample size, and use of samples from heterogeneous studies. We have attempted to address these through appropriate normalisation processes where possible. Due to its size, the study was limited to only a few diseases of interest, but we consider that our conclusions are valid for the methodological suitability of the NanoString platform for this parallel validation task. Further large-scale studies are needed to explore different conceptual frameworks for the clinical implementation of omic-based signatures, and to determine how to best integrate these novel technologies within existing clinical frameworks. Such studies should also assess the importance of multiclass model setup, for example, altering the regularisation strength for regression models, and comparing regression classifiers with other non-linear methods, such as random forest models.

Although most transcripts demonstrated measurable expression levels and good classification performance when converting from RNAseq to NanoString, some had poor detection. Loss of detection is previously described in cross-platform gene expression studies [24]. Lack of resolution for detection of low abundance transcripts using NanoString nCounter®, relative to the discovery platforms, may have reduced the utility of certain transcripts, and in future studies various input RNA quantities should be trialled to maximise transcript detection. Our findings highlight the need for cross-platform assessment of candidate diagnostic signatures, and consideration of the limitations of each methodology at the earliest stage of signature derivation.

Since CRP and neutrophil cutoffs were used for phenotyping viral patients, and CRP measurements were only available for 2 patients in the TB category, it was not possible to compare the diagnostic performance of the five validated signatures to these commonly used clinical biomarkers without confounding. However, it is known that performance of both CRP and blood cell measurements have limited combined sensitivity and specificity for causes of fever in children, and no well-defined cutoffs exist [6, 50]. The high-performance of the five validated models provides further demonstration of the potential for host-response diagnostic signatures to greatly influence clinical care in paediatrics, through improving diagnostic accuracy and reducing diagnostic delays.

Despite this, it remains the case that most existing signatures have yet to make the leap from bench to bedside. Such translation is particularly challenging for diagnostics in those presenting acutely with fever, where accurate diagnostics within a few hours is most needed [2]. In this setting, we have shown that NanoString may aid in bridging the gap between expensive untargeted gene expression quantification methods (e.g., RNAseq, microarrays) and cheaper, rapid technologies (e.g., qRT-PCR), enabling quick, cost-effective parallel evaluation and refinement of a variety of diagnostic signatures.

Conclusions

Our cross-platform study demonstrates in principle the utility of NanoString technology for efficient parallel validation of transcriptomic signatures. Our out-of-sample findings validated five distinct signatures, including a novel sparse TB signature, but with a reduction in discriminatory power in patients drawn from outside their remit. Two exploratory multi-class models showed high accuracy across multiple disparate diagnostic groups, highlighting the potential of this approach.