Combined Amplification and Molecular Classification for Gene Expression Diagnostics
RNA expression profiles contain information about the state of a cell and specific gene expression changes are often associated with disease. Classification of blood or similar samples based on RNA expression can thus be a powerful method for disease diagnosis. However, basing diagnostic decisions on RNA expression remains impractical for most clinical applications because it requires costly and slow gene expression profiling based on microarrays or next generation sequencing followed by often complex in silico analysis. DNA-based molecular classifiers that perform a computation over RNA inputs and summarize a diagnostic result in situ have been developed to address this issue, but lack the sensitivity required for use with actual biological samples. To address this limitation, we here propose a DNA-based classification system that takes advantage of PCR-based amplification for increased sensitivity. In our initial scheme, the importance of a transcript for a diagnostic decision is proportional to the number of molecular probes bound to that transcript. Although probe concentration is similar to that of the RNA input, subsequent amplification of the probes with PCR can dramatically increase the sensitivity of the assay. However, even slight biases in PCR efficiency can distort weight information encoded by the original probe set. To address this concern, we developed and mathematically analyzed multiple strategies for mitigating the bias associated with PCR-based amplification. We evaluate these amplified molecular classification strategies through simulation using two distinct gene expression data sets and associated disease categories as inputs. Through this analysis, we arrive at a novel molecular classifier framework that naturally accommodates PCR bias and also uses a smaller number of molecular probes than required in the initial, naive implementation.
G. G. was supported by Caltech’s Summer Undergraduate Research Fellowship program. R. L. and G. S. were supported by NSF grant CCF-1714497.