Abstract
Gene expression profiles help to capture the functional state in the body and to determine dysfunctional conditions in individuals. In principle, respiratory and other viral infections can be judged from blood samples; however, it has not yet been determined which genetic expression levels are predictive, in particular for the early transition states of the disease onset. For these reasons, we analyse the expression levels of infected and non-infected individuals to determine genes (potential biomarkers) which are active during the progression of the disease. We use machine learning (ML) classification algorithms to determine the state of respiratory viral infections in humans exploiting time-dependent gene expression measurements; the study comprises four respiratory viruses (H1N1, H3N2, RSV, and HRV), seven distinct clinical studies and 104 healthy test candidates involved overall. From the overall set of 12,023 genes, we identified the 10 top-ranked genes which proved to be most discriminatory with regards to prediction of the infection state. Our two models focus on the time stamp nearest to \(t = 48\) hours and nearest to \(t =\) “Onset Time” denoting the symptom onset (at different time points) according to the candidate’s specific immune system response to the viral infection. We evaluated algorithms including k-Nearest Neighbour (k-NN), Random Forest, linear Support Vector Machine (SVM), and SVM with radial basis function (RBF) kernel, in order to classify whether the gene expression sample collected at early time point t is infected or not infected. The “Onset Time” appears to play a vital role in prediction and identification of ten most discriminatory genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Braun, B.A., Marcovitz, A., Camp, J.G., Jia, R., Bejerano, G.: Mx1 and Mx2 key antiviral proteins are surprisingly lost in toothed whales. Proc. Nat. Acad. Sci. 112(26), 8036–8040 (2015)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Chen, M., et al.: Predicting viral infection from high-dimensional biomarker trajectories. J. Am. Stat. Assoc. 106(496), 1259–1279 (2011)
Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Mult. Classif. Syst. 34, 1–17 (2007)
Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)
Fensterl, V., Sen, G.C.: Interferon-induced ifit proteins: their role in viral pathogenesis. J. Virol. 89, 2462–2468 (2014). https://doi.org/10.1128/JVI.02744-14
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2010)
Krapp, C., et al.: Guanylate binding protein (GBP) 5 is an interferon-inducible inhibitor of HIV-1 infectivity. Cell Host Microbe 19(4), 504–514 (2016)
Kuhn, M.: Building predictive models in r using the caret package. J. Stat. Softw. Artic. 28(5), 1–26 (2008)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
Liu, T.Y., et al.: An individualized predictor of health and disease using paired reference and target samples. BMC Bioinform. 17(1), 47 (2016)
McCloskey, B., Dar, O., Zumla, A., Heymann, D.L.: Emerging infectious diseases and pandemic potential: status quo and reducing risk of global spread. Lancet Infect. Dis. 14(10), 1001–1010 (2014)
Molinari, N.A.M., et al.: The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine 25(27), 5086–5096 (2007)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org/
Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(suppl–1), D91–D94 (2004)
Scholkopf, B., et al.: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans. Signal Process. 45(11), 2758–2765 (1997)
Statistics, L.B., Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102(43), 15545–15550 (2005)
Subramanian, G., et al.: A new mechanism of interferon’s antiviral action: induction of autophagy, essential for paramyxovirus replication, is inhibited by the interferon stimulated gene, tdrd7. PLoS pathog. 14(1), e1006877 (2018)
Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10(4), 252 (2009)
Verhelst, J., Parthoens, E., Schepens, B., Fiers, W., Saelens, X.: Interferon-inducible protein Mx1 inhibits influenza virus by interfering with functional viral ribonucleoprotein complex assembly. J. Virol. 86(24), 13445–13455 (2012)
Wilkesmann, A., et al.: Hospitalized children with respiratory syncytial virus infection and neuromuscular impairment face an increased risk of a complicated course. Pediatr. Infect. Dis. J. 26(6), 485–491 (2007)
Wingender, E., et al.: Transfac: an integrated system for gene expression regulation. Nucleic Acids Res. 28(1), 316–319 (2000)
Woods, C.W., et al.: A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PloS One 8(1), e52198 (2013)
Wu, C., et al.: Biogps: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 10(11), R130 (2009)
Zhu, Z., et al.: Nonstructural protein 1 of influenza a virus interacts with human guanylate-binding protein 1 to antagonize antiviral activity. PloS One 8(2), e55920 (2013)
Acknowledgements
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Verma, G., Jha, A., Rebholz-Schuhmann, D., Madden, M.G. (2019). Using Machine Learning to Distinguish Infected from Non-infected Subjects at an Early Stage Based on Viral Inoculation. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-06016-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06015-2
Online ISBN: 978-3-030-06016-9
eBook Packages: Computer ScienceComputer Science (R0)