Using Machine Learning to Distinguish Infected from Non-infected Subjects at an Early Stage Based on Viral Inoculation

  • Ghanshyam VermaEmail author
  • Alokkumar Jha
  • Dietrich Rebholz-Schuhmann
  • Michael G. Madden
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11371)


Gene expression profiles help to capture the functional state in the body and to determine dysfunctional conditions in individuals. In principle, respiratory and other viral infections can be judged from blood samples; however, it has not yet been determined which genetic expression levels are predictive, in particular for the early transition states of the disease onset. For these reasons, we analyse the expression levels of infected and non-infected individuals to determine genes (potential biomarkers) which are active during the progression of the disease. We use machine learning (ML) classification algorithms to determine the state of respiratory viral infections in humans exploiting time-dependent gene expression measurements; the study comprises four respiratory viruses (H1N1, H3N2, RSV, and HRV), seven distinct clinical studies and 104 healthy test candidates involved overall. From the overall set of 12,023 genes, we identified the 10 top-ranked genes which proved to be most discriminatory with regards to prediction of the infection state. Our two models focus on the time stamp nearest to \(t = 48\) hours and nearest to \(t =\)Onset Time” denoting the symptom onset (at different time points) according to the candidate’s specific immune system response to the viral infection. We evaluated algorithms including k-Nearest Neighbour (k-NN), Random Forest, linear Support Vector Machine (SVM), and SVM with radial basis function (RBF) kernel, in order to classify whether the gene expression sample collected at early time point t is infected or not infected. The “Onset Time” appears to play a vital role in prediction and identification of ten most discriminatory genes.


Machine learning Respiratory viral infection Prediction Deferentially expressed genes 



This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund.


  1. 1.
    Braun, B.A., Marcovitz, A., Camp, J.G., Jia, R., Bejerano, G.: Mx1 and Mx2 key antiviral proteins are surprisingly lost in toothed whales. Proc. Nat. Acad. Sci. 112(26), 8036–8040 (2015)CrossRefGoogle Scholar
  2. 2.
    Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRefGoogle Scholar
  3. 3.
    Chen, M., et al.: Predicting viral infection from high-dimensional biomarker trajectories. J. Am. Stat. Assoc. 106(496), 1259–1279 (2011)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Mult. Classif. Syst. 34, 1–17 (2007)Google Scholar
  5. 5.
    Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)CrossRefGoogle Scholar
  6. 6.
    Fensterl, V., Sen, G.C.: Interferon-induced ifit proteins: their role in viral pathogenesis. J. Virol. 89, 2462–2468 (2014). Scholar
  7. 7.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)CrossRefGoogle Scholar
  8. 8.
    Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2010)Google Scholar
  9. 9.
    Krapp, C., et al.: Guanylate binding protein (GBP) 5 is an interferon-inducible inhibitor of HIV-1 infectivity. Cell Host Microbe 19(4), 504–514 (2016)CrossRefGoogle Scholar
  10. 10.
    Kuhn, M.: Building predictive models in r using the caret package. J. Stat. Softw. Artic. 28(5), 1–26 (2008)Google Scholar
  11. 11.
    Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002). Scholar
  12. 12.
    Liu, T.Y., et al.: An individualized predictor of health and disease using paired reference and target samples. BMC Bioinform. 17(1), 47 (2016)CrossRefGoogle Scholar
  13. 13.
    McCloskey, B., Dar, O., Zumla, A., Heymann, D.L.: Emerging infectious diseases and pandemic potential: status quo and reducing risk of global spread. Lancet Infect. Dis. 14(10), 1001–1010 (2014)CrossRefGoogle Scholar
  14. 14.
    Molinari, N.A.M., et al.: The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine 25(27), 5086–5096 (2007)CrossRefGoogle Scholar
  15. 15.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013).
  16. 16.
    Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(suppl–1), D91–D94 (2004)CrossRefGoogle Scholar
  17. 17.
    Scholkopf, B., et al.: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans. Signal Process. 45(11), 2758–2765 (1997)CrossRefGoogle Scholar
  18. 18.
    Statistics, L.B., Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefGoogle Scholar
  19. 19.
    Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  20. 20.
    Subramanian, G., et al.: A new mechanism of interferon’s antiviral action: induction of autophagy, essential for paramyxovirus replication, is inhibited by the interferon stimulated gene, tdrd7. PLoS pathog. 14(1), e1006877 (2018)CrossRefGoogle Scholar
  21. 21.
    Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10(4), 252 (2009)CrossRefGoogle Scholar
  22. 22.
    Verhelst, J., Parthoens, E., Schepens, B., Fiers, W., Saelens, X.: Interferon-inducible protein Mx1 inhibits influenza virus by interfering with functional viral ribonucleoprotein complex assembly. J. Virol. 86(24), 13445–13455 (2012)CrossRefGoogle Scholar
  23. 23.
    Wilkesmann, A., et al.: Hospitalized children with respiratory syncytial virus infection and neuromuscular impairment face an increased risk of a complicated course. Pediatr. Infect. Dis. J. 26(6), 485–491 (2007)CrossRefGoogle Scholar
  24. 24.
    Wingender, E., et al.: Transfac: an integrated system for gene expression regulation. Nucleic Acids Res. 28(1), 316–319 (2000)CrossRefGoogle Scholar
  25. 25.
    Woods, C.W., et al.: A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PloS One 8(1), e52198 (2013)CrossRefGoogle Scholar
  26. 26.
    Wu, C., et al.: Biogps: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 10(11), R130 (2009)CrossRefGoogle Scholar
  27. 27.
    Zhu, Z., et al.: Nonstructural protein 1 of influenza a virus interacts with human guanylate-binding protein 1 to antagonize antiviral activity. PloS One 8(2), e55920 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Insight Centre for Data AnalyticsNational University of Ireland GalwayGalwayIreland
  2. 2.School of Computer ScienceNational University of Ireland GalwayGalwayIreland
  3. 3.ZB MED - Information Center for Life SciencesUniversity of CologneCologneGermany

Personalised recommendations