Skip to main content

Using Machine Learning to Distinguish Infected from Non-infected Subjects at an Early Stage Based on Viral Inoculation

  • Conference paper
  • First Online:
Data Integration in the Life Sciences (DILS 2018)

Abstract

Gene expression profiles help to capture the functional state in the body and to determine dysfunctional conditions in individuals. In principle, respiratory and other viral infections can be judged from blood samples; however, it has not yet been determined which genetic expression levels are predictive, in particular for the early transition states of the disease onset. For these reasons, we analyse the expression levels of infected and non-infected individuals to determine genes (potential biomarkers) which are active during the progression of the disease. We use machine learning (ML) classification algorithms to determine the state of respiratory viral infections in humans exploiting time-dependent gene expression measurements; the study comprises four respiratory viruses (H1N1, H3N2, RSV, and HRV), seven distinct clinical studies and 104 healthy test candidates involved overall. From the overall set of 12,023 genes, we identified the 10 top-ranked genes which proved to be most discriminatory with regards to prediction of the infection state. Our two models focus on the time stamp nearest to \(t = 48\) hours and nearest to \(t =\)Onset Time” denoting the symptom onset (at different time points) according to the candidate’s specific immune system response to the viral infection. We evaluated algorithms including k-Nearest Neighbour (k-NN), Random Forest, linear Support Vector Machine (SVM), and SVM with radial basis function (RBF) kernel, in order to classify whether the gene expression sample collected at early time point t is infected or not infected. The “Onset Time” appears to play a vital role in prediction and identification of ten most discriminatory genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73072.

References

  1. Braun, B.A., Marcovitz, A., Camp, J.G., Jia, R., Bejerano, G.: Mx1 and Mx2 key antiviral proteins are surprisingly lost in toothed whales. Proc. Nat. Acad. Sci. 112(26), 8036–8040 (2015)

    Article  Google Scholar 

  2. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)

    Article  Google Scholar 

  3. Chen, M., et al.: Predicting viral infection from high-dimensional biomarker trajectories. J. Am. Stat. Assoc. 106(496), 1259–1279 (2011)

    Article  MathSciNet  Google Scholar 

  4. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Mult. Classif. Syst. 34, 1–17 (2007)

    Google Scholar 

  5. Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)

    Article  Google Scholar 

  6. Fensterl, V., Sen, G.C.: Interferon-induced ifit proteins: their role in viral pathogenesis. J. Virol. 89, 2462–2468 (2014). https://doi.org/10.1128/JVI.02744-14

    Article  Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    Article  Google Scholar 

  8. Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2010)

    Google Scholar 

  9. Krapp, C., et al.: Guanylate binding protein (GBP) 5 is an interferon-inducible inhibitor of HIV-1 infectivity. Cell Host Microbe 19(4), 504–514 (2016)

    Article  Google Scholar 

  10. Kuhn, M.: Building predictive models in r using the caret package. J. Stat. Softw. Artic. 28(5), 1–26 (2008)

    Google Scholar 

  11. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/

    Google Scholar 

  12. Liu, T.Y., et al.: An individualized predictor of health and disease using paired reference and target samples. BMC Bioinform. 17(1), 47 (2016)

    Article  Google Scholar 

  13. McCloskey, B., Dar, O., Zumla, A., Heymann, D.L.: Emerging infectious diseases and pandemic potential: status quo and reducing risk of global spread. Lancet Infect. Dis. 14(10), 1001–1010 (2014)

    Article  Google Scholar 

  14. Molinari, N.A.M., et al.: The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine 25(27), 5086–5096 (2007)

    Article  Google Scholar 

  15. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org/

  16. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(suppl–1), D91–D94 (2004)

    Article  Google Scholar 

  17. Scholkopf, B., et al.: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans. Signal Process. 45(11), 2758–2765 (1997)

    Article  Google Scholar 

  18. Statistics, L.B., Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  Google Scholar 

  19. Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  20. Subramanian, G., et al.: A new mechanism of interferon’s antiviral action: induction of autophagy, essential for paramyxovirus replication, is inhibited by the interferon stimulated gene, tdrd7. PLoS pathog. 14(1), e1006877 (2018)

    Article  Google Scholar 

  21. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., Luscombe, N.M.: A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10(4), 252 (2009)

    Article  Google Scholar 

  22. Verhelst, J., Parthoens, E., Schepens, B., Fiers, W., Saelens, X.: Interferon-inducible protein Mx1 inhibits influenza virus by interfering with functional viral ribonucleoprotein complex assembly. J. Virol. 86(24), 13445–13455 (2012)

    Article  Google Scholar 

  23. Wilkesmann, A., et al.: Hospitalized children with respiratory syncytial virus infection and neuromuscular impairment face an increased risk of a complicated course. Pediatr. Infect. Dis. J. 26(6), 485–491 (2007)

    Article  Google Scholar 

  24. Wingender, E., et al.: Transfac: an integrated system for gene expression regulation. Nucleic Acids Res. 28(1), 316–319 (2000)

    Article  Google Scholar 

  25. Woods, C.W., et al.: A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PloS One 8(1), e52198 (2013)

    Article  Google Scholar 

  26. Wu, C., et al.: Biogps: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 10(11), R130 (2009)

    Article  Google Scholar 

  27. Zhu, Z., et al.: Nonstructural protein 1 of influenza a virus interacts with human guanylate-binding protein 1 to antagonize antiviral activity. PloS One 8(2), e55920 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ghanshyam Verma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verma, G., Jha, A., Rebholz-Schuhmann, D., Madden, M.G. (2019). Using Machine Learning to Distinguish Infected from Non-infected Subjects at an Early Stage Based on Viral Inoculation. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-06016-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-06015-2

  • Online ISBN: 978-3-030-06016-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics