Skip to main content

Typing Plasmids with Distributed Sequence Representation

  • Conference paper
  • First Online:
Artificial Neural Networks in Pattern Recognition (ANNPR 2020)

Abstract

Multidrug resistant bacteria represent an increasing challenge for medicine. In bacteria, most antibiotic resistances are transmitted by plasmids. Therefore, it is important to study the spread of plasmids in detail in order to initiate possible countermeasures. The classification of plasmids can provide insights into the epidemiology and transmission of plasmid-mediated antibiotic resistance. The previous methods to classify plasmids are replicon typing and MOB typing. Both methods are time consuming and labor-intensive. Therefore, a new approach to plasmid typing was developed, which uses word embeddings and support vector machines (SVM) to simplify plasmid typing. Visualizing the word embeddings with t-distributed stochastic neighbor embedding (t-SNE) shows that the word embeddings finds distinct structure in the plasmid sequences. The SVM assigned the plasmids in the testing dataset with an average accuracy of 85.9% to the correct MOB type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Woese, C.R., Kandler, O., Wheelis, M.L.: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. 87, 4576–4579 (1990). https://doi.org/10.1073/pnas.87.12.4576

    Article  Google Scholar 

  2. Novick, R.P., Hoppensteadt, F.C.: On plasmid incompatibility. Plasmid 1, 421–434 (1978). https://doi.org/10.1016/0147-619X(78)90001-X

    Article  Google Scholar 

  3. Smets, B.F., Barkay, T.: Horizontal gene transfer: perspectives at a crossroads of scientific disciplines. Nat. Rev. Microbiol. 3, 675–678 (2005). https://doi.org/10.1038/nrmicro1253

    Article  Google Scholar 

  4. Frost, L.S., Leplae, R., Summers, A.O., Toussaint, A.: Mobile genetic elements: the agents of open source evolution. Nat. Rev. Microbiol. 3, 722–732 (2005). https://doi.org/10.1038/nrmicro1235

    Article  Google Scholar 

  5. Johnson, T.J., Nolan, L.K.: Plasmid replicon typing. In: Caugant, D.A. (ed.) CEUR Workshop Proceedings, vol. 551, pp. 27–35. Humana Press, Totowa (2009). https://doi.org/10.1007/978-1-60327-999-4_3

  6. del Solar, G., Giraldo, R., Ruiz-Echevarría, M.J., Espinosa, M., Díaz-Orejas, R.: Replication and control of circular bacterial plasmids. Microbiol. Mol. Biol. Rev. 62, 434–464 (1998). https://doi.org/10.1128/MMBR.62.2.434-464.1998

    Article  Google Scholar 

  7. Garcillán-Barcia, M.P., Alvarado, A., de la Cruz, F.: Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol. Rev. 35, 936–956 (2011). https://doi.org/10.1111/j.1574-6976.2011.00291.x

    Article  Google Scholar 

  8. Ramsay, J.P., et al.: An updated view of plasmid conjugation and mobilization in Staphylococcus. Mob. Genet. Elements 6, e1208317 (2016). https://doi.org/10.1080/2159256X.2016.1208317

    Article  Google Scholar 

  9. Garcillán-Barcia, M.P., Francia, M.V., de La Cruz, F.: The diversity of conjugative relaxases and its application in plasmid classification. FEMS Microbiol. Rev. 33, 657–687 (2009). https://doi.org/10.1111/j.1574-6976.2009.00168.x

    Article  Google Scholar 

  10. Orlek, A., et al.: Ordering the mob: insights into replicon and MOB typing schemes from analysis of a curated dataset of publicly available plasmids. Plasmid 91, 42–52 (2017). https://doi.org/10.1016/j.plasmid.2017.03.002

    Article  Google Scholar 

  11. Chollet, F.F., Allaire, J.J.: Deep Learning with R. Manning Publications, Shelter Island (2018)

    Google Scholar 

  12. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), pp. 746–751. Association for Computational Linguistics, Atlanta (2013)

    Google Scholar 

  13. Brownlee, J.: Word embeddings. In: Deep Learning for Natural Language Processing, pp. 114–143. Machine Learning Mastery, Vermont Victoria (2017)

    Google Scholar 

  14. Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954). https://doi.org/10.1080/00437956.1954.11659520

    Article  Google Scholar 

  15. Orlek, A., et al.: A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database. Data Br. 12, 423–426 (2017). https://doi.org/10.1016/j.dib.2017.04.024

    Article  Google Scholar 

  16. Orlek, A., et al.: Figshare (2017). https://figshare.com/s/18de8bdcbba47dbaba41

  17. Pagès, H., Abonyoun, P., Gentleman, R., DebRoy, S.: Biostrings: efficient manipulation of biological strings. R package version 2.56.0 (2018)

    Google Scholar 

  18. Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10, e0141287 (2015). https://doi.org/10.1371/journal.pone.0141287

    Article  Google Scholar 

  19. Ganapathiraju, M., Weisser, D., Rosenfeld, R., Carbonell, P., Reddy, R., Klein-Seetharaman, J.: Comparative N-gram analysis of whole-genome protein sequences. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 76–81. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  20. Srinivasan, S.M., Vural, S., King, B.R., Guda, C.: Mining for class-specific motifs in protein sequence classification. BMC Bioinform. 14, 96 (2013). https://doi.org/10.1186/1471-2105-14-96

    Article  Google Scholar 

  21. Vries, J.K., Liu, X.: Subfamily specific conservation profiles for proteins based on n-gram patterns. BMC Bioinform. 9, 72 (2008). https://doi.org/10.1186/1471-2105-9-72

    Article  Google Scholar 

  22. Bmschmidt.: WordVectors. github (2017). https://github.com/bmschmidt/wordVectors

  23. Goldenberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. ArXiv 1402.3722 (2014)

    Google Scholar 

  24. Krijthe, J.H.: Rtsne: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation (2015). https://github.com/jkrijthe/Rtsne

  25. Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008). https://doi.org/10.18637/jss.v028.i05

    Article  Google Scholar 

  26. Karatzoglou, A., Smola, A., Zeileis, A.: Kernlab – an S4 package for kernel methods in R. J. Stat. Softw. 11, 1–20 (2004)

    Article  Google Scholar 

  27. Platt, J.C.: Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines. MSR-TR-98-14 (1998)

    Google Scholar 

  28. Greve, W., Wentura, D.: Wissenschaftliche Beobachtung eine Einführung. Beltz, Weinheim (1997)

    Google Scholar 

  29. Landis, R., Koch, G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joël F. Pothier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaufmann, M., Schüle, M., Smits, T.H.M., Pothier, J.F. (2020). Typing Plasmids with Distributed Sequence Representation. In: Schilling, FP., Stadelmann, T. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2020. Lecture Notes in Computer Science(), vol 12294. Springer, Cham. https://doi.org/10.1007/978-3-030-58309-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58309-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58308-8

  • Online ISBN: 978-3-030-58309-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics