Abstract
Multidrug resistant bacteria represent an increasing challenge for medicine. In bacteria, most antibiotic resistances are transmitted by plasmids. Therefore, it is important to study the spread of plasmids in detail in order to initiate possible countermeasures. The classification of plasmids can provide insights into the epidemiology and transmission of plasmid-mediated antibiotic resistance. The previous methods to classify plasmids are replicon typing and MOB typing. Both methods are time consuming and labor-intensive. Therefore, a new approach to plasmid typing was developed, which uses word embeddings and support vector machines (SVM) to simplify plasmid typing. Visualizing the word embeddings with t-distributed stochastic neighbor embedding (t-SNE) shows that the word embeddings finds distinct structure in the plasmid sequences. The SVM assigned the plasmids in the testing dataset with an average accuracy of 85.9% to the correct MOB type.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Woese, C.R., Kandler, O., Wheelis, M.L.: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. 87, 4576–4579 (1990). https://doi.org/10.1073/pnas.87.12.4576
Novick, R.P., Hoppensteadt, F.C.: On plasmid incompatibility. Plasmid 1, 421–434 (1978). https://doi.org/10.1016/0147-619X(78)90001-X
Smets, B.F., Barkay, T.: Horizontal gene transfer: perspectives at a crossroads of scientific disciplines. Nat. Rev. Microbiol. 3, 675–678 (2005). https://doi.org/10.1038/nrmicro1253
Frost, L.S., Leplae, R., Summers, A.O., Toussaint, A.: Mobile genetic elements: the agents of open source evolution. Nat. Rev. Microbiol. 3, 722–732 (2005). https://doi.org/10.1038/nrmicro1235
Johnson, T.J., Nolan, L.K.: Plasmid replicon typing. In: Caugant, D.A. (ed.) CEUR Workshop Proceedings, vol. 551, pp. 27–35. Humana Press, Totowa (2009). https://doi.org/10.1007/978-1-60327-999-4_3
del Solar, G., Giraldo, R., Ruiz-Echevarría, M.J., Espinosa, M., Díaz-Orejas, R.: Replication and control of circular bacterial plasmids. Microbiol. Mol. Biol. Rev. 62, 434–464 (1998). https://doi.org/10.1128/MMBR.62.2.434-464.1998
Garcillán-Barcia, M.P., Alvarado, A., de la Cruz, F.: Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol. Rev. 35, 936–956 (2011). https://doi.org/10.1111/j.1574-6976.2011.00291.x
Ramsay, J.P., et al.: An updated view of plasmid conjugation and mobilization in Staphylococcus. Mob. Genet. Elements 6, e1208317 (2016). https://doi.org/10.1080/2159256X.2016.1208317
Garcillán-Barcia, M.P., Francia, M.V., de La Cruz, F.: The diversity of conjugative relaxases and its application in plasmid classification. FEMS Microbiol. Rev. 33, 657–687 (2009). https://doi.org/10.1111/j.1574-6976.2009.00168.x
Orlek, A., et al.: Ordering the mob: insights into replicon and MOB typing schemes from analysis of a curated dataset of publicly available plasmids. Plasmid 91, 42–52 (2017). https://doi.org/10.1016/j.plasmid.2017.03.002
Chollet, F.F., Allaire, J.J.: Deep Learning with R. Manning Publications, Shelter Island (2018)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), pp. 746–751. Association for Computational Linguistics, Atlanta (2013)
Brownlee, J.: Word embeddings. In: Deep Learning for Natural Language Processing, pp. 114–143. Machine Learning Mastery, Vermont Victoria (2017)
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954). https://doi.org/10.1080/00437956.1954.11659520
Orlek, A., et al.: A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database. Data Br. 12, 423–426 (2017). https://doi.org/10.1016/j.dib.2017.04.024
Orlek, A., et al.: Figshare (2017). https://figshare.com/s/18de8bdcbba47dbaba41
Pagès, H., Abonyoun, P., Gentleman, R., DebRoy, S.: Biostrings: efficient manipulation of biological strings. R package version 2.56.0 (2018)
Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10, e0141287 (2015). https://doi.org/10.1371/journal.pone.0141287
Ganapathiraju, M., Weisser, D., Rosenfeld, R., Carbonell, P., Reddy, R., Klein-Seetharaman, J.: Comparative N-gram analysis of whole-genome protein sequences. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 76–81. Morgan Kaufmann, San Francisco (2002)
Srinivasan, S.M., Vural, S., King, B.R., Guda, C.: Mining for class-specific motifs in protein sequence classification. BMC Bioinform. 14, 96 (2013). https://doi.org/10.1186/1471-2105-14-96
Vries, J.K., Liu, X.: Subfamily specific conservation profiles for proteins based on n-gram patterns. BMC Bioinform. 9, 72 (2008). https://doi.org/10.1186/1471-2105-9-72
Bmschmidt.: WordVectors. github (2017). https://github.com/bmschmidt/wordVectors
Goldenberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. ArXiv 1402.3722 (2014)
Krijthe, J.H.: Rtsne: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation (2015). https://github.com/jkrijthe/Rtsne
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008). https://doi.org/10.18637/jss.v028.i05
Karatzoglou, A., Smola, A., Zeileis, A.: Kernlab – an S4 package for kernel methods in R. J. Stat. Softw. 11, 1–20 (2004)
Platt, J.C.: Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines. MSR-TR-98-14 (1998)
Greve, W., Wentura, D.: Wissenschaftliche Beobachtung eine Einführung. Beltz, Weinheim (1997)
Landis, R., Koch, G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 159–174 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kaufmann, M., Schüle, M., Smits, T.H.M., Pothier, J.F. (2020). Typing Plasmids with Distributed Sequence Representation. In: Schilling, FP., Stadelmann, T. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2020. Lecture Notes in Computer Science(), vol 12294. Springer, Cham. https://doi.org/10.1007/978-3-030-58309-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-58309-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58308-8
Online ISBN: 978-3-030-58309-5
eBook Packages: Computer ScienceComputer Science (R0)