Skip to main content

Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues

  • Chapter
  • First Online:
Deep Learning for Biomedical Data Analysis

Abstract

DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to two biological scenarios: (A) metagenomics and (B) chromatin organization. The investigations have been carried out by considering DNA sequences as input data for the classification methodologies. In particular, we study and test the efficacy of (1) different DNA sequence representations and (2) several Deep Learning (DL) architectures that process sequences for the solution of the related supervised classification problems. Although developed for specific classification tasks, we think that such architectures could be served as a suggestion for developing other DNN models that process the same kind of input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amato, D., Di Gangi, M.A., Lo Bosco, G., Rizzo, R.: Recurrent deep neural networks fornucleosome classification. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds.) Computational Intelligence Methods for Bioinformatics and Biostatistics. pp. 118–127. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  2. Cairns, B.R.: Chromatin remodeling complexes: strength in diversity, precision through specialization. Current opinion in genetics & development 15(2), 185–190 (2005)

    Article  CAS  Google Scholar 

  3. Chaput, N., Lepage, P., Coutzac, C., Soularue, E., Le Roux, K., Monot, C., Boselli, L., Routier, E., Cassard, L., Collins, M., et al.: Baseline gut microbiota predicts clinical response and colitis in metastatic melanoma patients treated with ipilimumab. Annals of Oncology 28(6), 1368–1379 (2017)

    Article  CAS  PubMed  Google Scholar 

  4. Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A., McGarrell, D.M., Marsh, T., Garrity, G.M., et al.: The ribosomal database project: improved alignments and new tools for rrna analysis. Nucleic acids research 37(suppl_1), D141–D145 (2008)

    Google Scholar 

  5. Di Gangi, M., Lo Bosco, G., Rizzo, R.: Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinformatics 19(14), 418 (Nov 2018)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Di Gangi, M.A., Gaglio, S., La Bua, C., Lo Bosco, G., Rizzo, R.: A deep learning network for exploiting positional information in nucleosome related sequences. In: Rojas, I., Ortuño, F. (eds.) Bioinformatics and Biomedical Engineering: 5th International Work-Conference, IWBBIO 2017, Granada, Spain, April 26–28, 2017, Proceedings, Part II, pp. 524–533. Springer International Publishing (2017)

    Google Scholar 

  7. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)

    Google Scholar 

  8. Escobar-Zepeda, A., Vera-Ponce de León, A., Sanchez-Flores, A.: The road to metagenomics: from microbiology to dna sequencing technologies and bioinformatics. Frontiers in genetics 6, 348 (2015)

    Google Scholar 

  9. Escobar-Zepeda, A., Vera-Ponce de León, A., Sanchez-Flores, A.: The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics. Frontiers in Genetics 6(348) (2015)

    Google Scholar 

  10. Ferraro Petrillo, U., Sorella, M., Cattaneo, G., Giancarlo, R., Rombo, S.E.: Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics. BMC Bioinformatics 20(4), 138 (Apr 2019)

    Article  PubMed  PubMed Central  Google Scholar 

  11. Fiannaca, A., La Paglia, L., La Rosa, M., Renda, G., Rizzo, R., Gaglio, S., Urso, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC bioinformatics 19(7), 198 (2018)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Fiannaca, A., La Rosa, M., La Paglia, L., Rizzo, R., Urso, A.: nrc: non-coding rna classifier based on structural features. BioData mining 10(1), 27 (2017)

    Google Scholar 

  13. Fiannaca, A., La Rosa, M., Rizzo, R., Urso, A.: Analysis of dna barcode sequences using neural gas and spectral representation. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) Engineering Applications of Neural Networks, Communications in Computer and Information Science, vol. 384, pp. 212–221 (2013)

    Article  Google Scholar 

  14. Fiannaca, A., La Rosa, M., Rizzo, R., Urso, A.: A k-mer-based barcode dna classification methodology based on spectral representation and a neural gas network. Artificial Intelligence in Medicine 64(3), 173–184 (2015). https://doi.org/10.1016/j.artmed.2015.06.002

    Article  PubMed  Google Scholar 

  15. Frankel, A.E., Coughlin, L.A., Kim, J., Froehlich, T.W., Xie, Y., Frenkel, E.P., Koh, A.Y.: Metagenomic shotgun sequencing and unbiased metabolomic profiling identify specific human gut microbiota and metabolites associated with immune checkpoint therapy efficacy in melanoma patients. Neoplasia 19(10), 848–855 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Giancarlo, R., Lo Bosco, G., Pinello, L., Utro, F.: The three steps of clustering in the post-genomic era: A synopsis. In: Rizzo, R., Lisboa, P.J.G. (eds.) Computational Intelligence Methods for Bioinformatics and Biostatistics. pp. 13–30. Springer Berlin Heidelberg, Berlin, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Goodfellow, I.J., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA, USA (2016), http://www.deeplearningbook.org

    Google Scholar 

  18. Guo, S.H., Deng, E.Z., Xu, L.Q., Ding, H., Lin, H., Chen, W., Chou, K.C.: inuc-pseknc: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)

    Google Scholar 

  19. Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)

    Article  PubMed  Google Scholar 

  20. Hinton, G.E.: Reducing the Dimensionality of Data with Neural Networks. Science 313(5786), 504–507 (2006)

    Article  CAS  Google Scholar 

  21. Hinton, G.E., Osindero, S., Teh, Y.W.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006)

    Article  PubMed  Google Scholar 

  22. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    Article  CAS  PubMed  Google Scholar 

  23. Jones, P.A., Baylin, S.B.: The epigenomics of cancer. Cell 128(4), 683–692 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Jordan, M.I.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Artificial neural networks: concept learning, pp. 112–127 (1990)

    Google Scholar 

  25. Kaplan, N., K Moore, I., Mittendorf, Y., J Gossett, A., Tillo, D., Field, Y., M LeProust, E., R Hughes, T., Lieb, J., Widom, J., Segal, E.: The dna-encoded nucleosome organization of a eukaryotic genome. Nature 458, 362–6 (03 2009)

    Google Scholar 

  26. Kho, Z.Y., Lal, S.K.: The human gut microbiome–a potential controller of wellness and disease. Frontiers in microbiology 9 (2018)

    Google Scholar 

  27. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  28. Krebs, C.J.: Species diversity measures. Ecological methodology (1999)

    Google Scholar 

  29. Kullback, S., Leibler, R.A.: On Information and Sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  Google Scholar 

  30. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  CAS  Google Scholar 

  31. Lecun, Y., èon Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. pp. 2278–2324 (1998)

    Google Scholar 

  32. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  33. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  34. Li, Y., Huang, C., Ding, L., Li, Z., Pan, Y., Gao, X.: Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods (2019)

    Google Scholar 

  35. Liu, H., Lin, S., Cai, Z., Sun, X.: Role of 10–11bp periodicities of eukaryotic dna sequence in nucleosome positioning. Bio Systems 105, 295–9 (06 2011)

    Google Scholar 

  36. Liu, M.J., Seddon, A.E., Tsai, Z.T.Y., Major, I.T., Floer, M., Howe, G.A., Shiu, S.H.: Determinants of nucleosome positioning and their influence on plant gene expression. Genome research 25(8), 1182–1195 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Computational Intelligence Methods for Bioinformatics and Biostatistics, Lecture Notes in Computer Science, vol. 9874, pp. 114–128 (2016)

    CAS  Google Scholar 

  38. Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for dna sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) Fuzzy Logic and Soft Computing Applications. pp. 162–171. Springer International Publishing, Cham (2017)

    Chapter  Google Scholar 

  39. Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: 12th International Conference on Signal-Image Technology Internet-Based Systems (SITIS). pp. 688–692. IEEE (2016)

    Google Scholar 

  40. Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: Variable ranking feature selection for the identification of nucleosome related sequences. In: Benczúr, A., Thalheim, B., Horváth, T., Chiusano, S., Cerquitelli, T., Sidló, C., Revesz, P.Z. (eds.) New Trends in Databases and Information Systems. pp. 314–324. Springer International Publishing (2018)

    Google Scholar 

  41. Lu, Q., Wallrath, L.L., Elgin, S.C.: Nucleosome positioning and gene regulation. Journal of cellular biochemistry 55(1), 83–92 (1994)

    Article  CAS  PubMed  Google Scholar 

  42. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Briefings in Bioinformatics pp. 1–19 (2016)

    Google Scholar 

  43. Montúfar, G.: Restricted boltzmann machines: Introduction and review. In: Ay, N., Gibilisco, P., Matúš, F. (eds.) Information Geometry and Its Applications. pp. 75–115. Springer International Publishing, Cham (2018)

    Chapter  Google Scholar 

  44. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10). pp. 807–814 (2010)

    Google Scholar 

  45. Pinello, L., Lo Bosco, G.: A new feature selection methodology for k-mers representation of dna sequences. In: Computational Intelligence Methods for Bioinformatics and Biostatistics, Lecture Notes in Computer Science, vol. 8623, pp. 99–108 (2015)

    Google Scholar 

  46. Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.C.: A motif-independent metric for dna sequence specificity. BMC Bioinformatics 12 (2011)

    Google Scholar 

  47. Pinello, L., Lo Bosco, G., Yuan, G.C.: Applications of alignment-free methods in epigenomics. Briefings in Bioinformatics 15(3), 419–430 (2014)

    Article  CAS  PubMed  Google Scholar 

  48. Pulivarthy, S.R., Lion, M., Kuzu, G., Matthews, A.G., Borowsky, M.L., Morris, J., Kingston, R.E., Dennis, J.H., Tolstorukov, M.Y., Oettinger, M.A.: Regulated large-scale nucleosome density patterns and precise nucleosome positioning correlate with v (d) j recombination. Proceedings of the National Academy of Sciences 113(42), E6427–E6436 (2016)

    Article  CAS  Google Scholar 

  49. Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Liang, S., Zhang, W., Guan, Y., Shen, D., et al.: A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490(7418), 55 (2012)

    Article  CAS  PubMed  Google Scholar 

  50. Ramazzotti, M., Berná, L., Donati, C., Cavalieri, D.: riboframe: an improved method for microbial taxonomy profiling from non-targeted metagenomics. Frontiers in genetics 6, 329 (2015)

    Google Scholar 

  51. Ridgway, P., Almouzni, G.: Chromatin assembly and organization. Journal of cell science 114(15), 2711–2712 (2001)

    Article  CAS  PubMed  Google Scholar 

  52. Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N.N., Anderson, I.J., Cheng, J.F., Darling, A., Malfatti, S., Swan, B.K., Gies, E.A., Dodsworth, J.A., Hedlund, B.P., Tsiamis, G., Sievert, S.M., Liu, W.T., Eisen, J.A., Hallam, S.J., Kyrpides, N.C., Stepanauskas, R., Rubin, E.M., Hugenholtz, P., Woyke, T.: Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459), 431–437 (2013)

    Article  CAS  PubMed  Google Scholar 

  53. Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode dna. In: Computational Intelligence Methods for Bioinformatics and Biostatistics, Lecture Notes in Computer Science, vol. 8623, pp. 142–155 (2015)

    Google Scholar 

  54. Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to dna sequence classification. In: International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics. pp. 129–140. Springer (2015)

    Google Scholar 

  55. Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: Classification experiments of dna sequences by using a deep neural network and chaos game representation. In: Proceedings of the 17th International Conference on Computer Systems and Technologies 2016. pp. 222–228. ACM (2016)

    Google Scholar 

  56. Sala, A., Toto, M., Pinello, L., Gabriele, A., Di Benedetto, V., Ingrassia, A.M., Lo Bosco, G., Di Gesù, V., Giancarlo, R., Corona, D.F.V.: Genome-wide characterization of chromatin binding and nucleosome spacing activity of the nucleosome remodelling atpase iswi. The EMBO Journal 30(9), 1766–1777 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Schnitzler, G.R.: Control of nucleosome positions by dna sequence and remodeling machines. Cell biochemistry and biophysics 51(2–3), 67–80 (2008)

    Article  CAS  PubMed  Google Scholar 

  58. Shahbazian, M.D., Grunstein, M.: Functions of site-specific histone acetylation and deacetylation. Annu. Rev. Biochem. 76, 75–100 (2007)

    Article  CAS  PubMed  Google Scholar 

  59. Shawe-Taylor, J., Cristianini, N.: Support vector machines. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods pp. 93–112 (2000)

    Google Scholar 

  60. Simpson, E.H.: Measurement of Diversity. Nature 163(4148), 688–688 (1949)

    Article  Google Scholar 

  61. Song, Y.J., Cho, D.H.: Classification of various genomic sequences based on distribution of repeated k-word. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). pp. 3894–3897. IEEE (2017)

    Google Scholar 

  62. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014)

    Google Scholar 

  63. Svaren, J., Horz, W.: Transcription factors vs. nucleosomes: Regulation of the pho5 promoter in yeast. Trends in Biochemical Sciences 22, 93–97 (1997)

    Article  CAS  PubMed  Google Scholar 

  64. Tekaia, F., Lazcano, A., Dujon, B.: The genomic tree as revealed from whole proteome comparisons. Genome research 9(6), 550–557 (1999)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., Gordon, J.I.: An obesity-associated gut microbiome with increased capacity for energy harvest. nature 444(7122), 1027 (2006)

    Google Scholar 

  66. Vinje, H., Liland, K.H., Almøy, T., Snipen, L.: Comparing k-mer based methods for improved classification of 16s sequences. BMC Bioinformatics 16(1), 205 (Jul 2015)

    Article  PubMed  PubMed Central  Google Scholar 

  67. Wang, Y., Hill, K., Singh, S., Kari, L.: The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene 346, 173–185 (2005)

    Article  CAS  PubMed  Google Scholar 

  68. Weiner, A., Hughes, A., Yassour, M., Rando, O.J., Friedman, N.: High-resolution nucleosome mapping reveals transcription-dependent promoter packaging. Genome research 20(1), 90–100 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Whitehouse, I., Tsukiyama, T.: Antagonistic forces that position nucleosomes in vivo. Nature structural & molecular biology 13(7), 633 (2006)

    Article  CAS  Google Scholar 

  70. Wooley, J.C., Ye, Y.: Metagenomics: Facts and Artifacts, and Computational Challenges. Journal of Computer Science and Technology 25(1), 71–81 (2010)

    Article  Google Scholar 

  71. Wu, H., Gu, X.: Towards dropout training for convolutional neural networks. Neural Networks 71, 1–10 (2015)

    Article  PubMed  Google Scholar 

  72. Yuan, C., Lei, J., Cole, J., Sun, Y.: Reconstructing 16s rrna genes in metagenomic data. Bioinformatics 31(12), i35–i43 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting dna–protein binding. Bioinformatics 32(12), i121–i127 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Zhang, J., Peng, W., Wang, L.: Lenup: learning nucleosome positioning from dna sequences with improved convolutional neural networks. Bioinformatics 34(10), 1705–1712 (2018)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Additional support to Giosué Lo Bosco and Domenico Amato has been granted by Project INdAM - GNCS “Computational Intelligence methods for Digital Health”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giosué Lo Bosco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Amato, D. et al. (2021). Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues. In: Elloumi, M. (eds) Deep Learning for Biomedical Data Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-71676-9_2

Download citation

Publish with us

Policies and ethics