Deep Learning Architectures for DNA Sequence Classification

Lo Bosco, Giosué; Di Gangi, Mattia Antonino

doi:10.1007/978-3-319-52962-2_14

Giosué Lo Bosco^16,17 &
Mattia Antonino Di Gangi^18,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10147))

Included in the following conference series:

International Workshop on Fuzzy Logic and Applications

3282 Accesses
36 Citations

Abstract

DNA sequence classification is a key task in a generic computational framework for biomedical data analysis, and in recent years several machine learning technique have been adopted to successful accomplish with this task. Anyway, the main difficulty behind the problem remains the feature selection process. Sequences do not have explicit features, and the commonly used representations introduce the main drawback of the high dimensionality. For sure, machine learning method devoted to supervised classification tasks are strongly dependent on the feature extraction step, and in order to build a good representation it is necessary to recognize and measure meaningful details of the items to classify. Recently, neural deep learning architectures or deep learning models, were proved to be able to extract automatically useful features from input patterns. In this work we present two different deep learning architectures for the purpose of DNA sequence classification. Their comparison is carried out on a public data-set of DNA sequences, for five different classification tasks.

G. Lo Bosco and M.A. Di Gangi—Both authors have the same contribution to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 25(3), 403–410 (1990)
Article Google Scholar
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)
Article Google Scholar
Vinga, S., Almeida, J.: Alignment-free sequence comparison a review. Bioinformatics 19(4), 513–523 (2003)
Article Google Scholar
Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Brief. Bioinform. 15(3), 419–430 (2014)
Article Google Scholar
Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinform. 12, 1–9 (2011)
Article Google Scholar
Lo Bosco, G., Pinello, L.: A new feature selection methodology for K-mers representation of DNA sequences. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_9
Chapter Google Scholar
Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 114–128. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_9
Chapter Google Scholar
Farabet, C., Couprie, C., Najman, L., et al.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Tompson, J.J., Jain, A., LeCun, Y., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)
Google Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R.R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3276–3284 (2015)
Google Scholar
Li, J., Luong, M.-T., Jurafsky, D.: A hierarchical neural autoencoder for paragraphs and documents. In: Proceedings of 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1106–1115 (2015)
Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches attention-based neural machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Google Scholar
Cho, K., Van Merrienboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Google Scholar
Chatterjee, R., Farajian, M.A., Conforti, C., Jalalvand, S., Balaraman, V., Di Gangi, M.A., Ataman, D., Turchi, M., Negri, M., Federico, M.: FBK’s neural machine translation systems for IWSLT 2016. In: Proceedings of 13th International Workshop on Spoken Language Translation (IWSLT 2016) (2016)
Google Scholar
Seonwoo, M., Byunghan, L., Sungroh, Y.: Deep learning in bioinformatics. In: Briefings in Bioinformatics (2016)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Heidelberg (1990)
Chapter Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, October 2014
Google Scholar
Dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 1818–1826 (2014)
Google Scholar
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to dna sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_10
Chapter Google Scholar
Caruana, R.: Multi-task learning: a knowledge-based source of inductive bias. Mach. Learn. 28, 41–75 (1997)
Article Google Scholar
Drancourt, M., Berger, P., Raoult, D.: Systematic \(16S\) rRNA gene sequencing of atypical clinical isolates identified 27 new bacterial species associated with humans. J. Clin. Microbiol. 42(5), 2197–2202 (2004)
Article Google Scholar
https://rdp.cme.msu.edu/

Download references

Author information

Authors and Affiliations

Dipartimento di Matematica e Informatica, Universitá degli studi di Palermo, Palermo, Italy
Giosué Lo Bosco
Dipartimento di Scienze per l’Innovazione e le Tecnologie Abilitanti, Istituto Euro Mediterraneo di Scienza e Tecnologia, Palermo, Italy
Giosué Lo Bosco
Fondazione Bruno Kessler, Trento, Italy
Mattia Antonino Di Gangi
ICT International Doctoral School, University of Trento, Trento, Italy
Mattia Antonino Di Gangi

Authors

Giosué Lo Bosco
View author publications
You can also search for this author in PubMed Google Scholar
Mattia Antonino Di Gangi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giosué Lo Bosco .

Editor information

Editors and Affiliations

University of Naples “Parthenope”, Naples, Italy
Alfredo Petrosino
University of Salerno, Fisciano, (Salerno), Italy
Vincenzo Loia
University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lo Bosco, G., Di Gangi, M.A. (2017). Deep Learning Architectures for DNA Sequence Classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds) Fuzzy Logic and Soft Computing Applications. WILF 2016. Lecture Notes in Computer Science(), vol 10147. Springer, Cham. https://doi.org/10.1007/978-3-319-52962-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-52962-2_14
Published: 07 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52961-5
Online ISBN: 978-3-319-52962-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics