A Reinforcement Learning Based Approach to Multiple Sequence Alignment

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 634)


Multiple sequence alignment plays an important role in comparative genomic sequence analysis, being one of the most challenging problems in bioinformatics. This problem refers to the process of arranging the primary sequences of DNA, RNA or protein to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships between the sequences. In this paper we tackle multiple sequence alignment from a computational perspective and we introduce a novel approach, based on reinforcement learning, for addressing it. The experimental evaluation is performed on several DNA data sets, two of which contain human DNA sequences. The efficiency of our algorithm is shown by the obtained results, which prove that our technique outperforms other methods existing in the literature and which also indicate the potential of our proposal.


Bioinformatics Multiple Sequence Alignment Machine Learning Reinforcement Learning 



This work was supported by a grant of the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project number PN-II-RU-TE-2014-4-0082.


  1. 1.
    Agarwal, P.: Alignment of multiple sequences using GA method. Int. J. Emerg. Technol. Comput. Appl. Sci. (IJETCAS) 13–177, 412–421 (2013)Google Scholar
  2. 2.
    Carroll, H., Beckstead, W., O’Connor, T., Ebbert, M., Clement, M., Snell, Q., McClellan, D.: Dna reference alignment benchmarks based on teritary structure of encoded proteins. Bioinformatics 23(19), 2648–2649 (2007)CrossRefGoogle Scholar
  3. 3.
    Chao, L., Shuai, L.: The research on DNA multiple sequence alignment based on adaptive immune genetic algorithm. In: International Conference on Electronics and Optoelectronics (ICEOE), vol. 3, pp. V3–75–V3–78, July 2011Google Scholar
  4. 4.
    Chen, S.M., Lin, C.H.: Multiple DNA sequence aalignment based on genetic algorithms and divide-and-conquer techniques. Int. J. Appl. Sci. Eng. 3, 89–100 (2005)Google Scholar
  5. 5.
    Chen, S.M., Lin, C.H.: Multiple DNA sequence alignment based on genetic simulated annealing techniques. Inf. Manag. Sci. 18, 97–111 (2007)zbMATHGoogle Scholar
  6. 6.
    Chen, Y., Pan, Y., Chen, L., Chen, J.: Partitioned optimization algorithms for multiple sequence alignment. In: Proceedings of the 20th International Conference on Advanced Information Networking and Applications, pp. 618–622 (2006)Google Scholar
  7. 7.
    Czibula, I., Bocicor, M., Czibula, G.: A software framework for solving combinatorial optimization tasks. Studia Universitatis “Babes-Bolyai”, Informatica, LVI, 3–8 (2011). Proceedings of KEPT 2011, Special IssueGoogle Scholar
  8. 8.
    Dayan, P., Sejnowski, T.: TD(\(\lambda \)) converges with probability 1. Mach. Learn. 14, 295–301 (1994)Google Scholar
  9. 9.
    Eger, S.: Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics. Inf. Sci. 237, 287–304 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    EMBL-EBI, The european bioinformatics institute.
  11. 11.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRefGoogle Scholar
  12. 12.
    Kanz, C., Aldebert, P., Althorpe, N., et al.: The EMBL nucleotide sequence database. Nucleic Acids Res. 36, D29–D33 (2005)Google Scholar
  13. 13.
    Katoh, S.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)CrossRefGoogle Scholar
  14. 14.
    Larkin, M., Blackshields, G., Brown, N., Chenna, R., McGettigan, P., McWilliam, H., Valentin, F., Wallace, I., Wilm, A., Lopez, R., Thompson, J., Gibson, T., Higgins, D.: ClustalW and clustalX version 2.0. Bioinformatics 23(21), 2947–2948 (2007)CrossRefGoogle Scholar
  15. 15.
    Lipman, D., Altschul, S., Kececioglu, J.: A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 86, 4412–4415 (1989)CrossRefGoogle Scholar
  16. 16.
    Mircea, I., Bocicor, M., Dîncu, A.: On reinforcement learning based multiple sequence alignment. Studia Universitatis “Babes-Bolyai”, Informatica LIX, 50–65 (2014)Google Scholar
  17. 17.
    Nasser, S., Vert, G., Nicolescu, M., Murray, A.: Multiple sequence alignment using fuzzy logic. In: Proceedings of the IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 304–311 (2007)Google Scholar
  18. 18.
    Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  19. 19.
    Nelwamondo, F.V., Golding, D., Marwala, T.: A dynamic programming approach to missing data estimation using neural networks. Inf. Sci. 237, 49–58 (2013)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Nguyen, H., Yoshihara, I., Yamamori, K., Yasunaga, M.: Neural networks, adaptive optimization, and RNA secondary structure prediction. In: Proceedings of the 2002 Congress on Evolutionary Computation, CEC 2002, pp. 309–314 (2002)Google Scholar
  21. 21.
    Nizam, A., Shanmugham, B., Subburaya, K.: Self-organizing genetic algorithm for multiple sequence alignment. Glob. J. Comput. Sci. Technol. 11(7) (2011)Google Scholar
  22. 22.
    Rasmussen, T., Krink, T.: Improved hidden Markov model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid. BioSystems 72, 5–17 (2003)CrossRefGoogle Scholar
  23. 23.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  24. 24.
    Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS ONE 6(3), e18093+ (2011)CrossRefGoogle Scholar
  25. 25.
    Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Comput. Biol. 4, 337–348 (1994)CrossRefGoogle Scholar
  26. 26.
    Xiang, X., Zhang, D., Qin, J., Yuanyuan, F.: Ant colony with genetic algorithm based on planar graph for multiple sequence alignment. Inf. Technol. J. 9(2), 274–281 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer ScienceBabeş-Bolyai UniversityCluj-NapocaRomania

Personalised recommendations