Skip to main content

CSA-X: Modularized Constrained Multiple Sequence Alignment

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10252))

Included in the following conference series:

  • 583 Accesses

Abstract

Imposing constraints that influence multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. In this paper, a modularized program of constrained multiple sequence alignment (CMSA) called CSA-X is created that accepts constraints in the form of regular expressions. It uses arbitrary underlying MSA programs to generate alignments, and is therefore modular. The accuracy of CSA-X with different underlying MSA algorithms is compared, and also with another CMSA program called RE-MuSiC that similarly uses regular expressions for constraints. A technique is also developed to test the accuracies of CMSA algorithms with regular expression constraints using the BAliBASE 3.0 benchmark database. For verification, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. Based on the results presented herein, CSA-X significantly outperforms RE-MuSiC. On average, CSA-X used with constraints that were algorithmically created from the least conserved regions of the correct alignments achieves results that are 17.65% higher for Q score, and 23.7% higher for TC score compared to RE-MuSiC. Further, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. It also shows that the use of regular expression constraints, if chosen well, created from accurate knowledge regarding a lesser conserved region can improve alignment accuracy. Statistical significance is measured using the Wilcoxon rank-sum test and Wilcoxon signed-rank test. An open source implementation of CSA-X is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. CSA-X. https://bitbucket.org/RezwanIslam/csa-x/downloads. Accessed 28 Jan 2017

  2. Arslan, A.N.: Regular expression constrained sequence alignment. J. Discrete Algorithms 5(4), 647–661 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arslan, A.N.: Sequence alignment guided by common motifs described by context free grammars. In: Biotechnology and Bioinformatics Symposium (BIOT) (2007)

    Google Scholar 

  4. Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48(5), 1073–1082 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chung, Y.S., Lee, W.H., Tang, C.Y., Lu, C.L.: RE-MuSiC: A tool for multiple sequence alignment with regular expression constraints. Nucleic Acids Res. 35(suppl 2), W639–W644 (2007)

    Article  Google Scholar 

  6. Chung, Y.-S., Lu, C.L., Tang, C.Y.: Efficient algorithms for regular expression constrained sequence alignment. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 389–400. Springer, Heidelberg (2006). doi:10.1007/11780441_35

    Chapter  Google Scholar 

  7. Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2), 330–340 (2005)

    Article  Google Scholar 

  8. Du, Z., Lin, F.: Pattern-constrained multiple polypeptide sequence alignment. Comput. Biol. Chem. 29(4), 303–307 (2005)

    Article  MATH  Google Scholar 

  9. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)

    Article  Google Scholar 

  10. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M., Sigrist, C.J.: The PROSITE database. Nucleic Acids Res. 34(suppl 1), D227–D230 (2006)

    Article  Google Scholar 

  11. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)

    Article  Google Scholar 

  12. Kumar, S., Filipski, A.: Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res. 17(2), 127–135 (2007)

    Article  Google Scholar 

  13. Lassmann, T., Sonnhammer, E.L.: Kalign — an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6(1), 298 (2005)

    Article  Google Scholar 

  14. Morgenstern, B., Werner, N., Prohaska, S.J., Steinkamp, R., Schneider, I., Subramanian, A.R., Stadler, P.F., Weyer-Menkhoff, J.: Multiple sequence alignment with user-defined constraints at GOBICS. Bioinformatics 21(7), 1271–1273 (2005)

    Article  Google Scholar 

  15. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)

    Article  Google Scholar 

  16. Pais, F.S.-M., de Ruy, P., Oliveira, G., Coimbra, R.S.: Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol. Biol. 9(1), 1–4 (2014). BioMed Central

    Google Scholar 

  17. Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9), 1073–1079 (2007)

    Article  Google Scholar 

  18. Tang, C.Y., Lu, C.L., Chang, M.D.T., Tsai, Y.T., Sun, Y.J., Chao, K.M., Chang, J.M., Chiou, Y.H., Wu, C.M., Chang, H.T., Chou, W.I.: Constrained multiple sequence alignment tool development and its application to RNase family alignment. J. Bioinform. Comput. Biol. 1(02), 267–287 (2003)

    Article  Google Scholar 

  19. Te Tsai, Y., Huang, Y.P., Yu, C.T., Lu, C.L.: MuSiC: a tool for multiple sequence alignment with constraints. Bioinformatics 20(14), 2309–2311 (2004)

    Article  Google Scholar 

  20. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  21. Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins: Struct., Funct., Bioinf. 61(1), 127–136 (2005)

    Article  Google Scholar 

  22. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27(13), 2682–2690 (1999)

    Article  Google Scholar 

  23. Triola, M.M., Triola, M.F.: Biostatistics for the Biological and Health Sciences. Pearson Addison-Wesley, Boston (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. M. Rezwanul Islam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Islam, T.M.R., McQuillan, I. (2017). CSA-X: Modularized Constrained Multiple Sequence Alignment. In: Figueiredo, D., Martín-Vide, C., Pratas, D., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2017. Lecture Notes in Computer Science(), vol 10252. Springer, Cham. https://doi.org/10.1007/978-3-319-58163-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58163-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58162-0

  • Online ISBN: 978-3-319-58163-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics