Abstract
Multiple sequence alignment (MSA) is a crucial step in comparative genomics, structural, and functional studies, and phylogeny estimation. Owing to the NP-complete nature of MSA, several heuristics have developed for suboptimal alignments. The progressive alignment approach constitutes one of the most convenient and effective ways to align multiple sequences. In this study, to improve the sensitivity of progressive multiple sequence alignments, a new method for pairwise distance measurement based on the intuitionistic fuzzy proposed. Reference sequences from BALiBASE 4.0 (hand-aligned), PREFAB 4.0 (structurally supervised), and OXBench systems were employed to evaluate system performance. For checking the quality, the test alignments score in terms of SP-, C-, and TC score evaluated by the Friedman ranks test at the statistically significant level. The unweighted paired group with arithmetic mean hierarchical clustering (PIFD-UHC) and neighbor-joining-based hierarchical clustering (PIFD-NHC) methods were applied to carry out pairwise intuitionistic fuzzy distance measurements to construct a merge tree. The results indicate better performance of the proposed methods in improving the alignment sensitivity and accuracy. Comparatively, where the sequences not equidistant to each other, the PIFD-NHC has a more reliable performance. However, PIFD-UHC was the top performer in aligning all the BALiBASE reference sequence sets. Meanwhile, our approach runs a somewhat greater time complexity with similar memory usage to the ClustalW used pairwise distance measurement method.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available from the corresponding author on request.
References
Abdel-Azim G, Ben Othman M, Abo-Eleneen Z (2011) Modified progressive strategy for multiple proteins sequence alignment. Int J Comput 2(5):270–280
Altschul SF, Caroll RJ, Lipman DJ (1989) Weights for data related by a tree. J Mol Biol 215:403–410
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Atanassov K (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20:87–96
Biswanath C, Garai G (2017) A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 109(5–6):419–431. https://doi.org/10.1016/j.ygeno.2017.06.007
Carillo H, Lipman D (1988) The multiple sequence alignment problem in biology. SIMA J Appl Math 48:1073–1082
Chang JM, Tommaso PD, Lefort V (2015) TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction. Nucl Acids Res 1(43):W3–W6. https://doi.org/10.1093/nar/gkv310
Chatzou M, Magis C, Chang JM (2015) Multiple sequence alignment modeling: methods and applications. Brief Bioinform 17(6):1009–1023. https://doi.org/10.1093/bib/bbv099
Chenna R, Sugawara H, Koike T (2003) Multiple sequence of alignment with clustal series of programme. Nucl Acids Res 31(13):3497–3500. https://doi.org/10.1093/nar/gkg500
Collyda C, Diplaris S, Mitkas PA et al (2006) Fuzzy hidden markov models: a new approach in multiple sequence alignment. In: Hasman A et al (eds) Ubiquity: technologies for better health in aging societies. IOS Press, pp 99–104
Daugelaite J, O' Driscoll A, Sleator RD (2013) An overview of multiple sequence alignments and cloud computing in bioinformatics. Int Sch Res Not. https://doi.org/10.1155/2013/615630
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32(5):1792–1797. https://doi.org/10.1093/nar/gkh340
Fan B, Kong Q, Yuan X et al (2013) Learning weighted Hamming distance for binary descriptors. IEEE Int Conf Acoust Speech Signal Process. https://doi.org/10.1109/ICASSP.2013.6638084
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic tree. J Mol Evol 25:351–360
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Am Stat Assoc 32:675–701
Gotoh O (2014) Heuristic Alignment Methods. In: Russell D (eds) Multiple sequence alignment methods. Methods in molecular biology (methods and protocols). Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_2
Heger A, Holm L (2003) Sensitive pattern discovery with ‘fuzzy’ alignments of distantly related proteins. Bioinformatics 19(1):i130–i137. https://doi.org/10.1093/bioinformatics/btg1017
Jensen J, Hein J (2005) Gibbs sampler for statistical multiple alignment. Stat Sin 15(4):889–907
Karplus K, Sjölander K, Barrett C et al (1997) Predicting protein structure using hidden Markov models. Proteins Struct Funct Genet 1:134–139
Kruspe M, Stadler PF (2007) Progressive multiple sequence alignments from triplets. BMC Bioinform 8:254. https://doi.org/10.1186/1471-2105-8-254
Maiolo M, Zhang X, Gil M et al (2018) Progressive multiple sequence alignment with indel evolution. BMC Bioinform 19:331. https://doi.org/10.1186/s12859-018-2357-1
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453. https://doi.org/10.1016/0022-2836(70)90057-4
Ortuno FM, Valenzuela O, Pomares H et al (2012) Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucl Acids Res 41(1):1–10. https://doi.org/10.1093/nar/gks919
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530. https://doi.org/10.1109/TFUZZ.2004.840099
Raghava GPS, Searle SMJ, Audley PC et al (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform 4:47. https://doi.org/10.1186/1471-2105-4-47
Rouchka EC (2008) A brief overview of Gibbs sampling. University of louisville bioinformatics technical report series. TR-ULBL-2008–02
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425. https://doi.org/10.1093/oxfordjournals.molbev.a040454
Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Tools Protein Sci 27(1):135–145. https://doi.org/10.1002/pro.3290
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Thompson JD, Higgins DJ, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinform 15(1):87–88
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136
Wallace IM, Orla O, Higgins DG (2004) Evaluation of iterative alignment algorithms for multiple alignment. Bioinform 21(8):1408–1414. https://doi.org/10.1093/bioinformatics/bti159
Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337–348
Wheeler TJ, Kececioglu JD (2007) Multiple alignment by aligning alignments. Bioinformatics 23:559–568. https://doi.org/10.1093/bioinformatics/btm226
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80:726–730
Xu ZS (2007a) Some similarity measure of intuitionistic fuzzy sets and their applications to multiple attribute decision making. Fuzzy Optim Decis Making 6:109–121. https://doi.org/10.1007/s10700-007-9004-z
Xu ZS (2007b) Intuitionistic fuzzy aggregation operations. IEEE Trans on Fuzzy Syst 15:1179–1187. https://doi.org/10.1109/TFUZZ.2006.890678
Xu ZS, Chen J, Wu J (2008) Clustering algorithm for intuitionistic fuzzy sets. Inform Sci 178:3775–3790
Yadav DRK, Ercal G (2015) A comparative analysis of progressive multiple sequence alignment approaches using upgma and neighbor join based guide trees. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT) 5:3. https://doi.org/10.5121/ijcseit.2015.5401
Zadeh LA (1968) Fuzzy sets. Inf Control 8:338–353
Zhan Q, Ye Y, Lam TW et al (2015) Improving multiple sequence alignment by using better guide trees. BMC Bioinformatics 16(Suppl 5):S4. https://doi.org/10.1186/1471-2105-16-S5-S4
Acknowledgements
The authors would like special thanks to the reviewers for their helpful and constructive suggestions and comment to improve the quality of the paper.
Funding
This work has been supported by Jahrom University, Iran in Grant Code: 103/16133.
Author information
Authors and Affiliations
Contributions
Authors make substantial contributions to conceptualization and methodology, data curation, statistical analysis, and validation. BH wrote the manuscript. NF reviewed.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hajieghrari, B., Farrokhi, N. & Kamalizadeh, M. Intuitionistic fuzzy approach improve protein multiple sequence alignment. Netw Model Anal Health Inform Bioinforma 10, 45 (2021). https://doi.org/10.1007/s13721-021-00314-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-021-00314-6