Protocol

Multiple Sequence Alignment Methods

Volume 1079 of the series Methods in Molecular Biology pp 59-73

Date:

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

  • Stefano IantornoAffiliated withWellcome Trust Sanger InstituteNational Institute of Allergy and Infectious Diseases, National Institutes of Health
  • , Kevin GoriAffiliated withEMBL-European Bioinformatics Institute
  • , Nick GoldmanAffiliated withEMBL-European Bioinformatics Institute
  • , Manuel GilAffiliated withMax F. Perutz Laboratories, Center for Integrative Bioinformatics Vienna, Medical University Vienna, University of Vienna
  • , Christophe DessimozAffiliated withEMBL-European Bioinformatics Institute

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies—based on simulation, consistency, protein structure, and phylogeny—and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application—with a keen awareness of the assumptions underlying each benchmarking strategy.

Key words

Multiple sequence alignment Benchmarking Phylogenetic Protein structure Sequence evolution Consistency Homology