Deep-learning language models help to improve protein sequence alignment

doi:10.1038/s41592-022-01707-9

Deep-learning language models help to improve protein sequence alignment

Research Briefing
Published: 15 December 2022

Volume 20, pages 40–41, (2023)
Cite this article

From

View current issue Submit your manuscript

2464 Accesses
21 Altmetric
Explore all metrics

We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid substitutions or gaps. DEDAL improved the alignment correctness on remote homologs by up to threefold and the discrimination of remote homologs from evolutionarily unrelated sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: Example of pairwise alignment of two protein domain sequences.**

References

Jumper, J. B. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper describes Alphafold2, the state-of-the-art method for protein structure prediction for multiple sequence alignments.
Article CAS Google Scholar
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). This paper introduces the classical Smith–Waterman algorithm for pairwise sequence alignment.
Article CAS Google Scholar
Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). This paper presents the most widely used heuristic variant of the Smith–Waterman algorithm.
Article CAS Google Scholar
Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT 1, 4171–4186 (2019). This paper proposes BERT, a technique for unsupervised pretraining of language models.
Google Scholar

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods https://doi.org/10.1038/s41592-022-01700-2 (2022).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deep-learning language models help to improve protein sequence alignment. Nat Methods 20, 40–41 (2023). https://doi.org/10.1038/s41592-022-01707-9

Download citation

Published: 15 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1038/s41592-022-01707-9
Springer Nature America, Inc.

Associated content

Deep embedding and alignment of protein sequences

Article Nature Methods 15 December 2022

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep-learning language models help to improve protein sequence alignment

From

Access this article

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation