Skip to main content
Log in

Deep-learning language models help to improve protein sequence alignment

  • Research Briefing
  • Published:

From Nature Methods

View current issue Submit your manuscript

We trained DEDAL, an algorithm based on deep-learning language models, to generate pairwise alignments of protein sequences taking into account the sequence-specific context of amino acid substitutions or gaps. DEDAL improved the alignment correctness on remote homologs by up to threefold and the discrimination of remote homologs from evolutionarily unrelated sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Example of pairwise alignment of two protein domain sequences.

References

  1. Jumper, J. B. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper describes Alphafold2, the state-of-the-art method for protein structure prediction for multiple sequence alignments.

    Article  CAS  Google Scholar 

  2. Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). This paper introduces the classical Smith–Waterman algorithm for pairwise sequence alignment.

    Article  CAS  Google Scholar 

  3. Altschul, S. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). This paper presents the most widely used heuristic variant of the Smith–Waterman algorithm.

    Article  CAS  Google Scholar 

  4. Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT 1, 4171–4186 (2019). This paper proposes BERT, a technique for unsupervised pretraining of language models.

    Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Llinares-López, F., Berthet, Q., Blondel, M., Teboul, O. & Vert, J.-P. Deep embedding and alignment of protein sequences. Nat. Methods https://doi.org/10.1038/s41592-022-01700-2 (2022).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deep-learning language models help to improve protein sequence alignment. Nat Methods 20, 40–41 (2023). https://doi.org/10.1038/s41592-022-01707-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01707-9

  • Springer Nature America, Inc.

Navigation