Skip to main content
Log in

XDeMo: a novel deep learning framework for DNA motif mining using transformer models

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Motivation: Recognizing and studying DNA patterns is crucial for improving knowledge of illnesses, cell function, and gene control. Motifs determine which transcription factor a protein may bind to, leading to a better unraveling of gene expression. Advancements in the fields of deep learning and high-throughput sequencing have made possible the exploration of motif discovery anew, with greater accuracy and performance. Methodology: In this paper, a novel deep learning framework (XDeMo – Transformer-based Deep Motifs) for DNA motif mining using Transformer models is proposed. Furthermore, a hybrid encoding scheme is also introduced, called ‘blended’ encoding specifically designed for use with deep learning transformer models that are trained using DNA sequences. Results: Our proposed transformer-based framework for DNA motif discovery augmented by blended encoding outperforms many state-of-the-art deep learning models on many baseline performance metrics when trained on the standard datasets. Our models demonstrated robust performance in predicting motifs with high discriminative power, precision, recall, and F1 score. Conclusion: The model’s ability to capture intricate sequence patterns and long-range dependencies led to the discovery of biologically meaningful motifs that were verified from known transcription factor binding motif databases. This shows that our novel framework can be effectively used to find DNA motifs and therefore, aid in further downstream analyses for biomedical and biotechnological applications.

Significance

XDeMo’s practical implications span the realms of gene regulation research, genomics tool development, molecular biology, and diagnostic applications. It offers a robust foundation for further advancements in genomic analysis, with the potential to accelerate discoveries in gene regulation and the development of novel therapeutic strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

All the ChIP-Seq datasets that were used in this study were downloaded from the ENCODE (Encyclopedia of DNA Elements) database, which can be accessed and downloaded freely from the ENCODE website link (available at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/). The preprocessing steps that were performed on these datasets are detailed in the Methods section.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Rajashree Chaurasia and Udayan Ghose conceptualized the model architecture and methodology. Rajashree Chaurasia carried out the literature survey, data collection, preprocessing, and analysis, model construction, training, and evaluation, and wrote the manuscript. Udayan Ghose supervised and reviewed the manuscript preparation.

Corresponding author

Correspondence to Rajashree Chaurasia.

Ethics declarations

Conflicts of interest

The authors declare no conflicts of interest. No funding was received for conducting this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaurasia, R., Ghose, U. XDeMo: a novel deep learning framework for DNA motif mining using transformer models. Netw Model Anal Health Inform Bioinforma 13, 25 (2024). https://doi.org/10.1007/s13721-024-00463-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-024-00463-4

Keywords

Navigation