Skip to main content
Log in

A new efficient quorum planted (ℓ, d) motif search on ChIP-seq dataset using segmentation to filtration and freezing firefly algorithms

  • Optimization
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A comprehensive understanding of transcription factor binding sites (TFBSs) is a key problem in contemporary biology, which is a critical issue in gene regulation. Identifying a pattern of TFBSs in every DNA sequence, motif discovery reveals the basic regulatory relationship and compassionate the evolutionary system of every species. In this case, recognizing the high-quality (, d) motif is a great challenge. This problem is addressed in motif discovery and motif finding, using the proposed algorithms, such as Segmentation to Filtration (S2F) and Firefly with FREEZE (FFF), respectively. In this study, the whole DNA sequences are divided into two segments. Segment 1 involves motif discovery and is sliced by base and sub k-mers applying an iterative approach, followed by filtration 1 and 2 techniques, respectively. This approach obtains the top five percent of the best motifs (TOPbk_mer) based on accuracy. In segment 2, the motifs recognized in segment 1 are given as input to the FFF algorithm to identify the TFBs locations. The standard firefly algorithm with two freezing techniques, local and global, is employed to recognize the final motif. The performance of these algorithms is evaluated on the simulated datasets and real datasets such as the Escherichia coli cyclic AMP receptor protein (CRP) dataset, mouse Embryonic Stem Cell (mESC) dataset, and human species ChIP-seq (Chromatin Immuno Precipitation Sequences) dataset. All of these datasets have a running time of the experiment within 3 min, and the sequence numbers (t) hold ranges up to 39,601. It is evident from the results that the two proposed algorithms, S2F and FFF, can identify the high-quality motif, and it is faster than the state-of-the-art PMS and QPMS algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

1. The mESC data was downloaded from https://lgsun.grc.nia.nih.gov/CisFinder/, the web version of CisFinder. 2. For the ENCODE TF ChIP-seq data, homo sapiens (hg19) datasets were utilized and retrieve them with the following steps: a. Download the datasets of the narrow Peak format from ucsc http://genome.ucsc.edu/ENCODE/downloads.html. b. Convert the narrow peak format to the FASTA format. c. Find the web logo of the TFBSs from the JASPAR database. http://compbio.mit.edu/encode-motif.

References

Download references

Funding

No funding is involved.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Theepalakshmi.

Ethics declarations

Conflict of interest

The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

Ethical approval

All the authors mentioned in the manuscript have agreed to authorship, read and approved the manuscript, and given consent for submission and subsequent publication of the manuscript. The order of authorship is agreed upon by all named authors prior to submission. Full names, institutional affiliations, highest degree obtained by the authors, and e-mail address are clearly mentioned on the title page. The corresponding author, who takes full ownership of all the communication related to the manuscript, be designated and his/her detailed institutional affiliation is provided. Manuscript submission-related declarations: The manuscript in part or in full has not been submitted or published anywhere. The manuscript will not be submitted elsewhere until the editorial process is completed. Statements of ethical approval for studies involving human subjects and/or animals: This article doesn’t involve human subjects and/or animals.

Informed consent

For this type of study, informed consent is not required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Theepalakshmi, P., Reddy, U.S. A new efficient quorum planted (ℓ, d) motif search on ChIP-seq dataset using segmentation to filtration and freezing firefly algorithms. Soft Comput 28, 3049–3070 (2024). https://doi.org/10.1007/s00500-023-09236-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09236-z

Keywords

Navigation