Skip to main content
Log in

A modified Henry gas solubility optimization for solving motif discovery problem

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The DNA motif discovery (MD) problem is the main challenge of genome biology, and its importance is directly proportional to increasing sequencing technologies. MD plays a vital role in the identification of transcription factor binding sites that help in learning the mechanisms for regulation of gene expression. Metaheuristic algorithms are promising techniques for eliciting motif from DNA genomic sequences, but often fail to demonstrate robust performance by overcoming the inherent challenges in complex gene sequences, making search environment extremely non-convex for optimization methods. This paper proposes a novel modified Henry gas solubility optimization (MHGSO) algorithm for motif discovery which elicits a functional motif in DNA genomic sequences. In our approach, a new stage that captures the main characteristics of the motifs in DNA sequences is proposed, and MHGSO imitates the motifs characteristics for accurate detection of target motif. The performance of the MHGSO algorithm is validated using both synthetic and real datasets. Results confirm the stability and superiority of the proposed algorithm compared to state-of-the-art algorithms including MEME, DREME, XXmotif, PMbPSO, and MACS. Based on several evaluation matrices, MHGSO outperforms the competitor techniques in terms of nucleotide-level correlation coefficient, recall, precision, F-score, Cohen’s Kappa, and statistical validation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Gohardani SA, Bagherian M, Vaziri H (2019) A multi-objective imperialist competitive algorithm (moica) for finding motifs in dna sequences. Math Biosci Eng MBE 16(3):1575

    MathSciNet  Google Scholar 

  2. Hashim FA, Mabrouk MS, Atabany WAL (2019) Review of different sequence motif finding algorithms. Avicenna J Med Biotechnol 11(2):130–148

    Google Scholar 

  3. Som-in S, Kimpan W (2018) Enhancing of particle swarm optimization based method for multiple motifs detection in DNA sequences collections. IEEE ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2872978

    Article  Google Scholar 

  4. Lee NK, Li X, Wang D (2018) A comprehensive survey on genetic algorithms for dna motif prediction. Inf Sci 466:25–43

    MathSciNet  Google Scholar 

  5. Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336

    MathSciNet  MATH  Google Scholar 

  6. Rivière R, Barth D, Cohen J, Denise A (2008) Shuffling biological sequences with motif constraints. J Discrete Algorithms 6(2):192–204

    MathSciNet  MATH  Google Scholar 

  7. Lei C, Ruan J (2009) A novel swarm intelligence algorithm for finding dna motifs. Int J Comput Biol Drug Design 2(4):323

    Google Scholar 

  8. Shao L, Chen Y (2009) Bacterial foraging optimization algorithm integrating Tabu search for motif discovery. In: 2009 IEEE international conference on bioinformatics and biomedicine. pp 415–418. IEEE

  9. Hashim F, Mabrouk MS, Al-Atabany W (2017) Gwomf: grey wolf optimization for motif finding. In 2017 13th international computer engineering conference (ICENCO)

  10. Hashim FA, Houssein EH, Mabrouk MS, Al-Atabany W, Mirjalili S (2019) Henry gas solubility optimization: a novel physics-based algorithm. Future Gener Comput Syst 101:646–667

    Google Scholar 

  11. Bailey TL, Elkan C (1995) The value of prior knowledge in discovering motifs with meme. In: Ismb, vol 3, pp 21–29

  12. Bailey TL (2011) Dreme: motif discovery in transcription factor chip-seq data. Bioinformatics 27(12):1653–1659

    Google Scholar 

  13. Holger H, Guthöhrlein EW, Siebert M, Luehr S, Söding J (2013) P-value-based regulatory motif discovery using positional weight matrices. Genome Res 23(1):181–194

    Google Scholar 

  14. Reddy US, Arock M, Reddy AV (2010) Planted (l, d)-motif finding using particle swarm optimization. IJCA Special Issue ECQT 2:51–56

    Google Scholar 

  15. Elewa ES, Abdelhalim MB, Mabrouk MS (2014) Adaptation of cuckoo search algorithm for the motif finding problem. In: 2014 10th international computer engineering conference (ICENCO). IEEE, pp 87–91

  16. Zhang Y, Wang P, Yan M (2016) An entropy-based position projection algorithm for motif discovery. BioMed Res Int 2016:9127474

    Google Scholar 

  17. Sinha S, Tompa M (2003) Ymf: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res 31(13):3586–3588

    Google Scholar 

  18. Pavesi G, Mereghetti P, Mauri G, Pesole G (2004) Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res 32(suppl–2):W199–W203

    Google Scholar 

  19. Qiang Y, Huo H, Chen X, Guo H, Vitter JS, Huan J (2015) An efficient algorithm for discovering motifs in large dna data sets. IEEE Trans Nanobiosci 14(5):535–544

    Google Scholar 

  20. Reid JE, Wernisch L (2011) Steme: efficient em to find motifs in large data sets. Nucleic Acids Res 39(18):e126–e126

    Google Scholar 

  21. Quang D, Xie X (2014) Extreme: an online em algorithm for motif discovery. Bioinformatics 30(12):1667–1673

    Google Scholar 

  22. Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214

    Google Scholar 

  23. Liu X, Brutlag DL, Liu JS (2000) Bioprospector: discovering conserved dna motifs in upstream regulatory regions of co-expressed genes. In Biocomputing 2001. World Scientific, pp 127–138

  24. Sharov AA, Ko MSH (2009) Exhaustive search for over-represented dna sequence motifs with cisfinder. J Mol Biol 16(5):261–273

    Google Scholar 

  25. Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, van Helden J (2011) Rsat peak-motifs: motif analysis in full-size chip-seq datasets. Nucleic Acids Res 40(4):e31

    Google Scholar 

  26. Jia C, Carson MB, Wang Y, Lin Y, Lu H (2014) A new exhaustive method and strategy for finding motifs in chip-enriched regions. PLOS ONE 9(1):1–13

    Google Scholar 

  27. Pevzner PA, Sze SH et al (2000) Combinatorial approaches to finding subtle signals in dna sequences. In: ISMB, vol 8, pp 269–278

  28. Satya RV, Mukherjee A (2004) New algorithms for finding monad patterns in dna sequences. In: Apostolico A, Melucci M (eds) String processing and information retrieval. Springer, Berlin, pp 273–285

    Google Scholar 

  29. Liang S (2003) Cwinnower algorithm for finding fuzzy dna motifs. In: Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE bioinformatics conference (CSB2003), pp 260–265

  30. Jeremy B, Martin T (2001) Finding motifs using random projections. J Comput Biol 9(2):225–242

    Google Scholar 

  31. Raphael B, Liu L-T, Varghese G (2004) A uniform projection method for motif discovery in dna sequences. IEEE/ACM Trans Comput Biol Bioinform 1(2):91–94

    Google Scholar 

  32. Jensen Shane T, Shirley LX, Qing Z, Liu Jun S (2004) Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Stat Sci 19(1):188–204

    MathSciNet  MATH  Google Scholar 

  33. BoussaïD I, Lepagnot J, Siarry P (2013) A survey on optimization metaheuristics. Inf. Sci. 237:82–117

    MathSciNet  MATH  Google Scholar 

  34. González-Álvarez DL, Vega-Rodríguez MA, Gómez-Pulido JA, Sánchez-Pérez JM (2012) Comparing multiobjective artificial bee colony adaptations for discovering dna motifs. In: Giacobini M, Vanneschi L, Bush WS (eds) Evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 110–121

    Google Scholar 

  35. Vilas M, Patel Maulika S, Jyoti D (2015) Motif finding with application to the transcription factor binding sites problem. Int J Comput Appl 120(15):7–10

    Google Scholar 

  36. Lam AYS, Li VOK (2010) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evol Comput 14(3):381–399

    Google Scholar 

  37. van Helden J, André B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 11 edited by g. von Heijne. J Mol Biol 281(5):827–842

    Google Scholar 

  38. Ma X, Kulkarni A, Zhang Z, Xuan Z, Serfling R, Zhang MQ (2011) A highly efficient and effective motif discovery method for chip-seq/chip-chip data using positional information. Nucleic Acids Res 40(7):e50

    Google Scholar 

  39. Giulio P, Giancarlo M, MauriGraziano P (2001) An algorithm for finding signals of unknown length in dna sequences. Bioinformatics 17(1):S207–S214

    Google Scholar 

  40. Eleazar E, Pevzner Pavel A (2002) Finding composite regulatory patterns in dna sequences. Bioinformatics 18(1):S354–S363

    Google Scholar 

  41. Limor L, Inbal P, Zohar Y, Yael M-G (2013) Drimust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic Acids Res 41(W1):W174–W179

    Google Scholar 

  42. Sun HQ, Low MY, Hsu WJ, Rajapakse JC (2010) Listmotif: a time and memory efficient algorithm for weak motif discovery. In: 2010 IEEE international conference on intelligent systems and knowledge engineering, pp 254–260

  43. Rajasekaran S, Balla S, Huang C-H (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128

    Google Scholar 

  44. Sze SH, Zhao X (2006) Improved pattern-driven algorithms for motif finding in dna sequences. In: Eskin E, Ideker T, Raphael B, Workman C (eds) Systems biology and regulatory genomics. Springer, Berlin, pp 198–211

    Google Scholar 

  45. Davila J, Balla S, Rajasekaran S (2006) Space and time efficient algorithms for planted motif search. In: Alexandrov VN, van Albada GD, Sloot PMA, Dongarra J (eds) Computational science—ICCS 2006. Springer, Berlin, pp 822–829

    Google Scholar 

  46. Qiang Y, Hongwei H, Yipu Z, Hongzhi G (2012) Pairmotif: a new pattern-driven algorithm for planted (l, d) dna motif search. PLoS ONE 7(10):e48442

    Google Scholar 

  47. Jaime D, Sudha B, Sanguthevar R (2007) Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans Comput Biol Bioinform 4(4):544–552

    Google Scholar 

  48. Jaime D, Dinh H, Sanguthevar R (2012) A fast algorithm for finding (l, d)-motifs in dna and protein sequences. PLoS ONE 7(10):e41425

    Google Scholar 

  49. Huihai W, Wong PWH, Caddick MX, Sibthorp C (2013) Finding dna regulatory motifs with position-dependent models. J Med Bioeng 2(2):103–109

    Google Scholar 

  50. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science 262(5131):208–214

    Google Scholar 

  51. Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouzé P, Moreau Y (2002) A gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 9(2):447–464

    Google Scholar 

  52. Xing EP, Wu W, Jordan MI, Karp RM (2004) Logos: a modular Bayesian model for de novo motif detection. J Bioinform Comput Biol 02(01):127–154

    Google Scholar 

  53. Siebert M, Söding J (2016) Bayesian markov models consistently outperform pwms at predicting motifs in nucleotide sequences. Nucleic Acids Res. 44:6055–6069

    Google Scholar 

  54. Congdon CB, Fizer CW, Smith NW, Gaskins HR, Aman J, Nava GM, Mattingly C (2005) Preliminary results for gami: a genetic algorithms approach to motif inference. In: Proceedings of the 2005 IEEE symposium on computational intelligence in bioinformatics and computational biology, CIBCB’05, p 12

  55. Liu FF, Tsai JJ, Chen RM, Chen SN, Shih SH (2004) Fmga: finding motifs by genetic algorithm. In: Proceedings 4th IEEE symposium on bioinformatics and bioengineering, pp 459–466

  56. Che D, Song Y, Rasheed K (2005) Mdga: motif discovery using a genetic algorithm. In: GECCO’05: proceedings of the 2005 conference on Genetic and evolutionary computation, pp 447–452

  57. Verma RS, Sanjay K (2012) Dsapso: DNA sequence assembly using continuous particle swarm optimization with smallest position value rule. In: 2012 1st international conference on recent advances in information technology (RAIT)

  58. González-Álvarez DL, Vega-Rodríguez MA (2013) Hybrid multiobjective artificial bee colony with differential evolution applied to motif finding. In: Vanneschi L, Bush WS, Giacobini M (eds) Evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 68–79

    Google Scholar 

  59. Karaboga D, Aslan S (2016) A discrete artificial bee colony algorithm for detecting transcription factor binding sites in dna sequences. Genet Mol Res 15(2):1–11

    Google Scholar 

  60. Bouamama S, Boukerram A, Al-Badarneh AF (2010) Motif finding using ant colony optimization. In: Dorigo M, Birattari M, Di Caro GA, Doursat R, Engelbrecht AP, Floreano D, Gambardella LM, Groß R, Şahin E, Sayama H, Stützle T (eds) Swarm intelligence. Springer, Berlin, pp 464–471

    Google Scholar 

  61. Elewa Ebtehal S, Abdelhalim MB, Mabrouk Mai S (2014) Adaptation of cuckoo search algorithm for the motif finding problem. In: 2014 10th international computer engineering conference (ICENCO), pp 87–91

  62. Blanco E, Farre D, Mar Alba M, Messeguer X, Guigo R (2006) Abs: a database of annotated regulatory binding sites from orthologous promoters. Nucleic Acids Res 34(suppl–1):D63–D67

    Google Scholar 

  63. Adami C (2004) Information theory in molecular biology. Phys Life Rev 1:3–22

    Google Scholar 

  64. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Essam H. Houssein.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashim, F.A., Houssein, E.H., Hussain, K. et al. A modified Henry gas solubility optimization for solving motif discovery problem. Neural Comput & Applic 32, 10759–10771 (2020). https://doi.org/10.1007/s00521-019-04611-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04611-0

Keywords

Navigation