Pharmadoop: a tool for pharmacophore searching using Hadoop framework

  • Rahul Semwal
  • Imlimaong Aier
  • Utkarsh Raj
  • Pritish Kumar VaradwajEmail author
Original Article


The term pharmacophore is used to define the important features of one or more molecules having the same biological activity. Pharmacophores are selected based on several common features, such as the type of functional groups present, the distance between each atom or group of atoms and the angle between such groups or an individual atom. In this paper, we present the design and implementation of a pharmacophore searching tool, Pharmadoop, using the Hadoop framework. Due to its Hadoop implementation, Pharmadoop is a faster approach as compared to the existing standalone pharmacophore search tools. It utilizes the MapReduce algorithm to support the comparison of millions of conformers in a short time span. We further demonstrated and compared the utility of Pharmadoop on ten distinct chemical datasets of ligand molecules by running common substructure searching job on standalone and multi-system Hadoop platforms. These results were further used to perform pharmacophore searching applications on standalone and multi-node Hadoop distributions. The performance, speed and accuracy of the tool were evaluated through time-scale analysis and receiver operating curve. The Pharmadoop tool can be accessed at


Pharmacophore Hadoop Performance analysis ROC 


  1. Aier I, Varadwaj PK, Raj U (2016) Structural insights into conformational stability of both wild-type and mutant EZH2 receptor. Sci Rep 6:34984CrossRefGoogle Scholar
  2. Dixon SL, Smondyrev AM, Rao SN (2006) PHASE: a novel approach to pharmacophore modeling and 3D database searching. Chem Biol Drug Des 67(5):370–372CrossRefGoogle Scholar
  3. Dror O, Shulman-Peleg A, Nussinov R, Wolfson HJ (2006) Predicting molecular interactions in silico: I. an updated guide to pharmacophore identification and its applications to drug design. In: Frontiers in Medicinal Chemistry (Vol. 551, No. 584, pp 551–584). Bentham Science PublishersGoogle Scholar
  4. Guha R, Van Drie J (2008) Pharmacophore representation and searching. CDK NewsGoogle Scholar
  5. Gund P (2000) Evolution of the pharmacophore concept in pharmaceutical research. pharmacophore perception, development and use in drug design. pp 1–11Google Scholar
  6. Jauffret P, Hanser T, Tonnelier C, Kaufmann G (1990a) Machine learning of generic reactions: 1. Scope of the project; the GRAMS program. Tetrahedron Comput Methodol 3(6):323–333CrossRefGoogle Scholar
  7. Jauffret P, Tonnelier C, Hanser T, Kaufmann G, Wolff R (1990b) Machine learning of generic reactions: 2. toward an advanced computer representation of chemical reactions. Tetrahedron Comput Methodol 3(6):335–349CrossRefGoogle Scholar
  8. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J (2015) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213CrossRefGoogle Scholar
  9. Koes DR, Camacho CJ (2012) ZINCPharmer: pharmacophore search of the ZINC database. Nucleic Acids Research, 40(Web Server issue). pp W409–W414Google Scholar
  10. Rai S, Raj U, Tichkule S, Kumar H, Mishra S, Sharma N, Buddha R, Raghav D, Varadwaj PK (2016) Recent trends in in-silico drug discovery. Int J Comput Biol 5(1):54–76Google Scholar
  11. Raj U, Varadwaj PK (2016) Flavonoids as multi-target inhibitors for proteins associated with Ebola virus: in silico discovery using virtual screening and molecular docking studies. Interdiscip Sci 8(2):132–141CrossRefGoogle Scholar
  12. Raj U, Kumar H, Gupta S, Varadwaj PK (2016) Exploring dual inhibitors for STAT1 and STAT5 receptors utilizing virtual screening and dynamics simulation validation. J Biomol Struct Dyn 34(10):2115–2129CrossRefGoogle Scholar
  13. Raj U, Sharma AK, Aier I, Varadwaj PK (2017) In silico characterization of hypothetical proteins obtained from Mycobacterium tuberculosis H37Rv. Netw Model Anal Health Inform Bioinform 6(1):5CrossRefGoogle Scholar
  14. Raymond JW, Willett P (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J Comput Aided Mol Des 16(7):521–533CrossRefGoogle Scholar
  15. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE. pp 1–10Google Scholar
  16. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo-and bioinformatics. J Chem Inf Comput Sci 43(2):493–500CrossRefGoogle Scholar
  17. Tabhane S, Fadnavis RA (2015) Large data computing using Clustering algorithms based on Hadoop. Int J Eng Res Gen Sci 3(2):1056–1063Google Scholar
  18. Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1CrossRefGoogle Scholar
  19. Tonnelier C, Jauffret P, Hanser T, Kaufmann G (1990) Machine learning of generic reactions: 3. an efficient algorithm for maximal common substructure determination. Tetrahedron Comput Methodol 3(6):351–358CrossRefGoogle Scholar
  20. Van Drie JH, Weininger D, Martin YC (1989) ALADDIN: an integrated tool for computer-assisted molecular design and pharmacophore recognition from geometric, steric, and substructure searching of three-dimensional molecular structures. J Comput Aided Mol Des 3(3):225–251CrossRefGoogle Scholar
  21. Wermuth CG (2006) Pharmacophores: historical perspective and viewpoint from a medicinal chemist. Methods Princ Med Chem 32:3Google Scholar
  22. Wermuth CG, Ganellin CR, Lindberg P, Mitscher LA (1998) Glossary of terms used in medicinal chemistry (IUPAC Recommendations 1998). Pure Appl Chem 70(5):1129–1143CrossRefGoogle Scholar
  23. White T (2012) Hadoop: the definitive guide. Sebastopol, O’Reilly Media, Inc.Google Scholar
  24. Yang S (2010) Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today 15(11):444–450CrossRefGoogle Scholar
  25. Zhu W, Zeng N, Wang N (2010) Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS® implementations. NESUG proceedings: health care and life sciences, Baltimore. pp 1–9Google Scholar
  26. Zikopoulos P, Eaton C (2011) Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne MediaGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria 2017

Authors and Affiliations

  • Rahul Semwal
    • 1
  • Imlimaong Aier
    • 1
  • Utkarsh Raj
    • 1
  • Pritish Kumar Varadwaj
    • 1
    • 2
    Email author
  1. 1.Department of Information Technology (Bioinformatics)Indian Institute of Information TechnologyAllahabadIndia
  2. 2.Department of Applied SciencesIndian Institute of Information Technology-AllahabadAllahabadIndia

Personalised recommendations