Skip to main content
Log in

Epik: a software program for pK a prediction and protonation state generation for drug-like molecules

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript


Epik is a computer program for predicting pKa values for drug-like molecules. Epik can use this capability in combination with technology for tautomerization to adjust the protonation state of small drug-like molecules to automatically generate one or more of the most probable forms for use in further molecular modeling studies. Many medicinal chemicals can exchange protons with their environment, resulting in various ionization and tautomeric states, collectively known as protonation states. The protonation state of a drug can affect its solubility and membrane permeability. In modeling, the protonation state of a ligand will also affect which conformations are predicted for the molecule, as well as predictions for binding modes and ligand affinities based upon protein–ligand interactions. Despite the importance of the protonation state, many databases of candidate molecules used in drug development do not store reliable information on the most probable protonation states. Epik is sufficiently rapid and accurate to process large databases of drug-like molecules to provide this information. Several new technologies are employed. Extensions to the well-established Hammett and Taft approaches are used for pKa prediction, namely, mesomer standardization, charge cancellation, and charge spreading to make the predicted results reflect the nature of the molecule itself rather just for the particular Lewis structure used on input. In addition, a new iterative technology for generating, ranking and culling the generated protonation states is employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. Epik 1.5 (2007) Schrödinger, LLC, New York, NY

  2. Perrin DD, Dempsy B, Sergeant EP (1981) pKa prediction for organic acids and bases. Chapman and Hall, London

    Google Scholar 

  3. LigPrep 2.1 (2007) Schrödinger, LLC, New York, NY

  4. Resonance structures and mesomers are synonyms as discussed on the webpage: In the context of Epik we prefer to use mesomer because of its connection with the terminology “mesomeric effect” which often arises in discussions of pK a values

  5. ACD:

  6. ChemAxon:

  7. Sparc software:

  8. Klicic JJ, Friesner RA, Liu S-Y, Guida WCJ (2002) Phys Chem A 106, 1327


  10. SMARTS, SMiles ARbitrary Target Specification, is a registered trademark of Daylight Chemical Information Systems

  11. Jaffé HH (1953) Chem Rev 53:191

    Article  Google Scholar 

  12. Clark J, Perrin DD (1964) Quart Rev 18:295

    Article  CAS  Google Scholar 

  13. Longuet-Higgins HC (1950) J Chem Phys 18:265. ibid. 275. ibid. 283

  14. Perrin DD (1965) J Am Chem Soc 5590

  15. Maestro 8.0 (2007) Schrödinger, LLC, New York, NY

  16. Hägele G, Holzgrabe U (1999) In: Holzgrabe U, Wawer I, Diehl B (eds), NMR spectroscopy in drug development and analysis. Wiley-VCH, Weinheim Germany, pp 61–76

  17. Jaguar 7.0 (2007) Schrödinger, LLC, New York, NY

  18. Hansch C, Leo A, Hoekman D (1995) Exploring QSAR, hyrdophobic, electronic and steric constants. American Chemical Society, Washington, DC

    Google Scholar 

  19. Serjeant EP, Dempsey B (1979) Ionization constants of organic acids in aqueous solution. Pergamon Press, Oxford England

    Google Scholar 

  20. Perrin DD (1965) Dissociation constants of organic bases in aqueous solution. Butterworths, London

    Google Scholar 

  21. Perrin DD (1972) Dissociation constants of organic bases in aqueous solution: supplement 1972. Butterworths, London

    Google Scholar 

  22. CrossFire Beilstein, version 7.0; MDL Information Systems GmbH, Frankfurt am Main, Germany (

  23. DrugBank

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to John C. Shelley.

Electronic supplementary material

Below is the link to the electronic supplementary material.


Comparison between experimental and pKa values predicted by Epik for molecules listed in the drugbank with pKa values and SMILES patterns. Only predicted pKa values between 0.0 and 14 are reported except where the best matching pKa value exceeds 14 (nitrofurazone, mitomycin, and ethanol)

Appendix: Prioritizing SMARTS patterns for ABGs

Appendix: Prioritizing SMARTS patterns for ABGs

All SMARTS patterns for ABGs are assigned numeric priorites. The pattern with the highest priority that matches a particular ABG is selected and the parameters (e.g. pKa0 and ρ) associated with that pattern are used in the HT calculations. These priorities are assigned in one of three ways:

  1. 1.

    The ABG was manually assigned a negative priority and thus are matched only if a more specific pattern is not found. This was only done for very general patterns (e.g. primary, secondary or tertiary amines) which would match many functionalities, most of which are better described by more specific patterns (e.g. amides and anilines). Roughly 5% of the patterns in the database are assigned negative priorities.

  2. 2.

    The priority for the ABG was calculated from the SMARTS pattern.

  3. 3.

    In a couple of cases the priority was calculated as described in the last item (2) except that a manually assigned shift was added to distinguish very closely related patterns.

The procedure for calculating the priority from the SMARTS pattern will be outlined in detail below.

All SMARTS patterns for ABGs are recorded in the acidic form and begin with the acidic hydrogen followed immediately by the atom to with the acidic hydrogen is bound. We will refer that atom as the first heavy atom.

The SMARTS pattern is translated into a list of atoms and a list of bonds. The type of bond is noted or inferred from the SMARTS pattern consistent with the SMARTS standard. Each atom is classified SP3-like unless it meets one of two conditions:

  1. 1.

    If any of the bonds involving this atom are double, triple or aromatic

  2. 2.

    If it is a O, S or N and bonded to an aromatic atom

The priority, P, of a SMARTS pattern for an ABG is calculated using the equation:

$$ P=2^\ast a_2 +\sum\limits_{i > 2} {p_i^\ast a_i } $$

where: a i is the weighting for atom i in the SMARTS pattern and p i is an attenuation factor that depends on the shortest topological path from atom 2, the first heavy atom, to atom i. All atoms in the SMARTS pattern are included in the sum except the acidic hydrogen atom (atom 1). The a i values were determined by trial and error and are given in Table 3. The p i values were calculated using the equation:

$$ p_i =\prod\limits_j s_j $$

where s j is a attenuation factor corresponding to a portion of the shortest path from atom 2 to atom i. Each non-aromatic bond has a separate propagation factor while each set of consecutive aromatic bonds gets a single factor. Aromatic bonds are treated differently because the influence of atoms in aromatic systems does not monotonically decrease with the number of bonds. The attenuation factors for non-aromatic bonds are given in Table 4 while those for aromatic bonds are given in Table 5.

Table 3 Atom weighting factors used in prioritizing SMARTS matches for ABGs
Table 4 Attenuation factors, s j , for different non-aromatic bond types
Table 5 Attenuation factors, s j , as a function of aromatic path length

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shelley, J.C., Cholleti, A., Frye, L.L. et al. Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. J Comput Aided Mol Des 21, 681–691 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: