Skip to main content

Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign

Abstract

Quantitative structure–activity relationship (QSAR) is a branch of computer aided drug discovery that relates chemical structures to biological activity. Two well established and related QSAR descriptors are two- and three-dimensional autocorrelation (2DA and 3DA). These descriptors encode the relative position of atoms or atom properties by calculating the separation between atom pairs in terms of number of bonds (2DA) or Euclidean distance (3DA). The sums of all values computed for a given small molecule are collected in a histogram. Atom properties can be added with a coefficient that is the product of atom properties for each pair. This procedure can lead to information loss when signed atom properties are considered such as partial charge. For example, the product of two positive charges is indistinguishable from the product of two equivalent negative charges. In this paper, we present variations of 2DA and 3DA called 2DA_Sign and 3DA_Sign that avoid information loss by splitting unique sign pairs into individual histograms. We evaluate these variations with models trained on nine datasets spanning a range of drug target classes. Both 2DA_Sign and 3DA_Sign significantly increase model performance across all datasets when compared with traditional 2DA and 3DA. Lastly, we find that limiting 3DA_Sign to maximum atom pair distances of 6 Å instead of 12 Å further increases model performance, suggesting that conformational flexibility may hinder performance with longer 3DA descriptors. Consistent with this finding, limiting the number of bonds in 2DA_Sign from 11 to 5 fails to improve performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Abbreviations

2DA:

2D autocorrelation

3DA:

3D autocorrelation

ANN:

Artificial neural network

BCL:

BioChemical library

CADD:

Computer aided drug discovery

GPCR:

G-protein coupled receptor

HTS:

High-throughput screen

LB-CADD:

Ligand-based CADD

logAUC:

Area under the logarithmic ROC curve

LOO:

Leave-one-out

QSAR:

Quantitative structure–activity relationship

RDF:

Radial distribution function

ROC:

Receiver operating characteristic

VDW:

Van der Waals

References

  1. 1.

    Sliwoski G, Kothiwale S, Meiler J, Lowe EW Jr (2014) Computational methods in drug discovery. Pharmacol Rev 66(1):334–395. doi:10.1124/pr.112.007336

    Article  Google Scholar 

  2. 2.

    Salt DW, Yildiz N, Livingstone DJ, Tinsley CJ (1992) The use of artificial neural networks in QSAR. Pestic Sci 36(2):161–170. doi:10.1002/ps.2780360212

    CAS  Article  Google Scholar 

  3. 3.

    Butkiewicz M, Lowe EW, Meiler J (2012) Bcl::ChemInfo—qualitative analysis of machine learning models for activation of HSD involved in Alzheimer’s Disease. In: Computational intelligence in bioinformatics and computational biology (CIBCB), 2012 IEEE symposium on, 9–12 May 2012, pp 329–334. doi:10.1109/cibcb.2012.6217248

  4. 4.

    Trinajstić N (1992) Chemical graph theory. In: Mathematical chemistry series, 2nd edn. CRC Press, Boca Raton

  5. 5.

    Balaban AT (1998) Topological and stereochemical molecular descriptors for databases useful in QSAR, similarity/dissimilarity and drug design. SAR QSAR Environ Res 8(1–2):1–21. doi:10.1080/10629369808033259

    CAS  Article  Google Scholar 

  6. 6.

    Hemmer MC, Steinhauer V, Gasteiger J (1999) Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19(1):151–164. doi:10.1016/S0924-2031(99)00014-4

    CAS  Article  Google Scholar 

  7. 7.

    Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies. Perception of molecules: topological structure and 3-dimensional structure. Eur J Med Chem 19(1):61–65

    CAS  Google Scholar 

  8. 8.

    Hopfinger AJ, Wang S, Tokarski JS, Jin B, Albuquerque M, Madhav PJ, Duraiswami C (1997) Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J Am Chem Soc 119(43):10509–10524. doi:10.1021/ja9718937

    CAS  Article  Google Scholar 

  9. 9.

    Shahlaei M (2013) Descriptor selection methods in quantitative structure–activity relationship studies: a review study. Chem Rev 113(10):8093–8103. doi:10.1021/cr3004339

    CAS  Article  Google Scholar 

  10. 10.

    Moreau G, Broto P (1980) The auto-correlation of a topological-structure—a new molecular descriptor. Nouv J Chim 4(6):359–360

    CAS  Google Scholar 

  11. 11.

    Butkiewicz M, Lowe EW Jr, Mueller R, Mendenhall JL, Teixeira PL, Weaver CD, Meiler J (2013) Benchmarking ligand-based virtual high-throughput screening with the PubChem database. Molecules 18(1):735–756. doi:10.3390/molecules18010735

    CAS  Article  Google Scholar 

  12. 12.

    Kubinyi H, Folkers G, Martin YC (1998) 3D QSAR in drug design. Qdsar, vol 2. Kluwer, Dordrecht

  13. 13.

    Kiralj R, Ferreira MMC (2009) Basic validation procedures for regression models in QSAR and QSPR studies: theory and application. J Braz Chem Soc 20:770–787

    CAS  Article  Google Scholar 

  14. 14.

    Manchester J, Czermiński R (2009) CAUTION: popular “Benchmark” data sets do not distinguish the merits of 3D QSAR methods. J Chem Inf Model 49(6):1449–1454. doi:10.1021/ci9000508

    CAS  Article  Google Scholar 

  15. 15.

    Gasteiger J, Marsili M (1978) A new model for calculating atomic charges in molecules. Tetrahedron Lett 19(34):3181–3184. doi:10.1016/S0040-4039(01)94977-9

    Article  Google Scholar 

  16. 16.

    Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 36(22):3219–3228. doi:10.1016/0040-4020(80)80168-2

    CAS  Article  Google Scholar 

  17. 17.

    Guillen MD, Gasteiger J (1983) Extension of the method of iterative partial equalization of orbital electronegativity to small ring systems. Tetrahedron 39(8):1331–1335. doi:10.1016/S0040-4020(01)91901-5

    CAS  Article  Google Scholar 

  18. 18.

    Bauerschmidt S, Gasteiger J (1997) Overcoming the limitations of a connection table description: a universal representation of chemical species. J Chem Inf Comput Sci 37(4):705–714

    CAS  Article  Google Scholar 

  19. 19.

    Streitwieser A (1961) Molecular orbital theory for organic chemists. Wiley, New York

    Google Scholar 

  20. 20.

    Gasteiger J, Saller H (1985) Calculation of the charge distribution in conjugated systems by a quantification of the resonance concept. Angew Chem Int Ed Engl 24(8):687–689. doi:10.1002/anie.198506871

    Article  Google Scholar 

  21. 21.

    Gilson MK, Gilson HS, Potter MJ (2003) Fast assignment of accurate partial atomic charges: an electronegativity equalization method that accounts for alternate resonance forms. J Chem Inf Comput Sci 43(6):1982–1997

    CAS  Article  Google Scholar 

  22. 22.

    Gasteiger J, Hutchings MG (1983) New empirical models of substituent polarisability and their application to stabilisation effects in positively charged species. Tetrahedron Lett 24(25):2537–2540

    CAS  Article  Google Scholar 

  23. 23.

    Gasteiger J, Hutchings MG (1984) Quantitative models of gas-phase proton-transfer reactions involving alcohols, ethers, and their thio analogs. Correlation analyses based on residual electronegativity and effective polarizability. J Am Chem Soc 106(22):6489–6495. doi:10.1021/ja00334a006

    CAS  Article  Google Scholar 

  24. 24.

    Miller KJ (1990) Additivity methods in molecular polarizability. J Am Chem Soc 112(23):8533–8542. doi:10.1021/ja00179a044

    CAS  Article  Google Scholar 

  25. 25.

    Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581. doi:10.1021/cr00023a012

    CAS  Article  Google Scholar 

  26. 26.

    Cleves AE, Jain AN (2006) Robust ligand-based modeling of the biological targets of known drugs. J Med Chem 49(10):2921–2938. doi:10.1021/Jm051139t

    CAS  Article  Google Scholar 

  27. 27.

    Hristozov DP, Oprea TI, Gasteiger J (2007) Virtual screening applications: a study of ligand-based methods and different structure representations in four different scenarios. J Comput Aided Mol Des 21(10–11):617–640. doi:10.1007/s10822-007-9145-8

    CAS  Article  Google Scholar 

  28. 28.

    Clark RD, Webster-Clark DJ (2008) Managing bias in ROC curves. J Comput Aided Mol Des 22(3–4):141–146. doi:10.1007/s10822-008-9181-z

    CAS  Article  Google Scholar 

Download references

Acknowledgments

Work in the Meiler laboratory is supported through NIH (R01 GM080403, R01 GM099842, R01 DK097376, R01 HL122010, R01 GM073151, U19 AI117905) and NSF (CHE 1305874).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jens Meiler.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 143 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sliwoski, G., Mendenhall, J. & Meiler, J. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign. J Comput Aided Mol Des 30, 209–217 (2016). https://doi.org/10.1007/s10822-015-9893-9

Download citation

Keywords

  • Quantitative structure activity relationship
  • Descriptor
  • 2D autocorrelation
  • 3D autocorrelation
  • Artificial neural network
  • Virtual high-throughput screening