Skip to main content

Conformal Predictors for Compound Activity Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9653))

Abstract

The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures (NCM) extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The signature descriptors and other types of descriptors (e.g. circular descriptors) can be computed with the CDK Java package or any of its adaptations such as the RCDK package for the R statistical software.

  2. 2.

    In the case of linear SVM, it is possible to tackle the formulation of the quadratic optimization problem at the heart of the SVM in the primal and solve it with techniques such as Stochastic Gradient Descent or L-BFGS, which lend themselves well to being distributed across an array of computational nodes.

  3. 3.

    See [8] for a proof that Tanimoto Similarity is a kernel.

  4. 4.

    According to https://www.sgi.com/company_info/newsroom/press_releases/2015/september/salomon.html.

References

  1. Monve, V.: Introduction to similarity searching in chemistry. MATCH - Comm. Math. Comp. Chem. 51, 7–38 (2004)

    MathSciNet  Google Scholar 

  2. Bottou, L., Chapelle, O., DeCoste, D., Weston, J.: Large-Scale Kernel Machines (Neural Information Processing). The MIT Press, Cambridge (2007)

    Google Scholar 

  3. Bussonnier, M.: Interactive parallel computing in Python. https://github.com/ipython/ipyparallel

  4. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~jlin/libsvm

    Google Scholar 

  5. Chang, E.Y.: PSVM: parallelizing support vector machines on distributed computers. Foundations of Large-Scale Multimedia Information Management and Retrieval, pp. 213–230. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Faulon Jr., J.-L., Visco, D.P., Pophale, R.S.: The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)

    Article  Google Scholar 

  7. Gammerman, A., Vovk, V.: Hedging predictions in machine learning. Comput. J. 50(2), 151–163 (2007)

    Article  Google Scholar 

  8. Gärtner, T.: Kernels For Structured Data. World Scientific Publishing Co. Inc., River Edge (2009)

    MATH  Google Scholar 

  9. Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel Support Vector Machines: The Cascade SVM. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 521–528. MIT Press, Cambridge (2005)

    Google Scholar 

  10. Jain, A.N., Nicholls, A.: Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22(3–4), 133–139 (2008)

    Article  Google Scholar 

  11. Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)

    MathSciNet  MATH  Google Scholar 

  12. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer-Verlag New York, Inc., Secaucus (2005)

    MATH  Google Scholar 

  13. Weis, D.C., Visco Jr., D.P., Faulon, J.-L.: Data mining pubchem using a support vector machine with the signature molecular descriptor: classification of factor XIa inhibitors. J. Mol. Graph. Model. 27(4), 466–475 (2008)

    Article  Google Scholar 

  14. Woodsend, K., Gondzio, J.: Hybrid MPI/OpenMP parallel linear support vector machine training. J. Mach. Learn. Res. 10, 1937–1953 (2009)

    MathSciNet  MATH  Google Scholar 

  15. You, Y., Fu, H., Song, S.L., Randles, A., Kerbyson, D., Marquez, A., Yang, G., Hoisie, A.: Scaling support vector machines on modern HPC platforms. J. Parallel Distrib. Comput. 76(C), 16–31 (2015)

    Article  Google Scholar 

  16. Toccaceli, P., Nouretdinov, I., Luo, Z., Vovk, V., Carlsson, L., Gammerman, A.: Conformal predictors. Technical report for EU Horizon 2020 Programme ExCape Project. Royal Holloway, London, December 2015

    Google Scholar 

  17. Carlsson, L., Ahlberg, E., Boström, H., Johansson, U., Linusson, H.: Modifications to p-values of conformal predictors. In: SLDS 2015, pp. 251–259

    Google Scholar 

  18. Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. In: Pacific Symposium on Biocomputing, p. 311 (2012)

    Google Scholar 

  19. Wang, Y., Suzek, T., Zhang, J., Wang, J., He, S., Cheng, T., Shoemaker, B.A., Gindulyte, A., Bryant, S.H.: PubChem BioAssay: 2014 update. Nucleic Acids Res. 42(1), D1075–D1082 (2014)

    Article  Google Scholar 

  20. McCool, M., Robison, A.D., Reinders, J.: Structured Parallel Programming: Patterns for Efficient Computation. Morgan-Kaufmann, Burlington (2012)

    Google Scholar 

Download references

Acknowledgments

This project (ExCAPE) has received funding from the European Unions Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555. We are grateful for the help in conducting experiments to the Ministry of Education, Youth and Sports (Czech Republic) that supports the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center LM2015070”. This work was also supported by EPSRC grant EP/K033344/1 (“Mining the Network Behaviour of Bots”). We are indebted to Lars Carlsson of Astra Zeneca for providing the data and useful discussions. We are also thankful to Zhiyuan Luo and Vladimir Vovk for many valuable comments and discussions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Paolo Toccaceli , Ilia Nouretdinov or Alexander Gammerman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Toccaceli, P., Nouretdinov, I., Gammerman, A. (2016). Conformal Predictors for Compound Activity Prediction. In: Gammerman, A., Luo, Z., Vega, J., Vovk, V. (eds) Conformal and Probabilistic Prediction with Applications. COPA 2016. Lecture Notes in Computer Science(), vol 9653. Springer, Cham. https://doi.org/10.1007/978-3-319-33395-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33395-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33394-6

  • Online ISBN: 978-3-319-33395-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics