Conformal Predictors for Compound Activity Prediction

  • Paolo Toccaceli
  • Ilia Nouretdinov
  • Alexander Gammerman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9653)


The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures (NCM) extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data.


Conformal prediction Confidence estimation Chemoinformatics Non-conformity measure 



This project (ExCAPE) has received funding from the European Unions Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555. We are grateful for the help in conducting experiments to the Ministry of Education, Youth and Sports (Czech Republic) that supports the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center LM2015070”. This work was also supported by EPSRC grant EP/K033344/1 (“Mining the Network Behaviour of Bots”). We are indebted to Lars Carlsson of Astra Zeneca for providing the data and useful discussions. We are also thankful to Zhiyuan Luo and Vladimir Vovk for many valuable comments and discussions.


  1. 1.
    Monve, V.: Introduction to similarity searching in chemistry. MATCH - Comm. Math. Comp. Chem. 51, 7–38 (2004)MathSciNetGoogle Scholar
  2. 2.
    Bottou, L., Chapelle, O., DeCoste, D., Weston, J.: Large-Scale Kernel Machines (Neural Information Processing). The MIT Press, Cambridge (2007)Google Scholar
  3. 3.
    Bussonnier, M.: Interactive parallel computing in Python.
  4. 4.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011).
  5. 5.
    Chang, E.Y.: PSVM: parallelizing support vector machines on distributed computers. Foundations of Large-Scale Multimedia Information Management and Retrieval, pp. 213–230. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Faulon Jr., J.-L., Visco, D.P., Pophale, R.S.: The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)CrossRefGoogle Scholar
  7. 7.
    Gammerman, A., Vovk, V.: Hedging predictions in machine learning. Comput. J. 50(2), 151–163 (2007)CrossRefGoogle Scholar
  8. 8.
    Gärtner, T.: Kernels For Structured Data. World Scientific Publishing Co. Inc., River Edge (2009)zbMATHGoogle Scholar
  9. 9.
    Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel Support Vector Machines: The Cascade SVM. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 521–528. MIT Press, Cambridge (2005)Google Scholar
  10. 10.
    Jain, A.N., Nicholls, A.: Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22(3–4), 133–139 (2008)CrossRefGoogle Scholar
  11. 11.
    Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer-Verlag New York, Inc., Secaucus (2005)zbMATHGoogle Scholar
  13. 13.
    Weis, D.C., Visco Jr., D.P., Faulon, J.-L.: Data mining pubchem using a support vector machine with the signature molecular descriptor: classification of factor XIa inhibitors. J. Mol. Graph. Model. 27(4), 466–475 (2008)CrossRefGoogle Scholar
  14. 14.
    Woodsend, K., Gondzio, J.: Hybrid MPI/OpenMP parallel linear support vector machine training. J. Mach. Learn. Res. 10, 1937–1953 (2009)MathSciNetzbMATHGoogle Scholar
  15. 15.
    You, Y., Fu, H., Song, S.L., Randles, A., Kerbyson, D., Marquez, A., Yang, G., Hoisie, A.: Scaling support vector machines on modern HPC platforms. J. Parallel Distrib. Comput. 76(C), 16–31 (2015)CrossRefGoogle Scholar
  16. 16.
    Toccaceli, P., Nouretdinov, I., Luo, Z., Vovk, V., Carlsson, L., Gammerman, A.: Conformal predictors. Technical report for EU Horizon 2020 Programme ExCape Project. Royal Holloway, London, December 2015Google Scholar
  17. 17.
    Carlsson, L., Ahlberg, E., Boström, H., Johansson, U., Linusson, H.: Modifications to p-values of conformal predictors. In: SLDS 2015, pp. 251–259Google Scholar
  18. 18.
    Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. In: Pacific Symposium on Biocomputing, p. 311 (2012)Google Scholar
  19. 19.
    Wang, Y., Suzek, T., Zhang, J., Wang, J., He, S., Cheng, T., Shoemaker, B.A., Gindulyte, A., Bryant, S.H.: PubChem BioAssay: 2014 update. Nucleic Acids Res. 42(1), D1075–D1082 (2014)CrossRefGoogle Scholar
  20. 20.
    McCool, M., Robison, A.D., Reinders, J.: Structured Parallel Programming: Patterns for Efficient Computation. Morgan-Kaufmann, Burlington (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Royal HollowayUniversity of LondonEghamUK

Personalised recommendations