Abstract
Kernels for structured data have gained a lot of attention in a world with an ever increasing amount of complex data, generated from domains such as biology, chemistry, or engineering. However, while many applications involve spatial aspects, up to now only few kernel methods have been designed to take 3D information into account. We introduce a novel kernel called the 3D Neighborhood Kernel. As a first step, we focus on 3D structures of proteins and ligands, in which the atoms are represented as points in 3D space. By comparing the Euclidean distances between selected sets of atoms, the kernel can select spatial features that are important for determining functions of proteins or interactions with other molecules. We evaluate the kernel on a number of benchmark datasets and show that it obtains a competitive performance w.r.t. the state-of-the-art methods. While we apply this kernel to proteins and ligands, it is applicable to any kind of 3D data where objects follow a common schema, such as RNA, cars, or standardized equipment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ballester, P.J., Mitchell, J.B.O.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)
Borgwardt, K.: Graph Kernels. Ph.D. thesis, Computer Science, Ludwig-Maximilians-University Munich (2007)
Borgwardt, K., Ong, C., Schonauer, S., Vishwanathan, S., Smola, A., Kriegel, H.: Protein function prediction via graph kernels. Bioinformatics 21(S1), i47–i56 (2005)
de Berg, M., Cheong, O., Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, Heidelberg (2000)
Ceroni, A., Costa, F., Frasconi, P.: Classification of small molecules by two- and three-dimensional decomposition kernels. Bioinformatics 23(16), 2038–2045 (2007)
Costa, F., De Grave, K.: Fast neighborhood subgraph pairwise distance kernel. In: Proceedings of the 27th International Conference on Machine Learning, pp. 255-262 (2010)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel Based Methods. Cambridge University Press, UK (2000)
Deforche, K.: Modeling HIV resistance evolution under drug selective pressure. Ph.D. thesis, Katholieke Universiteit Leuven (2008)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345, 187–199 (2005)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 98, 209–226 (1977)
Hinselmann, G., Fechner, N., Jahn, A., Eckert, M., Zell, A.: Graph kernels for chemical compounds using topological and three-dimensional local atom pair environments. Neurocomputing 74, 219–229 (2010)
Hue, M., Riffle, M., Vert, J.-P., Stafford Noble, W.: Large-scale prediction of protein-protein interactions from structures. BMC Bioinform. 11(144), 1–9 (2010)
Joachims, T.: Learning to Classify Text using Support Vector Machines: Methods, Theory, and Algorithms. Springer, US (2002)
King, R.D., Muggleton, S., Srinivasan, A., Sternberg, M.J.E.: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Natl. Acad. Sci. 93, 438–442 (1996)
Kuramochi, M., Karypis, G.: Discovering frequent geometric subgraphs. In: Proceedings of the 2004 IEEE International Conference on Data Mining, pp. 258–265 (2004)
Lee, D.T., Wong, C.K.: Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Informatica 9, 23–29 (1977)
Nowozin, S., Tsuda, K.: Frequent subgraph retrieval in geometric graph databases. In: Proceedings of the 2008 IEEE International Conference on Data Mining, pp. 953–958 (2008)
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press (1998)
Qiu, J., Hue, M., Ben-Hur, A., Vert, J.-P., Stafford Noble, W.: A structural alignment kernel for protein structures. Bioinformatics 23(9), 1090–1098 (2007)
Ramon, J., Gärtner, T.: Expressivity versus efficiency of graph kernels. In: Proceedings of the First International Workshop on Mining Graphs, Trees and Sequences (MGTS2003), pp. 65–74 (2003)
Saidi, R., Maddouri, M., Nguifo, E.M.: Comparing graph-based representations of protein for mining purposes. In: Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics, pp. 35–38 (2009)
Sali, A., Blundell, T.L.: Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993)
Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Džeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinform. 11(2), 1–14 (2010)
Schietgat, L., Ramon, J., Bruynooghe, M.: A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics. Ann. Math. Artif. Intell. 69, 343–376 (2013)
Shervashidze, N., Borgwardt, K.: Fast subtree kernels on graphs. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1660–1668. Curran, USA (2009)
Srinivasan, A., Page, D., Camacho, R., King, R.D.: Quantitative pharmacophore models with inductive logic programming. Mach. Learn. 64, 65–90 (2006)
Suykens, J., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2005)
Wang, R., et al.: The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005)
Acknowledgements
The authors would like to thank students Davy De Mits and Sunil Aryal for conducting preliminary experiments, Dr. Kurt De Grave and Dr. Fabrizio Costa for assistance with running NSPDK, and Jérôme Renaux for proofreading. This research was supported by ERC-StG 240186 MiGraNT and IWT-SBO Nemoa.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Schietgat, L., Fannes, T., Ramon, J. (2015). Predicting Protein Function and Protein-Ligand Interaction with the 3D Neighborhood Kernel. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-24282-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)