Abstract
Kernel methods are a class of algorithms for pattern analysis with a number of convenient features. This paper proposes extension of the kernel method for biological screening data including chemical compounds. Our investigation of extending kernel aims to combine properties of graphical structure and molecule descriptors. The use of such kernels allows comparison of compounds, not only on graphs but also on important molecular descriptors. Our experimental evaluation of eight different classification problems shows that a proposed special kernel, which takes into account chemical molecule structure and molecule descriptors, statistically improves significantly the classification performance.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence 97(1-2), 245–271 (1997)
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.: The predictive toxicology evaluation challenge. In: 15th IJCAI (1997)
Burden, F.R.: Molecular Identification Number For Substructure Searches. Journal of Chemical Information and Computer Sciences 29, 225–227 (1989)
Dtp aids antiviral screen dataset, http://dtp.nci.nih.gov/
Mahe, D.P., Ueda, N., Akutsu, T., Perret, J.-L., Vert, J.-P.: Extensions of Marginalized Graph Kernels. In: Proc. 21st Int’l Conf. Machine Learning (2004)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and thesubset selection problem. In: Machine Learning: Proc. of the11th Intern. Conf., pp. 121–129 (1994)
Richards, G.W.: Virtual screening using grid computing: the screensaver project. Nature Reviews: Drug Discovery 1, 551–554 (2002)
Froehlich, H., Wegner, J.K., Zell, A.: QSAR Comb. Sci. 23, 311–318 (2004)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification using String Kernels. Journal of Machine Learning Research 2 (2002)
Hawkins, D.M., Young, S.S., Rusinko, A.: Analysis of a Large Structure- Activity Data Set Using Recursive Partitioning. Quantitative Structure Activity Relationships 16, 296–302 (1997)
Guyon, I., Elisseeff, A.: An Introduction into Variable and Feature Selection. Journal of Machine Learning Research (Special Issue on Variable and Feature Selection) 3, 1157–1182 (2003)
Kandola, J., Shawe-Taylor, J., Cristianini, N.: On the application of diffusion kernel to text data. Technical report, Neurocolt, NeuroCOLT Technical Report NC-TR-02- 122 (2002)
Palm, K., Stenburg, P., Luthman, K., Artursson, P.: Pharam. Res. 14, 586–571 (1997)
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press, Menlo Park (2003)
Lam, R.: Design and Analysis of Large chemical Databases for Drug Discovery. Ph.D thesis presented to Department of Statistics and Actuarial Science, University of Waterloo, Canada (2001)
Nikolova, N., Jaworska, J.: Approaches to Measure Chemical Similarity. Review, QSAR & Combinatorial Science 22, 9–10 (2003)
Wessel, M.D., Jurs, P.C., Tolan, J.W., Muskal, S.M.J.: Chem. Inf. Comput. Sci. 38, 726–735 (1998)
Pearlman, R.S., Smith, K.M.: Novel software tools for chemical diversity. Perspectives in Drug Discovery and Design 9/10/11, 339–353 (1998)
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in hiv data. In: 7th International Conference on Knowledge Discovery and Data Mining (2001)
Vishwanathan, S., Smola, A.: Fast Kernels for String and Tree Matching. In: NIPS, pp. 569–576 (2002)
Schoelkopf, B., Smola, A.J.: Learning with kernels. MIT Press, Cambridge (2002)
Shawe-Tylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Gaertner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(1), 270–275 (2003)
Joachims, T.: Learning to Classify Text using Support Vector Machines: Machines, Theory and Algorithms. Kluwer Academic Publishers, Boston (2002)
Todeschini, R., Consonni, V. (eds.): Handbook of Molecular Descriptors. Wiley-VCH, Weinheim (2000)
Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 18, S268–S275 (2002)
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explorations (Special Issue on Multi-Relational Data Mining) 5 (2003)
Cheminformatics library, http://www-ra.informatik.uni-tuebingen.de/software/joelib/
exactRankTests: Exact Distributions for Rank and Permutation Tests, http://cran.r-project.org/src/contrib/Descriptions/exactRankTests.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kozak, K., Kozak, M., Stapor, K. (2007). Kernels for Chemical Compounds in Biological Screening. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71629-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-71629-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71590-0
Online ISBN: 978-3-540-71629-7
eBook Packages: Computer ScienceComputer Science (R0)