Summary
In this chapter we study the problem of classifying chemical compound datasets. We present a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average outperforms existing schemes by 10% to 35%.
Keywords
- Feature Selection
- Mining Chemical Compound
- Inductive Logic Programming
- Frequent Subgraph
- Anthrax Toxin
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag London Limited
About this chapter
Cite this chapter
Deshpande, M., Kuramochi, M., Karypis, G. (2005). Mining Chemical Compounds. In: Wu, X., Jain, L., Wang, J.T., Zaki, M.J., Toivonen, H.T., Shasha, D. (eds) Data Mining in Bioinformatics. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-059-1_9
Download citation
DOI: https://doi.org/10.1007/1-84628-059-1_9
Publisher Name: Springer, London
Print ISBN: 978-1-85233-671-4
Online ISBN: 978-1-84628-059-7
eBook Packages: Computer ScienceComputer Science (R0)
