Optimizing Feature Sets for Structured Data

Rückert, Ulrich; Kramer, Stefan

doi:10.1007/978-3-540-74958-5_72

Ulrich Rückert¹ &
Stefan Kramer¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Included in the following conference series:

European Conference on Machine Learning

5672 Accesses
11 Citations

Abstract

Choosing a suitable feature representation for structured data is a non-trivial task due to the vast number of potential candidates. Ideally, one would like to pick a small, but informative set of structural features, each providing complementary information about the instances. We frame the search for a suitable feature set as a combinatorial optimization problem. For this purpose, we define a scoring function that favors features that are as dissimilar as possible to all other features. The score is used in a stochastic local search (SLS) procedure to maximize the diversity of a feature set. In experiments on small molecule data, we investigate the effectiveness of a forward selection approach with two different linear classification schemes.

Download to read the full chapter text

Chapter PDF

A Feature Selection Algorithm Based on Heuristic Decomposition

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

Efficient cross-validation traversals in feature subset selection

Article Open access 12 December 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Borgelt, C.: On canonical forms for frequent graph mining. In: Proc. 3rd Int. Workshop on Mining Graphs, Trees, and Sequences, pp. 1–12 (2005)
Google Scholar
Bringmann, B., Zimmermann, A., De Raedt, L., Nijssen, S.: Don’t be afraid of simpler patterns. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 55–66. Springer, Heidelberg (2006)
Chapter Google Scholar
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17(8), 1036–1050 (2005)
Article Google Scholar
Fang, H., Tong, W., Shi, L.M., Blair, R., Perkins, R., Branham, W., Hass, B.S., Xie, Q., Dial, S.L., Moland, C.L., Sheehan, D.M.: Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chemical Research in Toxicology 14(3), 280–294 (2001)
Article Google Scholar
Fröhlich, H., Wegner, J.K., Sieker, F., Zell, A.: Optimal assignment kernels for attributed molecular graphs. In: Proceedings of the 22nd ICML, pp. 225–232. ACM Press, New York (2005)
Google Scholar
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning simple relational kernels. In: AAAI, AAAI Press, Stanford (2006)
Google Scholar
Li, H., Yap, C.W., Ung, C.Y., Xue, Y., Cao, Z.W., Chen, Y.Z.: Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods. Journal of Chemical Information and Modeling 45(5), 1376–1384 (2005)
Article Google Scholar
Rückert, U., Kramer, S.: Stochastic local search in k-term DNF learning. In: Proceedings of the 20th ICML, pp. 648–655. AAAI Press (2003)
Google Scholar
Rückert, U., Kramer, S.: A statistical approach to rule learning. In: Proceedings of the 23rd ICML, pp. 785–792. ACM Press, New York (2006)
Google Scholar
Yoshida, F., Topliss, J.: QSAR model for drug human oral bioavailability. J. Med. Chem. 43, 2575–2585 (2000)
Article Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik/I12, Technische Universität München, Boltzmannstr. 3, D-85748 Garching b. München, Germany
Ulrich Rückert & Stefan Kramer

Authors

Ulrich Rückert
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rückert, U., Kramer, S. (2007). Optimizing Feature Sets for Structured Data. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_72

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimizing Feature Sets for Structured Data

Abstract

Chapter PDF

Similar content being viewed by others

A Feature Selection Algorithm Based on Heuristic Decomposition

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

Efficient cross-validation traversals in feature subset selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Optimizing Feature Sets for Structured Data

Abstract

Chapter PDF

Similar content being viewed by others

A Feature Selection Algorithm Based on Heuristic Decomposition

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

Efficient cross-validation traversals in feature subset selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation