Abstract
Many distance-based methods in machine learning are able to identify similar cases or prototypes from which decisions can be made. The explanation given is usually based on expressions such as “because case a is similar to case b”. However, a more general or meaningful pattern, such as “because case a has properties x and y (as b has)” is usually more difficult to find. Even in this case, the connection of this pattern with the original distance-based method is generally unclear, or even inconsistent. In this paper, we study the connection between the concept of distance (or similarity) and the concept of generalisation. More precisely, we define several conditions which, in our view, a sensible distance-based generalisation must have. From that, we are able to tell whether a generalisation operator for a pattern representation language is consistent with the metric space defined by the underlying distance. We show that there are pattern languages and generalisation operators which comply with these properties for typical data types: nominal, numerical, sets and lists. We also show the relationship between the well-known concepts of lgg and distances between terms, and the definition of generalisation presented in this paper.
This work has been partially supported by the EU (FEDER) and the Spanish MEC, under grant TIN 2004-7943-C04-02, the Acción Integrada Hispano-Austriaca HU2003-0003, and the Generalitat Valenciana (MEDIM, GV04B/477).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Falaschi, M., Levi, G., Martelli, M., Palamidessi, C.: Declarative Modeling of the Operational Behavior of Logic Languages. Theoretical Computer Science 69(3), 289–318 (1989)
Gartner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57 (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comp. Surveys 31(3), 264–323 (1999)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Plotkin, G.: A note on inductive generalization. Machine Intelligence 5, 153–163 (1970)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Ramon, J., Bruynooghe, M., Van Laer, W.: Distance measures between atoms. In: CompulogNet Area Meeting on Computational Logic and Machine Learing, University of Manchester, UK, pp. 35–41 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2005). Distance Based Generalisation. In: Kramer, S., Pfahringer, B. (eds) Inductive Logic Programming. ILP 2005. Lecture Notes in Computer Science(), vol 3625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536314_6
Download citation
DOI: https://doi.org/10.1007/11536314_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28177-1
Online ISBN: 978-3-540-31851-4
eBook Packages: Computer ScienceComputer Science (R0)