Distance Based Generalisation
Many distance-based methods in machine learning are able to identify similar cases or prototypes from which decisions can be made. The explanation given is usually based on expressions such as “because case a is similar to case b”. However, a more general or meaningful pattern, such as “because case a has properties x and y (as b has)” is usually more difficult to find. Even in this case, the connection of this pattern with the original distance-based method is generally unclear, or even inconsistent. In this paper, we study the connection between the concept of distance (or similarity) and the concept of generalisation. More precisely, we define several conditions which, in our view, a sensible distance-based generalisation must have. From that, we are able to tell whether a generalisation operator for a pattern representation language is consistent with the metric space defined by the underlying distance. We show that there are pattern languages and generalisation operators which comply with these properties for typical data types: nominal, numerical, sets and lists. We also show the relationship between the well-known concepts of lgg and distances between terms, and the definition of generalisation presented in this paper.
KeywordsDistance-based methods generalisation operators lgg metric space
Unable to display preview. Download preview PDF.
- 1.Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)Google Scholar
- 2.Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)Google Scholar
- 4.Gartner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57 (2004)Google Scholar
- 8.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 9.Ramon, J., Bruynooghe, M., Van Laer, W.: Distance measures between atoms. In: CompulogNet Area Meeting on Computational Logic and Machine Learing, University of Manchester, UK, pp. 35–41 (1998)Google Scholar