Advertisement

Distance Based Generalisation

  • V. Estruch
  • C. Ferri
  • J. Hernández-Orallo
  • M. J. Ramírez-Quintana
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3625)

Abstract

Many distance-based methods in machine learning are able to identify similar cases or prototypes from which decisions can be made. The explanation given is usually based on expressions such as “because case a is similar to case b”. However, a more general or meaningful pattern, such as “because case a has properties x and y (as b has)” is usually more difficult to find. Even in this case, the connection of this pattern with the original distance-based method is generally unclear, or even inconsistent. In this paper, we study the connection between the concept of distance (or similarity) and the concept of generalisation. More precisely, we define several conditions which, in our view, a sensible distance-based generalisation must have. From that, we are able to tell whether a generalisation operator for a pattern representation language is consistent with the metric space defined by the underlying distance. We show that there are pattern languages and generalisation operators which comply with these properties for typical data types: nominal, numerical, sets and lists. We also show the relationship between the well-known concepts of lgg and distances between terms, and the definition of generalisation presented in this paper.

Keywords

Distance-based methods generalisation operators lgg metric space 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)Google Scholar
  2. 2.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)Google Scholar
  3. 3.
    Falaschi, M., Levi, G., Martelli, M., Palamidessi, C.: Declarative Modeling of the Operational Behavior of Logic Languages. Theoretical Computer Science 69(3), 289–318 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Gartner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57 (2004)Google Scholar
  5. 5.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comp. Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  6. 6.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  7. 7.
    Plotkin, G.: A note on inductive generalization. Machine Intelligence 5, 153–163 (1970)MathSciNetGoogle Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  9. 9.
    Ramon, J., Bruynooghe, M., Van Laer, W.: Distance measures between atoms. In: CompulogNet Area Meeting on Computational Logic and Machine Learing, University of Manchester, UK, pp. 35–41 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • V. Estruch
    • 1
  • C. Ferri
    • 1
  • J. Hernández-Orallo
    • 1
  • M. J. Ramírez-Quintana
    • 1
  1. 1.DSICUniv. Politècnica de ValènciaValènciaSpain

Personalised recommendations