Machine Learning

, Volume 87, Issue 1, pp 57–92 | Cite as

Similarity measures over refinement graphs

Article

Abstract

Similarity also plays a crucial role in support vector machines. Similarity assessment plays a key role in lazy learning methods such as k-nearest neighbor or case-based reasoning. In this paper we will show how refinement graphs, that were originally introduced for inductive learning, can be employed to assess and reason about similarity. We will define and analyze two similarity measures, Sλ and Sπ, based on refinement graphs. The anti-unification-based similarity, Sλ, assesses similarity by finding the anti-unification of two instances, which is a description capturing all the information common to these two instances. The property-based similarity, Sπ, is based on a process of disintegrating the instances into a set of properties, and then analyzing these property sets. Moreover these similarity measures are applicable to any representation language for which a refinement graph that satisfies the requirements we identify can be defined. Specifically, we present a refinement graph for feature terms, in which several languages of increasing expressiveness can be defined. The similarity measures are empirically evaluated on relational data sets belonging to languages of different expressiveness.

Keywords

Similarity measures Refinement graphs Case-based reasoning Feature terms Lazy learning 

References

  1. Aamodt, A., & Plaza, E. (1994). Case-based reasoning: foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications, 7(1), 39–59. Google Scholar
  2. Aït-Kaci, H. (2007). Description logic vs. order-sorted feature logic. In Description Logics. Google Scholar
  3. Aït-Kaci, H., & Podelski, A. (1992). Towards a meaning of LIFE. Technical Report 11, Digital Research Laboratory. Google Scholar
  4. Aït-Kaci, H., Podelski, A., & Goldstein, S. C. (1993). Order-sorted feature theory unification. In ILPS ’93: Proc. 1993 international symposium on logic programming (pp. 506–524). Cambridge: MIT Press. Google Scholar
  5. Aït-Kaci, H., & Sasaki, Y. (2001). An axiomatic approach to feature term generalization. In EMCL ’01: Proceedings of the 12th European conference on machine learning (pp. 1–12). London: Springer. Google Scholar
  6. Arcos, J. L. (1997). The NOOS representation language. PhD thesis, Universitat Politècnica de Catalunya. Google Scholar
  7. Armengol, E., & Plaza, E. (2003). Relational case-based reasoning for carcinogenic activity prediction. Artificial Intelligence Review, 20(1–2), 121–141. CrossRefGoogle Scholar
  8. Armengol, E., & Plaza, E. (2005). Lazy learning for predictive toxicology based on a chemical ontology. In W. Dubitzky & F. Azuaje (Eds.), Artificial intelligence methods and tools for systems biology (Vol. 5, pp. 1–18). Berlin: Springer. CrossRefGoogle Scholar
  9. Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., & Patel-Schneider, P. F. (Eds.) (2003). The description logic handbook: theory, implementation, and applications. Cambridge: Cambridge University Press. MATHGoogle Scholar
  10. Bergmann, R., & Stahl, A. (1998). Similarity measures for object-oriented case representations. In Lecture notes in artificial intelligence: Vol. 1488. Advances in case-based reasoning, EWCBR-98 (pp. 8–13). Berlin: Springer. Google Scholar
  11. Bisson, G. (1992). Learing in FOL with a similarity measure. In Proceedings of AAAI 1992 (pp. 82–87). Google Scholar
  12. Börner, K. (1993). Structural similarity as a guidance in case-based design. In Lecture notes in computer science: Vol. 837. Topics in case-based reasoning: EWCBR’93 (pp. 197–208). CrossRefGoogle Scholar
  13. Carpenter, B. (1992). Cambridge tracts in theoretical computer science: Vol. 32. The logic of typed feature structures. Cambridge: Cambridge University Press. MATHCrossRefGoogle Scholar
  14. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. MATHCrossRefGoogle Scholar
  15. d’Amato, C., Staab, S., & Fanizzi, N. (2008). On the influence of description logics ontologies on conceptual similarity. In Lecture notes in computer science: Vol. 5268. EKAW ’08, Proc. 16th int. conf. on knowledge engineering: practice and patterns (pp. 48–63). CrossRefGoogle Scholar
  16. Dietterich, T., Domingos, P., Getoor, L., Muggleton, S., & Tadepalli, P. (2008). Structured machine learning: the next ten years. Machine Learning, 3–23. Google Scholar
  17. Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In L. Saitta (Ed.), Proceedings 13th international conference on machine learning (pp. 122–130). Los Altos: Kaufmann. Google Scholar
  18. Fanizzi, N., D’Amato, C., & Esposito, F. (2007). Induction of optimal semi-distances for individuals based on feature sets. In Proc. 2007 international workshop on description logics. CEUR-WS. Google Scholar
  19. Fanizzi, N., D’Amato, C., & Esposito, F. (2008). DL-FOIL concept learning in description logics. In Lecture notes in computer science: Vol. 5194. ILP ’08: Proceedings of the 18th int. conf. inductive logic programming (pp. 107–121). Berlin: Springer. Google Scholar
  20. González-Calero, P. A., Díaz-Agudo, B., & Gómez-Albarrán, M. (1999). Applying DLs for retrieval in case-based reasoning. In Proceedings of the 1999 description logics workshop (DL’99). Google Scholar
  21. Helma, C., King, R., Kramer, S., & Srinivasan, A. (2001). The predictive toxicology challenge 2000–2001. Bioinformatics, 17, 107–108. CrossRefGoogle Scholar
  22. Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of 8th annual conf. cognitive science society (pp. 1–12). Google Scholar
  23. Horváth, T., Wrobel, S., & Bohnebeck, U. (2001). Relational instance-based learning with lists and terms. Machine Learning, 43(1/2), 53–80. MATHCrossRefGoogle Scholar
  24. Hutchinson, A. (1997). Metrics on terms and clauses. In Lecture notes in computer science: Vol. 1224. ECML ’97: Proceedings of the 9th European conference on machine learning (pp. 138–145). Berlin: Springer. CrossRefGoogle Scholar
  25. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the twentieth international conference (ICML 2003) (pp. 321–328). Menlo Park: AAAI Press. Google Scholar
  26. Larson, J., & Michalski, R. S. (1977). Inductive inference of VL decision rules. SIGART Bulletin, 63, 38–44. CrossRefGoogle Scholar
  27. Lavrač, N., & Džeroski, S. (1994). Inductive logic programming. Techniques and applications. Chichester: Ellis Horwood. MATHGoogle Scholar
  28. Lehmann, J., & Hitzler, P. (2008). Foundations of refinement operators for description logics. In Lecture notes in computer science: Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP) (pp. 161–174). Berlin: Springer. Google Scholar
  29. Lehmann, J., & Hitzler, P. (2008). A refinement operator based learning algorithm for the ALC description logic. In Lecture notes in computer science: Vol. 4894. Proceedings of the 17th international conference on inductive logic programming (ILP) (pp. 147–160). Berlin: Springer. Google Scholar
  30. Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: a new East–West challenge. Technical report, Oxford University Computing Laboratory, Oxford, UK. Google Scholar
  31. Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226. MathSciNetCrossRefGoogle Scholar
  32. Ontañón, S., & Plaza, E. (2009). On similarity measures based on a refinement lattice. In D. Wilson & L. McGinty (Eds.), Lecture notes in computer science: Vol. 5650. Proc. 8th int. conf. on case-based reasoning (pp. 240–255). Berlin: Springer. Google Scholar
  33. Plaza, E. (1995). Cases as terms: a feature term approach to the structured representation of cases. In M. Veloso & A. Aamodt (Eds.), Lecture notes in artificial intelligence: Vol. 1010. Case-based reasoning, ICCBR-95 (pp. 265–276). Berlin: Springer. Google Scholar
  34. Plaza, E., Armengol, E., & Ontañón, S. (2005). The explanatory power of symbolic similarity in case-based reasoning. Artificial Intelligence Review, 24(2), 145–161. MATHCrossRefGoogle Scholar
  35. Plotkin, G. D. (1970). A note on inductive generalization. In Machine intelligence (Vol. 5, pp. 153–163). Edinburgh: Edinburgh University Press. Google Scholar
  36. Pollard, C. J., & Moshier, M. D. (1990). Unifying partial descriptions of sets. In P. P. Hanson (Ed.), Information, language and cognition (Vol. 1, pp. 167–184). Vancouver: University of British Columbia Press. Google Scholar
  37. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Google Scholar
  38. Raymond, J. W., Blankley, C. J., & Willett, P. (2003). Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. Journal of Molecular Graphics and Modelling, 21(5), 421–433. CrossRefGoogle Scholar
  39. Raymond, J. W., Gardiner, E. J., & Willett, P. (2002). Rascal: calculation of graph similarity using maximum common edge subgraphs. Computer Journal, 45(6), 631–644. MATHCrossRefGoogle Scholar
  40. Sánchez-Ruiz, A. A., Ontañón, S., González-Calero, P. A., & Plaza, E. (2011). Measuring similarity in description logics using refinement operators. In Lecture notes in computers science: Vol. 6880. Proc. 19th int. conf. on case-based reasoning (pp. 289–303). Berlin: Springer. Google Scholar
  41. Shapiro, E. Y. (1981). Inductive inference of theories from facts. Technical report 192, Dept. of Computer Science, Yale University. Google Scholar
  42. Tversky, A. (1977). Features of similarity. In Psychological review (Vol. 84, pp. 327–352). Google Scholar
  43. van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1992). Subsumption and refinement in model inference. Discussion paper 7, Erasmus University Rotterdam, Faculty of Economics. Google Scholar
  44. van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1994). Existence and nonexistence of complete refinement operators. In ECML-94: Proceedings of the European conference on machine learning (pp. 307–322). New York: Springer. CrossRefGoogle Scholar
  45. Willett, P., Barnard, J. M., & Downs, G. M. (1998). Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 38(6), 983–996. Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.IIIA, Artificial Intelligence Research Institute, CSICSpanish Council for Scientific Research, Campus UABBellaterraSpain

Personalised recommendations