Conceptual Clustering of Multi-Relational Data

  • Nuno A. Fonseca
  • Vítor Santos Costa
  • Rui Camacho
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7207)

Abstract

“Traditional” clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are “similar” among them and are “dissimilar” to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation.

We present a novel conceptual clustering system for multi-relational data, based on the popular k − medoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster’s members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters.

Keywords

Inductive Logic Programming Conceptual Cluster Subgroup Discovery Inductive Logic Programming System Bottom Clause 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson, G., Pfahringer, B.: Clustering Relational Data Based on Randomized Propositionalization. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 39–48. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (March 2007)Google Scholar
  3. 3.
    Bisson, G.: Conceptual clustering in a first order logic representation. In: ECAI 1992: Proceedings of the 10th European Conference on Artificial Intelligence, pp. 458–462. John Wiley & Sons, Inc., New York (1992)Google Scholar
  4. 4.
    Camacho, R., Fonseca, N.A., Rocha, R., Santos Costa, V.: ILP:- Just Trie It. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 78–87. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Davis, J., Burnside, E., de Castro Dutra, I., Page, D., Santos Costa, V.: An Integrated Approach to Learning Bayesian Networks of Rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Džeroski, S., Lavrač, N.: Learning relations from noisy examples: An empirical comparison of LINUS and FOIL. In: International Workshop on Machine Learning, pp. 399–402. Morgan Kaufmann (1991)Google Scholar
  7. 7.
    Emde, W., Wettschereck, D.: Relational instance based learning. In: Proceedings 13th ICML, pp. 122–130. Morgan Kaufmann Publishers (1996)Google Scholar
  8. 8.
    Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Machine Learning - Proceedings 13th International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann Publishers (1996)Google Scholar
  9. 9.
    Fonseca, N.A., Camacho, R., Rocha, R., Santos Costa, V.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining 89, 45–67 (2008)MATHGoogle Scholar
  10. 10.
    Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: ILP: Compute once, reuse often. In: 6th Workshop on Multi-Relational Data Mining, MRDM 2007 (2007)Google Scholar
  11. 11.
    Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: K-RNN: k-relational neareast neighbour algorithm. In: 23rd Annual ACM Symposium on Applied Computing, SAC 2008 (2008)Google Scholar
  12. 12.
    Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57(3), 205–232 (2004)MATHCrossRefGoogle Scholar
  13. 13.
    Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)Google Scholar
  14. 14.
    Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRefGoogle Scholar
  15. 15.
    Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recogn. Lett. 24(9-10), 1563–1569 (2003)MATHCrossRefGoogle Scholar
  16. 16.
    Horvath, T., Wrobel, S., Bohnebeck, U.: Relational instance-based learning with lists and terms. Machine Learning 43(1/2), 53–80 (2001)MATHCrossRefGoogle Scholar
  17. 17.
    Kirsten, M., Wrabel, S., Horváth, T.: Distance based approaches to relational learning and clustering, pp. 213–230 (2000)Google Scholar
  18. 18.
    Kirsten, M., Wrobel, S., Horvath, T.: Distance based approaches to relational learning and clustering. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 213–232. Springer (September 2001)Google Scholar
  19. 19.
    Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: De Raedt, L., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)Google Scholar
  20. 20.
    Kok, S., Domingos, P.: Extracting Semantic Networks from Text Via Relational Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 624–639. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavraç, N. (eds.) Relational Data Mining, pp. 262–286. Springer New York Inc., New York (2001)Google Scholar
  22. 22.
    Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: National Conference on Artificial Intelligence, pp. 795–800 (2005)Google Scholar
  23. 23.
    Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning simple relational kernels. In: AAAI (2006)Google Scholar
  24. 24.
    Lipkus, A.H.: A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)MATHCrossRefGoogle Scholar
  25. 25.
    Michalski, R.S., Stepp, R.E.: Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–409 (1983)CrossRefGoogle Scholar
  26. 26.
    Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)Google Scholar
  27. 27.
    Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. JLP 19/20, 629–679 (1994)CrossRefGoogle Scholar
  28. 28.
    Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)MATHGoogle Scholar
  29. 29.
    De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 133–140. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  30. 30.
    Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)CrossRefGoogle Scholar
  31. 31.
    Ramon, J., Bruynooghe, M.: A Framework for Defining Distances between First-Order Logic Objects. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 271–280. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  32. 32.
    Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)MATHCrossRefGoogle Scholar
  33. 33.
    Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)CrossRefGoogle Scholar
  34. 34.
    Sebag, M.: Distance Induction in First Order Logic. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 264–272. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  35. 35.
    Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis Predictions using ILP. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 273–287. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  36. 36.
    Yamamoto, A.: Which Hypotheses can be Found with Inverse Entailment? In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 296–308. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  37. 37.
    Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 344–353. ACM, New York (2005)CrossRefGoogle Scholar
  38. 38.
    Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nuno A. Fonseca
    • 1
    • 2
  • Vítor Santos Costa
    • 1
    • 3
  • Rui Camacho
    • 4
  1. 1.CRACS-INESC Porto LAUniversidade do PortoPortoPortugal
  2. 2.EMBL OutstationThe European Bioinformatics Institute (EBI)CambridgeUK
  3. 3.DCC-FCUPUniversidade do PortoPortoPortugal
  4. 4.LIAAD-INESC Porto LA & DEI-FEUPUniversidade do PortoPortoPortugal

Personalised recommendations