Instance Based Clustering of Semantic Web Resources

  • Gunnar AAstrand Grimnes
  • Peter Edwards
  • Alun Preece
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5021)


The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue that an essential feature for such agents is the capability to analyse data and learn. In this paper we outline the challenges and issues surrounding the application of clustering algorithms to Semantic Web data. We present several ways to extract instances from a large RDF graph and computing the distance between these. We evaluate our approaches on three different data-sets, one representing a typical relational database to RDF conversion, one based on data from a ontologically rich Semantic Web enabled application, and one consisting of a crawl of FOAF documents; applying both supervised and unsupervised evaluation metrics. Our evaluation did not support choosing a single combination of instance extraction method and similarity metric as superior in all cases, and as expected the behaviour depends greatly on the data being clustered. Instead, we attempt to identify characteristics of data that make particular methods more suitable.


Cluster Solution Conceptual Graph Supervise Evaluation Blank Node Datatype Property 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 28–37 (2001)CrossRefGoogle Scholar
  2. 2.
    Grimnes, G.A., Edwards, P., Preece, A.: Learning Meta-Descriptions of the FOAF Network. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 152–165. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Edwards, P., Grimnes, G.A., Preece, A.: An Empirical Investigation of Learning from the Semantic Web. In: ECML/PKDD, Semantic Web Mining Workshop, pp. 71–89 (2002)Google Scholar
  4. 4.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)CrossRefGoogle Scholar
  5. 5.
    Sauermann, L., Grimnes, G.A., Kiesel, M., Fluit, C., Maus, H., Heim, D., Nadeem, D., Horak, B., Dengel, A.: Semantic desktop 2.0: The gnowsis experience. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Montes-y-Gómez, M., Gelbukh, A., López-López, A.: Comparison of Conceptual Graphs. In: Cairó, O., Cantú, F.J. (eds.) MICAI 2000. LNCS, vol. 1793, pp. 548–556. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Dieng, R., Hug, S.: Comparison of personal ontologies represented through conceptual graphs. In: Proceedings of ECAI 1998, pp. 341–345 (1998)Google Scholar
  8. 8.
    Euzenat, J., Valtchev, P.: An integrative proximity measure for ontology alignment. In: Proceedings of the 1st Intl. Workshop on Semantic Integration. CEUR, vol. 82 (2003)Google Scholar
  9. 9.
    Maedche, A., Zacharias, V.: Clustering ontology-based metadata in the semantic web. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 348–360. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, The University of Texas at Austin (2002)Google Scholar
  11. 11.
    Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22. ACM Press, New York (1999)CrossRefGoogle Scholar
  12. 12.
    Heß, A.: Supervised and Unsupervised Ensemble Learning for the Semantic Web. PhD thesis, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland (2006)Google Scholar
  13. 13.
    Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: KDD, pp. 287–290 (1997)Google Scholar
  14. 14.
    Sugar, C.A., James, G.M.: Finding the Number of Clusters in a Data Set - An Information Theoretic Approach. Journal of the American Statistical Association 98, 750–763 (2003)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Gunnar AAstrand Grimnes
    • 1
  • Peter Edwards
    • 2
  • Alun Preece
    • 3
  1. 1.Knowledge Management DepartmentDFKI GmbHKaiserslauternGermany
  2. 2.Computing Science DepartmentUniversity of AberdeenUK
  3. 3.Computer Science DepartmentCardiff UniversityUK

Personalised recommendations