Advertisement

Data Mining and Knowledge Discovery

, Volume 15, Issue 3, pp 349–381 | Cite as

Tree-Traversing Ant Algorithm for term clustering based on featureless similarities

  • Wilson Wong
  • Wei Liu
  • Mohammed Bennamoun
Article

Abstract

Many conventional methods for concepts formation in ontology learning have relied on the use of predefined templates and rules, and static resources such as WordNet. Such approaches are not scalable, difficult to port between different domains and incapable of handling knowledge fluctuations. Their results are far from desirable, either. In this paper, we propose a new ant-based clustering algorithm, Tree-Traversing Ant (TTA), for concepts formation as part of an ontology learning system. With the help of Normalized Google Distance (NGD) and n° of Wikipedia (n°W) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable and portable across domains. Evaluations with an seven datasets show promising results with an average lexical overlap of 97% and ontological improvement of 48%. At the same time, the evaluations demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods.

Keywords

Ontology learning Text mining Term clustering Concept discovery Cluster analysis Featureless similarity measures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett C, Gacs P, Li M, Vitanyi P and Zurek W (1998). Information distance. IEEE Trans Inform Theory 44(4): 1407–1423 zbMATHCrossRefMathSciNetGoogle Scholar
  2. Berkhin P (2002) Survey of clustering data mining techniques. Technical report. Accrue SoftwareGoogle Scholar
  3. Choi B, Yao Z (2005) Web page classification. In: Chu W, Lin T (eds) Foundations and advances in data mining. Springer-VerlagGoogle Scholar
  4. Cilibrasi R, Vitanyi P (2005) Automatic meaning discovery using google. http://xxx.lanl. gov/abs/cs.CL/0412098Google Scholar
  5. Cilibrasi R, Vitanyi P (2006) Automatic extraction of meaning from the web. In: Proceedings of the IEEE international symposium on information theory, Seattle, USAGoogle Scholar
  6. Cimiano P, Staab S (2005) Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Proceedings of the workshop on learning and extending lexical ontologies with machine learning methods, Bonn, GermanyGoogle Scholar
  7. Dellschaft K, Staab S (2006) On how to perform a gold standard based evaluation of ontology learning. In: Proceedings of the 5th international semantic web conference (ISWC)Google Scholar
  8. Deneubourg J, Goss S, Franks N, Sendova-Franks A, Detrain C, Chretien L (1991) The dynamics of collective sorting: robot-like ants and ant-like robots. In: Proceedings of the 1st international conference on simulation of adaptive behavior: from animals to Animats, FranceGoogle Scholar
  9. Faure D, Nedellec C (1998) A corpus-based conceptual clustering method for verb frames and ontology acquisition. In: Proceedings of the 1st international conference on language resources and evaluation (LREC), Granada, SpainGoogle Scholar
  10. Faure D, Poibeau T (2000) First experiments of using semantic knowledge learned by asium for information extraction task using intex. In: Proceedings of the 1st Workshop on Ontology Learning, Berlin, GermanyGoogle Scholar
  11. Gomez-Perez A, Manzano-Macho D (2003) A survey of ontology learning methods and techniques. Deliverable 1.5, OntoWeb ConsortiumGoogle Scholar
  12. Grunwald P and Vitanyi P (2003). Kolmogorov complexity and information theory. J Logic Language(and Information) 12(4): 497–529 CrossRefMathSciNetGoogle Scholar
  13. Gutowitz H (1993) Complexity-seeking ants. In: Proceedings of the 3rd European conference on artificial life.Google Scholar
  14. Handl J, Meyer B (2002) Improved ant-based clustering and sorting. In: Proceedings of the 7th international conference on parallel problem solving from natureGoogle Scholar
  15. Handl J, Knowles J, Dorigo M (2003) Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and 1d-som. Technical Report TR/IRIDIA/2003-24, Universite Libre de BruxellesGoogle Scholar
  16. Handl J, Knowles J and Dorigo M (2006). Ant-based clustering and topographic mapping. Artif Life 12(1): 35–61 CrossRefGoogle Scholar
  17. Jain A, Murty M and Flynn P (1999). Data clustering: a review. ACM Comput Survey 31(3): 264–323 CrossRefGoogle Scholar
  18. Lagus K, Honkela T, Kaski S, Kohonen T (1996) Self-organizing maps of document collections: A new approach to interactive exploration. In: Proceedings of the 2nd international conference on knowledge discovery and data miningGoogle Scholar
  19. Lelewer D and Hirschberg D (1987). Data compression. ACM Comput Surveys 19(3): 261–296 zbMATHCrossRefGoogle Scholar
  20. Lumer E, Faieta B (1994) Diversity and adaptation in populations of clustering ants. In: Proceedings of the 3rd international conference on simulation of adaptive behavior: from animals to animats 3Google Scholar
  21. Maedche A, Staab S (2002) Measuring similarity between ontologies. In: Proceedings of the European conference on knowledge acquisition and management (EKAW), Madrid, SpainGoogle Scholar
  22. Maedche A, Volz R (2001) The ontology extraction & maintenance framework: text-to-onto. In: Proceedings of the IEEE international conference on data mining, California, USAGoogle Scholar
  23. Ritter H and Kohonen T (1989). Self-organizing semantic maps. Biol Cybernet 61(1): 241–254 CrossRefGoogle Scholar
  24. Sabou M, Wroe C, Goble C, Mishne G (2005) Learning domain ontologies for web service descriptions: an experiment in bioinformatics. In: Proceedings of the 14th international conference on World Wide WebGoogle Scholar
  25. Shamsfard M, Barforoush A (2002) An introduction to hasti: an ontology learning system. In: Proceedings of the 7th Iranian conference on electrical engineering, Tehran, IranGoogle Scholar
  26. Shamsfard M and Barforoush A (2004). Learning ontologies from natural language texts. Int J Human-Computer Stud 60(1): 17–63 CrossRefGoogle Scholar
  27. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical Report 00-034, University of MinnesotaGoogle Scholar
  28. Vitanyi P (2005) Universal similarity. In: Proceedings of the IEEE ITSOC information theory workshop on coding and complexity, New ZealandGoogle Scholar
  29. Vizine A, deCastro L, Hruschka E and Gudwin R (2005). Towards improving clustering ants: an adaptive ant clustering algorithm. Informatica 29(2): 143–154 zbMATHGoogle Scholar
  30. Wong W, Liu W, Bennamoun M (2006) Terms clustering using tree-traversing ants and featureless similarities. In: Proceedings of the international symposium on practical cognitive agents and robots, Perth, AustraliaGoogle Scholar
  31. Yao Z, Choi B (2003) Bidirectional hierarchical clustering for web mining. In: Proceedings of the IEEE/WIC international conference on web intelligenceGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.School of Computer Science and Software EngineeringUniversity of Western AustraliaCrawleyAustralia

Personalised recommendations