Knowledge and Information Systems

, Volume 30, Issue 3, pp 715–738 | Cite as

Multi-objective frequent termset clustering

  • Katharina MorikEmail author
  • Andreas Kaspari
  • Michael Wurst
  • Marcin Skirzynski
Regular Paper


Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions. We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.


Collaborative clustering Distributed data mining Ensemble clustering Multi-media collections 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data basesGoogle Scholar
  2. 2.
    Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshopGoogle Scholar
  3. 3.
    Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)Google Scholar
  4. 4.
    Benz D, Hotho A, Jäschke R, Krause B, Mitzlaff F, Schmitz C, Stumme G (2010) The social bookmark and publication management system bibsonomy—a platform for evaluating and demonstrating web 2.0 research. VLDB J 19(6): 849–875CrossRefGoogle Scholar
  5. 5.
    Bockermann C, Jungermann F (2010) Stream-based community discovery via relational hypergraph factorization on evolving networks. In: Proceedings of the workshop on dynamic networks and knowledge discovery (DyNaK 2010) at ECML PKDDGoogle Scholar
  6. 6.
    Coello Coello CA (1999) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3): 129–156Google Scholar
  7. 7.
    Deb K, Agrawal S, Pratab A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the parallel problem solving from nature conferenceGoogle Scholar
  8. 8.
    Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data miningGoogle Scholar
  9. 9.
    Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2): 198–208CrossRefGoogle Scholar
  10. 10.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of dataGoogle Scholar
  11. 11.
    Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the international conference on multidisciplinary information sciences and technologiesGoogle Scholar
  12. 12.
    Hotho A, Jäschke R, Schmitz C, Stumme G (2006) BibSonomy: a social bookmark and publication sharing system. In: Proceedings of the conceptual structures tool interoperability workshop at the international conference on conceptual structuresGoogle Scholar
  13. 13.
    Kaser O, Lemire D (2007) Tag-cloud drawing: algorithms for cloud visualization. In: WWW workshop on tagging and metadata for social information organizationGoogle Scholar
  14. 14.
    Kobayashi M, Aono M (2006) Exploring overlapping clusters using dynamic re-scaling and sampling. Knowl Inf Syst 10(3): 295–313CrossRefGoogle Scholar
  15. 15.
    Körner C, Benz D, Hotho A, Strohmaier M, Stumme G (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) Proceedings of the 19th international conference on world wide web, WWW 2010. ACM, NY, pp 521–530CrossRefGoogle Scholar
  16. 16.
    Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding and enhancement of internal clustering validation measures. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  17. 17.
    Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2010) Yale: Rapid prototyping for complex data mining tasks. In: Ungar L, Craven M, Gunopulos D, Eliassi-Rad T (eds) KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, NY, pp 935–940Google Scholar
  18. 18.
    Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, NYGoogle Scholar
  19. 19.
    Schmitz C, Hotho A, Jäschke R, Stumme G (2006) Mining association rules in folksonomies. In: Proceedings of the IFCS conferenceGoogle Scholar
  20. 20.
    Tatti N (2008) Maximum entropy based significance of itemsets. Knowl Inf Syst 17(1): 57–77CrossRefGoogle Scholar
  21. 21.
    Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the international conference on information and knowledge managementGoogle Scholar
  22. 22.
    Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn J 55: 311–331zbMATHCrossRefGoogle Scholar
  23. 23.
    Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4): 257–271CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Katharina Morik
    • 1
    Email author
  • Andreas Kaspari
    • 2
  • Michael Wurst
    • 3
  • Marcin Skirzynski
    • 1
  1. 1.Technical University Dortmund, Computer Science VIIIDortmundGermany
  2. 2.DuisburgGermany
  3. 3.Smarter Cities Technology Center, IBM DublinDublinIreland

Personalised recommendations