Chapter

Bisociative Knowledge Discovery

Volume 7250 of the series Lecture Notes in Computer Science pp 104-121

Open Access This content is freely available online to anyone, anywhere at any time.

Cover Similarity Based Item Set Mining

  • Marc SegondAffiliated withEuropean Centre for Soft Computing
  • , Christian BorgeltAffiliated withEuropean Centre for Soft Computing

Abstract

In standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions. We, instead, strive to find item sets for which the similarity of the covers of the items (that is, the sets of transactions containing the items) exceeds a user-defined threshold. This approach yields a much better assessment of the association strength of the items, because it takes additional information about their occurrences into account. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. In addition, standard frequent item set mining turns out to be a special case of this flexible framework. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.