TopPI: An Efficient Algorithm for Item-Centric Mining

  • Martin Kirchgessner
  • Vincent Leroy
  • Alexandre Termier
  • Sihem Amer-Yahia
  • Marie-Christine Rousset
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9829)

Abstract

We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset “nori seaweed, wasabi, sushi rice, soy sauce” that occurrs in only 133 store receipts out of 290 million. It also finds the itemset “milk, puff pastry”, that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.

Keywords

Frequent itemset mining Top-K Parallel data mining 

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499 (1994)Google Scholar
  2. 2.
    Anderson, C.: The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, New York (2006)Google Scholar
  3. 3.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI) (2004)Google Scholar
  4. 4.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Symposium on Principles of Database Systems (PODS) (2001)Google Scholar
  5. 5.
    Goel, S., Broder, A., Gabrilovich, E., Pang, B.: Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the Third International Conference on Web Search and Data Mining (WSDM), pp. 201–210 (2010)Google Scholar
  6. 6.
    Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: Proceedings of the International Conference on Data Mining (ICDM), pp. 211–218. IEEE (2002)Google Scholar
  7. 7.
    Kirchgessner, M., Mishra, S., Leroy, V., Amer-Yahia, S.: Testing interestingness measures in practice: a large-scale analysis of buying patterns (2016). http://arxiv.org/abs/1603.04792
  8. 8.
    Le Bras, Y., Lenca, P., Lallich, S.: Mining interesting rules without support requirement: a general universal existential upward closure property. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) Data Mining. Annals of Information Systems, vol. 8, pp. 75–98. Springer, New York (2010)CrossRefGoogle Scholar
  9. 9.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the Second Conference on Recommender Systems (RecSys), pp. 107–114 (2008)Google Scholar
  10. 10.
    Minato, S., Uno, T., Tsuda, K., Terada, A., Sese, J.: A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 422–436. Springer, Heidelberg (2014)Google Scholar
  11. 11.
    Négrevergne, B., Termier, A., Méhaut, J.F., Uno, T.: Discovering closedfrequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: Proceedings of the International Conference on High Performance Computing and Simulation (HPCS). pp. 521–528 (2010)Google Scholar
  12. 12.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  13. 13.
    Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, vol. 4, pp. 21–30 (2000)Google Scholar
  14. 14.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Martin Kirchgessner
    • 1
  • Vincent Leroy
    • 1
  • Alexandre Termier
    • 2
  • Sihem Amer-Yahia
    • 1
  • Marie-Christine Rousset
    • 1
  1. 1.Université Grenoble Alpes, LIG, CNRSGrenobleFrance
  2. 2.Université Rennes 1, INRIA/IRISARennesFrance

Personalised recommendations