Advertisement

Adaptive Self-Sufficient Itemset Miner for Transactional Data Streams

  • Feiyang TangEmail author
  • David Tse Jung HuangEmail author
  • Yun Sing Koh
  • Philippe Fournier-Viger
Conference paper
  • 1.6k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11671)

Abstract

Most studies on pattern mining consider itemsets that have a high frequency of occurrence as useful, often determined by the support of the itemsets. However, current research has shown that we need to move beyond a pure “support-confidence” framework for pattern mining. Recently, there is an interest on finding statistically significant patterns and one of the most popular type of patterns is self-sufficient itemsets. One limitation is that these works do not consider concept drifts and cannot be used in a data stream. Learning in the online environment requires us to develop efficient and effective mechanisms to address the online characteristics of non-static data and non-stationary data distributions. In our research we will concentrate on detecting self-sufficient itemsets from data streams. These patterns have a frequency that is significantly different from the frequency of their subsets and supersets. We present a comprehensive framework for mining self-sufficient itemsets from data streams along with a drift detector. This supports mining self-sufficient itemsets in an online environment and provides the ability to adapt to changes in the stream. Our experimental evaluations show that our framework can mine self-sufficient itemsets faster in an online environment and with better precision and recall.

Keywords

Data stream mining Batch processing Self-sufficient itemsets Association rule mining Drift detection 

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD Conference, vol. 22, p. 207 (1993)CrossRefGoogle Scholar
  2. 2.
    Bayardo, R.J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)Google Scholar
  3. 3.
    Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4(2), 217–240 (2000)CrossRefGoogle Scholar
  4. 4.
    Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448 (2007)Google Scholar
  5. 5.
    Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)Google Scholar
  6. 6.
    Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
  7. 7.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)zbMATHCrossRefGoogle Scholar
  8. 8.
    Hamalainen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32, 1–32 (2011)Google Scholar
  9. 9.
    Harel, M., Crammer, K., El-Yaniv, R., Mannor, S.: Concept drift detection through resampling. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. II-1009–II-1017 (2014)Google Scholar
  10. 10.
    Hettich, S., Bay, S.D.: Irvine, CA (1999). http://kdd.ics.uci.edu
  11. 11.
    Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-cup 2000 organizers’ report. SIGKDD Explor. 2, 86–98 (2000)CrossRefGoogle Scholar
  12. 12.
    Liu, A., Zhang, G., Lu, J.: Fuzzy time windowing for gradual concept drift adaptation. In: Proceedings of the 2017 IEEE International Conference on Fuzzy Systems, pp. 1–6. IEEE (2017)Google Scholar
  13. 13.
    Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)CrossRefGoogle Scholar
  14. 14.
    Narayanan, R., Honbo, D., Memik, G., Choudhary, A., Zambreno, J.: NU-MineBench (2018). http://cucis.ece.northwestern.edu/index.html
  15. 15.
    Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
  16. 16.
    Nguyen, H.L., Woon, Y.K., Ng, W.K.: A survey on data stream clustering and classification. Knowl. Inf. Syst. 45, 535–569 (2014)CrossRefGoogle Scholar
  17. 17.
    Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. Knowl. Discovery Databases, 229–238 (1991) Google Scholar
  18. 18.
    Webb, G.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)CrossRefGoogle Scholar
  19. 19.
    Webb, G.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data 4, 1–20 (2010)CrossRefGoogle Scholar
  20. 20.
    Webb, G.: Filtered-top-k association discovery. WIREs Data Mining Knowl. Discov. 1(3), 183–192 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer ScienceThe University of AucklandAucklandNew Zealand
  2. 2.School of Humanities and Social SciencesHarbin Institute of Technology (Shenzhen)ShenzhenChina

Personalised recommendations