Adaptive Self-Sufficient Itemset Miner for Transactional Data Streams
- 1.6k Downloads
Abstract
Most studies on pattern mining consider itemsets that have a high frequency of occurrence as useful, often determined by the support of the itemsets. However, current research has shown that we need to move beyond a pure “support-confidence” framework for pattern mining. Recently, there is an interest on finding statistically significant patterns and one of the most popular type of patterns is self-sufficient itemsets. One limitation is that these works do not consider concept drifts and cannot be used in a data stream. Learning in the online environment requires us to develop efficient and effective mechanisms to address the online characteristics of non-static data and non-stationary data distributions. In our research we will concentrate on detecting self-sufficient itemsets from data streams. These patterns have a frequency that is significantly different from the frequency of their subsets and supersets. We present a comprehensive framework for mining self-sufficient itemsets from data streams along with a drift detector. This supports mining self-sufficient itemsets in an online environment and provides the ability to adapt to changes in the stream. Our experimental evaluations show that our framework can mine self-sufficient itemsets faster in an online environment and with better precision and recall.
Keywords
Data stream mining Batch processing Self-sufficient itemsets Association rule mining Drift detectionReferences
- 1.Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD Conference, vol. 22, p. 207 (1993)CrossRefGoogle Scholar
- 2.Bayardo, R.J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)Google Scholar
- 3.Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Min. Knowl. Disc. 4(2), 217–240 (2000)CrossRefGoogle Scholar
- 4.Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448 (2007)Google Scholar
- 5.Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)Google Scholar
- 6.Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
- 7.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)zbMATHCrossRefGoogle Scholar
- 8.Hamalainen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32, 1–32 (2011)Google Scholar
- 9.Harel, M., Crammer, K., El-Yaniv, R., Mannor, S.: Concept drift detection through resampling. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. II-1009–II-1017 (2014)Google Scholar
- 10.Hettich, S., Bay, S.D.: Irvine, CA (1999). http://kdd.ics.uci.edu
- 11.Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-cup 2000 organizers’ report. SIGKDD Explor. 2, 86–98 (2000)CrossRefGoogle Scholar
- 12.Liu, A., Zhang, G., Lu, J.: Fuzzy time windowing for gradual concept drift adaptation. In: Proceedings of the 2017 IEEE International Conference on Fuzzy Systems, pp. 1–6. IEEE (2017)Google Scholar
- 13.Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)CrossRefGoogle Scholar
- 14.Narayanan, R., Honbo, D., Memik, G., Choudhary, A., Zambreno, J.: NU-MineBench (2018). http://cucis.ece.northwestern.edu/index.html
- 15.Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
- 16.Nguyen, H.L., Woon, Y.K., Ng, W.K.: A survey on data stream clustering and classification. Knowl. Inf. Syst. 45, 535–569 (2014)CrossRefGoogle Scholar
- 17.Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. Knowl. Discovery Databases, 229–238 (1991) Google Scholar
- 18.Webb, G.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)CrossRefGoogle Scholar
- 19.Webb, G.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data 4, 1–20 (2010)CrossRefGoogle Scholar
- 20.Webb, G.: Filtered-top-k association discovery. WIREs Data Mining Knowl. Discov. 1(3), 183–192 (2011)CrossRefGoogle Scholar