Abstract
A discriminative itemset is a frequent itemset in the target data stream with much higher frequency than that of the same itemset in the rest of the data streams in the dataset. The discriminative itemsets describe the distinguishing features between data streams. Mining discriminative itemsets in data streams is very important, where continuously arriving transactions can be inserted in fast speed and large volume. Compared with frequent itemset mining in single data stream, there are additional challenges in the discriminative itemset mining process as the Apriori property of subset is not applicable. We propose an efficient and high accurate method for mining discriminative itemsets in data streams using a tilted-time window model. The proposed single-pass H-DISSparse algorithm is designed particularly based on several well-defined characteristics aiming to improve the approximate frequencies of the itemsets in the tilted-time window model. The data structures are dynamically adjusted in offline time intervals to reflect the discriminative itemset frequencies in different time periods in unsynchronized data streams. Empirical analysis shows the efficient time and space complexity of the proposed method in the fast-growing big data streams.
Similar content being viewed by others
References
Aggarwal CC (2007) Data streams: models and algorithms. Springer, Berlin
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases VLDB.
Alhammady H, Ramamohanarao K (2005) Mining emerging patterns and classification in data streams. In: The proceedings of IEEE/WIC/ACM international conference on web intelligence, pp 272–275
Amagata D, Hara T (2017) Mining top-k co-occurrence patterns across multiple streams. IEEE Trans Knowl Data Eng 29(10):2249–2262
Bailey J, Loekito E (2010) Efficient incremental mining of contrast patterns in changing data. Inf Process Lett 110(3):88–92
Bailey J, Manoukian T, Ramamohanarao K (2002) Fast algorithms for mining emerging patterns. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery
Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: 2008 IEEE 24th international conference on data engineering, IEEE
Chi Y, Wang H, Philip SY et al (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Fourth IEEE international conference on data mining ICDM '04
Chi Y, Wang H, Philip SY et al (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294
Dong G, Bailey J (2012) Contrast data mining: concepts, algorithms, and applications. CRC Press, Boca Raton
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Fan H, Ramamohanarao K (2002) An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. In: Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining
Fan W, Zhang K, Cheng H et al (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fournier-Viger P, Lin JC-W, Gomariz A et al (2016) The SPMF open-source data mining library version 2. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2016, Riva del Garda, Italy, 19–23 Sept 2016, Proceedings, part III. Springer, Cham, pp 36–40
Giannella C, Han J, Pei J et al (2003) Mining frequent patterns in data streams at multiple time granularities. Next Gener Data Min 212:191–212
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM sigmod record. ACM, New York
He Z, Gu F, Zhao C et al (2017) Conditional discriminative pattern mining. Inf Sci 375(3):1–15
He Z, Zhang S, Gu F et al (2019) Mining conditional discriminative sequential patterns. Inf Sci 478:524–539
Leonardo P, Fabio V (2018) Efficient mining of the most significant patterns with permutation testing. In: Proceedings of the 24th ACM sigkdd international conference on knowledge discovery & data mining. London, United Kingdom. ACM, pp 2070–2079
Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
Lin Z, Jiang B, Pei J et al (2010) Mining discriminative items in multiple data streams. World Wide Web 13(4):497–522
Manku GS (2016) Frequent itemset mining over data streams. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams. Springer, Berlin, pp 209–219
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, VLDB endowment
Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam
Seyfi M (2011) Mining discriminative items in multiple data streams with hierarchical counters approach. In: Fourth international workshop on advanced computational intelligence (IWACI), 2011, IEEE
Seyfi M (2018) Mining discriminative itemsets in data streams using different window models. Queensland University of Technology, Brisbane
Seyfi M, Geva S, Nayak R (2014) Mining discriminative itemsets in data streams. In: International conference on web information systems engineering. Springer
Seyfi M, Nayak R, Xu Y et al (2017) Efficient mining of discriminative itemsets. In: Proceedings of the international conference on web intelligence, Leipzig, Germany. ACM, pp 451–459
Tanbeer SK, Ahmed CF, Jeong B-S et al (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865
Yu K, Ding W, Simovici DA et al (2015) Classification with streaming features: an emerging-pattern mining approach. ACM Trans Knowl Discov Data 9(4):1–31
Yu K, Ding W, Wang H et al (2013) Bridging causal relevance and pattern discriminability: Mining emerging patterns from high-dimensional data. IEEE Trans Knowl Data Eng 25(12):2721–2739
Zhang X, Dong G, Kotagiri R (2000) Exploring constraints to efeciently mine emerging patterns from large high-dimensional datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Seyfi, M., Nayak, R., Xu, Y. et al. Mining discriminative itemsets in data streams using the tilted-time window model. Knowl Inf Syst 63, 1241–1270 (2021). https://doi.org/10.1007/s10115-021-01550-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01550-y