Abstract
Data stream mining is the process of extracting knowledge from massive real-time sequence of data items arriving at a very high data rate. It has several practical applications, such as user behavior analysis, software testing and market research. However, the large amount of data generated may offer challenges to process and analyze data at nearly real time. In this paper, we first present the MFI-TransSW+ algorithm, an optimized version of MFI-TransSW algorithm that efficiently processes clickstreams, that is, data streams where the data items are the pages of a Web site. Then, we outline the implementation of a news articles recommender system, called ClickRec, to demonstrate the efficiency and applicability of the proposed algorithm. Finally, we describe experiments, conducted with real world data, which show that MFI-TransSW+ outperforms the original algorithm, being up to two orders of magnitude faster when processing clickstreams.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. VLDB 1994, 1–32 (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS, p. 1 (2002)
Chang, J.H., Lee, W.S.: A sliding window method for finding recently frequent itemsets over online data streams. J. Inf. Sci. Eng. 20(4), 753–762 (2004)
Cheng, J., Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowl. Inf. Syst. 16(1), 1–27 (2008)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493–507 (1952)
Chi, Y., Wang, H., Philip, S.Y., Muntz, R.R.: Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl. Inf. Syst. 10(3), 265–294 (2006)
Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding window filtering: an efficient method for incremental mining on a time-variant database. Inf. Syst. 30(3), 227–244 (2005)
Li, H.-F., Lee, S.-Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)
Li, H.-F., Lee, S.-Y., Shan, M.-K.: An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the First International Workshop on Knowledge Discovery in Data Streams (2004)
Li, H.-F., Lee, S.-Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18. IEEE (2005)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. VLDB 2002, 346–357 (2002)
Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C.: Modeling online browsing and path analysis using clickstream data. Mark. Sci. 23(4), 579–595 (2004)
Yu, J.X., Chong, Z., Lu, H., Zhang, Z., Zhou, A.: A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf. Sci. 176(14), 1986–2015 (2006)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
de Amorim, F.A., Nunes, B.P., Lopes, G.R., Casanova, M.A. (2017). MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams. In: Bridge, D., Stuckenschmidt, H. (eds) E-Commerce and Web Technologies. EC-Web 2016. Lecture Notes in Business Information Processing, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-319-53676-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-53676-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53675-0
Online ISBN: 978-3-319-53676-7
eBook Packages: Computer ScienceComputer Science (R0)