Skip to main content

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

  • Conference paper
  • First Online:
  • 1172 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 278))

Abstract

Data stream mining is the process of extracting knowledge from massive real-time sequence of data items arriving at a very high data rate. It has several practical applications, such as user behavior analysis, software testing and market research. However, the large amount of data generated may offer challenges to process and analyze data at nearly real time. In this paper, we first present the MFI-TransSW+ algorithm, an optimized version of MFI-TransSW algorithm that efficiently processes clickstreams, that is, data streams where the data items are the pages of a Web site. Then, we outline the implementation of a news articles recommender system, called ClickRec, to demonstrate the efficiency and applicability of the proposed algorithm. Finally, we describe experiments, conducted with real world data, which show that MFI-TransSW+ outperforms the original algorithm, being up to two orders of magnitude faster when processing clickstreams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. VLDB 1994, 1–32 (1994)

    Google Scholar 

  2. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS, p. 1 (2002)

    Google Scholar 

  3. Chang, J.H., Lee, W.S.: A sliding window method for finding recently frequent itemsets over online data streams. J. Inf. Sci. Eng. 20(4), 753–762 (2004)

    Google Scholar 

  4. Cheng, J., Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowl. Inf. Syst. 16(1), 1–27 (2008)

    Article  Google Scholar 

  5. Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 493–507 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chi, Y., Wang, H., Philip, S.Y., Muntz, R.R.: Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl. Inf. Syst. 10(3), 265–294 (2006)

    Article  Google Scholar 

  7. Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding window filtering: an efficient method for incremental mining on a time-variant database. Inf. Syst. 30(3), 227–244 (2005)

    Article  Google Scholar 

  8. Li, H.-F., Lee, S.-Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)

    Article  Google Scholar 

  9. Li, H.-F., Lee, S.-Y., Shan, M.-K.: An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the First International Workshop on Knowledge Discovery in Data Streams (2004)

    Google Scholar 

  10. Li, H.-F., Lee, S.-Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18. IEEE (2005)

    Google Scholar 

  11. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. VLDB 2002, 346–357 (2002)

    Google Scholar 

  12. Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C.: Modeling online browsing and path analysis using clickstream data. Mark. Sci. 23(4), 579–595 (2004)

    Article  Google Scholar 

  13. Yu, J.X., Chong, Z., Lu, H., Zhang, Z., Zhou, A.: A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf. Sci. 176(14), 1986–2015 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Franklin A. de Amorim or Bernardo Pereira Nunes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

de Amorim, F.A., Nunes, B.P., Lopes, G.R., Casanova, M.A. (2017). MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams. In: Bridge, D., Stuckenschmidt, H. (eds) E-Commerce and Web Technologies. EC-Web 2016. Lecture Notes in Business Information Processing, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-319-53676-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53676-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53675-0

  • Online ISBN: 978-3-319-53676-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics