Frequent Itemset Extraction over Data Streams Using Chernoff Bound

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 199)

Abstract

Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams.In this paper we revise a Chernoff Bound based Sliding-window approach called CBSW+ which is capable of mining frequent itemsets over high speed data streams. The new method keeps the advantages of the previous CBSW also resolve the drawbacks and produce the runtime memory consumption. In the proposed method we design a synopsis data structure to keep track of the boundary between maximum and minimum window size prediction for itemsets. Conceptual drifts in a data stream are reflected by boundary movements in the data structure.

Keywords

Chernoff Bound Data Streams Mining Frequent Itemsets 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Raissi, C., Poncelet, P., Teisseire: Towards a new approach for mining frequent itemsets on data stream. J. Intell. Inf. Syst. 28, 23–36 (2007)CrossRefGoogle Scholar
  2. 2.
    Xu Yu, J., Chong, Z., Lu, H., Zhang, Z., Zhou, A.: A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences 176, 1986–2015 (2006)CrossRefGoogle Scholar
  3. 3.
    Giannella, G., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Next Generation Data Mining. MIT, New York (2003)Google Scholar
  4. 4.
    Li, H.F., Lee, S.Y., Shan, M.: An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the 1st International Workshop on Knowledge Discovery in Data Streams (2004)Google Scholar
  5. 5.
    Jin, R., Agrawal, G.: An Algorithm for In-Core Frequent Itemset Mining on Streaming DataGoogle Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the ACM Conference on Knowledge and Data Discovery, SIGKDD (2000)Google Scholar
  7. 7.
    Cheng, Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowl. Inf. Syst. (2007)Google Scholar
  8. 8.
    Charikar, M., Chen, K., Farach, M.: Finding frequent items in data streams. Theory Comput. Sci. 312, 3–15 (2004)CrossRefMATHGoogle Scholar
  9. 9.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC, pp. 207–216 (1993)Google Scholar
  10. 10.
    Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics 23(4), 493–507 (1953)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Calders, T., Dexters, N., Goethals, B.: Mining Frequent Items in a Stream Using Flexible WindowsGoogle Scholar
  13. 13.
    Sun Maria, X., Orlowska, E., Li, X.: Finding Frequent Itemsets in High-Speed Data StreamsGoogle Scholar
  14. 14.
    Han Dong, X., Ng, W., Wong, K., Lee, V.: Discovering Frequent Sets from Data Streams with CPU Constraint. This paper appeared at the AusDM 2007, Gold Coast, Australia. Conferences in Research and Practice in Information Technology (CRPIT), vol. 70 (2007)Google Scholar
  15. 15.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. Int. Conf. Very Large Data Bases (VLDB 1994), pp. 487–499 (1994)Google Scholar
  16. 16.
    Zaki, J.M., Hsiao, C.: CHARM: An efficient algorithm for closed itemset mining. In: Proc. SIAM Int. Conf. Data Mining, pp. 457–473 (2002)Google Scholar
  17. 17.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. Int. Conf. Data Engineering (ICDE 1995), pp. 3–14 (1995)Google Scholar
  18. 18.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc.Int. Conf. Data Mining (ICDM 2001), pp. 313–320 (2001)Google Scholar
  19. 19.
    Li, H.F., Lee, S.Y., Shan, M.K.: Online Mining (Recently) Maximal Frequent Itemsets over Data Streams. In: Proceedings of the 15th IEEE International Workshop on Research Issues on Data Engineering, RIDE (2005)Google Scholar
  20. 20.
    Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, pp. 202–208 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Research Department of Computer ScienceNGM CollegeCoimbatore DistrictIndia

Personalised recommendations