Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2013: Machine Learning and Knowledge Discovery in Databases pp 493–508Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
Fast and Exact Mining of Probabilistic Data Streams

Fast and Exact Mining of Probabilistic Data Streams

  • Reza Akbarinia23 &
  • Florent Masseglia23 
  • Conference paper
  • 3254 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8188)

Abstract

Discovering Probabilistic Frequent Itemsets (PFI) is very challenging since algorithms designed for deterministic data are not applicable in probabilistic data. The problem is even more difficult for probabilistic data streams where massive frequent updates need to be taken into account while respecting data stream constraints. In this paper, we propose FEMP (Fast and Exact Mining of Probabilistic data streams), the first solution for exact PFI mining in data streams with sliding windows. FEMP allows updating the frequentness probability of an itemset whenever a transaction is added or removed from the observation window. Using these update operations, we are able to extract PFI in sliding windows with very low response times. Furthermore, our method is exact, meaning that we are able to discover the exact probabilistic frequentness distribution function for any monitored itemset, at any time. We implemented FEMP and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results illustrate its very good performance.

Keywords

  • Probabilistic Data Streams
  • Probabilistic Frequent Itemsets
  • Sliding Windows

Chapter PDF

Download to read the full chapter text

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22, 207–216 (1993)

    CrossRef  Google Scholar 

  2. Akbarinia, R., Valduriez, P., Verger, G.: Efficient Evaluation of SUM Queries Over Probabilistic Data. IEEE Transactions on Knowledge and Data Engineering (2012)

    Google Scholar 

  3. Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 119–128. ACM, New York (2009)

    CrossRef  Google Scholar 

  4. Calders, T., Garboni, C., Goethals, B.: Approximation of frequentness probability of itemsets in uncertain data. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 749–754. IEEE, Washington, DC (2010)

    CrossRef  Google Scholar 

  5. Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  6. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. The VLDB Journal 16, 523–544 (2007)

    CrossRef  Google Scholar 

  7. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining. AAAI/MIT (2003)

    Google Scholar 

  8. Kranen, P., Seidl, T.: Harnessing the strengths of anytime algorithms for constant data streams. Data Min. Knowl. Discov. 19, 245–260 (2009)

    CrossRef  MathSciNet  Google Scholar 

  9. Leung, C.K.-S., Brajczuk, D.A.: Efficient algorithms for the mining of constrained frequent patterns from uncertain data. SIGKDD Explor. Newsl. 11, 123–130 (2010)

    CrossRef  Google Scholar 

  10. Leung, C.K.-S., Jiang, F.: Frequent itemset mining of uncertain data streams using the damped window model. In: Proceedings of the 2011 ACM Symposium on Applied Computing, SAC 2011, pp. 950–955. ACM, New York (2011)

    CrossRef  Google Scholar 

  11. Leung, C.-S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: Proceedings of IEEE 25th International Conference on Data Engineering (ICDE), pp. 1663–1670 (2009)

    Google Scholar 

  12. Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 273–282. ACM, New York (2010)

    Google Scholar 

  13. Teng, W.-G., Chen, M.-S., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: VLDB, pp. 93–104 (2003)

    Google Scholar 

  14. Wang, L., Cheng, R., Lee, S.D., Cheung, D.: Accelerating probabilistic frequent itemset mining: a model-based approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 429–438. ACM, New York (2010)

    CrossRef  Google Scholar 

  15. Liu, Y.-H.: Mining frequent patterns from univariate uncertain data. Data and Knowledge Engineering 71(1), 47–68 (2012)

    CrossRef  Google Scholar 

  16. Zhang, C., Masseglia, F., Lechevallier, Y.: ABS: The anti bouncing model for usage data streams. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 1169–1174. IEEE Computer Society, Washington, DC (2010)

    CrossRef  Google Scholar 

  17. Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 819–832. ACM, New York (2008)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Zenith team (INRIA-UM2), LIRMM, Montpellier, France

    Reza Akbarinia & Florent Masseglia

Authors
  1. Reza Akbarinia
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Florent Masseglia
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium

    Hendrik Blockeel

  2. Fraunhofer IAIS, Department of Knowledge Discovery, University of Bonn, Schloss Birlinghoven, 53754, Sankt Augustin, Germany

    Kristian Kersting

  3. LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands

    Siegfried Nijssen

  4. Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic

    Filip Železný

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akbarinia, R., Masseglia, F. (2013). Fast and Exact Mining of Probabilistic Data Streams. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_32

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-40988-2_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40987-5

  • Online ISBN: 978-3-642-40988-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

18.206.13.203

Not affiliated

Springer Nature

© 2023 Springer Nature