Skip to main content

Top-k Pattern Matching Using an Information-Theoretic Criterion over Probabilistic Data Streams

  • Conference paper
  • First Online:
Book cover Web and Big Data (APWeb-WAIM 2017)

Abstract

As the development of data mining technologies for sensor data streams, more sophisticated methods for complex event processing are demanded. In the case of event recognition, since event recognition results may contain errors, we need to deal with the uncertainty of events. We therefore consider probabilistic event data streams with occurrence probabilities of events, and develop a pattern matching method based on regular expressions. In this paper, we first analyze the semantics of pattern matching over non-probabilistic data streams, and then propose the problem of top-k pattern matching over probabilistic data streams. We introduce the use of an information-theoretic criterion to select appropriate matches as the result of pattern matching. Then, we present an efficient algorithm to detect top-k matches, and evaluate the effectiveness of our approach using real and synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: 2008 IEEE 24th ICDE, pp. 150–159 (2008)

    Google Scholar 

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  3. Akdere, M., Çetintemel, U., Tatbul, N.: Plan-based complex event detection across distributed sources. Proc. VLDB Endow. 1(1), 66–77 (2008)

    Article  Google Scholar 

  4. Chandramouli, B., Goldstein, J., Maier, D.: High-performance dynamic pattern matching over disordered streams. Proc. VLDB Endow. 3(1–2), 220–231 (2010)

    Article  Google Scholar 

  5. Chen, L., Nugent, C., Wang, H.: A knowledge-driven approach to activity recognition in smart homes. IEEE TKDE 24(6), 961–974 (2012)

    Google Scholar 

  6. Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: Proceedings of 2007 ACM SIGMOD, pp. 281–292 (2007)

    Google Scholar 

  7. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)

    MATH  Google Scholar 

  8. Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)

    Article  Google Scholar 

  9. Diao, Y., Fischer, P., Franklin, M.J., To, R.: YFilter: efficient and scalable filtering of XML documents. In: Proceedings of 18th ICDE, pp. 341–342 (2002)

    Google Scholar 

  10. Forney Jr., G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  11. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison Wesley, Boston (2000)

    MATH  Google Scholar 

  12. Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. Proc. VLDB Endow. 1(1), 301–312 (2008)

    Article  Google Scholar 

  13. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 15(3), 1192–1209 (2013)

    Article  Google Scholar 

  15. Li, Z., Ge, T., Chen, C.X.: \(\varepsilon \)-matching: event processing over noisy sequences in real time. In: Proceedings of 2013 ACM SIGMOD, pp. 601–612 (2013)

    Google Scholar 

  16. Liu, M., Golovnya, D., Rundensteiner, E.A., Claypool, K.T.: Sequence pattern query processing over out-of-order event streams. In: 2009 IEEE 25th ICDE, pp. 784–795 (2009)

    Google Scholar 

  17. Mei, Y., Madden, S.: ZStream: a cost-based query processor for adaptively detecting composite events. In: Proceedings of 2009 ACM SIGMOD, pp. 193–206 (2009)

    Google Scholar 

  18. Nakata, I.: Generation of pattern-matching algorithms by extended regular expressions. Japan Soc. Softw. Sci. Tech. 5, 1–9 (1993)

    Article  Google Scholar 

  19. Ré, C., Letchner, J., Balazinska, M., Suciu, D.: Event queries on correlated probabilistic streams. In: Proceedings of 2008 ACM SIGMOD, pp. 715–728 (2008)

    Google Scholar 

  20. Santini, S.: Querying streams using regular expressions: some semantics, decidability, and efficiency issues. VLDB J. 24(6), 801–821 (2015)

    Article  MathSciNet  Google Scholar 

  21. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  22. Tran, T.T.L., Peng, L., Diao, Y., McGregor, A., Liu, A.: CLARO: modeling and processing uncertain data streams. VLDB J. 21(5), 651–676 (2012)

    Article  Google Scholar 

  23. Woods, L., Teubner, J., Alonso, G.: Complex event detection at wire speed with FPGAs. Proc. VLDB Endow. 3(1–2), 660–669 (2010)

    Article  Google Scholar 

  24. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: Proceedings of 2006 ACM SIGMOD, pp. 407–418 (2006)

    Google Scholar 

  25. Yin, J., Yang, Q., Pan, J.J.: Sensor-based abnormal human-activity detection. IEEE TKDE 20(8), 1082–1090 (2008)

    Google Scholar 

  26. Zhang, H., Diao, Y., Immerman, N.: On complexity and optimization of expensive queries in complex event processing. In: Proceedings of 2014 ACM SIGMOD, pp. 217–228 (2014)

    Google Scholar 

  27. Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of 2008 ACM SIGMOD, pp. 819–832 (2008)

    Google Scholar 

Download references

Acknowledgment

This research was partially supported by the Center of Innovation Program from Japan Science and Technology Agency (JST) and KAKENHI (16H01722, 26540043).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kento Sugiura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sugiura, K., Ishikawa, Y. (2017). Top-k Pattern Matching Using an Information-Theoretic Criterion over Probabilistic Data Streams. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63579-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63578-1

  • Online ISBN: 978-3-319-63579-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics