Advertisement

ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams

  • Chun Jin
  • Jaime Carbonell
  • Phil Hayes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3488)

Abstract

Efficient processing of complex streaming data presents multiple challenges, especially when combined with intelligent detection of hidden anomalies in real time. We label such systems Stream Anomaly Monitoring Systems (SAMS), and describe the CMU/Dynamix ARGUS system as a new kind of SAMS to detect rare but high value patterns combining streaming and historical data. Such patterns may correspond to hidden precursors of terrorist activity, or early indicators of the onset of a dangerous disease, such as a SARS outbreak. Our method starts from an extension of the RETE algorithm for matching streaming data against multiple complex persistent queries, and proceeds beyond to transitivity inferences, conditional intermediate result materialization, and other such techniques to obtain both accuracy and efficiency, as demonstrated by the evaluation results outperforming classical techniques such as a modern DMBS.

Keywords

High Data Rate Incremental Evaluation Continuous Query Transitivity Inference Computation Sharing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. The VLDB Journal 12(2), 120–139 (2003)CrossRefGoogle Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proc. of the 21st ACM SIGMOD-SIGACT-SIGART Symp. PODS (2002)Google Scholar
  3. 3.
    Blakeley, J.A., et al.: Updating Derived Relations: Detecting Irrelevant and Autonomously Computable Updates. ACM Trans. on Database Systems (TODS) 14(3), 369–400 (1989)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: Proc. of the 2003 Conf. on Innovative Data Systems Research (2003)Google Scholar
  5. 5.
    Chen, J., et al.: Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries. In: Proc. of the 18th Intl. Conf. on Data Engineering (2002)Google Scholar
  6. 6.
    Fink, E., Goldstein, A., Hayes, P., Carbonell, J.: Search for Approximate Matches in Large Databases. In: Proc. of the 2004 IEEE Intl. Conf. on Systems, Man, and Cybernetics (2004)Google Scholar
  7. 7.
    Forgy, C.L.: Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial Intelligence 19(1), 17–37 (1982)CrossRefGoogle Scholar
  8. 8.
    Haas, L., et al.: Startburst Mid-Flight: As the Dust Clears. IEEE Trans. on Knowledge and Data Engineering 2(1), 143–160 (1990)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Hanson, E.N., Bodagala, S., Chadaga, U.: Optimized Trigger Condition Testing in Ariel Using Gator Networks. Technical Report TR-97-021, CISE Dept., Univ. of Florida (1997)Google Scholar
  10. 10.
    Hanson, E.N., et al.: Scalable Trigger Processing. In: Proc. of the 15th Intl. Conf. on Data Engineering (1999)Google Scholar
  11. 11.
    Jin, C., Carbonell, J.: ARGUS: Rete + DBMS = Efficient Continuous Profile Matching on Large-Volume Data Streams. Tech. Report CMU-LTI-04-181, Carnegie Mellon Univ. (2004), http://www.cs.cmu.den/~cjin/publications/Rete.pdf
  12. 12.
    Liu, L., Tang, W., Buttler, D., Pu, C.: Information Monitoring on the Web: A Scalable Solution. World Wide Web Journal 5(4) (2002)Google Scholar
  13. 13.
    Miranker, D.P.: TREAT: A New and Efficient Match Algorithm for IA Production Systems. Morgan Kaufmann, San Francisco (1990)Google Scholar
  14. 14.
    Miranker, D.P., Brant, D.A.: An algorithmic basis for integrating production systems and large databases. In: Proc. of the Sixth Intl. Conf. on Data Engineering (1990)Google Scholar
  15. 15.
    Ono, K., Lohman, G.: Measuring the Complexity of Join Enumeration in Query Optimization. In: Proc. of 16th Intl. Conf. on VLDB, pp. 314–325 (1990)Google Scholar
  16. 16.
    Perlin, M.W.: The match box algorithm for parallel production system match. Technical Report CMU-CS-89-163, Carnegie Mellon Univ. (1989)Google Scholar
  17. 17.
    Pirahesh, H., et al.: A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. In: Proc. of 13th Intl. Conf. on Data Engineering, pp. 391–400 (1997)Google Scholar
  18. 18.
    Schreier, U., Pirahesh, H., Agrawal, R., Mohan, C.: Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS. In: Proc. of 17th Intl. Conf. on VLDB (1991)Google Scholar
  19. 19.
    Terry, D., Goldberg, D., Nichols, D., Oki, B.: Continuous Queries over Append-Only Databases. In: Proc. of the 1992 ACM SIGMOD Intl. Conf., (1992)Google Scholar
  20. 20.
    Widom, J., Ceri, S. (eds.): Active Database Systems. Morgan Kaufmann, San Francisco (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Chun Jin
    • 1
  • Jaime Carbonell
    • 1
  • Phil Hayes
    • 2
  1. 1.Language Technologies Institute, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.Dynamix TechnologiesWexfordUSA

Personalised recommendations