The VLDB Journal

, Volume 12, Issue 2, pp 140–156

PSoup: a system for streaming queries over streaming data

Original Paper

Abstract.

Recent work on querying data streams has focused on systems where newly arriving data is processed and continuously streamed to the user in real time. In many emerging applications, however, ad hoc queries and/or intermittent connectivity also require the processing of data that arrives prior to query submission or during a period of disconnection. For such applications, we have developed PSoup, a system that combines the processing of ad hoc and continuous queries by treating data and queries symmetrically, allowing new queries to be applied to old data and new data to be applied to old queries. PSoup also supports intermittent connectivity by separating the computation of query results from the delivery of those results. PSoup builds on adaptive query-processing techniques developed in the Telegraph project at UC Berkeley. In this paper, we describe PSoup and present experiments that demonstrate the effectiveness of our approach.

Keywords:

Stream query processing Query-data duality Disconnected operation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1. Altinel M, Franklin M (2000) Efficient filtering of XML documents for selective dissemination of information. In: Proceedings of the 26th international conference on very large data bases, Cairo, 10--14 September, pp 53--64Google Scholar
  2. 2. Aksoy D, Franklin M, Zdonik S (2001) Data staging for on-demand broadcast. In: Proceedings of the 27th international conference on very large data bases, 20--23 August 2001, Hong Kong, pp 571--580Google Scholar
  3. 3. Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WI, 3--5 June 2002, pp 221--232Google Scholar
  4. 4. Avnur R, Hellerstein J (2000) Eddies: continuously adaptive query processing. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, 16--18 May 2000, pp 261--272Google Scholar
  5. 5. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and Issues in Data Stream Systems. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WI, 3--5 June 2002, pp 1--16Google Scholar
  6. 6. Bonnet P, Gehrke J, Seshadri P (2001) Towards sensor database systems. In: Proceedings of the 2nd international conference on mobile data management, Hong Kong, 8--10 January 2001, pp 3--14Google Scholar
  7. 7. Bonnet P, Seshadri P (2000) Device database systems. In: Proceedings of the 16th international conference on data engineering, San Diego, 28 February--3 March 2000, p 194Google Scholar
  8. 8. Babu S, Widom J (2001) Continuous queries over data streams. SIGMOD Record 30(3):109--120Google Scholar
  9. 9. Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S (2002) Monitoring streams: a new class of data management applications. In: Proceedings of the 27th international conference on very large data bases, Hong Kong, 20--23 August 2002, pp 215--226Google Scholar
  10. 10. Chen J, DeWitt D, Tian F, Wang Y (2000) NiagaraCQ: a scalable continuous query system for internet databases. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, 16--18 May 200, pp 379--390Google Scholar
  11. 11. Chandrasekaran S, Cooper O, Deshpande A, Franklin M, Hellerstein J, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, Shah M (2003) TelegraphCQ: continuous dataflow processing for an uncertain world. In: Proceedings of the 1st biennial conference on innovative data systems research, Asilomar, CA, 5--8 January 2003Google Scholar
  12. 12. Chandrasekaran S, Franklin M (2002) Streaming queries over streaming data. In: Proceedings of the 27th international conference on very large data bases, Hong Kong, 20--23 August 2002, pp 203--214Google Scholar
  13. 13. Cherniack M, Franklin M, Zdonik S (2001) Expressing user profiles for data recharging. IEEE Pers Commun 8(4):6--13, Special issue on pervasive computingGoogle Scholar
  14. 14. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms, San Francisco, 6--8 January 2002, pp 635--644Google Scholar
  15. 15. DeWitt D, Naughton J, Schneider D (1991) An evaluation of non-equijoin algorithms. In: Proceedings of the 17th international conference on very large data bases, Barcelona, 3--6 September 1991, pp 443--452Google Scholar
  16. 16. Forgy, C. (1982) Rete: a fast algorithm for the many patterns/many objects match problem. Artif Intell 19(1):17--37Google Scholar
  17. 17. Fabret F, Jacobsen H, Llibrat F, Pereira J, Ross K, Shasha D(2001) Filtering algorithms and implementation for very fast publish/subscribe systems. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, 21--24 May 2001, pp 115--126Google Scholar
  18. 18. Fox A, Gribble S, Chawathe Y, Brewer E, Gauthier P (1997) Cluster-based scalable network services. In: Proceedings of the 16th ACM symposium on operating system principles, St Malo, France, 5--8 October 1997, pp 78--91Google Scholar
  19. 19. Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, 21--24 May 2001, pp 13--24Google Scholar
  20. 20. Hanson E, Bodagala S, Chadaga U (1997) Optimized trigger condition testing in ariel using gator networks. Technical report TR97-021, University of Florida CISE DepartmentGoogle Scholar
  21. 21. Hanson E, Carnes C, Huang L, Konyala M, Noronha L, Parthasarathy S, Park J, Vernon A (1999) Scalable trigger processing. In: Proceedings of the 15th international conference on data engineering, Sydney, 23--26 March 1999, pp 266--275Google Scholar
  22. 22. Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah M (2000) Adaptive query processing: technology in evolution. IEEE Data Eng Bull 23(2):7--18Google Scholar
  23. 23. Jagadish H, Mumick I, Silberschatz A (1995) View maintenance issues for the chronicle data model. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, San Jose, 22--25 May 1995, pp 113--124Google Scholar
  24. 24. Kanellakis P, Kupert G, Reveszt P (1990) Constraint query languages. In: Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Nashville, TN, 2--4 April 1990, pp 299--313Google Scholar
  25. 25. Keidl M, Kreutz A, Kemper A, Kossmann D (2002) A publish & subscribe architecture for distributed metadata management. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 February--1 March 2002, pp 309--320Google Scholar
  26. 26. Lee W, Stolfo S, Mok K (1999) Mining in a data-flow environment: experience in network intrusion detection. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, 15--18 August 1999, pp 114--124Google Scholar
  27. 27. Miranker D (1987) TREAT: a better match algorithm for AI production system matching. In: Proceedings of the 6th national conference on artificial intelligence, Seattle, 13--17 July 1987, pp 42--47Google Scholar
  28. 28. Madden S, Franklin M (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 February--1 March 2002, pp 309--320Google Scholar
  29. 29. Madden S, Shah M, Hellerstein J, Raman V (2002) Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, 2--6 June 2002Google Scholar
  30. 30. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the 1st biennial conference on innovative data systems research, Asilomar, CA, 5--8 January 2003Google Scholar
  31. 31. O'Neil P, Quass D (1997) Improved query performance with variant indexes. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, 13--15 May 1997, pp 38--49Google Scholar
  32. 32. Raman V (2001) Interactive Query Processing. PhD thesis, University of California, BerkeleyGoogle Scholar
  33. 33. Shivakumar N, Garcia-Molina H (1997) Wave-indices: indexing evolving databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, 13--15 May 1997, pp 381--392Google Scholar
  34. 34. Shah M, Hellerstein J, Chandrasekaran S, Franklin M (2003) Flux: an adaptive repartitioning operator for continuous query systems. In: Proceedings of the 19th international conference on data engineering, Bangalore, India (in press)Google Scholar
  35. 35. Sullivan M, Heybey A (1998) Tribeca: a system for managing large databases of network traffic. In: Proceedings of the USENIX annual technical conference, New Orleans, 15--19 June 1998Google Scholar
  36. 36. Seshadri P, Livny M, Ramakrishnan R (1994) Sequence query processing. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, 24--27 May 1994, pp 430--441Google Scholar
  37. 37. Stonebraker M, Sellis TK, Hanson EN (1986) An analysis of rule indexing implementations in data base systems. In: Proceedings of the 1st international conference on expert database systems, Charleston, SC, 1--4 April 1986, pp 465--476Google Scholar
  38. 38. Sistla A, Wolfson O, Chamberlain S, Dao S (1997) Modeling and querying moving objects. In: Proceedings of the 13th international conference on data engineering, Birmingham, UK, 7--11 April 1997, IEEE Computer Society, New York, pp 422--432Google Scholar
  39. 39. Sadri R, Zaniolo C, Zarkesh A, Adibi J (2001) Optimization of sequence queries in database systems. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Santa Barbara, CA, 21--23 May 2001, pp 71--81Google Scholar
  40. 40. Terry D, Goldberg D, Nichols D, Oki B (1992) Continuous queries over append-only databases. In: Proceedings of the 1992 ACM SIGMOD international conference on management of data, San Diego, 2--5 June 1992, pp 321--330Google Scholar
  41. 41. Urhan T, Franklin M, Amsaleg L (1998) Cost based query scrambling for initial delays. In: Proceedings ACM SIGMOD international conference on management of data, Seattle, 2--4 June 1998, pp 130--141Google Scholar
  42. 42. Urhan T, Franklin M (2000) XJoin: a reactively-scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27--33Google Scholar
  43. 43. Wilschut A, Apers P (1991) Dataflow query execution in a parallel main-memory environment. In: Proceedings of the 1st international conference on parallel and distributed information systems (PDIS 1991), Miami Beach, 4--6 December 1991, pp 68--77Google Scholar
  44. 44. Yan TW, Garcia-Molina H (1999) The SIFT information dissemination system. ACM Trans Database Sys 24(4):529--565Google Scholar
  45. 45. Yang J, Widom J (2000) Temporal view self-maintenance. In: Proceedings of the 7th international conference on extending database technology, Konstanz, Germany, 27--31 March 2000, pp 395--412Google Scholar
  46. 46. Yang J, Widom J (2001) Incremental computation and maintenance of temporal aggregates. In: Proceedings of the 17th international conference on data engineering, Heidelberg, 2--6 April 2001, IEEE Computer Society, New York, pp 51--60Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2003

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer SciencesUniversity of California at BerkeleyBerkeleyUSA

Personalised recommendations