Skip to main content
Log in

PSoup: a system for streaming queries over streaming data

  • Original Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Recent work on querying data streams has focused on systems where newly arriving data is processed and continuously streamed to the user in real time. In many emerging applications, however, ad hoc queries and/or intermittent connectivity also require the processing of data that arrives prior to query submission or during a period of disconnection. For such applications, we have developed PSoup, a system that combines the processing of ad hoc and continuous queries by treating data and queries symmetrically, allowing new queries to be applied to old data and new data to be applied to old queries. PSoup also supports intermittent connectivity by separating the computation of query results from the delivery of those results. PSoup builds on adaptive query-processing techniques developed in the Telegraph project at UC Berkeley. In this paper, we describe PSoup and present experiments that demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • 1. Altinel M, Franklin M (2000) Efficient filtering of XML documents for selective dissemination of information. In: Proceedings of the 26th international conference on very large data bases, Cairo, 10--14 September, pp 53--64

  • 2. Aksoy D, Franklin M, Zdonik S (2001) Data staging for on-demand broadcast. In: Proceedings of the 27th international conference on very large data bases, 20--23 August 2001, Hong Kong, pp 571--580

  • 3. Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WI, 3--5 June 2002, pp 221--232

  • 4. Avnur R, Hellerstein J (2000) Eddies: continuously adaptive query processing. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, 16--18 May 2000, pp 261--272

  • 5. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and Issues in Data Stream Systems. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Madison, WI, 3--5 June 2002, pp 1--16

  • 6. Bonnet P, Gehrke J, Seshadri P (2001) Towards sensor database systems. In: Proceedings of the 2nd international conference on mobile data management, Hong Kong, 8--10 January 2001, pp 3--14

  • 7. Bonnet P, Seshadri P (2000) Device database systems. In: Proceedings of the 16th international conference on data engineering, San Diego, 28 February--3 March 2000, p 194

  • 8. Babu S, Widom J (2001) Continuous queries over data streams. SIGMOD Record 30(3):109--120

    Google Scholar 

  • 9. Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S (2002) Monitoring streams: a new class of data management applications. In: Proceedings of the 27th international conference on very large data bases, Hong Kong, 20--23 August 2002, pp 215--226

  • 10. Chen J, DeWitt D, Tian F, Wang Y (2000) NiagaraCQ: a scalable continuous query system for internet databases. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, 16--18 May 200, pp 379--390

  • 11. Chandrasekaran S, Cooper O, Deshpande A, Franklin M, Hellerstein J, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, Shah M (2003) TelegraphCQ: continuous dataflow processing for an uncertain world. In: Proceedings of the 1st biennial conference on innovative data systems research, Asilomar, CA, 5--8 January 2003

  • 12. Chandrasekaran S, Franklin M (2002) Streaming queries over streaming data. In: Proceedings of the 27th international conference on very large data bases, Hong Kong, 20--23 August 2002, pp 203--214

  • 13. Cherniack M, Franklin M, Zdonik S (2001) Expressing user profiles for data recharging. IEEE Pers Commun 8(4):6--13, Special issue on pervasive computing

    Google Scholar 

  • 14. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms, San Francisco, 6--8 January 2002, pp 635--644

  • 15. DeWitt D, Naughton J, Schneider D (1991) An evaluation of non-equijoin algorithms. In: Proceedings of the 17th international conference on very large data bases, Barcelona, 3--6 September 1991, pp 443--452

  • 16. Forgy, C. (1982) Rete: a fast algorithm for the many patterns/many objects match problem. Artif Intell 19(1):17--37

    Google Scholar 

  • 17. Fabret F, Jacobsen H, Llibrat F, Pereira J, Ross K, Shasha D(2001) Filtering algorithms and implementation for very fast publish/subscribe systems. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, 21--24 May 2001, pp 115--126

  • 18. Fox A, Gribble S, Chawathe Y, Brewer E, Gauthier P (1997) Cluster-based scalable network services. In: Proceedings of the 16th ACM symposium on operating system principles, St Malo, France, 5--8 October 1997, pp 78--91

  • 19. Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA, 21--24 May 2001, pp 13--24

  • 20. Hanson E, Bodagala S, Chadaga U (1997) Optimized trigger condition testing in ariel using gator networks. Technical report TR97-021, University of Florida CISE Department

  • 21. Hanson E, Carnes C, Huang L, Konyala M, Noronha L, Parthasarathy S, Park J, Vernon A (1999) Scalable trigger processing. In: Proceedings of the 15th international conference on data engineering, Sydney, 23--26 March 1999, pp 266--275

  • 22. Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah M (2000) Adaptive query processing: technology in evolution. IEEE Data Eng Bull 23(2):7--18

    Google Scholar 

  • 23. Jagadish H, Mumick I, Silberschatz A (1995) View maintenance issues for the chronicle data model. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, San Jose, 22--25 May 1995, pp 113--124

  • 24. Kanellakis P, Kupert G, Reveszt P (1990) Constraint query languages. In: Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Nashville, TN, 2--4 April 1990, pp 299--313

  • 25. Keidl M, Kreutz A, Kemper A, Kossmann D (2002) A publish & subscribe architecture for distributed metadata management. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 February--1 March 2002, pp 309--320

  • 26. Lee W, Stolfo S, Mok K (1999) Mining in a data-flow environment: experience in network intrusion detection. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, 15--18 August 1999, pp 114--124

  • 27. Miranker D (1987) TREAT: a better match algorithm for AI production system matching. In: Proceedings of the 6th national conference on artificial intelligence, Seattle, 13--17 July 1987, pp 42--47

  • 28. Madden S, Franklin M (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 February--1 March 2002, pp 309--320

  • 29. Madden S, Shah M, Hellerstein J, Raman V (2002) Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, 2--6 June 2002

  • 30. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the 1st biennial conference on innovative data systems research, Asilomar, CA, 5--8 January 2003

  • 31. O'Neil P, Quass D (1997) Improved query performance with variant indexes. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, 13--15 May 1997, pp 38--49

  • 32. Raman V (2001) Interactive Query Processing. PhD thesis, University of California, Berkeley

  • 33. Shivakumar N, Garcia-Molina H (1997) Wave-indices: indexing evolving databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, 13--15 May 1997, pp 381--392

  • 34. Shah M, Hellerstein J, Chandrasekaran S, Franklin M (2003) Flux: an adaptive repartitioning operator for continuous query systems. In: Proceedings of the 19th international conference on data engineering, Bangalore, India (in press)

  • 35. Sullivan M, Heybey A (1998) Tribeca: a system for managing large databases of network traffic. In: Proceedings of the USENIX annual technical conference, New Orleans, 15--19 June 1998

  • 36. Seshadri P, Livny M, Ramakrishnan R (1994) Sequence query processing. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, 24--27 May 1994, pp 430--441

  • 37. Stonebraker M, Sellis TK, Hanson EN (1986) An analysis of rule indexing implementations in data base systems. In: Proceedings of the 1st international conference on expert database systems, Charleston, SC, 1--4 April 1986, pp 465--476

  • 38. Sistla A, Wolfson O, Chamberlain S, Dao S (1997) Modeling and querying moving objects. In: Proceedings of the 13th international conference on data engineering, Birmingham, UK, 7--11 April 1997, IEEE Computer Society, New York, pp 422--432

  • 39. Sadri R, Zaniolo C, Zarkesh A, Adibi J (2001) Optimization of sequence queries in database systems. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Santa Barbara, CA, 21--23 May 2001, pp 71--81

  • 40. Terry D, Goldberg D, Nichols D, Oki B (1992) Continuous queries over append-only databases. In: Proceedings of the 1992 ACM SIGMOD international conference on management of data, San Diego, 2--5 June 1992, pp 321--330

  • 41. Urhan T, Franklin M, Amsaleg L (1998) Cost based query scrambling for initial delays. In: Proceedings ACM SIGMOD international conference on management of data, Seattle, 2--4 June 1998, pp 130--141

  • 42. Urhan T, Franklin M (2000) XJoin: a reactively-scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27--33

    Google Scholar 

  • 43. Wilschut A, Apers P (1991) Dataflow query execution in a parallel main-memory environment. In: Proceedings of the 1st international conference on parallel and distributed information systems (PDIS 1991), Miami Beach, 4--6 December 1991, pp 68--77

  • 44. Yan TW, Garcia-Molina H (1999) The SIFT information dissemination system. ACM Trans Database Sys 24(4):529--565

    Google Scholar 

  • 45. Yang J, Widom J (2000) Temporal view self-maintenance. In: Proceedings of the 7th international conference on extending database technology, Konstanz, Germany, 27--31 March 2000, pp 395--412

  • 46. Yang J, Widom J (2001) Incremental computation and maintenance of temporal aggregates. In: Proceedings of the 17th international conference on data engineering, Heidelberg, 2--6 April 2001, IEEE Computer Society, New York, pp 51--60

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sirish Chandrasekaran.

Additional information

Received: 17 September 2002, Revised: 18 February 2003, Published online: 10 July 2003

Edited by R. Ramakrishnan

This work has been supported in part by the National Science Foundation under the ITR grants of IIS0086057 and SI0122599, and by IBM, Microsoft, Siemens, and the UC MICRO program.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chandrasekaran, S., Franklin, M.J. PSoup: a system for streaming queries over streaming data. VLDB 12, 140–156 (2003). https://doi.org/10.1007/s00778-003-0096-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-003-0096-y

Keywords:

Navigation