Architecting a Network Query Engine for Producing Partial Results
The growth of the Internet has made it possible to query data in all corners of the globe. This trend is being abetted by the emergence of standards for data representation, such as XML. In face of this exciting opportunity, however, existing query engines need to be changed in order to use them to effectively query the Internet. One of the challenges is providing partial results of query computation, based on the initial portion of the input, because it may be undesirable to wait for all of the input. This situation is due to (a) limited data transfer bandwidth (b) temporary unavailability of sites and (c) intrinsically long-running queries (e.g., continual queries or triggers). A major issue in providing partial results is dealing with non-monotonic operators, such as sort, average, negation and nest, because these operators need to see all of their input before they can produce the correct output. While previous work on producing partial results has looked at a limited set of non-monotonic operators, emerging hierarchical standards such as XML, which are heavily nested, and sophisticated queries require more general solutions to the problem. In this paper, we define the semantics of partial results and outline mechanisms for ensuring these semantics for queries with arbitrary non-monotonic operators. Re-architecting a query engine to produce partial results requires modifications to the implementations of operators. We explore implementation alternatives and quantitatively compare their effectiveness using the Niagara prototype system.
KeywordsMonotonic Operator Hash Table Partial Result Input Stream Query Execution
Unable to display preview. Download preview PDF.
- 1.T. Bray, J. Paoli, C. M. Sperberg-McQueen, “Extensible Markup Language (XML) 1.0”, http://www.w3.org/TR/REC-xml.
- 2.J. Chen, D. DeWitt, F. Tian, Y. Wang, “NiagaraCQ: A Scalable Continuous Query System for Internet Databases,” Proceedings of the SIGMOD Conference, Dallas, Texas (2000).Google Scholar
- 3.J. M. Hellerstein, P. J. Haas, H. Wang, “Online Aggregation”, Proceedings of the SIGMOD Conference, Tuscon, Arizona (1997).Google Scholar
- 4.Z. G. Ives, D. Florescu, M. Friedman, A. Levy, D. S. Weld, “An Adaptive Query Execution System for Data Integration”, Proceedings of the SIGMOD Conference, Philadelphia, Pennsylvania (1999).Google Scholar
- 5.Z. G. Ives, A. Y. Levy, D. S. Weld. Efficient Evaluation of Regular Path Expressions on Streaming XML Data. Technical Report UW-CSE-2000-05-02, University of Washington.Google Scholar
- 6.L. Liu, C. Pu, R. Barga, T. Zhou, “Differential Evaluation of Continual Queries”, Proceedings of the International Conference on Distributed Computing Systems (1996).Google Scholar
- 7.K. Tan, C. H. Goh, B. C. Ooi, “Online Feedback for Nested Aggregate Queries with Multi-Threading”, Proceedings of the VLDB Conference, Edinburgh, Scotland (1999).Google Scholar
- 8.T. Urhan, M. J. Franklin, “XJoin: Getting Fast Answers from Slow and Bursty Networks”, University of Maryland Technical Report, UMIACS-TR-99-13 (1999).Google Scholar
- 9.T. Urhan, M. J. Franklin, L. Amsaleg, “Cost Based Query Scrambling for Initial Delays”, Proceedings of the SIGMOD Conference, Seattle, Washington (1998).Google Scholar
- 10.A. N. Wilschut, P. M. G. Apers, “Data Flow Query Execution in a Parallel Main Memory Environment”, International Conference on Parallel and Distributed Information Systems (1991).Google Scholar