Advertisement

Page-Flow in Query Engine Grid

  • Qiming Chen
  • Meichun Hsu
  • Ren Wu
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 443)

Abstract

As scaling out applications with multiple servers has become a popular industry practice, we investigate collaborating distributed Query Engines (QEs) to support graph-structured SQL dataflow processes. A SQL dataflow process consists of queries (optionally with UDFs) linked with relational dataflow. We focus on using Distributed Caching Platform (DCP) for inter-QEs data communication. While DCP has gained popularity lately, exchanging query results tuple-by-tuple through DCP is often inefficient due to the tiny granularity of cache access and the overhead of data conversion and interpretation. This has motivated us to explore a new and more efficient mechanism for inter-QEs communication, taking advantage of DCP’s binary protocol. We propose the page-flow approach characterized by extending and externalizing the database buffer pool to DCP to allow the producer QE to put query results as data pages (blocks) to the DCP to be retrieved by the consumer QE. In this way, the relational dataflow logically becomes binary page-flow; the tuples contained in the transferred pages are exactly in the format required by the relational operators thus can be feed in queries directly without any conversion. Further, using pages as mini-batches of tuples, enhances the latency of DCP access. We have implemented this mechanism on a cluster of PostgreSQL engines. Our experiments results demonstrate its value.

Keywords

database buffer management distributed caching platform 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abouzeid, Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: VLDB 2009 (2009)Google Scholar
  2. 2.
    Nori, A.: Distributed Caching Platforms. In: VLDB 2010 (2010)Google Scholar
  3. 3.
    Baer, J., Wang, W.: On the inclusion properties for multi-level cache hierarchies. In: Proc. ISCA 1988 (1988)Google Scholar
  4. 4.
    Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, CMU-CS-07-128 (2007)Google Scholar
  5. 5.
    Chen, Q., Hsu, M., Zeller, H.: Experience in Continuous analytics as a Service (CaaaS). In: EDBT 2011 (2011)Google Scholar
  6. 6.
    Chen, Q., Hsu, M.: Query Engine Net for Streaming Analytics. In: Proc. 19th International Conference on Cooperative Information Systems, CoopIS (2011)Google Scholar
  7. 7.
    DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation And Data Management System. In: VLDB 2008 (2008)Google Scholar
  8. 8.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (March 2007)Google Scholar
  9. 9.
    Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a NetworkEffect World. In: CIDR 2009 (2009)Google Scholar
  10. 10.
    Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)Google Scholar
  11. 11.
    Memcached (2010), http://www.memcached.org/
  12. 12.
  13. 13.
  14. 14.
    Vmware vFabric GemFire (2010), http://www.gemstone.com/
  15. 15.
    IBM Websphere Extreme Scale Cache (2010), http://www.ibm.com/
  16. 16.
    AppFabric Cache (2010), http://msdn.microsoft.com/
  17. 17.
    Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)Google Scholar
  18. 18.
    The Wafflegrid Project, http://www.wafflegrid.com/

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.HP LabsPalo AltoUSA

Personalised recommendations