Query Execution Optimization Based on Incremental Update in Database Distributed Middleware
Big data is often generated incrementally in the real word. Existing incremental query optimization is mainly used in the streaming data environment. Due to the constraints of real-time streaming data applications, existing incremental execution mechanisms is difficult to directly apply to large business-oriented data in a distributed environment. This paper proposes a query execution optimization method based on incremental update in database distributed middleware. First, the proposed method defines the Reference-Graph according to tables and their foreign key relationships, based on which a data partition strategy is provided to reduce data transmission quantity during query operation. In addition, the proposed method proposes an incremental update query execution strategy and incremental intermediate result preservation mechanism in distributed environment for non-aggregate and aggregate query respectively. The combination of data partition and incremental updating strategy reduces the query execution cost and enhance the performance of complex query operation significantly. Finally, the experimental results conducted on the benchmark dataset test and verify the effectiveness of the proposed method.
KeywordsDatabase middleware Distributed database Data partition Incremental update Result set reuse
This work was supported by the Fundamental Research Funds for the Central Universities and DHU distinguished Young Professor Program No. B201312.
- 1.Kobielus, J., Evelson, B., Karel, R., Coit, C.: In-database analytics: the heart of the predictive enterprise. Forrester Researc, Cambridge, USA (2009)Google Scholar
- 2.Amoeba Software Foundation. http://docs.hexnova.com/amoeba/index.html
- 3.Alibaba Group. Cobar architecture guide. http://code.alibabatech.com/docs/cobarclient/zh
- 4.MyCat Software Foundation. https://github.com/MyCATApache/Mycat-doc
- 5.QIHU 360 software Co. Atlas architecture guide. https://github.com/Qihoo360/Atlas
- 6.Transaction Processing Performance Council. TPC BENCHMARK H: Standard Specification Revision 2.17.0. http://www.tpc.org/tpch/spec/tpch2.17.0.pdf
- 8.Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th International Conference on Data Engineering, pp. 106–114. IEEE Press, Piscataway (1996)Google Scholar
- 10.Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)Google Scholar
- 14.Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing, In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM (2008)Google Scholar