PosDB: A Distributed Column-Store Engine
- 925 Downloads
In this paper we present a novel disk-based distributed column-store, describe its architecture and discuss a number of technical solutions. Our system is essentially a query engine which was written completely from scratch. It is aimed for shared-nothing environments and supports different forms of parallel query processing.
Query processing in PosDB is organized according to the classic Volcano pull-based model which is adapted for the column-store case. Currently, we support late materialization only, and therefore employ a join index data structure to represent positional information. In our system query plan can consist of both positional and value operators. PosDB has about a dozen of core operators among which several variants of selections and joins, aggregation. We also have several operators that ensure intra-query parallelism and operators for network interoperability. In its current state the system is fully capable of processing the Star Schema Benchmark in a local and distributed environment.
KeywordsColumn Stores Star Schema Benchmark Query Plan Shared-nothing Environment Tuple Reconstruction
- 1.O’Neil, P.E., O’Neil, E.J., Chen, X.: The Star Schema Benchmark (SSB) (2009). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF. Accessed 20 July 2012
- 2.Google Supersonic Library (2017). https://code.google.com/archive/p/supersonic/. Accessed 12 February 2017
- 3.Abadi, D., Boncz, P., Harizopoulos, S.: The Design and Implementation of Modern Column-Oriented Database Systems. Now Publishers Inc., Hanover, massachusetts (2013)Google Scholar
- 5.Arulraj, J., Pavlo, A., Menon, P.: Bridging the Archipelago between row-stores and column-stores for hybrid workloads. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, pp. 583–598 (2016)Google Scholar
- 6.Chernishev, G.: Physical design approaches for column-stores. SPIIRAS Proceedings 30, 204–222 (2013)Google Scholar
- 9.Chernishev, G., Galaktionov, V., Grigorev, V., Klyuchikov, E., Smirnov, K.: A study of PosDB performance in a distributed environment. In: Proceedings of the 2017 Software Engineering and Information Management, SEIM 2017 (2017)Google Scholar
- 11.Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)Google Scholar
- 12.Liu, Y., et al.: DCODE: A distributed column-oriented database engine for big data analytics. In: Khalil, I., Neuhold, E., Tjoa, A.M., Da Xu, L., You, I. (eds.) CONFENIS/ICT-EurAsia -2015. LNCS, vol. 9357, pp. 289–299. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24315-3_30 CrossRefGoogle Scholar
- 13.Stonebraker, M., et al.: C-Store: A column-oriented DBMS. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 553–564. VLDB Endowment (2005)Google Scholar