PosDB is an engine of a disk-based column-store DBMS designed for processing OLAP queries in a shared nothing environment. It is written completely from scratch and aims to become a platform for studying the distributed query processing in column-stores. This paper presents the first comprehensive description of the system. The presentation begins with the history of column-stores in order to clarify the reasons of their success. Next, the creation of a new system is justified, and an overview of its architecture is given. Finally, all its components are described in detail. Currently, query execution in PosDB is based on the Volcano model with block-oriented processing and late materialization. Various physical operators have been developed for relational operations such as join, aggregation, and selection. Some auxiliary operators were developed to support intraquery parallelism and network communication. Data distribution is achieved using horizontal range partitioning and data replication. The current version of PosDB can execute all queries from the Star Schema Benchmark in both centralized and distributed environments.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Harizopoulos, S., Abadi, D., and Boncz, P., Column-Oriented Database Systems, VLDB 2009, Tutorial, 2009.
Manegold, S., Boncz, P., Nes, N., and Kersten, M., Cache-conscious radix-decluster projections, in Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB’04, Toronto: VLDB Endowment, 2004, vol. 30, pp. 684–695.
Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.
Abadi, D.J., Myers, D.S., DeWitt, D.J., and Madden, S., Materialization strategies in a column-oriented DBMS, in Proceedings of ICDE, Istanbul, 2007, Chirkova, R., Dogac, A., Özsu, M.T., and Timos K. Sellis, T.K., Eds., pp. 466–475.
Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/x100: Hyper-pipelining query execution, in CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Asilomar, Calif., 2005, Online Proceedings, pp. 225–237. www.cidrdb.org, 2005.
Ivanova, I.E. and Sokolinsky, L. B., Parallel processing of very large databases using distributed column indexes, Program. Comput. Software, 2017, vol. 43, no. 3, pp. 131–144.
Idreos, S., Kersten, M.L., and Manegold, S., Database cracking, in CIDR, pp. 68–78. www.cidrdb.org, 2007.
Graefe, G. and Kuno, H., Self-selecting, self-tuning, incrementally optimized indexes, in Proceedings of the 13th International Conference on Extending Database Technology, EDBT’10, New York: ACM, 2010, pp. 371–381.
Abadi, D., Madden, S., and Ferreira, M., Integrating compression and execution in column-oriented data-base systems, in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 06, New York: ACM, 2006, pp. 671–682.
Holloway, A.L., Raman, V., Swart, G., and DeWitt, D.J., How to barter bits for chronons: Compression and bandwidth trade offs for database scans, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, New York: ACM, 2007, pp. 389–400.
Ivanova, M., Kersten, M.L., and Nes, N., Self-organizing strategies for a columnstore database, in Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’ 08, New York: ACM, 2008, pp. 157–168.
Shrinivas, L., Bodagala, S., Varadarajan, R., Cary, A., Bharathan, V., and Bear, C., Materialization strategies in the vertica analytic database: Lessons learned, in 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, pp. 1196–1207.
Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L., and Graefe, G., Query processing techniques for solid state drives, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 09, New York: ACM, 2009, pp. 59–72.
Hankins R.A. and Patel, J.M., Data morphing: an adaptive, cache-conscious storage technique, in Proceedings of the 29th international conference on Very large data bases, VLDB’ 2003, VLDB Endowment, 2003, vol. 29, pp. 417–428.
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., and Zdonik, S., Cstore: A column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’ 05, VLDB Endowment, 2005, pp. 553–564.
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.
Chernishev, G., Towards Self-management in a distributed column-store system, Cham: Springer, 2015, pp. 97–107.
Chernishev, G. The Design of an Adaptive Column-Store System, J. Big Data, 2017, vol. 4, no. 1, 2017.
Graefe, G., Query evaluation techniques for large databases, ACM Comput. Surv., 1993, vol. 25, no. 2, pp. 73–169.
O’Neil, P.E., O’Neil, E.J., and Chen, X., The star schema benchmark (SSB). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2009. Accessed September 10, 2017.
Chernishev, G., Galaktionov, V., Grigorev, V., Klyuchikov, E., and Smirnov, K. A study of PosDB Performance in a Distributed Environment, in Proceedings of the 2017 Software Engineering and Information Management, SEIM’ 17, 2017.
Karasalo, I. and Svensson, P., The design of cantor: A new system for data analysis, in Proceedings of the 3rd international workshop on Statistical and scientific database management, Berkeley, 1986, pp. 224–244.
Copeland, G.P. and Khoshafian, S.N., A decomposition storage model, SIGMOD Rec., 1985, vol. 14, no. 4, pp. 268–279.
Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proceedings of the Third International Conference on Data Engineering, Washington, 1987, pp. 636–643.
Shao, M., Schindler, J., Schlosser, S.W., Ailamaki, A., and Ganger. G.R., Clotho: Decoupling memory page layout from storage organization, in Proceedings of the Thirtieth international conference on Very large data bases, VLDB’ 04, VLDB Endowment, 2004, vol. 30, pp. 696–707.
Ailamaki, A., DeWitt, D.J., Hill, M.D., and Skounakis, M., Weaving relations for cache performance, in Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’ 01, San Francisco, 2001, pp. 169–180.
Abadi, D., Boncz, P., and Harizopoulos, S., The Design and Implementation of Modern Column-Oriented Database Systems, Hanover, Mass.: Now, 2013.
Chernyshev, G., Physical Design Approaches for Column-Stores, Tr.St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, vol. 7, pp. 204–222.
Abadi, D., Boncz, P., and Harizopoulos, S., Columnoriented database systems, VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.
OLAP, in editors, Encyclopedia of Database Systems, Liu, Ling and Özsu, M.T., Eds., Springer, 2009, pp. 1947–1947. doi 10.1007/978-0-387-39940-9_3191
Bellatreche, L. and Benkrid, S., A joint design approach of partitioning and allocation in parallel data warehouses, in Data Warehousing and Knowledge Discovery, Pedersen, T., Mohania, M., and Tjoa, A., Eds., Lecture Notes in Computer Science, vol. 5691, pp. 99–110, Berlin: Springer, 2009. doi 10.1007/978-3-642-03730-6_9
Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., and Wang, S., ScaMMDB: Facing Challenge of Mass Data Processing with MMDB, Berlin: Springer, 2009, pp. 1–12.
Liu, Y., Cao, F., Mortazavi, M., Chen, M., Yan, N., Ku, C., Adnaik, A., Morgan, S., Shi, G., Wang, Y., and Fang, F., DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics, Cham: Springer, 2015, pp. 289–299
Arulraj, J., Pavlo, A., and Menon, P., Bridging the archipelago between row-stores and column-stores for hybrid workloads, in Proceedings of the 2016 International Conference on Management of Data, SIGMOD’16, 2016, pp. 583–598.
Google. Supersonic library. https://code.google.com/archive/p/supersonic/, 2017. Accessed February 12, 2017.
DeWitt, D. and Gray, J., Parallel database systems: The future of high performance database systems, Commun. ACM, 1992, vol. 35, no. 6, pp. 85–98.
Kossmann, D., The state of the art in distributed query processing, ACM Comput. Surv., 2000, vol. 32, no. 4, pp. 422–469.
Tran, N., Lamb, A., Shrinivas, L., Bodagala, S., and Dave, J., The Vertica query optimizer: The case for specialized query optimizers, in IEEE 30th International Conference on Data Engineering, 2014, pp. 1108–1119.
Graefe, G., Volcano—an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., 1994, no. 1, pp. 120–135.
Neumann, T., Efficiently compiling efficient query plans for modern hardware, VLDB Endowment, 2011, Vol. 4, no. p, pp. 539–550.
Padmanabhan, S., Malkemus, T., Agarwal, R.C., and Jhingran, A., Block oriented processing of relational database operations in modern computer architectures, in Proceedings of the 17th International Conference on Data Engineering, Washington, 2001, pp. 567–574.
Zukowski, M., Nes, N. and Boncz, P., Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing, in Proceedings of the 4th International Workshop on Data Management on New Hardware, DaMoN’ 08, New York, 2008, pp. 47–54.
Jacobs, A., The pathologies of big data, Commun. ACM, 2009, vol. 52, no. 8, pp. 36–44.
Li Zhe and Ross, K.A., Fast joins using join indices, VLDB J., 1999, vol. 8, no. pp. 1–24.
Neumann, T., Efficient generation and execution of DAG-structured query graphs, Doctoral Dissertation, 2005.
Original Russian Text © G.A. Chernishev, V.A. Galaktionov, V.D. Grigorev, E.S. Klyuchikov, K.K. Smirnov, 2018, published in Programmirovanie, 2018, Vol. 44, No. 1.
About this article
Cite this article
Chernishev, G.A., Galaktionov, V.A., Grigorev, V.D. et al. PosDB: An Architecture Overview. Program Comput Soft 44, 62–74 (2018). https://doi.org/10.1134/S0361768818010024