PosDB: An Architecture Overview

Abstract

PosDB is an engine of a disk-based column-store DBMS designed for processing OLAP queries in a shared nothing environment. It is written completely from scratch and aims to become a platform for studying the distributed query processing in column-stores. This paper presents the first comprehensive description of the system. The presentation begins with the history of column-stores in order to clarify the reasons of their success. Next, the creation of a new system is justified, and an overview of its architecture is given. Finally, all its components are described in detail. Currently, query execution in PosDB is based on the Volcano model with block-oriented processing and late materialization. Various physical operators have been developed for relational operations such as join, aggregation, and selection. Some auxiliary operators were developed to support intraquery parallelism and network communication. Data distribution is achieved using horizontal range partitioning and data replication. The current version of PosDB can execute all queries from the Star Schema Benchmark in both centralized and distributed environments.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Harizopoulos, S., Abadi, D., and Boncz, P., Column-Oriented Database Systems, VLDB 2009, Tutorial, 2009.

    Google Scholar 

  2. 2.

    Manegold, S., Boncz, P., Nes, N., and Kersten, M., Cache-conscious radix-decluster projections, in Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB’04, Toronto: VLDB Endowment, 2004, vol. 30, pp. 684–695.

    Google Scholar 

  3. 3.

    Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.

    Google Scholar 

  4. 4.

    Abadi, D.J., Myers, D.S., DeWitt, D.J., and Madden, S., Materialization strategies in a column-oriented DBMS, in Proceedings of ICDE, Istanbul, 2007, Chirkova, R., Dogac, A., Özsu, M.T., and Timos K. Sellis, T.K., Eds., pp. 466–475.

  5. 5.

    Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/x100: Hyper-pipelining query execution, in CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Asilomar, Calif., 2005, Online Proceedings, pp. 225–237. www.cidrdb.org, 2005.

  6. 6.

    Ivanova, I.E. and Sokolinsky, L. B., Parallel processing of very large databases using distributed column indexes, Program. Comput. Software, 2017, vol. 43, no. 3, pp. 131–144.

    MathSciNet  Article  Google Scholar 

  7. 7.

    Idreos, S., Kersten, M.L., and Manegold, S., Database cracking, in CIDR, pp. 68–78. www.cidrdb.org, 2007.

  8. 8.

    Graefe, G. and Kuno, H., Self-selecting, self-tuning, incrementally optimized indexes, in Proceedings of the 13th International Conference on Extending Database Technology, EDBT’10, New York: ACM, 2010, pp. 371–381.

    Google Scholar 

  9. 9.

    Abadi, D., Madden, S., and Ferreira, M., Integrating compression and execution in column-oriented data-base systems, in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 06, New York: ACM, 2006, pp. 671–682.

    Google Scholar 

  10. 10.

    Holloway, A.L., Raman, V., Swart, G., and DeWitt, D.J., How to barter bits for chronons: Compression and bandwidth trade offs for database scans, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, New York: ACM, 2007, pp. 389–400.

    Google Scholar 

  11. 11.

    Ivanova, M., Kersten, M.L., and Nes, N., Self-organizing strategies for a columnstore database, in Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’ 08, New York: ACM, 2008, pp. 157–168.

    Google Scholar 

  12. 12.

    Shrinivas, L., Bodagala, S., Varadarajan, R., Cary, A., Bharathan, V., and Bear, C., Materialization strategies in the vertica analytic database: Lessons learned, in 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, pp. 1196–1207.

    Google Scholar 

  13. 13.

    Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L., and Graefe, G., Query processing techniques for solid state drives, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 09, New York: ACM, 2009, pp. 59–72.

    Google Scholar 

  14. 14.

    Hankins R.A. and Patel, J.M., Data morphing: an adaptive, cache-conscious storage technique, in Proceedings of the 29th international conference on Very large data bases, VLDB’ 2003, VLDB Endowment, 2003, vol. 29, pp. 417–428.

    Article  Google Scholar 

  15. 15.

    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., and Zdonik, S., Cstore: A column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’ 05, VLDB Endowment, 2005, pp. 553–564.

    Google Scholar 

  16. 16.

    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.

    Google Scholar 

  17. 17.

    Chernishev, G., Towards Self-management in a distributed column-store system, Cham: Springer, 2015, pp. 97–107.

    Google Scholar 

  18. 18.

    Chernishev, G. The Design of an Adaptive Column-Store System, J. Big Data, 2017, vol. 4, no. 1, 2017.

    Article  Google Scholar 

  19. 19.

    Graefe, G., Query evaluation techniques for large databases, ACM Comput. Surv., 1993, vol. 25, no. 2, pp. 73–169.

    Article  Google Scholar 

  20. 20.

    O’Neil, P.E., O’Neil, E.J., and Chen, X., The star schema benchmark (SSB). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2009. Accessed September 10, 2017.

  21. 21.

    Chernishev, G., Galaktionov, V., Grigorev, V., Klyuchikov, E., and Smirnov, K. A study of PosDB Performance in a Distributed Environment, in Proceedings of the 2017 Software Engineering and Information Management, SEIM’ 17, 2017.

    Google Scholar 

  22. 22.

    Karasalo, I. and Svensson, P., The design of cantor: A new system for data analysis, in Proceedings of the 3rd international workshop on Statistical and scientific database management, Berkeley, 1986, pp. 224–244.

    Google Scholar 

  23. 23.

    Copeland, G.P. and Khoshafian, S.N., A decomposition storage model, SIGMOD Rec., 1985, vol. 14, no. 4, pp. 268–279.

    Article  Google Scholar 

  24. 24.

    Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proceedings of the Third International Conference on Data Engineering, Washington, 1987, pp. 636–643.

    Google Scholar 

  25. 25.

    Shao, M., Schindler, J., Schlosser, S.W., Ailamaki, A., and Ganger. G.R., Clotho: Decoupling memory page layout from storage organization, in Proceedings of the Thirtieth international conference on Very large data bases, VLDB’ 04, VLDB Endowment, 2004, vol. 30, pp. 696–707.

    Google Scholar 

  26. 26.

    Ailamaki, A., DeWitt, D.J., Hill, M.D., and Skounakis, M., Weaving relations for cache performance, in Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’ 01, San Francisco, 2001, pp. 169–180.

    Google Scholar 

  27. 27.

    Abadi, D., Boncz, P., and Harizopoulos, S., The Design and Implementation of Modern Column-Oriented Database Systems, Hanover, Mass.: Now, 2013.

    Google Scholar 

  28. 28.

    Chernyshev, G., Physical Design Approaches for Column-Stores, Tr.St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, vol. 7, pp. 204–222.

    Google Scholar 

  29. 29.

    Abadi, D., Boncz, P., and Harizopoulos, S., Columnoriented database systems, VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.

    Article  Google Scholar 

  30. 30.

    OLAP, in editors, Encyclopedia of Database Systems, Liu, Ling and Özsu, M.T., Eds., Springer, 2009, pp. 1947–1947. doi 10.1007/978-0-387-39940-9_3191

  31. 31.

    Bellatreche, L. and Benkrid, S., A joint design approach of partitioning and allocation in parallel data warehouses, in Data Warehousing and Knowledge Discovery, Pedersen, T., Mohania, M., and Tjoa, A., Eds., Lecture Notes in Computer Science, vol. 5691, pp. 99–110, Berlin: Springer, 2009. doi 10.1007/978-3-642-03730-6_9

  32. 32.

    Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., and Wang, S., ScaMMDB: Facing Challenge of Mass Data Processing with MMDB, Berlin: Springer, 2009, pp. 1–12.

    Google Scholar 

  33. 33.

    Liu, Y., Cao, F., Mortazavi, M., Chen, M., Yan, N., Ku, C., Adnaik, A., Morgan, S., Shi, G., Wang, Y., and Fang, F., DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics, Cham: Springer, 2015, pp. 289–299

    Google Scholar 

  34. 34.

    Arulraj, J., Pavlo, A., and Menon, P., Bridging the archipelago between row-stores and column-stores for hybrid workloads, in Proceedings of the 2016 International Conference on Management of Data, SIGMOD’16, 2016, pp. 583–598.

    Google Scholar 

  35. 35.

    Google. Supersonic library. https://code.google.com/archive/p/supersonic/, 2017. Accessed February 12, 2017.

  36. 36.

    DeWitt, D. and Gray, J., Parallel database systems: The future of high performance database systems, Commun. ACM, 1992, vol. 35, no. 6, pp. 85–98.

    Article  Google Scholar 

  37. 37.

    Kossmann, D., The state of the art in distributed query processing, ACM Comput. Surv., 2000, vol. 32, no. 4, pp. 422–469.

    Article  Google Scholar 

  38. 38.

    Tran, N., Lamb, A., Shrinivas, L., Bodagala, S., and Dave, J., The Vertica query optimizer: The case for specialized query optimizers, in IEEE 30th International Conference on Data Engineering, 2014, pp. 1108–1119.

    Google Scholar 

  39. 39.

    Graefe, G., Volcano—an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., 1994, no. 1, pp. 120–135.

    Article  Google Scholar 

  40. 40.

    Neumann, T., Efficiently compiling efficient query plans for modern hardware, VLDB Endowment, 2011, Vol. 4, no. p, pp. 539–550.

    Article  Google Scholar 

  41. 41.

    Padmanabhan, S., Malkemus, T., Agarwal, R.C., and Jhingran, A., Block oriented processing of relational database operations in modern computer architectures, in Proceedings of the 17th International Conference on Data Engineering, Washington, 2001, pp. 567–574.

    Google Scholar 

  42. 42.

    Zukowski, M., Nes, N. and Boncz, P., Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing, in Proceedings of the 4th International Workshop on Data Management on New Hardware, DaMoN’ 08, New York, 2008, pp. 47–54.

    Google Scholar 

  43. 43.

    Jacobs, A., The pathologies of big data, Commun. ACM, 2009, vol. 52, no. 8, pp. 36–44.

    Article  Google Scholar 

  44. 44.

    Li Zhe and Ross, K.A., Fast joins using join indices, VLDB J., 1999, vol. 8, no. pp. 1–24.

    Article  Google Scholar 

  45. 45.

    Neumann, T., Efficient generation and execution of DAG-structured query graphs, Doctoral Dissertation, 2005.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to G. A. Chernishev.

Additional information

Original Russian Text © G.A. Chernishev, V.A. Galaktionov, V.D. Grigorev, E.S. Klyuchikov, K.K. Smirnov, 2018, published in Programmirovanie, 2018, Vol. 44, No. 1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chernishev, G.A., Galaktionov, V.A., Grigorev, V.D. et al. PosDB: An Architecture Overview. Program Comput Soft 44, 62–74 (2018). https://doi.org/10.1134/S0361768818010024

Download citation