Advertisement

Programming and Computer Software

, Volume 44, Issue 1, pp 62–74 | Cite as

PosDB: An Architecture Overview

  • G. A. Chernishev
  • V. A. Galaktionov
  • V. D. Grigorev
  • E. S. Klyuchikov
  • K. K. Smirnov
Article
  • 45 Downloads

Abstract

PosDB is an engine of a disk-based column-store DBMS designed for processing OLAP queries in a shared nothing environment. It is written completely from scratch and aims to become a platform for studying the distributed query processing in column-stores. This paper presents the first comprehensive description of the system. The presentation begins with the history of column-stores in order to clarify the reasons of their success. Next, the creation of a new system is justified, and an overview of its architecture is given. Finally, all its components are described in detail. Currently, query execution in PosDB is based on the Volcano model with block-oriented processing and late materialization. Various physical operators have been developed for relational operations such as join, aggregation, and selection. Some auxiliary operators were developed to support intraquery parallelism and network communication. Data distribution is achieved using horizontal range partitioning and data replication. The current version of PosDB can execute all queries from the Star Schema Benchmark in both centralized and distributed environments.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Harizopoulos, S., Abadi, D., and Boncz, P., Column-Oriented Database Systems, VLDB 2009, Tutorial, 2009.Google Scholar
  2. 2.
    Manegold, S., Boncz, P., Nes, N., and Kersten, M., Cache-conscious radix-decluster projections, in Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB’04, Toronto: VLDB Endowment, 2004, vol. 30, pp. 684–695.Google Scholar
  3. 3.
    Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.CrossRefGoogle Scholar
  4. 4.
    Abadi, D.J., Myers, D.S., DeWitt, D.J., and Madden, S., Materialization strategies in a column-oriented DBMS, in Proceedings of ICDE, Istanbul, 2007, Chirkova, R., Dogac, A., Özsu, M.T., and Timos K. Sellis, T.K., Eds., pp. 466–475.Google Scholar
  5. 5.
    Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/x100: Hyper-pipelining query execution, in CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Asilomar, Calif., 2005, Online Proceedings, pp. 225–237. www.cidrdb.org, 2005.Google Scholar
  6. 6.
    Ivanova, I.E. and Sokolinsky, L. B., Parallel processing of very large databases using distributed column indexes, Program. Comput. Software, 2017, vol. 43, no. 3, pp. 131–144.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Idreos, S., Kersten, M.L., and Manegold, S., Database cracking, in CIDR, pp. 68–78. www.cidrdb.org, 2007.Google Scholar
  8. 8.
    Graefe, G. and Kuno, H., Self-selecting, self-tuning, incrementally optimized indexes, in Proceedings of the 13th International Conference on Extending Database Technology, EDBT’10, New York: ACM, 2010, pp. 371–381.CrossRefGoogle Scholar
  9. 9.
    Abadi, D., Madden, S., and Ferreira, M., Integrating compression and execution in column-oriented data-base systems, in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 06, New York: ACM, 2006, pp. 671–682.CrossRefGoogle Scholar
  10. 10.
    Holloway, A.L., Raman, V., Swart, G., and DeWitt, D.J., How to barter bits for chronons: Compression and bandwidth trade offs for database scans, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, New York: ACM, 2007, pp. 389–400.CrossRefGoogle Scholar
  11. 11.
    Ivanova, M., Kersten, M.L., and Nes, N., Self-organizing strategies for a columnstore database, in Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’ 08, New York: ACM, 2008, pp. 157–168.CrossRefGoogle Scholar
  12. 12.
    Shrinivas, L., Bodagala, S., Varadarajan, R., Cary, A., Bharathan, V., and Bear, C., Materialization strategies in the vertica analytic database: Lessons learned, in 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, pp. 1196–1207.CrossRefGoogle Scholar
  13. 13.
    Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L., and Graefe, G., Query processing techniques for solid state drives, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 09, New York: ACM, 2009, pp. 59–72.Google Scholar
  14. 14.
    Hankins R.A. and Patel, J.M., Data morphing: an adaptive, cache-conscious storage technique, in Proceedings of the 29th international conference on Very large data bases, VLDB’ 2003, VLDB Endowment, 2003, vol. 29, pp. 417–428.CrossRefGoogle Scholar
  15. 15.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., and Zdonik, S., Cstore: A column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’ 05, VLDB Endowment, 2005, pp. 553–564.Google Scholar
  16. 16.
    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.Google Scholar
  17. 17.
    Chernishev, G., Towards Self-management in a distributed column-store system, Cham: Springer, 2015, pp. 97–107.Google Scholar
  18. 18.
    Chernishev, G. The Design of an Adaptive Column-Store System, J. Big Data, 2017, vol. 4, no. 1, 2017.CrossRefGoogle Scholar
  19. 19.
    Graefe, G., Query evaluation techniques for large databases, ACM Comput. Surv., 1993, vol. 25, no. 2, pp. 73–169.CrossRefGoogle Scholar
  20. 20.
    O’Neil, P.E., O’Neil, E.J., and Chen, X., The star schema benchmark (SSB). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2009. Accessed September 10, 2017.Google Scholar
  21. 21.
    Chernishev, G., Galaktionov, V., Grigorev, V., Klyuchikov, E., and Smirnov, K. A study of PosDB Performance in a Distributed Environment, in Proceedings of the 2017 Software Engineering and Information Management, SEIM’ 17, 2017.Google Scholar
  22. 22.
    Karasalo, I. and Svensson, P., The design of cantor: A new system for data analysis, in Proceedings of the 3rd international workshop on Statistical and scientific database management, Berkeley, 1986, pp. 224–244.Google Scholar
  23. 23.
    Copeland, G.P. and Khoshafian, S.N., A decomposition storage model, SIGMOD Rec., 1985, vol. 14, no. 4, pp. 268–279.CrossRefGoogle Scholar
  24. 24.
    Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proceedings of the Third International Conference on Data Engineering, Washington, 1987, pp. 636–643.Google Scholar
  25. 25.
    Shao, M., Schindler, J., Schlosser, S.W., Ailamaki, A., and Ganger. G.R., Clotho: Decoupling memory page layout from storage organization, in Proceedings of the Thirtieth international conference on Very large data bases, VLDB’ 04, VLDB Endowment, 2004, vol. 30, pp. 696–707.Google Scholar
  26. 26.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., and Skounakis, M., Weaving relations for cache performance, in Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’ 01, San Francisco, 2001, pp. 169–180.Google Scholar
  27. 27.
    Abadi, D., Boncz, P., and Harizopoulos, S., The Design and Implementation of Modern Column-Oriented Database Systems, Hanover, Mass.: Now, 2013.Google Scholar
  28. 28.
    Chernyshev, G., Physical Design Approaches for Column-Stores, Tr.St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, vol. 7, pp. 204–222.Google Scholar
  29. 29.
    Abadi, D., Boncz, P., and Harizopoulos, S., Columnoriented database systems, VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.CrossRefGoogle Scholar
  30. 30.
    OLAP, in editors, Encyclopedia of Database Systems, Liu, Ling and Özsu, M.T., Eds., Springer, 2009, pp. 1947–1947. doi 10.1007/978-0-387-39940-9_3191Google Scholar
  31. 31.
    Bellatreche, L. and Benkrid, S., A joint design approach of partitioning and allocation in parallel data warehouses, in Data Warehousing and Knowledge Discovery, Pedersen, T., Mohania, M., and Tjoa, A., Eds., Lecture Notes in Computer Science, vol. 5691, pp. 99–110, Berlin: Springer, 2009. doi 10.1007/978-3-642-03730-6_9Google Scholar
  32. 32.
    Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., and Wang, S., ScaMMDB: Facing Challenge of Mass Data Processing with MMDB, Berlin: Springer, 2009, pp. 1–12.Google Scholar
  33. 33.
    Liu, Y., Cao, F., Mortazavi, M., Chen, M., Yan, N., Ku, C., Adnaik, A., Morgan, S., Shi, G., Wang, Y., and Fang, F., DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics, Cham: Springer, 2015, pp. 289–299Google Scholar
  34. 34.
    Arulraj, J., Pavlo, A., and Menon, P., Bridging the archipelago between row-stores and column-stores for hybrid workloads, in Proceedings of the 2016 International Conference on Management of Data, SIGMOD’16, 2016, pp. 583–598.CrossRefGoogle Scholar
  35. 35.
    Google. Supersonic library. https://code.google.com/archive/p/supersonic/, 2017. Accessed February 12, 2017.Google Scholar
  36. 36.
    DeWitt, D. and Gray, J., Parallel database systems: The future of high performance database systems, Commun. ACM, 1992, vol. 35, no. 6, pp. 85–98.CrossRefGoogle Scholar
  37. 37.
    Kossmann, D., The state of the art in distributed query processing, ACM Comput. Surv., 2000, vol. 32, no. 4, pp. 422–469.CrossRefGoogle Scholar
  38. 38.
    Tran, N., Lamb, A., Shrinivas, L., Bodagala, S., and Dave, J., The Vertica query optimizer: The case for specialized query optimizers, in IEEE 30th International Conference on Data Engineering, 2014, pp. 1108–1119.Google Scholar
  39. 39.
    Graefe, G., Volcano—an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., 1994, no. 1, pp. 120–135.CrossRefGoogle Scholar
  40. 40.
    Neumann, T., Efficiently compiling efficient query plans for modern hardware, VLDB Endowment, 2011, Vol. 4, no. p, pp. 539–550.CrossRefGoogle Scholar
  41. 41.
    Padmanabhan, S., Malkemus, T., Agarwal, R.C., and Jhingran, A., Block oriented processing of relational database operations in modern computer architectures, in Proceedings of the 17th International Conference on Data Engineering, Washington, 2001, pp. 567–574.CrossRefGoogle Scholar
  42. 42.
    Zukowski, M., Nes, N. and Boncz, P., Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing, in Proceedings of the 4th International Workshop on Data Management on New Hardware, DaMoN’ 08, New York, 2008, pp. 47–54.CrossRefGoogle Scholar
  43. 43.
    Jacobs, A., The pathologies of big data, Commun. ACM, 2009, vol. 52, no. 8, pp. 36–44.CrossRefGoogle Scholar
  44. 44.
    Li Zhe and Ross, K.A., Fast joins using join indices, VLDB J., 1999, vol. 8, no. pp. 1–24.CrossRefGoogle Scholar
  45. 45.
    Neumann, T., Efficient generation and execution of DAG-structured query graphs, Doctoral Dissertation, 2005.Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  • G. A. Chernishev
    • 1
    • 2
  • V. A. Galaktionov
    • 1
  • V. D. Grigorev
    • 1
  • E. S. Klyuchikov
    • 1
  • K. K. Smirnov
    • 1
  1. 1.St. Petersburg State UniversitySt. PetersburgRussia
  2. 2.JetBrains ResearchSt. PetersburgRussia

Personalised recommendations