A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

  • Carlos OrdonezEmail author
  • Ladjel Bellatreche
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 903)


Big data requirements have revolutionized database technology, bringing many innovative and revamped DBMSs to process transactional (OLTP) or demanding query workloads (cubes, exploration, pre-processing). Parallel and main memory processing have become important features to exploit new hardware and cope with data volume. With such landscape in mind, we present a survey comparing modern row and columnar DBMSs, contrasting their ability to write data (storage mechanisms, transaction processing, batch loading, enforcing ACID) and their ability to read data (query processing, physical operators, sequential vs parallel). We provide a unifying view of alternative storage mechanisms, database algorithms and query optimizations used across diverse DBMSs. We contrast the architecture and processing of a parallel DBMS with an HPC system. We cover the full spectrum of subsystems going from storage to query processing. We consider parallel processing and the impact of much larger RAM, which brings back main-memory databases. We then discuss important parallel aspects including speedup, sequential bottlenecks, data redistribution, high speed networks, main memory processing with larger RAM and fault-tolerance at query processing time. We outline an agenda for future research.



The first author thanks the guidance from Michael Stonebraker to understand query processing based on columnar storage, arrays of unlimited size to support mathematical analytics and lock-free transaction processing in main memory.


Authors and Affiliations

  1. 1.University of HoustonHoustonUSA
  2. 2.LIAS/ISAE-ENSMAPoitiersFrance

