DBMS Data Loading: An Analysis on Modern Hardware
- First Online:
- Cite this paper as:
- Dziedzic A., Karpathiotakis M., Alagiannis I., Appuswamy R., Ailamaki A. (2017) DBMS Data Loading: An Analysis on Modern Hardware. In: Blanas S., Bordawekar R., Lahiri T., Levandoski J., Pavlo A. (eds) Data Management on New Hardware. IMDM 2016, ADMS 2016. Lecture Notes in Computer Science, vol 10195. Springer, Cham
Data loading has traditionally been considered a “one-time deal” – an offline process out of the critical path of query execution. The architecture of DBMS is aligned with this assumption. Nevertheless, the rate in which data is produced and gathered nowadays has nullified the “one-off” assumption, and has turned data loading into a major bottleneck of the data analysis pipeline.
This paper analyzes the behavior of modern DBMS in order to quantify their ability to fully exploit multicore processors and modern storage hardware during data loading. We examine multiple state-of-the-art DBMS, a variety of hardware configurations, and a combination of synthetic and real-world datasets to identify bottlenecks in the data loading process and to provide guidelines on how to accelerate data loading. Our findings show that modern DBMS are unable to saturate the available hardware resources. We therefore identify opportunities to accelerate data loading.