Organizing and Accessing Data in SanssouciDB
Providing enterprise users with the information they require, when they require it is not just a question of using the latest technology to store information in an efficient manner. Enterprise application developers and users need ways of accessing and storing their information that are suited to the tasks they wish to carry out. This includes things like making sure the most relevant data is always stored close to the CPU for fast access, while data that is no longer required for the day-to-day running of the business is stored in slower, cheaper storage; allowing new columns to be added to tables if customers need to customize an application; and allowing developers to choose the most efficient storage strategy for their particular task. The work of an application developer can also be made easier if they are able to read and write data in a way that fits in with the business process they are modeling. Finally, users and developers also need to have confidence that their database will be available when they need it, and that they will not lose their data if the power goes out or a hardware component fails. In this chapter we describe how data is accessed and organized in SanssouciDB to meet the requirements we have outlined. Enterprise applications place specific demands on a DBMS beyond those of just processing large amounts of data quickly. Chief amongst them is the ability to process both small, write-intensive transactions, and complex, long-running queries and transactions in a single system. As we have seen, column storage is well suited to the latter but it does not perform well on the single-row inserts which characterize write-intensive transactions. For this reason inserts can be directed to a write-optimized differential store and merged, when appropriate, with main storage. Scheduling these different workloads in an efficient manner is integral to the performance of the system. We do not want a long running query to block the entire system, but we also need to make sure that large numbers of small transactions do not swamp it. Another requirement that enterprise applications impose on a DBMS are that historical data must be kept for reporting and legal reasons, and it should be possible to store and access this data without severely impacting the performance of the system. The ability of enterprise applications to recover from failures without a loss of data and to have as little downtime as possible is also very important. In addition certain database operations, namely aggregation and join, form the majority of operations carried out by the database in enterprise applications, and special algorithms are required to take full advantage of the parallelization opportunities presented by modern multi-core processors.