Abstract
Traditional disk-resident OLTP systems were mainly designed for computers with relatively small memory. Driven by the advance of hardware, OLTP systems need to be redesigned for larger memory and multi-core environments. Compared to disk-resident systems, in-memory systems have significant performance advantages, from the perspectives of both transaction throughput and query latency. Their performance is no longer limited by disk I/Os. Instead, the efficiency and scalability over multi-core CPUs become more important. In this paper, we survey and summarize a wide spectrum of design and implementation considerations that may affect the efficiency or scalability of an in-memory OLTP system. These considerations are concerned with most of the main components of databases, including concurrency control, logging, indexing and transaction compilation. For each of the components, we provide some in-depth analysis based on recent research works. This survey also aims to provide some guidance for designing or implementing high-performance in-memory OLTP systems.
Similar content being viewed by others
Notes
Which is not visible at the transactional level.
For in-place updates, the state of data is likely to be inconsistent in the caches of multiple processors.
The project is ended in 2016.
References
Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P (1992) Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. TODS 17(1):94–162
Alibaba single day record. https://techcrunch.com/2017/11/11/alibaba-smashes-its-singles-day-record/. Accessed 2018
Färber F, Cha SK, Primsch J, Bornhövd C, Sigg S, Lehner W (2012) SAP HANA database: data management for modern business applications. SIGMOD Rec 40(4):45–51
Lee J, Kwon YS, Färber F, Muehle M, Lee C, Bensberg C, Lee JY, Lee AH, Lehner W (2013) SAP HANA distributed in-memory database system: transaction, session, and metadata management. In: ICDE. IEEE, pp 1165–1173
Stonebraker M, Weisberg A (2013) The VoltDB main memory DBMS. IEEE Data Eng Bull 36(2):21–27
Kemper A, Neumann T (2011) Hyper: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: ICDE. IEEE, pp 195–206
Faerber F, Kemper A, Larson P-Å, Levandoski J, Neumann T, Pavlo A et al (2017) Main memory database systems. Found Trends Databases 8(1–2):1–130
Zhang H, Chen G, Ooi BC, Tan K-L, Zhang M (2015) In-memory big data management and processing: a survey. IEEE Trans Knowl Data Eng 27(7):1920–1948
Kung H-T, Robinson JT (1981) On optimistic methods for concurrency control. TODS 6(2):213–226
Tu S, Zheng W, Kohler E, Liskov B, Madden S (2013) Speedy transactions in multicore in-memory databases. In: SOSP. ACM, pp 18–32
Diaconu C, Freedman C, Ismert E, Larson P-A, Mittal P, Stonecipher R, Verma N, Zwilling M (2013) Hekaton: SQL server’s memory-optimized OLTP engine. In: SIGMOD. ACM, pp 1243–1254
Larson P-Å, Blanas S, Diaconu C, Freedman C, Patel JM, Zwilling M (2011) High-performance concurrency control mechanisms for main-memory databases. Proc VLDB Endow 5(4):298–309
Yu X, Pavlo A, Sanchez D, Devadas S (2016) TicToc: time traveling optimistic concurrency control. In: SIGMOD, vol 8, pp 209–220
Wu Y, Chan C-Y, Tan K-L (2016) Transaction healing: scaling optimistic concurrency control on multicores. In: SIGMOD. ACM, pp 1689–1704
Neumann T, Mühlbauer T, Kemper A (2015) Fast serializable multi-version concurrency control for main-memory database systems. In: SIGMOD. ACM, pp 677–689
Loesing S, Pilman M, Etter T, Kossmann D (2015) On the design and scalability of distributed shared-data databases. In: SIGMOD. ACM, pp 663–676
Jordan J, Banerjee J, Batman R (1981) Precision locks. In: SIGMOD. ACM, pp 143–147
Zheng W, Tu S, Kohler E, Liskov B (2014) Fast databases with fast durability and recovery through multicore parallelism. In: OSDI, pp 465–477
Eswaran KP, Gray JN, Lorie RA, Traiger IL (1976) The notions of consistency and predicate locks in a database system. Commun ACM 19(11):624–633
Johnson R, Pandis I, Ailamaki A (2009) Improving OLTP scalability using speculative lock inheritance. Proc VLDB Endow 2(1):479–489
Jung H, Han H, Fekete A, Heiser G, Yeom HY (2014) A scalable lock manager for multicores. TODS 39(4):29
Harizopoulos S, Abadi DJ, Madden S, Stonebraker M (2008) OLTP through the looking glass, and what we found there. In: SIGMOD. ACM, pp 981–992
Ren K, Thomson A, Abadi DJ (2012) Lightweight locking for main memory database systems. In: VLDB. vol 6, pp 145–156
Pandis I, Johnson R, Hardavellas N, Ailamaki A (2010) Data-oriented transaction execution. Proc VLDB Endow 3(1–2):928–939
Xie C, Su C, Littley C, Alvisi L, Kapritsos M, Wang Y (2015) High-performance acid via modular concurrency control. In: SOSP. ACM, pp 279–294
Narula N, Cutler C, Kohler E, Morris R (2014) Phase reconciliation for contended in-memory transactions. In: OSDI, pp 511–524
Shasha D, Llirbat F, Simon E, Valduriez P (1995) Transaction chopping: algorithms and performance studies. TODS 20(3):325–363
Yu X, Bezerra G, Pavlo A, Devadas S, Stonebraker M (2014) Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proc VLDB Endow 8(3):209–220
Agrawal R, Carey MJ, Livny M (1987) Concurrency control performance modeling: alternatives and implications. TODS 12(4):609–654
Thomasian A (1993) Two-phase locking performance and its thrashing behavior. TODS 18(4):579–625
Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P (2007) The end of an architectural era: (it’s time for a complete rewrite). In: VLDB. VLDB Endowment, pp 1150–1160
Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones EP, Madden S, Stonebraker M, Zhang Y et al (2008) H-store: a high-performance, distributed main memory transaction processing system. VLDB 1(2):1496–1499
Faleiro JM, Thomson A, Abadi DJ (2014) Lazy evaluation of transactions in database systems. In: SIGMOD. ACM, pp 15–26
Thomson A, Abadi DJ (2010) The case for determinism in database systems. Proc VLDB Endow 3(1–2):70–80
Thomson A, Diamond T, Weng S-C, Ren K, Shao P, Abadi DJ (2012) CalvIn: fast distributed transactions for partitioned database systems. In: SIGMOD, pp 1–12
Jones EP, Abadi DJ, Madden S (2010) Low overhead concurrency control for partitioned main memory databases. In: SIGMOD. ACM, pp 603–614
Pavlo A, Jones EP, Zdonik S (2011) On predictive modeling for optimizing transaction execution in parallel OLTP systems. Proc VLDB Endow 5(2):85–96
Wu Y, Arulraj J, Lin J, Xian R, Pavlo A (2017) An empirical evaluation of in-memory multi-version concurrency control. Proc VLDB Endow 10(7):781–792
Weikum G, Vossen G (2001) Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier, Amsterdam
Diaconu C, Freedman C, Ismert E, Larson P-Å, Mittal P et al (2013) Hekaton: SQL server’s memory-optimized OLTP engine. In: SIGMOD, pp 1243–1254
Berenson H, Bernstein P, Gray J, Melton J, O’Neil E, O’Neil P (1995) A critique of ANSI SQL isolation levels. SIGMOD Rec 24:1–10
Fekete A, Liarokapis D, O’Neil E, O’Neil P, Shasha D (2005) Making snapshot isolation serializable. TODS 30(2):492–528
Jorwekar S, Fekete A, Ramamritham K, Sudarshan S (2007) Automating the detection of snapshot isolation anomalies. In: VLDB, pp 1263–1274
Cahill MJ, Röhm U, Fekete AD (2009) Serializable isolation for snapshot databases. DMoNH 34(4):20
Revilak S, O’Neil P, O’Neil E (2011) Precisely serializable snapshot isolation (PSSI). In: ICDE. IEEE, pp 482–493
Ports DR, Grittner K (2012) Serializable snapshot isolation in PostgreSQL. Proc VLDB Endow 5(12):1850–1861
Wang T, Johnson R, Fekete A, Pandis I (2015) The serial safety net: efficient concurrency control on modern hardware. In: DMoNH. ACM, p 8
Adya A, Liskov BH (1999) Weak consistency: a generalized theory and optimistic implementations for distributed transactions. Doctoral dissertation, Massachusetts Institute of Technology
Kim K, Wang T, Johnson R, Pandis I (2016) ERMIA: fast memory-optimized database system for heterogeneous workloads. In: SIGMOD. ACM, pp 1675–1687
Jung H, Han H, Fekete A, Röhm U, Yeom HY (2013) Performance of serializable snapshot isolation on multicore servers. In: DASFAA (2), Lecture Notes in Computer Science. Springer, pp 416–430
Han H, Park S, Jung H, Fekete A, Rohm U, Yeom HY (2014) Scalable serializable snapshot isolation for multicore systems. In: ICDE
Malviya N, Weisberg A, Madden S, Stonebraker M (2014) Rethinking main memory OLTP recovery. In: ICDE. IEEE, pp 604–615
Yao C, Agrawal D, Chen G, Ooi BC, Wu S (2016) Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases. In: SIGMOD. ACM, pp 1119–1134
Johnson R, Pandis I, Stoica R, Athanassoulis M, Ailamaki A (2010) Aether: a scalable approach to logging. Proc VLDB Endow 3(1–2):681–692
Wang T, Johnson R (2014) Scalable logging through emerging non-volatile memory. Proc VLDB Endow 7(10):865–876
Helland P, Sammer H, Lyon J, Carr R, Garrett P, Reuter A (1989) Group commit timers and high volume transaction systems. In: High performance transaction systems. Springer, pp 301–329
Jung H, Han H, Kang S (2017) Scalable database logging for multicores. PVLDB 11(2):135–148
Fang R, Hsiao H-I, He B, Mohan C, Wang Y (2011) High performance database logging using storage class memory. In: ICDE. IEEE, pp 1221–1231
Huang J, Schwan K, Qureshi MK (2014) NVRAM-aware logging in transaction systems. Proc VLDB Endow 8(4):389–400
NVM (2018) https://en.wikipedia.org/wiki/Non-volatile_memory. Accessed 2018
Wu Y, Guo W, Chan C, Tan K (2017) Fast failure recovery for main-memory DBMSs on multicores. In: SIGMOD, pp 267–281
Arulraj J, Perron M, Pavlo A (2016) Write-behind logging. Proc VLDB Endow 10(4):337–348
Comer D (1979) The ubiquitous b-tree. ACM Comput Surv 11(2):121–137
Rao J, Ross KA (2000) Making b+-trees cache conscious in main memory. SIGMOD Rec 29:475–486
Rao J, Ross KA (1999) Cache conscious indexing for decision-support in main memory. In: VLDB, pp 78–89
Schlegel B, Gemulla R, Lehner W (2009) K-ary search on modern processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM, New York, pp 52–60
Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P (2010) Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In: SIGMOD, ACM, pp 339–350
Mao Y, Kohler E, Morris RT (2012) Cache craftiness for fast multicore key-value storage. In: EuroSys. ACM, pp 183–196
Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N (2018) The case for learned index structures. In: SIGMOD, pp 489–504
Leis V, Kemper A, Neumann T (2013) The adaptive radix tree: artful indexing for main-memory databases. In: ICDE. IEEE, pp 38–49
Böhm M, Schlegel B, Volk PB, Fischer U, Habich D, Lehner W (2011) Efficient in-memory indexing with generalized prefix trees. In: BTW, Germany, pp 227–246
Zhang H, Andersen DG, Pavlo A, Kaminsky M, Ma L, Shen R (2016) Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In: SIGMOD. ACM, pp 1567–1581
Wang Z, Pavlo A, Lim H, Leis V, Zhang H, Kaminsky M, Andersen DG (2018) Building a Bw-tree takes more than just buzz words. In: SIGMOD conference. ACM, pp 473–488
Binna R, Zangerle E, Pichl M, Specht G, Leis V (2018) HOT: a height optimized Trie index for main-memory database systems. In: SIGMOD, pp 521–534
Zhang H, Lim H, Leis V, Andersen DG, Kaminsky M, Keeton K, Pavlo A (2018) Surf: practical range query filtering with fast succinct tries. In: SIGMOD, pp 323–336
Jacobson G (1989) Space-efficient static trees and graphs. In: 30th annual symposium on foundations of computer science, Research Triangle Park, NC, USA, 30 Oct–1 Nov 1989, pp 549–554
Stoica R, Ailamaki A (2013) Enabling efficient OS paging for main-memory OLTP databases. In: DaMoN, p 7
Funke F, Kemper A, Neumann T (2012) Compacting transactional data in hybrid OLTP&OLAP databases. Proc VLDB Endow 5(11):1424–1435
Eldawy A, Levandoski J, Larson P-Å (2014) Trekking through siberia: managing cold data in a memory-optimized database. Proc VLDB Endow 7(11):931–942
Levandoski JJ, Larson P-Å, Stoica R (2013) Identifying hot and cold data in main-memory databases. In: ICDE. IEEE, pp 26–37
Cache replacement. https://en.wikipedia.org/wiki/cache-replacement-policies. Accessed 2018
DeBrabant J, Pavlo A, Tu S, Stonebraker M, Zdonik S (2013) Anti-caching: a new approach to database management system architecture. Proc VLDB Endow 6(14):1942–1953
Lang H, Mühlbauer T, Funke F, Boncz PA, Neumann T, Kemper A (2016) Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD. ACM, pp 311–326
Levandoski JJ, Lomet DB, Sengupta S (2013) The Bw-tree: a b-tree for new hardware platforms. In: ICDE. IEEE, pp 302–313
Graefe G (2010) A survey of b-tree locking techniques. TODS 35(3):16
Lehman PL et al (1981) Efficient locking for concurrent operations on b-trees. TODS 6(4):650–670
Cha SK, Hwang S, Kim K, Kwon K (2001) Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In: VLDB, vol 1, pp 181–190
Sewall J, Chhugani J, Kim C, Satish N, Dubey P (2011) PALM: parallel architecture-friendly latch-free modifications to b+ trees on many-core processors. Proc VLDB Endow 4(11):795–806
Neumann T (2011) Efficiently compiling efficient query plans for modern hardware. Proc VLDB Endow 4(9):539–550
Graefe G (1994) Volcano: an extensible and parallel query evaluation system. IEEE Trans Knowl Data Eng 6(1):120–135
Yan C, Cheung A (2016) Leveraging lock contention to improve OLTP application performance. Proc VLDB Endow 9(5):444–455
Wang Z, Mu S, Cui Y, Yi H, Chen H, Li J (2016) Scaling multicore databases via constrained parallel execution. In: SIGMOD. ACM, pp 1643–1658
Wang Z, Mu S, Cui Y, Yi H, Chen H, Li J (2016) Scaling multicore databases via constrained parallel execution. In: SIGMOD, ACM. New York, NY, pp 1643–1658
Mu S, Cui Y, Zhang Y, Lloyd W, Li J (2014) Extracting more concurrency from distributed transactions. In: OSDI, pp 479–494
Curino C, Jones E, Zhang Y, Madden S (2010) Schism: a workload-driven approach to database replication and partitioning. Proc VLDB Endow 3(1–2):48–57
Pavlo A, Curino C, Zdonik S (2012) Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD. ACM, pp 61–72
Taft R, Mansour E, Serafini M, Duggan J, Elmore AJ, Aboulnaga A, Pavlo A, Stonebraker M (2014) E-store: fine-grained elastic partitioning for distributed transaction processing systems. Proc VLDB Endow 8(3):245–256
Two-phase commit protocol. https://en.wikipedia.org/wiki/two-phase-commit-protocol. Accessed 2018
Lin Q, Chang P, Chen G, Ooi BC, Tan K-L, Wang Z (2016) Towards a non-2pc transaction management in distributed database systems. In: SIGMOD
Ren K, Thomson A, Abadi DJ (2014) An evaluation of the advantages and disadvantages of deterministic database systems. Proc VLDB Endow 7(10):821–832
Ailamaki A, DeWitt DJ, Hill MD, Wood DA (1999) DBMSs on a modern processor: where does time go? In: VLDB, pp 266–277
Sirin U, Tözün P, Porobic D, Ailamaki A (2016) Micro-architectural analysis of in-memory OLTP. In: SIGMOD, Vol. 215922
Tözün P, Gold B, Ailamaki A (2013) OLTP in wonderland: where do cache misses come from in major OLTP components?. In: DaMoN. ACM, p 8
Miller JE, Kasture H, Kurian G, Gruenwald C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: HPCA-16. IEEE, pp 1–12
Salomie T-I, Subasu IE, Giceva J, Alonso G (2011) Database engines on multicores, why parallelize when you can distribute? In: EuroSys. ACM, pp 17–30
Appuswamy R, Anadiotis A, Porobic D, Iman M, Ailamaki A (2017) Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads. Proc VLDB Endow 11(2):121–134
Mühlbauer T, Rödiger W, Reiser A, Kemper A, Neumann T (2013) ScyPer: elastic OLAP throughput on transactional data. In: DanaC. ACM, pp 11–15
Lahiri T, Neimat M-A, Folkman S (2013) Oracle TimesTen: an in-memory database for enterprise applications. IEEE Data Eng Bull 36(2):6–13
Lindström J, Raatikka V, Ruuth J, Soini P, Vakkila K (2013) IBM solidDB: in-memory database optimized for extreme speed and availability. IEEE Data Eng Bull 36(2):14–20
MemSQL Inc., MemSQL. http://www.memsql.com. Accessed 2018
Freedman C, Ismert E, Larson P-Å (2014) Compilation in the Microsoft SQL Server Hekaton engine. IEEE Data Eng Bull 37(1):22–30
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO. IEEE, pp 75–86
Fredkin E (1960) Trie memory. CACM 3(9):490–499
Wolski A, Raatikka V (2006) Performance measurement and tuning of hot-standby databases. In: International service availability symposium. Springer, pp 149–161
Chan C-Y, Ioannidis YE (1998) Bitmap index design and evaluation. SIGMOD Rec 27:355–366
Bernstein PA, Hadzilacos V, Goodman N (1987) Concurrency control and recovery in database systems. Addison-Wesley, Boston
Altibase. Altibase administrator’s manual release
Pavlo A, Angulo G, Arulraj J, Lin H, Lin J, Ma L, Menon P, Mowry TC, Perron M, Quah I, Santurkar S, Tomasic A, Toor S, Aken DV, Wang Z, Wu Y, Xian R, Zhang T (2017) Self-driving database management systems. In: CIDR. www.cidrdb.org
Peloton. https://github.com/cmu-db/peloton. Accessed 2018
Sikka V, Färber F, Lehner W, Cha SK, Peh T, Bornhövd C (2012) Efficient transaction processing in SAP HANA database: the end of a column store myth. In: SIGMOD. ACM, pp 731–742
Wang T, Kimura H (2016) Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proc VLDB Endw 10(2):49–60
Pavlo A (2017) What are we doing with our lives? Nobody cares about our concurrency control research. In: SIGMOD, p 3
Zhu T, Wang D, Hu H, Qian W, Wang X, Zhou A (2018) Interactive transaction processing for in-memory database system. In: DASFAA, part II, pp 228–246
Gray J et al (1996) The dangers of replication and a solution. SIGMOD Rec 25(2):173–182
Decandia G et al (2007) Dynamo: Amazon’s highly available key-value store. In: SOSP
Cassandra website. http://cassandra.apache.org/. Accessed 2018
Bailis P, Davidson A, Fekete A et al (2013) Highly available transactions: virtues and limitations. Proc VLDB Endow 7(3):181–192
Lamport L (1998) The part-time parliament. TOCS 16(2):133–169
Lamport L (2001) Paxos made simple. ACM SIGACT News 32(4):18–25
Baker J, Bond C, Corbett JC et al (2011) Megastore: providing scalable, highly available storage for interactive services. In: CIDR, pp 223–234
Corbett JC, Jeffrey D et al (2013) Spanner: Googles globally distributed database. TOCS 31(3):8
Rao J, Shekita EJ, Tata S (2011) Using paxos to build a scalable, consistent, and highly available datastore. In: VLDB, pp 243–254
Acknowledgements
This work is supported by National Science Foundation of China under Grant Numbers 61702189, 61672232, 61772202 and Youth Science and Technology - Yang Fan Program of Shanghai under Grant Number 17YF1427800.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, H., Zhou, X., Zhu, T. et al. In-memory transaction processing: efficiency and scalability considerations. Knowl Inf Syst 61, 1209–1240 (2019). https://doi.org/10.1007/s10115-019-01340-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01340-7