Abstract
Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalances.
Similar content being viewed by others
References
Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)
Agrawal R., Carey M.J., Livny M.: Concurrency control performance modeling: alternatives and implications. ACM TODS 12(4), 609–654 (1987)
Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)
Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: MICRO, pp. 319–330 (2004)
Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: SPAA, pp. 228–237 (2005)
Bernstein P.A., Goodman N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency control and recovery in database systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1987)
Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)
Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, pp. 73–86 (2009)
Chen, S., Gibbons, P.B., Mowry, T.C., Valentin, G.: Fractal prefetching B+-Trees: optimizing both cache and disk performance. In: SIGMOD, pp. 157–168 (2002)
Curino C., Jones E., Zhang Y., Madden S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)
Das S., Antony S., Agrawal D., El Abbadi A.: Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB 2, 217–228 (2009)
Dewitt D.J., Ghandeharizadeh S., Schneider D.A., Bricker A., Hsiao H.i., Rasmussen R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. TKDE 2(1), 44–62 (1990)
Donjerkovic, D., Ioannidis, Y.E., Ramakrishnan, R.: Dynamic histograms: Capturing evolving data sets. In: ICDE, p. 86 (2000)
Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM TODS 27, 261–298 (2002)
Graefe, G.: Sorting and indexing with partitioned B-trees. In: CIDR, pp. 1–13 (2003)
Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)
Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)
Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)
Hill M.D., Marty M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Jaluta I., Sippu S., Soisalon-Soininen E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)
Johnson R., Pandis I., Ailamaki A.: Improving OLTP scala- bility using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)
Johnson R., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)
Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)
Kung H.T., Robinson J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)
Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: Extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)
Larson P.A., Blanas S., Diaconu C., Freedman C., Patel J.M., Zwilling M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)
Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)
Lightstone, S., Surendra, M., Diao, Y., Parekh, S.S., Hellerstein, J.L., Rose, K., Storm, A.J., Garcia-Arellano, C.: Control theory: a foundational technique for self managing databases. In: ICDE Workshops, pp. 395–403 (2007)
Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical Report CRL-92-4, Dec (1992)
Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)
Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)
Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: GIS, pp. 28–33 (2001)
Muth P., O’Neil P., Pick A., Weikum G.: The LHAM log-structured history data access method. VLDB J. 8, 199–221 (2000)
Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom Application Transaction Processing Benchmark (TATP). http://tatpbenchmark.sourceforge.net/ (2009)
Oracle: Oracle real application clusters. Available at http://www.orace.com/technetwork/database/clustering/overview
Pandis I., Johnson R., Hardavellas N., Ailamaki A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)
Pandis, I., Tözün, P., Branco, M., Karampinas, D., Porobic, D., Johnson, R., Ailamaki, A.: A data-oriented transaction execution engine and supporting tools. In: SIGMOD, pp. 1237–1240 (2011)
Pandis I., Tözün P., Johnson R., Ailamaki A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)
Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 475–486 (2000)
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD, pp. 558–569 (2002)
Sewall J., Chhugani J., Kim C., Satish N., Dubey P.: PALM: Parallel architecture-friendly latch-free modifications to b+trees on many-core processors. PVLDB 4(11), 795–806 (2011)
Somogyi, S., Wenisch, T.F., Ailamaki, A., Falsafi, B.: Spatio-temporal memory streaming. In: ISCA, pp. 69–80 (2009)
Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)
Stonebraker M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Thomasian A.: Concurrency control: methods, performance, and analysis. ACM Comput. Surv. 30, 70–119 (1998)
TPC: TPC benchmark B standard specification, revision 2.0 (1994). Available at http://www.tpc.org/tpcb
TPC: TPC benchmark C (OLTP) standard specification, revision 5.11 (2010). Available at http://www.tpc.org/tpcc
Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: ICDE, pp. 1127–1138 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Ippokratis Pandis and Ryan Johnson: work done while author affiliated with CMU and EPFL.
Rights and permissions
About this article
Cite this article
Tözün, P., Pandis, I., Johnson, R. et al. Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. The VLDB Journal 22, 151–175 (2013). https://doi.org/10.1007/s00778-012-0278-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0278-6