Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

Tözün, Pınar; Pandis, Ippokratis; Johnson, Ryan; Ailamaki, Anastasia

doi:10.1007/s00778-012-0278-6

Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

Regular Paper
Published: 26 June 2012

Volume 22, pages 151–175, (2013)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Pınar Tözün¹,
Ippokratis Pandis²,
Ryan Johnson³ &
…
Anastasia Ailamaki¹

380 Accesses
17 Citations
Explore all metrics

Abstract

Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-memory transaction processing: efficiency and scalability considerations

Article 13 February 2019

bCATE: A Balanced Contention-Aware Transaction Execution Model for Highly Concurrent OLTP Systems

Characterization of the Impact of Hardware Islands on OLTP

Article 29 December 2015

References

Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)
Agrawal R., Carey M.J., Livny M.: Concurrency control performance modeling: alternatives and implications. ACM TODS 12(4), 609–654 (1987)
Article Google Scholar
Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)
Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: MICRO, pp. 319–330 (2004)
Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: SPAA, pp. 228–237 (2005)
Bernstein P.A., Goodman N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)
Article MathSciNet MATH Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency control and recovery in database systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1987)
Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)
Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, pp. 73–86 (2009)
Chen, S., Gibbons, P.B., Mowry, T.C., Valentin, G.: Fractal prefetching B+-Trees: optimizing both cache and disk performance. In: SIGMOD, pp. 157–168 (2002)
Curino C., Jones E., Zhang Y., Madden S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)
Google Scholar
Das S., Antony S., Agrawal D., El Abbadi A.: Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB 2, 217–228 (2009)
Google Scholar
Dewitt D.J., Ghandeharizadeh S., Schneider D.A., Bricker A., Hsiao H.i., Rasmussen R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. TKDE 2(1), 44–62 (1990)
Article Google Scholar
Donjerkovic, D., Ioannidis, Y.E., Ramakrishnan, R.: Dynamic histograms: Capturing evolving data sets. In: ICDE, p. 86 (2000)
Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM TODS 27, 261–298 (2002)
Article Google Scholar
Graefe, G.: Sorting and indexing with partitioned B-trees. In: CIDR, pp. 1–13 (2003)
Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)
Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)
Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)
Hill M.D., Marty M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Article Google Scholar
Jaluta I., Sippu S., Soisalon-Soininen E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)
Article Google Scholar
Johnson R., Pandis I., Ailamaki A.: Improving OLTP scala- bility using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)
Google Scholar
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)
Johnson R., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)
Google Scholar
Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)
Kung H.T., Robinson J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)
Article Google Scholar
Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: Extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)
Larson P.A., Blanas S., Diaconu C., Freedman C., Patel J.M., Zwilling M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)
Google Scholar
Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)
Lightstone, S., Surendra, M., Diao, Y., Parekh, S.S., Hellerstein, J.L., Rose, K., Storm, A.J., Garcia-Arellano, C.: Control theory: a foundational technique for self managing databases. In: ICDE Workshops, pp. 395–403 (2007)
Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical Report CRL-92-4, Dec (1992)
Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)
Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)
Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: GIS, pp. 28–33 (2001)
Muth P., O’Neil P., Pick A., Weikum G.: The LHAM log-structured history data access method. VLDB J. 8, 199–221 (2000)
Article Google Scholar
Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom Application Transaction Processing Benchmark (TATP). http://tatpbenchmark.sourceforge.net/ (2009)
Oracle: Oracle real application clusters. Available at http://www.orace.com/technetwork/database/clustering/overview
Pandis I., Johnson R., Hardavellas N., Ailamaki A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)
Google Scholar
Pandis, I., Tözün, P., Branco, M., Karampinas, D., Porobic, D., Johnson, R., Ailamaki, A.: A data-oriented transaction execution engine and supporting tools. In: SIGMOD, pp. 1237–1240 (2011)
Pandis I., Tözün P., Johnson R., Ailamaki A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)
Google Scholar
Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 475–486 (2000)
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD, pp. 558–569 (2002)
Sewall J., Chhugani J., Kim C., Satish N., Dubey P.: PALM: Parallel architecture-friendly latch-free modifications to b+trees on many-core processors. PVLDB 4(11), 795–806 (2011)
Google Scholar
Somogyi, S., Wenisch, T.F., Ailamaki, A., Falsafi, B.: Spatio-temporal memory streaming. In: ISCA, pp. 69–80 (2009)
Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)
Stonebraker M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Thomasian A.: Concurrency control: methods, performance, and analysis. ACM Comput. Surv. 30, 70–119 (1998)
Article Google Scholar
TPC: TPC benchmark B standard specification, revision 2.0 (1994). Available at http://www.tpc.org/tpcb
TPC: TPC benchmark C (OLTP) standard specification, revision 5.11 (2010). Available at http://www.tpc.org/tpcc
Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: ICDE, pp. 1127–1138 (2011)

Download references

Author information

Authors and Affiliations

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, VD, Switzerland
Pınar Tözün & Anastasia Ailamaki
IBM Almaden Research Center, San Jose, CA, USA
Ippokratis Pandis
Department of Computer Science, University of Toronto, Toronto, ON, Canada
Ryan Johnson

Authors

Pınar Tözün
View author publications
You can also search for this author in PubMed Google Scholar
Ippokratis Pandis
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Ailamaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pınar Tözün.

Additional information

Ippokratis Pandis and Ryan Johnson: work done while author affiliated with CMU and EPFL.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tözün, P., Pandis, I., Johnson, R. et al. Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. The VLDB Journal 22, 151–175 (2013). https://doi.org/10.1007/s00778-012-0278-6

Download citation

Received: 07 September 2011
Revised: 21 January 2012
Accepted: 02 May 2012
Published: 26 June 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s00778-012-0278-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

Abstract

Access this article

Similar content being viewed by others

In-memory transaction processing: efficiency and scalability considerations

bCATE: A Balanced Contention-Aware Transaction Execution Model for Highly Concurrent OLTP Systems

Characterization of the Impact of Hardware Islands on OLTP

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

Abstract

Access this article

Similar content being viewed by others

In-memory transaction processing: efficiency and scalability considerations

bCATE: A Balanced Contention-Aware Transaction Execution Model for Highly Concurrent OLTP Systems

Characterization of the Impact of Hardware Islands on OLTP

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation