Skip to main content
Log in

Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)

  2. Agrawal R., Carey M.J., Livny M.: Concurrency control performance modeling: alternatives and implications. ACM TODS 12(4), 609–654 (1987)

    Article  Google Scholar 

  3. Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)

  4. Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: MICRO, pp. 319–330 (2004)

  5. Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: SPAA, pp. 228–237 (2005)

  6. Bernstein P.A., Goodman N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency control and recovery in database systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1987)

  8. Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)

  9. Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, pp. 73–86 (2009)

  10. Chen, S., Gibbons, P.B., Mowry, T.C., Valentin, G.: Fractal prefetching B+-Trees: optimizing both cache and disk performance. In: SIGMOD, pp. 157–168 (2002)

  11. Curino C., Jones E., Zhang Y., Madden S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)

    Google Scholar 

  12. Das S., Antony S., Agrawal D., El Abbadi A.: Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB 2, 217–228 (2009)

    Google Scholar 

  13. Dewitt D.J., Ghandeharizadeh S., Schneider D.A., Bricker A., Hsiao H.i., Rasmussen R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. TKDE 2(1), 44–62 (1990)

    Article  Google Scholar 

  14. Donjerkovic, D., Ioannidis, Y.E., Ramakrishnan, R.: Dynamic histograms: Capturing evolving data sets. In: ICDE, p. 86 (2000)

  15. Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM TODS 27, 261–298 (2002)

    Article  Google Scholar 

  16. Graefe, G.: Sorting and indexing with partitioned B-trees. In: CIDR, pp. 1–13 (2003)

  17. Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)

  18. Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)

  19. Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)

  20. Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)

  21. Hill M.D., Marty M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)

    Article  Google Scholar 

  22. Jaluta I., Sippu S., Soisalon-Soininen E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)

    Article  Google Scholar 

  23. Johnson R., Pandis I., Ailamaki A.: Improving OLTP scala- bility using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)

    Google Scholar 

  24. Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)

  25. Johnson R., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)

    Google Scholar 

  26. Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)

  27. Kung H.T., Robinson J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)

    Article  Google Scholar 

  28. Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: Extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)

  29. Larson P.A., Blanas S., Diaconu C., Freedman C., Patel J.M., Zwilling M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)

    Google Scholar 

  30. Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)

  31. Lightstone, S., Surendra, M., Diao, Y., Parekh, S.S., Hellerstein, J.L., Rose, K., Storm, A.J., Garcia-Arellano, C.: Control theory: a foundational technique for self managing databases. In: ICDE Workshops, pp. 395–403 (2007)

  32. Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical Report CRL-92-4, Dec (1992)

  33. Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)

  34. Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)

  35. Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)

  36. Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: GIS, pp. 28–33 (2001)

  37. Muth P., O’Neil P., Pick A., Weikum G.: The LHAM log-structured history data access method. VLDB J. 8, 199–221 (2000)

    Article  Google Scholar 

  38. Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom Application Transaction Processing Benchmark (TATP). http://tatpbenchmark.sourceforge.net/ (2009)

  39. Oracle: Oracle real application clusters. Available at http://www.orace.com/technetwork/database/clustering/overview

  40. Pandis I., Johnson R., Hardavellas N., Ailamaki A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)

    Google Scholar 

  41. Pandis, I., Tözün, P., Branco, M., Karampinas, D., Porobic, D., Johnson, R., Ailamaki, A.: A data-oriented transaction execution engine and supporting tools. In: SIGMOD, pp. 1237–1240 (2011)

  42. Pandis I., Tözün P., Johnson R., Ailamaki A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)

    Google Scholar 

  43. Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)

  44. Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 475–486 (2000)

  45. Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD, pp. 558–569 (2002)

  46. Sewall J., Chhugani J., Kim C., Satish N., Dubey P.: PALM: Parallel architecture-friendly latch-free modifications to b+trees on many-core processors. PVLDB 4(11), 795–806 (2011)

    Google Scholar 

  47. Somogyi, S., Wenisch, T.F., Ailamaki, A., Falsafi, B.: Spatio-temporal memory streaming. In: ISCA, pp. 69–80 (2009)

  48. Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)

  49. Stonebraker M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)

    Google Scholar 

  50. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

  51. Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)

  52. Thomasian A.: Concurrency control: methods, performance, and analysis. ACM Comput. Surv. 30, 70–119 (1998)

    Article  Google Scholar 

  53. TPC: TPC benchmark B standard specification, revision 2.0 (1994). Available at http://www.tpc.org/tpcb

  54. TPC: TPC benchmark C (OLTP) standard specification, revision 5.11 (2010). Available at http://www.tpc.org/tpcc

  55. Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: ICDE, pp. 1127–1138 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pınar Tözün.

Additional information

Ippokratis Pandis and Ryan Johnson: work done while author affiliated with CMU and EPFL.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tözün, P., Pandis, I., Johnson, R. et al. Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. The VLDB Journal 22, 151–175 (2013). https://doi.org/10.1007/s00778-012-0278-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0278-6

Keywords

Navigation