The VLDB Journal

, Volume 22, Issue 2, pp 151–175 | Cite as

Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

  • Pınar TözünEmail author
  • Ippokratis Pandis
  • Ryan Johnson
  • Anastasia Ailamaki
Regular Paper


Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalances.


Physiological partitioning PLP Multi-rooted B+Trees MRBtree Dynamic load balancing Re-partitioning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)Google Scholar
  2. 2.
    Agrawal R., Carey M.J., Livny M.: Concurrency control performance modeling: alternatives and implications. ACM TODS 12(4), 609–654 (1987)CrossRefGoogle Scholar
  3. 3.
    Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)Google Scholar
  4. 4.
    Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: MICRO, pp. 319–330 (2004)Google Scholar
  5. 5.
    Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: SPAA, pp. 228–237 (2005)Google Scholar
  6. 6.
    Bernstein P.A., Goodman N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency control and recovery in database systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1987)Google Scholar
  8. 8.
    Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)Google Scholar
  9. 9.
    Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, pp. 73–86 (2009)Google Scholar
  10. 10.
    Chen, S., Gibbons, P.B., Mowry, T.C., Valentin, G.: Fractal prefetching B+-Trees: optimizing both cache and disk performance. In: SIGMOD, pp. 157–168 (2002)Google Scholar
  11. 11.
    Curino C., Jones E., Zhang Y., Madden S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)Google Scholar
  12. 12.
    Das S., Antony S., Agrawal D., El Abbadi A.: Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB 2, 217–228 (2009)Google Scholar
  13. 13.
    Dewitt D.J., Ghandeharizadeh S., Schneider D.A., Bricker A., Hsiao H.i., Rasmussen R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. TKDE 2(1), 44–62 (1990)CrossRefGoogle Scholar
  14. 14.
    Donjerkovic, D., Ioannidis, Y.E., Ramakrishnan, R.: Dynamic histograms: Capturing evolving data sets. In: ICDE, p. 86 (2000)Google Scholar
  15. 15.
    Gibbons P.B., Matias Y., Poosala V.: Fast incremental maintenance of approximate histograms. ACM TODS 27, 261–298 (2002)CrossRefGoogle Scholar
  16. 16.
    Graefe, G.: Sorting and indexing with partitioned B-trees. In: CIDR, pp. 1–13 (2003)Google Scholar
  17. 17.
    Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)Google Scholar
  18. 18.
    Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)Google Scholar
  19. 19.
    Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)Google Scholar
  20. 20.
    Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)Google Scholar
  21. 21.
    Hill M.D., Marty M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)CrossRefGoogle Scholar
  22. 22.
    Jaluta I., Sippu S., Soisalon-Soininen E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)CrossRefGoogle Scholar
  23. 23.
    Johnson R., Pandis I., Ailamaki A.: Improving OLTP scala- bility using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)Google Scholar
  24. 24.
    Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)Google Scholar
  25. 25.
    Johnson R., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)Google Scholar
  26. 26.
    Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)Google Scholar
  27. 27.
    Kung H.T., Robinson J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)CrossRefGoogle Scholar
  28. 28.
    Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: Extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)Google Scholar
  29. 29.
    Larson P.A., Blanas S., Diaconu C., Freedman C., Patel J.M., Zwilling M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)Google Scholar
  30. 30.
    Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)Google Scholar
  31. 31.
    Lightstone, S., Surendra, M., Diao, Y., Parekh, S.S., Hellerstein, J.L., Rose, K., Storm, A.J., Garcia-Arellano, C.: Control theory: a foundational technique for self managing databases. In: ICDE Workshops, pp. 395–403 (2007)Google Scholar
  32. 32.
    Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical Report CRL-92-4, Dec (1992)Google Scholar
  33. 33.
    Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)Google Scholar
  34. 34.
    Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)Google Scholar
  35. 35.
    Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)Google Scholar
  36. 36.
    Mondal, A., Kitsuregawa, M., Ooi, B.C., Tan, K.L.: R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In: GIS, pp. 28–33 (2001)Google Scholar
  37. 37.
    Muth P., O’Neil P., Pick A., Weikum G.: The LHAM log-structured history data access method. VLDB J. 8, 199–221 (2000)CrossRefGoogle Scholar
  38. 38.
    Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom Application Transaction Processing Benchmark (TATP). (2009)
  39. 39.
    Oracle: Oracle real application clusters. Available at
  40. 40.
    Pandis I., Johnson R., Hardavellas N., Ailamaki A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)Google Scholar
  41. 41.
    Pandis, I., Tözün, P., Branco, M., Karampinas, D., Porobic, D., Johnson, R., Ailamaki, A.: A data-oriented transaction execution engine and supporting tools. In: SIGMOD, pp. 1237–1240 (2011)Google Scholar
  42. 42.
    Pandis I., Tözün P., Johnson R., Ailamaki A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)Google Scholar
  43. 43.
    Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)Google Scholar
  44. 44.
    Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 475–486 (2000)Google Scholar
  45. 45.
    Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD, pp. 558–569 (2002)Google Scholar
  46. 46.
    Sewall J., Chhugani J., Kim C., Satish N., Dubey P.: PALM: Parallel architecture-friendly latch-free modifications to b+trees on many-core processors. PVLDB 4(11), 795–806 (2011)Google Scholar
  47. 47.
    Somogyi, S., Wenisch, T.F., Ailamaki, A., Falsafi, B.: Spatio-temporal memory streaming. In: ISCA, pp. 69–80 (2009)Google Scholar
  48. 48.
    Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)Google Scholar
  49. 49.
    Stonebraker M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)Google Scholar
  50. 50.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)Google Scholar
  51. 51.
    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)Google Scholar
  52. 52.
    Thomasian A.: Concurrency control: methods, performance, and analysis. ACM Comput. Surv. 30, 70–119 (1998)CrossRefGoogle Scholar
  53. 53.
    TPC: TPC benchmark B standard specification, revision 2.0 (1994). Available at
  54. 54.
    TPC: TPC benchmark C (OLTP) standard specification, revision 5.11 (2010). Available at
  55. 55.
    Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: ICDE, pp. 1127–1138 (2011)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Pınar Tözün
    • 1
    Email author
  • Ippokratis Pandis
    • 2
  • Ryan Johnson
    • 3
  • Anastasia Ailamaki
    • 1
  1. 1.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
  2. 2.IBM Almaden Research CenterSan JoseUSA
  3. 3.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations