The VLDB Journal

, Volume 23, Issue 1, pp 1–23 | Cite as

Eliminating unscalable communication in transaction processing

  • Ryan JohnsonEmail author
  • Ippokratis Pandis
  • Anastasia Ailamaki
Regular Paper


Multicore hardware demands software parallelism. Transaction processing workloads typically exhibit high concurrency, and, thus, provide ample opportunities for parallel execution. Unfortunately, because of the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents; since it is notoriously difficult to implement high performing transaction processing systems that incur no communication whatsoever. As a result, transaction processing systems cannot always convert abundant, even embarrassing, request-level parallelism into execution parallelism due to communication bottlenecks. Transaction processing system designers must therefore find ways to achieve scalability while still allowing communication to occur. To this end, we identify three forms of communication in the system—unbounded, fixed, and cooperative—and argue that only the first type poses a fundamental threat to scalability. The other two types tend not impose obstacles to scalability, though they may reduce single-thread performance. We argue that proper analysis of communication patterns in any software system is a powerful tool for improving the system’s scalability. Then, we present and evaluate under a common framework techniques that attack significant sources of unbounded communication during transaction processing and sketch a solution for those that remain. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, physical page accesses, and buffer pool frame accesses. They either reduce such communication through caching, downgrade it to a less-threatening type, or eliminate it completely through system design. We find that the later technique, revisiting the transaction processing architecture, is the most effective. The final design cuts unbounded communication by roughly an order of magnitude compared with the baseline, while exhibiting better scalability on multicore machines.


Scalable OLTP Communication patterns Shore-MT SLI Aether DORA PLP Overlay Bufferpools 



The authors are deeply grateful for all the members of Carnegie Mellon’s StagedDB/CMP team and EPFL’s DIAS laboratory who made this work possible through their research efforts, helpful feedback, and encouragement. We are especially indebted to Pinar Tözün who helped us with an additional set of experiments. We would also like to thank the many anonymous reviewers whose thoughtful and constructive remarks helped improve both this paper and the research papers summarized here. This research was supported by an IBM PhD fellowship; grants and equipment from Intel and Sun; a Sloan research fellowships; an IBM faculty partnership award; NSF grants CCR-0205544, CCR-0509356, IIS-0133686, and IIS-0713409; an ESF EurYI award; and Swiss National Foundation funds.


  1. 1.
    Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)Google Scholar
  2. 2.
    Ailamaki, A., DeWitt, D.J., Hill, M.D.: Walking four machines by the shore. In: CAECW (2001)Google Scholar
  3. 3.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266–277 (1999)Google Scholar
  4. 4.
    Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)Google Scholar
  5. 5.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS, pp. 483–485 (1967)Google Scholar
  6. 6.
    Aspnes, J., Herlihy, M., Shavit, N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)Google Scholar
  7. 7.
    Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)Google Scholar
  8. 8.
    Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3–14 (1998)Google Scholar
  9. 9.
    Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)Google Scholar
  10. 10.
    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)Google Scholar
  11. 11.
    Bernstein, P.A., Goodman, N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)Google Scholar
  12. 12.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: PACT, pp. 72–81 (2008)Google Scholar
  13. 13.
    Carey, M.J., et al.: Shoring up persistent applications. In: SIGMOD, pp. 383–394 (1994)Google Scholar
  14. 14.
    Chang, F., et al.: Bigtable: A distributed storage system for structured data. In: OSDI, p. 15 (2006)Google Scholar
  15. 15.
    Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM TODS 32(3), 116–127 (2007)Google Scholar
  16. 16.
    Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: VLDB, pp. 339–350 (2007)Google Scholar
  17. 17.
    Clark, K.L., McCabe, F.G.: Go! a multi-paradigm programming language for implementing multi-threaded agents. Ann. Math. Artif. Intell. 41, 171–206 (2004)Google Scholar
  18. 18.
    Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)Google Scholar
  19. 19.
    Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. SIGMOD Rec. 16, 82–96 (1987)Google Scholar
  20. 20.
    Davis, J.D., Laudon, J., Olukotun, K.: Maximizing CMP throughput with mediocre cores. In: PACT, pp. 51–62(2005)Google Scholar
  21. 21.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, p. 10 (2004)Google Scholar
  22. 22.
    DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. SIGOPS OSR 41(6), 205–220 (2007)Google Scholar
  23. 23.
    Dewitt, D.J., et al.: The Gamma database machine project. IEEE TKDE 2(1), 44–62 (1990)Google Scholar
  24. 24.
    Dragojevic, A., Guerraoui, R., Kapalka, M.: Dividing transactional memories by zero. In: TRANSACT (2008)Google Scholar
  25. 25.
    Graefe, G.: Hierarchical locking in B-tree indexes. In: BTW, pp. 18–42 (2007)Google Scholar
  26. 26.
    Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD, pp. 173–182 (1996)Google Scholar
  27. 27.
    Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)Google Scholar
  28. 28.
    Hardavellas, N., et al.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)Google Scholar
  29. 29.
    Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)Google Scholar
  30. 30.
    Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: SIGMOD, pp. 383–394 (2005)Google Scholar
  31. 31.
    Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)Google Scholar
  32. 32.
    Helland, P., et al.: Group commit timers and high volume transaction systems. In: HPTS, pp. 301–329 (1989)Google Scholar
  33. 33.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)Google Scholar
  34. 34.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 21(2), 289–300 (1993)Google Scholar
  35. 35.
    Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)Google Scholar
  36. 36.
    Hunt, G.C., Larus, J.R.: Singularity: rethinking the software stack. SIGOPS OSR 41(2), 37–49 (2007)Google Scholar
  37. 37.
    Jaluta, I., Sippu, S., Soisalon-Soininen, E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)Google Scholar
  38. 38.
    Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)Google Scholar
  39. 39.
    Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)Google Scholar
  40. 40.
    Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)Google Scholar
  41. 41.
    Jorwekar, S., Fekete, A., Ramamritham, K., Sudarshan, S.: Automating the detection of snapshot isolation anomalies. In: VLDB, pp. 1263–1274 (2007)Google Scholar
  42. 42.
    Kemper, A., Neumann, T.: HyPer—a hybrid OLTP &OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)Google Scholar
  43. 43.
    Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)Google Scholar
  44. 44.
    Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)Google Scholar
  45. 45.
    Larson, P.A., et al.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)Google Scholar
  46. 46.
    Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS OSR 13, 3–19 (1979)Google Scholar
  47. 47.
    Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: ICDE, pp. 173–184 (2001)Google Scholar
  48. 48.
    Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)Google Scholar
  49. 49.
    Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4 (1990)Google Scholar
  50. 50.
    Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4 (1992)Google Scholar
  51. 51.
    Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: ISPP, pp. 165–171 (1994)Google Scholar
  52. 52.
    Mellor-Crummey, J.M., Scott, M.L.: Scalable reader-writer synchronization for shared-memory multiprocessors. SIGPLAN Not. 26(7), 106–113 (1991)Google Scholar
  53. 53.
    Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)Google Scholar
  54. 54.
    Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)Google Scholar
  55. 55.
    Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)Google Scholar
  56. 56.
    Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)Google Scholar
  57. 57.
    Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)Google Scholar
  58. 58.
    Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware islands. PVLDB 5(11), 1447–1458 (2012)Google Scholar
  59. 59.
    Ranganathan, P., Gharachorloo, K., Adve, S.V., Barroso, L.A.: Performance of database workloads on shared-memory systems with out-of-order processors. In: ASPLOS, pp. 307–318 (1998)Google Scholar
  60. 60.
    Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)Google Scholar
  61. 61.
    Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: SIGMOD, pp. 475–486 (2000)Google Scholar
  62. 62.
    Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204–213 (1995)Google Scholar
  63. 63.
    Smith, A.J.: Sequentiality and prefetching in database systems. ACM TODS 3, 223–247 (1978)Google Scholar
  64. 64.
    Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: ICDT, pp. 139–147 (1995)Google Scholar
  65. 65.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160 (2001)Google Scholar
  66. 66.
    Stonebraker, M.: The case for shared nothing. IEEE Database, Eng. Bull. 9, 4–9 (1986)Google Scholar
  67. 67.
    Stonebraker, M.: Stonebraker on nosql and enterprises. Commun. ACM 54, 10–11 (2011)Google Scholar
  68. 68.
    Stonebraker, M., et al.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)Google Scholar
  69. 69.
    Thomson, A., Abadi, D.J.: The case for determinism in database systems. PVLDB 3, 70–80 (2010)Google Scholar
  70. 70.
    Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J 1–25 (2012)Google Scholar
  71. 71.
    Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)Google Scholar
  72. 72.
    Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: SOSP, pp. 230–243 (2001)Google Scholar
  73. 73.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ryan Johnson
    • 1
    Email author
  • Ippokratis Pandis
    • 2
  • Anastasia Ailamaki
    • 3
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.IBM Almaden Research CenterSan JoseUSA
  3. 3.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations