The VLDB Journal

, Volume 21, Issue 2, pp 239–263 | Cite as

Scalability of write-ahead logging on multicore and multisocket hardware

  • Ryan JohnsonEmail author
  • Ippokratis Pandis
  • Radu Stoica
  • Manos Athanassoulis
  • Anastasia Ailamaki
Special Issue Paper


The shift to multi-core and multi-socket hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads making frequent small changes to data. In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20–69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks, in a single-socket multiprocessor server. Moreover, it achieves log insert throughput over 2.2 GB/s for small log records on the single-socket server, roughly 20 times higher than the traditional way of accessing the log using a single mutex. Furthermore, we investigate techniques on scaling the performance of logging to multi-socket servers. We present a set of optimizations which partly ameliorate the latency penalty that comes with multi-socket hardware, and then we investigate the feasibility of applying a distributed log buffer design at the socket level.


Log manager Early lock release Flush pipelining Log buffer contention Consolidation array Scaling to multisockets 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: understanding flash IO patterns. In: CIDR’09: Fourth Biennial Conference on Innovative Data Systems Research, pp. 48–54. Asilomar, USA (2009)Google Scholar
  2. 2.
    Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference (2004)Google Scholar
  3. 3.
    Carey, M.J., DeWitt, D.J., Franklin, M.J., Hall, N.E., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., Zwilling, M.J.: Shoring up persistent applications. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, USA, pp. 383–394. ACM, New York (1994)Google Scholar
  4. 4.
    Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: Proceedings of the 35th SIGMOD international conference on management of data, pp. 73–86. ACM, New York (2009)Google Scholar
  5. 5.
    Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. In: Proceedings of the 1987 ACM SIGMOD international conference on management of data, San Francisco, CA, USA, pp. 82–96. ACM, New York (1987)Google Scholar
  6. 6.
    Dewitt, D.J., Ghandeharizadeh, S., Schneider, D.A., Bricker, A., Hsiao, H.I., Rasmussen, R.: The Gamma database machine project. IEEE Trans. Knowl. Data Eng. 2(1), pp. 44–62. IEEE, Piscataway, NJ, USA (1990)Google Scholar
  7. 7.
    DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, MA, USA, pp. 1–8. ACM, New York (1984)Google Scholar
  8. 8.
    Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Database Eng. Bull. 8(2), pp. 3–10. Washington, DC, USA (1985)Google Scholar
  9. 9.
    Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Boston, Montreal, Quebec, Canada, pp. 173–182. ACM, New York (1996)Google Scholar
  10. 10.
    Hardavellas, N., Pandis, I., Johnson, R.F., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR’07: Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, pp. 79–87 (2007)Google Scholar
  11. 11.
    Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Vancouver, Canada, pp. 981–992. ACM, New York (2008)Google Scholar
  12. 12.
    Helland, P., Sammer, H., Lyon, J., Carr, R., Garrett, P., Reuter, A.: Group commit timers and high volume transaction systems. In: HPTS’87: 2nd International Workshop on High Performance Transaction Systems, Pacific Grove, CA, USA, pp. 301–329Google Scholar
  13. 13.
    Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, Barcelona, Spain, pp. 206–215. ACM, New York (2004)Google Scholar
  14. 14.
    Johnson, R., Pandis, I., Ailamaki, A.: Improving OLTP scalability using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)Google Scholar
  15. 15.
    Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, pp. 24–35. ACM, New York (2009)Google Scholar
  16. 16.
    Johnson R.F., Pandis I., Stoica R., Athanassoulis M., Ailamaki A.: Aether: a scalable approach to logging. PVLDB 3(1–2), 681–692 (2010)Google Scholar
  17. 17.
    Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: extending shared-disk clusters with shared caches. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 683–686. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  18. 18.
    Lamport L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)zbMATHCrossRefGoogle Scholar
  19. 19.
    Lee, S.W., Moon, B., Park, C., Kim, J.M., Kim, S.W.: A case for flash memory SSD in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Boston, Vancouver, Canada, pp. 1075–1086. ACM, New York (2008)Google Scholar
  20. 20.
    Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4, Digital Equipment Corporation, Cambridge Research Lab (1990)Google Scholar
  21. 21.
    Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4, Digital Equipment Corporation, Cambridge Research Lab (1992)Google Scholar
  22. 22.
    Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: Proceedings of the 16th International conference on very large data bases, pp. 392–405. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  23. 23.
    Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS 17(1), 94–162 (1992)Google Scholar
  24. 24.
    Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, Las Vegas, Nevada, USA, pp. 253–262. ACM, New York (2005)Google Scholar
  25. 25.
    Neuvonen, S., Wolski, A., Manner, M., Raatikka, V.: Telecom application transaction processing benchmark (TATP). See
  26. 26.
    Oracle: Asynchronous commit: Oracle database advanced application developer’s guide. Available at
  27. 27.
    Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: PDSIA’99: International Workshop on parallel and distributed computing for symbolic and irregular applications, Sendai, Japan, pp. 182–204 (1999)Google Scholar
  28. 28.
    Pandis, I., Johnson, R.F., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1–2), pp. 928–939 (2010)Google Scholar
  29. 29.
    Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. Technical report, EPFL (2011)Google Scholar
  30. 30.
    PostgreSQL: Asynchronous commit: PostgreSQL 8.4.2 documentation. Available at
  31. 31.
    Rafii, A., DuBois, D.: Performance tradeoffs of group commit logging. In: CMG Conference (1989)Google Scholar
  32. 32.
    Scott, M.L.: Non-blocking timeout in scalable queue-based spin locks. In: Proceedings of the twenty-first annual symposium on principles of distributed computing, Monterey, California, pp. 31–40. ACM, New York (2002)Google Scholar
  33. 33.
    Shavit, N., Touitou, D.: Elimination trees and the construction of pools and stacks: preliminary version. In: Proceedings of the seventh annual ACM symposium on parallel algorithms and architectures, SPAA’95, Santa Barbara, CA, USA, pp. 54–63. ACM, New York (1995)Google Scholar
  34. 34.
    Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: Proceedings of the 5th International Conference on Database Theory, pp. 139–147. Springer, London (1995)Google Scholar
  35. 35.
    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: Proceedings of the 33rd international conference on very large data bases, Vienna, Austria, pp. 1150–1160 (2007)Google Scholar
  36. 36.
    Thomson A., Abadi D.J.: The case for determinism in database systems. PVLDB 3(1–2), 70–80 (2010)Google Scholar
  37. 37.
    TPC benchmark B standard specification, revision 2.0 (1994). Available at
  38. 38.
    TPC benchmark C (OLTP) standard specification, revision 5.9 (2007). Available at

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Ryan Johnson
    • 1
    Email author
  • Ippokratis Pandis
    • 2
  • Radu Stoica
    • 3
  • Manos Athanassoulis
    • 3
  • Anastasia Ailamaki
    • 3
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.IBM Almaden Research CenterSan JoseUSA
  3. 3.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations