Characterization of the Impact of Hardware Islands on OLTP

Abstract

Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, non-uniform core topology does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. We show that no strategy is optimal for all cases and that the best choice depends on the combination of hardware topology and workload characteristics. Finally, we argue that transaction processing systems must be aware of the hardware topology in order to achieve predictably high performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Notes

  1. 1.

    Such as VoltDB, MongoDB, MemSQL, NuoDB.

  2. 2.

    For more details, see  http://www.supermicro.com/manuals/motherboard/7500/X8OBN-F.

  3. 3.

    Explaining, among other reasons, the high compensation for skilled database administrators.

References

  1. 1.

    Accetta, M.J., Baron, R.V., Bolosky, W.J., Golub, D.B., Rashid, R.F., Tevanian, A., Young, M.: Mach: A new kernel foundation for UNIX development. In: USENIX Summer, pp. 93–112 (1986)

  2. 2.

    Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266–277 (1999)

  3. 3.

    Albutiu, M.C., Kemper, A., Neumann, T.: Massively Parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)

    Google Scholar 

  4. 4.

    Amazon: EC2 instance types (2015). https://aws.amazon.com/ec2/instance-types/

  5. 5.

    Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Coordination avoidance in database systems. PVLDB 8(3), 185–196 (2015)

    Google Scholar 

  6. 6.

    Balkesen, C., Alonso, G., Teubner, J., Ozsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2014)

  7. 7.

    Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3–14 (1998)

  8. 8.

    Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)

  9. 9.

    Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: IEEE MICRO, pp. 319–330 (2004)

  10. 10.

    Bernstein, P.A., Goodman, N.: Multiversion concurrency control–theory and algorithms. ACM TODS 8(4), 465–483 (1983)

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Blagodurov, S., Zhuravlev, S., Fedorova, A., Kamali, A.: A case for NUMA-aware contention management on multicore systems. In: PACT, pp. 557–558 (2010)

  12. 12.

    Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)

  13. 13.

    Carey, M.J., DeWitt, D.J., Franklin, M.J., Hall, N.E., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., Zwilling, M.J.: Shoring up persistent applications. In: SIGMOD, pp. 383–394 (1994)

  14. 14.

    Closson, K.: You buy a NUMA system, Oracle says disable NUMA! What gives? (2009). http://kevinclosson.wordpress.com/2009/05/14/you-buy-a-numa-system-oracle-says-disable-numa-what-gives-part-ii/

  15. 15.

    Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: OSDI, pp. 261–264 (2012)

  16. 16.

    Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)

    Google Scholar 

  17. 17.

    Dashti, M., Fedorova, A., Funston, J., Gaud, F., Lachaize, R., Lepers, B., Quema, V., Roth, M.: Traffic management: a holistic approach to memory placement on NUMA systems. In: ASPLOS, pp. 381–394 (2013)

  18. 18.

    David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: SOSP, pp. 33–48 (2013)

  19. 19.

    Engler, D.R., Kaashoek, M.F., O’Toole Jr., J.: Exokernel: an operating system architecture for application-level resource management. In: SOSP, pp. 251–266 (1995)

  20. 20.

    Giceva, J., Alonso, G., Roscoe, T., Harris, T.: Deployment of query plans on multicores. PVLDB 8(3), 233–244 (2014)

  21. 21.

    Graham, C., Sood, B., Horiuchi, H., Sommer, D.: Market share: Database management system software, worldwide (2009). http://www.gartner.com/DisplayDocument?id=1044912

  22. 22.

    Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)

  23. 23.

    Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)

  24. 24.

    Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)

  25. 25.

    HP: Running Microsoft SQL Server 2014 on HP Integrity Superdome X—Reference Configuration Guide (2015). http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-8846ENW

  26. 26.

    Johnson, R., Pandis, I., Ailamaki, A.: Improving OLTP scalability using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)

    Google Scholar 

  27. 27.

    Johnson, R., Pandis, I., Ailamaki, A.: Eliminating unscalable communication in transaction processing. Vldb J. 23(1), 1–23 (2014)

    Article  Google Scholar 

  28. 28.

    Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)

  29. 29.

    Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)

    Google Scholar 

  30. 30.

    Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)

  31. 31.

    Jung, H., Han, H., Fekete, A.D., Heiser, G., Yeom, H.Y.: A Scalable lock manager for multicores. In: SIGMOD, pp. 73–84 (2013)

  32. 32.

    Kemper, A., Neumann, T.: HyPer – a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)

  33. 33.

    Kimura, H.: FOEDUS: OLTP engine for a thousand cores and NVRAM. In: SIGMOD, pp. 691–706 (2015)

  34. 34.

    Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)

  35. 35.

    Kissinger, T., Kiefer, T., Schlegel, B., Habich, D., Molka, D., Lehner, W.: ERIS: A NUMA-aware in-memory storage engine for analytical workload. In: ADMS, pp. 74–85 (2014)

  36. 36.

    Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)

    Article  Google Scholar 

  37. 37.

    Lahiri, T., Neimat, M.A., Folkman, S.: Oracle TimesTen: an in-memory database for enterprise applications. IEEE Data Eng. Bull. 36(2), 6–13 (2013)

    Google Scholar 

  38. 38.

    Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)

  39. 39.

    Larson, P.A., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)

  40. 40.

    Levandoski, J.J., Lomet, D.B., Sengupta, S.: The bw-tree: a b-tree for new hardware platforms. In: ICDE, pp. 302–313 (2013)

  41. 41.

    Levinthal, D.: Performance analysis guide for Intel Core i7 and Intel Xeon 5500 processors (2009). http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf

  42. 42.

    Li, Y., Pandis, I., Mueller, R., Raman, V., Lohman, G.: NUMA-aware algorithms: the case of data shuffling. In: CIDR (2013)

  43. 43.

    Lindström, J., Raatikka, V., Ruuth, J., Soini, P., Vakkila, K.: IBM solidDB: in-memory database optimized for extreme speed and availability. IEEE Data Eng. Bull. 36(2), 14–20 (2013)

    Google Scholar 

  44. 44.

    Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: Eurosys, pp. 183–196 (2012)

  45. 45.

    Microsoft: Analytics Platform System (2015). http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system

  46. 46.

    Oracle Corp.: Exadata Database Machine (2015). https://www.oracle.com/engineered-systems/exadata/database-machine-x4-8/features.html

  47. 47.

    Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)

    Google Scholar 

  48. 48.

    Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)

    Google Scholar 

  49. 49.

    Pavlo, A., Curino, C., Zdonik, S.: Skew-Aware Automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD, pp. 61–72 (2012)

  50. 50.

    Pavlo, A., Jones, E.P.C., Zdonik, S.: On predictive modeling for optimizing transaction execution in parallel OLTP systems. PVLDB 5(2), 85–96 (2011)

    Google Scholar 

  51. 51.

    Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)

  52. 52.

    Porobic, D., Liarou, E., Tözün, P., Ailamaki, A.: ATraPos: adaptive transaction processing on hardware islands. In: ICDE (2014)

  53. 53.

    Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware Islands. PVLDB 5(11), 1447–1458 (2012)

    Google Scholar 

  54. 54.

    Quamar, A., Kumar, K.A., Deshpande, A.: Sword: scalable workload-aware data placement for transactional workloads. In: EDBT, pp. 430–441 (2013)

  55. 55.

    Salomie, T.I., Subasu, I.E., Giceva, J., Alonso, G.: Database engines on multicores, why parallelize when you can distribute? In: EuroSys, pp. 17–30 (2011)

  56. 56.

    Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)

  57. 57.

    Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)

    Google Scholar 

  58. 58.

    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)

  59. 59.

    Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: ISCA, pp. 283–294 (2011)

  60. 60.

    Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: Fast distributed transactions for partitioned database systems. In: SIGMOD, pp. 1–12 (2012)

  61. 61.

    Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J. 22(2), 151–175 (2013)

    Article  Google Scholar 

  62. 62.

    Tözün, P., Pandis, I., Kaynak, C., Jevdjic, D., Ailamaki, A.: From A to E: analyzing TPC’s OLTP Benchmarks—The obsolete, the ubiquitous, the unexplored. In: EDBT, pp. 17–28 (2013)

  63. 63.

    TPC: TPC benchmark B standard specification, revision 2.0 (1994). http://www.tpc.org/tpcb

  64. 64.

    TPC: TPC benchmark C standard specification, revision 5.11 (2010). http://www.tpc.org/tpcc

  65. 65.

    TPC: TPC benchmark E standard specification, revision 1.12.0 (2010). http://www.tpc.org/tpce

  66. 66.

    Tran, K.Q., Naughton, J.F., Sundarmurthy, B., Tsirogiannis, D.: JECB: A join-extension, code-based approach to OLTP data partitioning. In: SIGMOD, pp. 39–50. ACM

  67. 67.

    Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP, pp. 18–32 (2013)

  68. 68.

    Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)

    Article  Google Scholar 

  69. 69.

    Wagle, M., Booss, D., Schreter, I.: NUMA-aware memory management with in-memory databases. In: TPCTC (2015)

  70. 70.

    Wilson, M.: Disabling NUMA parameter (2011). http://www.michaelwilsondba.info/2011/05/disabling-numa-parameter.html

  71. 71.

    Yu, X., Bezerra, G., Pavlo, A., Devadas, S., Stonebraker, M.: Staring into the abyss: an evaluation of concurrency control with one thousand cores. PVLDB 8(3), 209–220 (2014)

  72. 72.

    Zhang, C., Ré, C.: Dimmwitted: a study of main-memory statistical analytics. PVLDB 7(12), 1283–1294 (2014)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Eric Sedlar and Brian Gold for many insightful discussions and the members of the DIAS laboratory for their support throughout this work. This work is partially funded by Oracle Labs and by the Swiss National Science Foundation (Grant No. 200021-146407/1).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Danica Porobic.

Additional information

I. Pandis: Work done while author was affiliated with IBM.

M. Branco, P. Tözün: Work done while author was affiliated with EPFL.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Porobic, D., Pandis, I., Branco, M. et al. Characterization of the Impact of Hardware Islands on OLTP. The VLDB Journal 25, 625–650 (2016). https://doi.org/10.1007/s00778-015-0413-2

Download citation

Keywords

  • Islands
  • Shared-everything
  • Shared-nothing
  • OLTP
  • Multisocket multicores
  • Non-uniform hardware topology