The VLDB Journal

, Volume 25, Issue 5, pp 651–672 | Cite as

Exploiting SSDs in operational multiversion databases

  • Mohammad Sadoghi
  • Kenneth A. Ross
  • Mustafa Canim
  • Bishwaranjan Bhattacharjee
Special Issue Paper


Multiversion databases store both current and historical data. Rows are typically annotated with timestamps representing the period when the row is/was valid. We develop novel techniques to reduce index maintenance in multiversion databases, so that indexes can be used effectively for analytical queries over current data without being a heavy burden on transaction throughput. To achieve this end, we re-design persistent index data structures in the storage hierarchy to employ an extra level of indirection. The indirection level is stored on solid-state disks that can support very fast random I/Os, so that traversing the extra level of indirection incurs a relatively small overhead. The extra level of indirection dramatically reduces the number of magnetic disk I/Os that are needed for index updates and localizes maintenance to indexes on updated attributes. Additionally, we batch insertions within the indirection layer in order to reduce physical disk I/Os for indexing new records. In this work, we further exploit SSDs by introducing novel DeltaBlock techniques for storing the recent changes to data on SSDs. Using our DeltaBlock, we propose an efficient method to periodically flush the recently changed data from SSDs to HDDs such that, on the one hand, we keep track of every change (or delta) for every record, and, on the other hand, we avoid redundantly storing the unchanged portion of updated records. By reducing the index maintenance overhead on transactions, we enable operational data stores to create more indexes to support queries. We have developed a prototype of our indirection proposal by extending the widely used generalized search tree open-source project, which is also employed in PostgreSQL. Our working implementation demonstrates that we can significantly reduce index maintenance and/or query processing cost by a factor of 3. For the insertion of new records, our novel batching technique can save up to 90 % of the insertion time. For updates, our prototype demonstrates that we can significantly reduce the database size by up to 80 % even with a modest space allocated for DeltaBlocks on SSDs.


Multiversion databases SSD Flash storage Index maintenance 


  1. 1.
    BioPostgres: Data management for computational biology.
  2. 2.
    IBM DB2 Database for Linux, UNIX, and Windows.
  3. 3.
  4. 4.
    OpenFTS: Open source full text search engine.
  5. 5.
    PostGIS: Geographic information systems.
  6. 6.
    PostgreSQL: Open source object-relational database system.
  7. 7.
    YAGO2: High-quality knowledge base.
  8. 8.
    Agrawal, D., Ganesan, D., Sitaraman, R.K., Diao, Y., Singh, S.: Lazy-adaptive tree: an optimized index structure for flash devices. PVLDB 2(1), 361–372 (2009)Google Scholar
  9. 9.
    Ang, C.-H., Tan, K.-P.: The interval B-tree. Inf. Process. Lett. 53(2), 85–89 (1995)CrossRefMATHGoogle Scholar
  10. 10.
    Arpaci-Dusseau, R., Arpaci-Dusseau, A.: Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books, 0.5 edition (2012)Google Scholar
  11. 11.
    Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P.B., Stoica, R.: MaSM: efficient online updates in data warehouses. In: SIGMOD Conference, pp. 865–876 (2011)Google Scholar
  12. 12.
    Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, P.: An asymptotically optimal multiversion B-Tree. VLDB J. 5(4), 264–275 (1996)CrossRefGoogle Scholar
  13. 13.
    Bhattacharjee, B., Lim, L., Malkemus, T., Mihaila, G., Ross, K., Lau, S., McArthur, C., Toth, Z., Sherkat, R.: Efficient index compression in DB2 LUW. Proc. VLDB Endow. 2(2), 1462–1473 (2009)CrossRefGoogle Scholar
  14. 14.
    Bhattacharjee, B., Malkemus, T., Lau, S., Mckeough, S., Kirton, J.-A., Boeschoten, R.V., Kennedy, J.: Efficient bulk deletes for multi dimensionally clustered tables in DB2. In: VLDB, pp. 1197–1206 (2007)Google Scholar
  15. 15.
    Bhattacharjee, B., Ross, K.A., Lang, C.A., Mihaila, G.A., Banikazemi, M.: Enhancing recovery using an SSD buffer pool extension. In: DaMoN, pp. 10–16 (2011)Google Scholar
  16. 16.
    Bozkaya, T., Özsoyoğlu, M.: Indexing valid time intervals. Lect. Notes Comput. Sci. 1460, 541–550 (1998)CrossRefGoogle Scholar
  17. 17.
    Canim, M., Bhattacharjee, B., Mihaila, G.A., Lang, C.A., Ross, K.A.: An object placement advisor for DB2 using solid state storage. PVLDB 2(2), 1318–1329 (2009)Google Scholar
  18. 18.
    Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: SSD bufferpool extensions for database systems. PVLDB 3(2), 1435–1446 (2010)Google Scholar
  19. 19.
    Chaudhuri, S., Narasayya, V.: Automating statistics management for query optimizers. IEEE Trans. Knowl. Data Eng 13(1), 7–20 (2001)CrossRefGoogle Scholar
  20. 20.
    Chaudhuri, S., Narasayya, V.R.: An efficient cost-driven index selection tool for microsoft SQL server. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, pp. 146–155. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  21. 21.
    Chen, F., Luo, T., Zhang, X.: CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: FAST, pp. 77–90 (2011)Google Scholar
  22. 22.
    Chen, S.: Time travel query or bi-temporal. In: DB2 for z/OS Technical Forum (2010)Google Scholar
  23. 23.
    Do, J., Zhang, D., Patel, J.M., DeWitt, D.J., Naughton, J.F., Halverson, A.: Turbocharging DBMS buffer pool using SSDs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD ’11, pp. 1113–1124. ACM, New York (2011)Google Scholar
  24. 24.
    Dou, A.J., Lin, S., Kalogeraki, V.: Real-time querying of historical data in flash-equipped sensor devices. In: IEEE Real-Time Systems Symposium, pp. 335–344 (2008)Google Scholar
  25. 25.
    Drossel, G.: Methodologies for calculating SSD usable life. In: Storage Developer Conference (2009)Google Scholar
  26. 26.
    Elmasri, R., Wuu, G.T.J., Kouramajian, V.: The time index and the monotonic B+-tree. In: Temporal Databases, pp. 433–456 (1993)Google Scholar
  27. 27.
  28. 28.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall Press, Upper Saddle River, NJ (2008)Google Scholar
  29. 29.
    The GiST indexing project.
  30. 30.
    Gunadhi, H., Segev, A.: Efficient indexing methods for temporal relations. IEEE Trans. Knowl. Data Eng. 5(3), 496 (1993)CrossRefGoogle Scholar
  31. 31.
    Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB ’95, pp. 562–573. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  32. 32.
    Hinshaw, F.D., Harris, C.S., Sarin, S.K.: Controlling visibility in multi-version database systems. US 7305386 Patent, Netezza Corporation (2007)Google Scholar
  33. 33.
    Hitz, D., Lau, J., Malcolm, M.: File system design for an NFS file server appliance. In: Proceedings of the USENIX Winter 1994 Technical Conference, WTEC’94, pp. 19–19. USENIX Association, Berkeley (1994)Google Scholar
  34. 34.
  35. 35.
    Inmon, W.H.: Building the Operational Data Store, 2nd edn. Wiley, New York (1999)Google Scholar
  36. 36.
    Jouini, K., Jomier, G.: Indexing multiversion databases. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM ’07, pp. 915–918. ACM, New York (2007)Google Scholar
  37. 37.
    Kang, W.-H., Lee, S.-W., Moon, B.: Flash-based extended cache for higher throughput and faster recovery. PVLDB 5(11), 1615–1626 (2012)Google Scholar
  38. 38.
    Larson, P.-A., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow. 5(4), 298–309 (2011)CrossRefGoogle Scholar
  39. 39.
    Levandoski, J.J., Lomet, D.B., Sengupta, S.: The Bw-Tree: a B-tree for new hardware platforms. In: Proceedings of the 2013 IEEE 29th International Conference on Data Engineering, ICDE ’13. IEEE Computer Society, Washington (2013)Google Scholar
  40. 40.
    Leventhal, A.: Flash storage memory. Commun. ACM 51(7), 47–51 (2008)CrossRefGoogle Scholar
  41. 41.
    Li, Y., He, B., Luo, Q., Yi, K.: Tree indexing on flash disks. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ’09, pp. 1303–1306. IEEE Computer Society, Washington (2009)Google Scholar
  42. 42.
    Lomet, D., Barga, R., Mokbel, M.F., Shegalov, G., Wang, R., Zhu, Y.: Immortal DB: transaction time support for SQL server. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD ’05, pp. 939–941. ACM, New York (2005)Google Scholar
  43. 43.
    Lomet, D., Hong, M., Nehme, R., Zhang, R.: Transaction time indexing with version compression. Proc. VLDB Endow. 1(1), 870–881 (2008)CrossRefGoogle Scholar
  44. 44.
    Menon, P., Rabl, T., Sadoghi, M., Jacobsen, H.: CaSSanDra: an SSD boosted key-value store. In: IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31–April 4, 2014, pp. 1162–1167 (2014)Google Scholar
  45. 45.
    Murphy, G., Compher, D.: DB2 storage observations (2011)Google Scholar
  46. 46.
    Omiecinski, E., Liu, W., Akyildiz, I.F.: Analysis of a deferred and incremental update strategy for secondary indexes. Inf. Syst. 16(3), 345–356 (1991)CrossRefGoogle Scholar
  47. 47.
    O’Neil, P.E., Cheng, E., Gawlick, D., O’Neil, E.J.: The log-structured merge-tree (LSM-Tree). Acta Inf. 33(4), 351–385 (1996)CrossRefMATHGoogle Scholar
  48. 48.
  49. 49.
  50. 50.
    Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.-A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endow. 5(12), 1724–1735 (2012)CrossRefGoogle Scholar
  51. 51.
    Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10(1), 26–52 (1992)Google Scholar
  52. 52.
    Sadoghi, M., Canim, M., Bhattacharjee, B., Nagel, F., Ross, K.A.: Reducing database locking contention through multi-version concurrency. Proc. VLDB Endow. 7(13), 1331–1342 (2014)CrossRefGoogle Scholar
  53. 53.
    Sadoghi, M., Ross, K.A., Canim, M., Bhattacharjee, B.: Making updates disk-I/O friendly using SSDs. Proc. VLDB Endow. 6(11), 997–1008 (2013)CrossRefGoogle Scholar
  54. 54.
    Salzberg and Tsotras: Comparison of access methods for time-evolving data. CSURV. Comput. Surv. 31(2), 158–221 (1999). doi: 10.1145/319806.319816
  55. 55.
    Samy, V., Lu, W., Rada, A., Punit, S., Srinivasan, S.: Best practices physical database design for online transaction processing (OLTP) environments (2011)Google Scholar
  56. 56.
    Saracco, C.M., Nicola, M., Gandhi, L.: A matter of time: temporal data management in DB2 for z/OS (2010)Google Scholar
  57. 57.
    Sears, R., Ramakrishnan, R.: bLSM: a general purpose log structured merge tree. In: SIGMOD Conference, pp. 217–228 (2012)Google Scholar
  58. 58.
    Shen, H., Chin, B., Lu, O.H.: The TP-Index: a dynamic and efficient indexing mechanism for temporal databases. In: Proceedings of the Tenth International Conference on Data Engineering, pp. 274–281. IEEE (1994)Google Scholar
  59. 59.
    Snodgrass, R.T.: A case study of temporal data. Teradata Corporation, Dayton (2010)Google Scholar
  60. 60.
    TPC-H, decision support benchmark.
  61. 61.
    Vo, H.T., Wang, S., Agrawal, D., Chen, G., Ooi, B.C.: LogBase: a scalable log-structured database system in the cloud. PVLDB 5(10), 1004–1015 (2012)Google Scholar
  62. 62.
    Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp. 91–104. ACM, New York (2011)Google Scholar
  63. 63.
    Wu, C.-H., Kuo, T.-W., Chang, L.-P.: An efficient B-tree layer implementation for flash-memory storage systems. ACM Trans. Embedded Comput. Syst. 6(3) (2007). doi: 10.1145/1275986.1275991

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Mohammad Sadoghi
    • 1
  • Kenneth A. Ross
    • 1
    • 2
  • Mustafa Canim
    • 1
  • Bishwaranjan Bhattacharjee
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA
  2. 2.Columbia UniversityNew YorkUSA

Personalised recommendations