Skip to main content

Tide-tree: A self-tuning indexing scheme for hybrid storage system

Abstract

Main memory index is built with the assumption that the RAM is sufficiently large to hold data. Due to the volatility and high unit price of main memory, indices under secondary memory such as SSD and HDD are widely used. However, the I/O operation with main memory is still the bottleneck for query efficiency. In this paper, we propose a self-tuning indexing scheme called Tide-tree for RAM/Disk-based hybrid storage system. Tide-tree aims to overcome the obstacles main memory and disk-based indices face, and performs like the tide to achieve a double-win in space and performance, which is self-adaptive with respect to the running environment. Particularly, Tide-tree delaminates the tree structure adaptively with high efficiency based on storage sense, and applies an effective self-tuning algorithm to dynamically load various nodes into main memory. We employ memory mapping technology to solve the persistent problem of main memory index, and improves the efficiency of data synchronism and pointer translation. To further enhance the independence of Tide-tree, we employ the index head and the level address table to manage the whole index. With the index head, three efficient operations are proposed, namely index rebuild, index load and range search. We have conducted extensive experiments to compare the Tide-tree with several state-of-the-art indices, and the results have validated the high efficiency, reusability and stability of Tide-tree.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20

Notes

  1. http://db.csail.mit.edu/sigmod11contest/

  2. http://linux.die.net/man/2/mmap

  3. http://fallabs.com/tokyocabinet/

  4. http://redis.io/

References

  1. Agrawal, D., Ganesan, D., Sitaraman, R., Diao, Y., Singh, S.: Lazy-adaptive tree: An optimized index structure for flash devices. Proceedings of the VLDB Endowment 2(1), 361–372 (2009)

    Article  Google Scholar 

  2. Athanassoulis, M., Ailamaki, A.: Bf-tree: approximate tree indexing. Proceedings of the VLDB Endowment 7(14), 1881–1892 (2014)

    Article  Google Scholar 

  3. Boehm, M., Schlegel, B., Volk, P.B., Fischer, U., Habich, D., Lehner, W.: Efficient in-memory indexing with generalized prefix trees. In: BTW, vol. 180, pp 227–246 (2011)

  4. Chaudhuri, S., Weikum, G.: Rethinking database system architecture: Towards a self-tuning risc-style database system. In: VLDB, pp 1–10. Citeseer (2000)

  5. Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. (CSUR) 11(2), 121–137 (1979)

    MathSciNet  Article  MATH  Google Scholar 

  6. Das, S., Nishimura, S., Agrawal, D., El Abbadi, A.: Albatross: lightweight elasticity in shared storage databases for the cloud using live data migration. Proceedings of the VLDB Endowment 4(8), 494–505 (2011)

    Article  Google Scholar 

  7. Dewitt, S.J.W.D.J.: A performance study of alternative object faulting and pointer swizzling strategies. In: Proceedings 18th Int. Conf. Very Large Data Bases, Vancouver, BC, Canada (1992)

  8. Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: Sql server’s memory-optimized oltp engine. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 1243–1254. ACM (2013)

  9. Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)

    Article  Google Scholar 

  10. Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Towards efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security, doi:10.1109/TIFS.2016.2596138 (2016)

  11. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database system implementation, vol. 654. Prentice Hall Upper Saddle River, NJ (2000)

    Google Scholar 

  12. Graefe, G.: Modern b-tree techniques. Foundations and Trends in Databases 3(4), 203–402 (2011)

    Article  Google Scholar 

  13. Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes. In: Proceedings of the 13th International Conference on Extending Database Technology, pp 371–381. ACM (2010)

  14. Graefe, G., Volos, H., Kimura, H., Kuno, H., Tucek, J., Lillibridge, M., Veitch, A.: In-memory performance for big data. Proceedings of the VLDB Endowment 8(1), 37–48 (2014)

    Article  Google Scholar 

  15. Halim, F., Idreos, S., Karras, P., Yap, R.H.: Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. Proceedings of the VLDB Endowment 5(6), 502–513 (2012)

    Article  Google Scholar 

  16. Idreos, S., Kersten, M.L., Manegold, S., et al.: Database cracking. In: CIDR, vol. 3, pp 1–8 (2007)

  17. Jin, P., Yang, P., Yue, L.: Optimizing b+-tree for hybrid storage systems. Distributed and Parallel Databases 33(3), 449–475 (2015)

    Article  Google Scholar 

  18. Jørgensen, M.V., Rasmussen, R.B., Šaltenis, S., Schjønning, C.: Fb-tree: a b+-tree for flash-based ssds. In: Proceedings of the 15th Symposium on International Database Engineering & Applications, pp 34–42. ACM (2011)

  19. Kissinger, T., Schlegel, B., Boehm, M., Habich, D., Lehner, W.: A high-throughput in-memory index, durable on flash-based ssd: insights into the winning solution of the sigmod programming contest 2011. ACM SIGMOD Record 41(3), 44–50 (2012)

    Article  Google Scholar 

  20. Lahiri, T., Neimat, M.A., Folkman, S.: Oracle timesten: An in-memory database for enterprise applications. IEEE Data Eng. Bull. 36(2), 6–13 (2013)

    Google Scholar 

  21. Lee, H.S., Lee, D.H.: An efficient index buffer management scheme for implementing a b-tree on nand flash memory. Data Knowl. Eng. 69(9), 901–916 (2010)

    Article  Google Scholar 

  22. Lehman, T.J., Carey, M.J.: A study of index structures for main memory database management systems. In: Proceedings VLDB (1986)

  23. Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: Artful indexing for main-memory databases. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp 38–49. IEEE (2013)

  24. Li, Y., He, B., Luo, Q., Yi, K.: Tree indexing on flash disks. In: 2009 IEEE 25th International Conference on Data Engineering, pp 1303–1306. IEEE (2009)

  25. Li, Y., He, B., Yang, R.J., Luo, Q., Yi, K.: Tree indexing on solid state drives. Proceedings of the VLDB Endowment 3(1-2), 1195–1206 (2010)

    Article  Google Scholar 

  26. Lin, Z., Kahng, M., Sabrin, K.M., Chau, D.H.P., Lee, H., Kang, U.: Mmap: Fast billion-scale graph computation on a pc via memory mapping. In: 2014 IEEE International Conference on Big Data (Big Data), pp 159–164. IEEE (2014)

  27. Long, X., Suel, T.: Three-level caching for efficient query processing in large web search engines. World Wide Web 9(4), 369–395 (2006)

    Article  Google Scholar 

  28. Mullin, J.K.: A second look at bloom filters. Commun. ACM 26(8), 570–571 (1983)

    Article  Google Scholar 

  29. Nath, S., Kansal, A.: Flashdb: dynamic self-tuning database for nand flash. In: Proceedings of the 6th international conference on Information processing in sensor networks, pp 410–419. ACM (2007)

  30. Peng, P., Zou, L., Chen, L., Lin, X., Zhao, D.: Answering subgraph queries over massive disk resident graphs. World Wide Web 19(3), 417–448 (2016)

    Article  Google Scholar 

  31. Rao, J., Ross, K.A.: Making b+-trees cache conscious in main memory. In: ACM SIGMOD Record, vol. 29, pp 475–486. ACM (2000)

  32. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10. IEEE (2010)

  33. Song, S., Chen, L.: Indexing dataspaces with partitions. World wide web 16 (2), 141–170 (2013)

    MathSciNet  Article  Google Scholar 

  34. WANG, S., QIN, X., SHEN, Y., LI, B., SHI, W.: Research on durable csb+-tree indexing technology. Journal of Frontiers of Computer Science and Technology 2, 005 (2015)

    Google Scholar 

  35. Wu, C.H., Kuo, T.W., Chang, L.P.: An efficient b-tree layer implementation for flash-memory storage systems. ACM Trans. Embed. Comput. Syst. (TECS) 6(3), 19 (2007)

    Article  Google Scholar 

  36. Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)

    Article  Google Scholar 

  37. Yang, C., Jin, P., Yue, L., Yang, P.: Efficient buffer management for tree indexes on solid state drives. Int. J. Parallel Prog. 44(1), 5–25 (2016)

    Article  Google Scholar 

  38. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp 2–2. USENIX Association (2012)

  39. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 10–10 (2010)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by the Australian Research Council’s Discovery Projects Scheme (DP170102726), the National Natural Foundation of China under Grant No. 91646204, 61373015, 61300052, 41301047, 71322104, the Funding of Jiangsu Innovation Program for Graduate Education under Grant No.SJZZ_0043, and National Center for International Joint Research on E-Business Information Processing (2013B01035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhifeng Bao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Qin, X., Bao, Z. et al. Tide-tree: A self-tuning indexing scheme for hybrid storage system. World Wide Web 20, 1017–1045 (2017). https://doi.org/10.1007/s11280-016-0426-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0426-9

Keywords

  • Self-tuning index
  • Memory map
  • Pointer swizzling
  • Hybrid storage