Advertisement

Function Interpolation for Learned Index Structures

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12008)

Abstract

Range indexes such as B-trees are widely recognised as effective data structures for enabling fast retrieval of records by the query key. While such classical indexes offer optimal worst-case guarantees, recent research suggests that average-case performance might be improved by alternative machine learning-based models such as deep neural networks. This paper explores an alternative approach by modelling the task as one of function approximation via interpolation between compressed subsets of keys. We explore the Chebyshev and Bernstein polynomial bases, and demonstrate substantial benefits over deep neural networks. In particular, our proposed function interpolation models exhibit memory footprint two orders of magnitude smaller compared to neural network models, and 30–40% accuracy improvement over neural networks trained with the same amount of time, while keeping query time generally on-par with neural network models.

Keywords

Indexing Databases Function approximation 

References

  1. 1.
    Aldà, F., Rubinstein, B.I.P.: The Bernstein mechanism: function release under differential privacy. In: AAAI, pp. 1705–1711 (2017)Google Scholar
  2. 2.
    Bayer, R., McCreight, E.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)Google Scholar
  3. 3.
    Boyd, J.P., Ong, J.R.: Exponentially-convergent strategies for defeating the Runge phenomenon for the approximation of non-periodic functions, part I: single-interval schemes. Commun. Comput. Phys. 5(2–4), 484–497 (2009)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Brisebarre, N., Joldeş, M.: Chebyshev interpolation polynomial-based tools for rigorous computing. In: Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, pp. 147–154. ACM (2010)Google Scholar
  5. 5.
    Cheney, E.W.: Introduction to Approximation Theory. McGraw-Hill, New York (1966)zbMATHGoogle Scholar
  6. 6.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn NIPS Workshop (2011)Google Scholar
  7. 7.
    Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: Fiting-tree: a data-aware index structure. In: SIGMOD, pp. 1189–1206 (2019)Google Scholar
  8. 8.
    Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: UAI, pp. 148–155 (1998)Google Scholar
  9. 9.
    Gil, A., Segura, J., Temme, N.M.: Numerical Methods for Special Functions. Society for Industrial and Applied Mathematics (2007)Google Scholar
  10. 10.
    Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE (1998)Google Scholar
  11. 11.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  12. 12.
    Graefe, G., Larson, P.A.: B-tree indexes and CPU caches. In: ICDE, pp. 349–358 (2001)Google Scholar
  13. 13.
    Hadian, A., Heinis, T.: Interpolation-friendly B-trees: bridging the gap between algorithmic and learned indexes. In: EDBT, pp. 710–713 (2019)Google Scholar
  14. 14.
    Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR, pp. 68–78 (2007)Google Scholar
  15. 15.
    Kim, C., et al.: Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In: SIGMOD, pp. 339–350 (2010)Google Scholar
  16. 16.
    Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD (2018)Google Scholar
  17. 17.
    Kubica, J.M., Moore, A., Connolly, A.J., Jedicke, R.: Spatial data structures for efficient trajectory-based queries. Technical report, CMU-RI-TR-04-61, Carnegie Mellon University (2004)Google Scholar
  18. 18.
    Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: artful indexing for main-memory databases. In: ICDE, pp. 38–49 (2013)Google Scholar
  19. 19.
    Microsoft: Hardware and software requirements for installing SQL serverGoogle Scholar
  20. 20.
    Mitzenmacher, M.: A model for learned bloom filters, and optimizing by sandwiching. In: NIPS, pp. 462–471 (2018)CrossRefGoogle Scholar
  21. 21.
    Rao, J., Ross, K.A.: Making b+-trees cache conscious in main memory. In: SIGMOD, pp. 475–486 (2000)CrossRefGoogle Scholar
  22. 22.
    Schonfelder, J.: Chebyshev expansions for the error and related functions. Math. Comput. 32(144), 1232–1240 (1978)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar
  24. 24.
    Stonebraker, M.: The case for partial indexes. SIGMOD Rec. 18(4), 4–11 (1989)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computing and Information SystemsUniversity of MelbourneMelbourneAustralia

Personalised recommendations