Advertisement

Identifying and Caching Hot Triples for Efficient RDF Query Processing

  • Wei Emma ZhangEmail author
  • Quan Z. Sheng
  • Kerry Taylor
  • Yongrui Qin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9050)

Abstract

Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.

Keywords

Caching Query expansion Exponential smoothing RDF 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carpineto, C., Romano, G.: A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Survey 44(1), 1 (2012)CrossRefGoogle Scholar
  2. 2.
    Denning, P.J.: The Working Set Model for Program Behaviour. Communications of the ACM 11(5), 323–333 (1968)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL Querying of Large RDF Graphs. The VLDB Endowment (PVLDB) 4(11), 1123–1134 (2011)Google Scholar
  4. 4.
    Johnson, N.L., Kemp, A.W., Kotz, S.: Univariate Discrete Distributions (2nd Edition). Wiley (1993)Google Scholar
  5. 5.
    Jr., E.S.G.: Exponential Smoothing: The State of The Art-Part II. International Journal of Forecasting 22(4), 637–666 (2006)CrossRefGoogle Scholar
  6. 6.
    Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proc. of 29th International Conference on Data Engineering (ICDE 2013), pp. 26–37. Brisbane, Australia, April 2013Google Scholar
  7. 7.
    Lorey, J., Naumann, F.: Detecting SPARQL query templates for data prefetching. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 124–139. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  8. 8.
    Martin, M., Unbehauen, J., Auer, S.: Improving the Performance of Semantic Web Applications with SPARQL Query Caching. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 304–318. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  9. 9.
    Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proc. of the Conference on File and Storage Technologies (FAST 2003). San Francisco, California, USA, March 2003Google Scholar
  10. 10.
    Movellan, J.R.: A Quickie on Exponential Smoothing. http://mplab.ucsd.edu/tutorials/ExpSmoothing.pdfa/
  11. 11.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proc. of the International Conference on Management of Data (SIGMOD 2009)Google Scholar
  12. 12.
    Neumann, T., Weikum, G.: The RDF-3X Engine for Scalable Management of RDF Data. The VLDB Journal 19(1), 91–113 (2010)CrossRefGoogle Scholar
  13. 13.
    O’Neil, E.J., O’Neil, P.E., Weikum, G.: The LRU-K page replacement algorithm for database disk buffering. In: Proc. of the International Conference on Management of Data (SIGMOD 1993), pp. 297–306. Washington, D.C., USA, May 1993Google Scholar
  14. 14.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3) (2009)Google Scholar
  15. 15.
    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proc. of the 17th International World Wide Web Conference (WWW 2008), pp. 595–604. Beijing, China, April 2008Google Scholar
  16. 16.
    Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficiently querying RDF data in triple stores. In: Proc. of the 17th International World Wide Web Conference (WWW 2008), pp. 1053–1054. Beijing, China, April 2008Google Scholar
  17. 17.
    Yang, M., Wu, G.: Caching intermediate result of SPARQL queries. In: Proc. of the 20th International World Wide Web Conference (WWW 2011), pp. 159–160. Hyderabad, India, March 2011Google Scholar
  18. 18.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A Distributed Graph Engine for Web Scale RDF Data. The VLDB Endowment (PVLDB) 6(4), 265–276 (2013)CrossRefGoogle Scholar
  19. 19.
    Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: Answering SPARQL Queries via Subgraph Matching. The VLDB Endowment (PVLDB) 4(8), 482–493 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Wei Emma Zhang
    • 1
    Email author
  • Quan Z. Sheng
    • 1
  • Kerry Taylor
    • 2
  • Yongrui Qin
    • 1
  1. 1.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  2. 2.CSIROCanberraAustralia

Personalised recommendations