Learning-Based SPARQL Query Performance Prediction

  • Wei Emma ZhangEmail author
  • Quan Z. Sheng
  • Kerry Taylor
  • Yongrui Qin
  • Lina Yao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10041)


According to the predictive results of query performance, queries can be rewritten to reduce time cost or rescheduled to the time when the resource is not in contention. As more large RDF datasets appear on the Web recently, predicting performance of SPARQL query processing is one major challenge in managing a large RDF dataset efficiently. In this paper, we focus on representing SPARQL queries with feature vectors and using these feature vectors to train predictive models that are used to predict the performance of SPARQL queries. The evaluations performed on real world SPARQL queries demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.


SPARQL Feature modeling Prediction 


  1. 1.
    Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th International Conference on Extending Database Technology (EDBT 2011), Uppsala, pp. 449–460, March 2011Google Scholar
  2. 2.
    Akdere, M., Çetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: Proceedings of the 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, pp. 390–401, April 2012Google Scholar
  3. 3.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)MathSciNetGoogle Scholar
  4. 4.
    Bursztyn, D., Goasdoué, F., Manolescu, I.: Optimizing reformulation-based query answering in RDF. In: Proceedings of the 18th International Conference on Extending Database Technology (EDBT 2015), Brussels, pp. 265–276, March 2015Google Scholar
  5. 5.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRefGoogle Scholar
  6. 6.
    Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceedings of the 25th International Conference on Data Engineering (ICDE 2009), Shanghai, pp. 592–603, March 2009Google Scholar
  7. 7.
    Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: Proceedings of the 17th International Conference on Extending Database Technology (EDBT 2014), Athens, pp. 439–450, March 2014Google Scholar
  8. 8.
    Hasan, R.: Predicting SPARQL query performance and explaining linked data. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 795–805. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-07443-6_53 CrossRefGoogle Scholar
  9. 9.
    Li, J., König, A.C., Narasayya, V.R., Chaudhuri, S.: Robust estimation of resource consumption for SQL queries using statistical techniques. VLDB Endow. (PVLDB) 5(11), 1555–1566 (2012)CrossRefGoogle Scholar
  10. 10.
    Morsey, M., Lehmann, J., Auer, S., Ngomo, A.N.: Usage-centric benchmarking of RDF triple stores. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, Toronto, July 2012Google Scholar
  11. 11.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16 (2009)CrossRefGoogle Scholar
  12. 12.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  13. 13.
    Smola, A., Vapnik, V.: Support vector regression machines. Adv. Neural Inf. Process. Syst. 9, 155–161 (1997)Google Scholar
  14. 14.
    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, pp. 595–604, April 2008Google Scholar
  15. 15.
    Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P. A.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), Uppsala, pp. 324–335, March 2012Google Scholar
  16. 16.
    Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: Proceedings of the 29th International Conference on Data Engineering (ICDE 2013), Brisbane, pp. 1081–1092, April 2013Google Scholar
  17. 17.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Wei Emma Zhang
    • 1
    Email author
  • Quan Z. Sheng
    • 1
  • Kerry Taylor
    • 2
  • Yongrui Qin
    • 3
  • Lina Yao
    • 4
  1. 1.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  2. 2.Research School of Computer ScienceAustralian National UniversityCanberraAustralia
  3. 3.School of Computing and EngineeringUniversity of HuddersfieldHuddersfieldUK
  4. 4.School of Computer Science and EngineeringUNSW AustraliaSydneyAustralia

Personalised recommendations