Advertisement

Intelligent SPARQL Endpoints: Optimizing Execution Performance by Automatic Query Relaxation and Queue Scheduling

  • Ana I. Torre-BastidaEmail author
  • Esther Villar-Rodriguez
  • Miren Nekane Bilbao
  • Javier Del Ser
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)

Abstract

The Web of Data is widely considered as one of the major global repositories populated with countless interconnected and structured data prompting these linked datasets to be continuously and sharply increasing. In this context the so-called SPARQL Protocol and RDF Query Language is commonly used to retrieve and manage stored data by means of SPARQL endpoints, a query processing service especially designed to get access to these databases. Nevertheless, due to the large amount of data tackled by such endpoints and their structural complexity, these services usually suffer from severe performance issues, including inadmissible processing times. This work aims at overcoming this noted inefficiency by designing a distributed parallel system architecture that improves the performance of SPARQL endpoints by incorporating two functionalities: (1) a queuing system to avoid bottlenecks during the execution of SPARQL queries; and (2) an intelligent relaxation of the queries submitted to the endpoint at hand whenever the relaxation itself and the consequently lowered complexity of the query are beneficial for the overall performance of the system. To this end the system relies on a two-fold optimization criterion: the minimization of the query running time, as predicted by a supervised learning model; and the maximization of the quality of the results of the query as quantified by a measure of similarity. These two conflicting optimization criteria are efficiently balanced by two bi-objective heuristic algorithms sequentially executed over groups of SPARQL queries. The approach is validated on a prototype and several experiments that evince the applicability of the proposed scheme.

Keywords

SPARQL Query rewriting Linked Open Data Ontology management Multiobjective optimization 

References

  1. 1.
    Shadbolt, N., Hall, W., Berners-Lee, T.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006)CrossRefGoogle Scholar
  2. 2.
    Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: ACM International Conference on Database Theory, pp. 4–33 (2010)Google Scholar
  3. 3.
    Ganapathi, A., Kuno, H., Dayal, U., Wiener, J.L., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: better decisions enabled by machine learning. In: IEEE International Conference on Data Engineering, pp. 592–603 (2009)Google Scholar
  4. 4.
    Akdere, M., Çetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: IEEE International Conference on Data Engineering, pp. 390–401 (2012)Google Scholar
  5. 5.
    Hasan, R., Gandon, F.: A machine learning approach to SPARQL query performance prediction. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, vol. 1, pp. 266–273 (2014)Google Scholar
  6. 6.
    Görlitz, O., Staab, S.: Splendid: SPARQL endpoint federation exploiting void descriptions. In: COLD, vol. 782 (2011)Google Scholar
  7. 7.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-25073-6_38 CrossRefGoogle Scholar
  8. 8.
    Bernstein, A., Kiefer, C., Stocker, M.: OptARQ: a SPARQL optimization approach based on triple pattern selectivity estimation. Technical report ifi-2007.03, University of Zurich (2007)Google Scholar
  9. 9.
    Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P: Heuristics-based query optimisation for SPARQL. In: ACM International Conference on Extending Database Technology, pp. 324–335 (2012)Google Scholar
  10. 10.
    Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: EDBT, pp. 439–450 (2014)Google Scholar
  11. 11.
    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: ACM International Conference on World Wide Web, pp. 595–604 (2008)Google Scholar
  12. 12.
    Bicer, V., Tran, T., Gossen, A.: Relational kernel machines for learning from graph-structured RDF data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 47–62. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21034-1_4 CrossRefGoogle Scholar
  13. 13.
    Yamagata, Y., Fukuta, N.: An approach to dynamic query classification and approximation on an inference-enabled SPARQL endpoint. J. Inf. Process. 23(6), 759–766 (2015)Google Scholar
  14. 14.
    Maali, F., Hassan, I.A., Decker, S.: Scheduling for SPARQL endpoints. In: International Semantic Web Conference, pp. 19–28 (2014)Google Scholar
  15. 15.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-25073-6_29 CrossRefGoogle Scholar
  16. 16.
    Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60(5), 503–520 (2004)CrossRefGoogle Scholar
  17. 17.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Trillo, R., Gracia, J., Espinoza, M., Mena, E.: Discovering the semantics of user keywords. J. Univ. Comput. Sci. 13(12), 1908–1935 (2007)Google Scholar
  19. 19.
    Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Struct. Multi. Optim. 26(6), 369–395 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)CrossRefGoogle Scholar
  21. 21.
    Geem, Z.W., Kim, J.H., Loganathan, G.: A new heuristic optimization algorithm: harmony search. Simulation 76(2), 60–68 (2001)CrossRefGoogle Scholar
  22. 22.
    Manjarres, D., Landa-Torres, I., Gil-Lopez, S., Del Ser, J., Bilbao, M.N., Salcedo-Sanz, S., Geem, Z.W.: A survey on applications of the harmony search algorithm. Eng. Appl. Artif. Intell. 26(8), 1818–1831 (2013)CrossRefGoogle Scholar
  23. 23.
    Bean, J.C.: Genetic algorithms and random keys for sequencing and optimization. ORSA J. Comput. 6(2), 154–160 (1994)CrossRefzbMATHGoogle Scholar
  24. 24.
    Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-17746-0_39 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ana I. Torre-Bastida
    • 1
    Email author
  • Esther Villar-Rodriguez
    • 1
  • Miren Nekane Bilbao
    • 2
  • Javier Del Ser
    • 1
    • 2
    • 3
  1. 1.TECNALIA. OPTIMA UnitDerioSpain
  2. 2.University of the Basque Country UPV/EHUBilbaoSpain
  3. 3.Basque Center for Applied Mathematics (BCAM)BilbaoSpain

Personalised recommendations