Extending SPARQL for Data Analytic Tasks

  • Julian DolbyEmail author
  • Achille Fokoue
  • Mariano Rodriguez Muro
  • Kavitha Srinivas
  • Wen Sun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9982)


SPARQL has many nice features for accessing data integrated across different data sources, which is an important step in any data analysis task. We report on the use of SPARQL for two real data analytic use cases from the healthcare and life sciences domains, which exposed certain weaknesses in the current specification of SPARQL, specifically when the data being integrated is most conveniently accessed via RESTful services and in formats beyond RDF, such as XML. We therefore extended SPARQL with generalized service, constructs for accessing services beyond the SPARQL endpoints supported by service. For efficiency, our constructs support posting data, which is also not supported by service. We provide an open source implementation of this SPARQL endpoint in an RDF store called Quetzal, and evaluate its use in the two data analytic scenarios over real datasets.


  1. 1.
  2. 2.
  3. 3.
    Arenas, M., Sequeda, J., Prud’hommeaux, E., Bertails, A.: A direct mapping of relational data to RDF. W3C Recommendation, W3C, September 2012Google Scholar
  4. 4.
    Das, S., Cyganiak, R., Sundara, S.: R2RML: RDB to RDF mapping language. W3C Recommendation, W3C, September 2012Google Scholar
  5. 5.
    Ermilov, I., Auer, S., Stadler, C.: CSV2RDF: user-driven CSV to RDF mass conversion framework. In: Proceedings of the ISEM 2013, September 2013Google Scholar
  6. 6.
    Flockhart, D.A., Honig, P., Yasuda, S.U., Rosebraugh, C.: Preventable adverse drug reactions: a focus on drug interactions. In: Centers for Education & Research on TherapeuticsGoogle Scholar
  7. 7.
    Fokoue, A., Hassanzadeh, O., Sadoghi, M., Zhang, P.: Predicting drug-drug interactions through similarity-based link prediction over web data. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016. ACM (2016)Google Scholar
  8. 8.
    Fokoue, A., Sadoghi, M., Hassanzadeh, O., Zhang, P.: Predicting drug-drug interactions through large-scale similarity-based link prediction.
  9. 9.
    Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C Recommendation, W3C, March 2013Google Scholar
  10. 10.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. (CSUR) 32(4), 422–469 (2000)CrossRefGoogle Scholar
  11. 11.
    Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21034-1_10 CrossRefGoogle Scholar
  12. 12.
    Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-68234-9_39 CrossRefGoogle Scholar
  13. 13.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: a federation layer for distributed query processing on linked open data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6644, pp. 481–486. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21064-8_39 CrossRefGoogle Scholar
  14. 14.
    Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22(3), 183–236 (1990)CrossRefGoogle Scholar
  15. 15.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43(2), 493–500 (2003)CrossRefGoogle Scholar
  16. 16.
    Tandy, J., Herman, I., Kellogg, G.: Generating RDF from tabular data on the web. W3C Proposed Recommendation, W3C, November 2015Google Scholar
  17. 17.
    Williams, G.: SPARQL 1.1 service description. W3C Recommendation, W3C (2013)Google Scholar
  18. 18.
    Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., Wang, S.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Julian Dolby
    • 1
    Email author
  • Achille Fokoue
    • 1
  • Mariano Rodriguez Muro
    • 1
  • Kavitha Srinivas
    • 1
  • Wen Sun
    • 2
  1. 1.IBM Thomas J. Watson Research CenterYorktownUSA
  2. 2.IBM Research ChinaBeijingChina

Personalised recommendations