The Journal of Supercomputing

, Volume 71, Issue 7, pp 2694–2719 | Cite as

Storage schema and ontology-independent SPARQL to HiveQL translation

  • Naila Karim
  • Khalid Latif
  • Zahid Anwar
  • Sharifullah Khan
  • Amir Hayat
Article

Abstract

Growing size of Semantic Web data demands scalable semantic stores. Hadoop-based distributed and parallel processing frameworks such as HBase and Hive are becoming increasingly popular for storing and retrieving voluminous data. Hive, more specifically, supports complex analytical processing but the query interface does not support data exploration using SPARQL, a standard query language for Semantic Web. We propose a semantic preserving SPARQL to HiveQL translation scheme that provides a querying interface for Hive in an attempt to realize a scalable semantic web triplestore. Major contributions of our research include: semantic preserving SPARQL to HiveQL query translation algorithm and storage schema-independent querying mechanism that accommodates different storage schemes without impacting translation time. The results demonstrate efficient working of proposed translation algorithm and that it supports different types of SPARQL queries.

Keywords

SPARQL Hadoop HiveQL Query translation 

References

  1. 1.
    Aasman J (2006) Allegro graph: Rdf triple database. Technical report, Franz Inc., Berkeley. http://www.franz.com/agraph/allegrograph (Last visited May 2014)
  2. 2.
    Abadi D, Marcus A, Madden S, Hollenbach K (2007) Using the barton libraries dataset as an rdf benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT, Cambridge. http://dspace.mit.edu/bitstream/handle/1721.1/37816/MIT-CSAIL-TR-2007-036
  3. 3.
    Abadi D, Marcus A, Madden S, Hollenbach K (2009) Sw-store: a vertically partitioned dbms for semantic web data management. VLDB J 18(2):385–406CrossRefGoogle Scholar
  4. 4.
    Bajda-Pawlikowski K (2008) Querying rdf data stored in dbms: Sparql to sql conversion. Technical Report TR-1409, Yale Computer Science Department, USAGoogle Scholar
  5. 5.
    Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43CrossRefGoogle Scholar
  6. 6.
    Bizer C, Heath T, Berners-Lee T (2009a) Linked data-the story so far. Int J Semant Web Inf Syst (IJSWIS) 5:1–22Google Scholar
  7. 7.
    Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009b) DBpedia—a crystallization point for the web of data. Web Semant: Sci Serv Agents World Wide Web 7(3):154–165. doi: 10.1016/j.websem.2009.07.002
  8. 8.
    Broekstra J, Kampman A, Van Harmelen F (2002) Sesame: a generic architecture for storing and querying rdf and rdf schema. The Semantic Web, ISWC 2002:54–68Google Scholar
  9. 9.
    Butt AS (2011) Evaluation of rdf storage systems. Master’s thesis, School of Electrical Engineering and Computer Science, National University of Sciences and Technology, IslamabadGoogle Scholar
  10. 10.
    Chebotko A, Lu S, Jamil H, Fotouhi F (2006) Semantics preserving sparql-to-sql query translation for optional graph patterns. Technical Report TR-DB-052006-CLJFGoogle Scholar
  11. 11.
    Chebotko A, Lu S, Fotouhi F (2009) Semantics preserving sparql-to-sql translation. Data Knowl Eng 68(10):973–1000CrossRefGoogle Scholar
  12. 12.
    Chong E, Das S, Eadon G, Srinivasan J (2005) An efficient sql-based rdf querying scheme. In: Proceedings of the 31st international conference on very large data bases. VLDB Endowment, pp 1216–1227Google Scholar
  13. 13.
    Cyganiak R (2005) A relational algebra for sparql. Technical Report HPL-2005-170, Digital Media Systems Laboratory, HP Labs BristolGoogle Scholar
  14. 14.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  15. 15.
    Dyer C, Cordova A, Mont A, Lin J (2008) Fast, easy, and cheap: construction of statistical machine translation models with mapreduce. In: Proceedings of the third workshop on statistical machine translation. Association for Computational Linguistics, Baltimore, pp 199–207Google Scholar
  16. 16.
    Elliott B, Cheng E, Thomas-Ogbuji C, Ozsoyoglu Z (2009) A complete translation from sparql into efficient sql. In: Proceedings of the 2009 international database engineering & applications symposium. ACM, New York, pp 31–42Google Scholar
  17. 17.
    Erling O (2001) Implementing a sparql compliant rdf triple store using a sql-ordbms. Technical Report, OpenLink Software VirtuosoGoogle Scholar
  18. 18.
    Farhan Husain M, Doshi P, Khan L, Thuraisingham B (2009) Storage and retrieval of large rdf graph using hadoop and mapreduce. In: Cloud computing, pp 680–686Google Scholar
  19. 19.
    Foundation AS (2014) Apache hbase reference guide. http://hbase.apache.org/book/performance.html (Last visited: May 2014)
  20. 20.
    Franke C, Morin S, Chebotko A, Abraham J, Brazier P (2011) Distributed semantic web data management in hbase and mysql cluster. In: 2011 IEEE international conference on cloud Computing (CLOUD). IEEE, New York, pp 105–112Google Scholar
  21. 21.
    Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of map-reduce: the pig experience. Proc VLDB Endow 2(2):1414–1425CrossRefGoogle Scholar
  22. 22.
    George L (2011) HBase: the definitive guide. O’Reilly Media, SebastopolGoogle Scholar
  23. 23.
    Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant: Sci Serv Agents World Wide Web 3(2):158–182CrossRefGoogle Scholar
  24. 24.
    Harris S, Gibbins N (2003) 3store: Efficient bulk rdf storage. In: Volz R, Decker S, Cruz I (eds) Proceedings of the 1st international workshop on practical and scalable semantic systems (PSSS), CEUR-WS, vol 89Google Scholar
  25. 25.
    Harris S, Shadbolt N (2005) Sparql query processing with conventional relational database systems. In: Web information systems engineering-WISE 2005 workshops. Springer, Berlin, pp 235–244Google Scholar
  26. 26.
    Husain M, Khan L, Kantarcioglu M, Thuraisingham B (2010) Data intensive query processing for large rdf graphs using cloud computing tools. In: 2010 IEEE 3rd international conference on cloud computing (CLOUD). IEEE, New York, pp 1–10Google Scholar
  27. 27.
    Jia Y, Shao Z (2009) Hive performance benchmarks. Feature Report HIVE-396, Apache Software Foundation. https://issues.apache.org/jira/browse/HIVE-396
  28. 28.
    Kumar V, Andrade H, Gedik B, Wu K (2010) Deduce: at the intersection of mapreduce and stream processing. In: Proceedings of the 13th international conference on extending database technology. ACM, New York, pp 657–662Google Scholar
  29. 29.
    Lee R, Luo T, Huai Y, Wang F, He Y, Zhang X (2011) Ysmart: Yet another sql-to-mapreduce translator. In: 2011 31st international conference on distributed computing systems (ICDCS). IEEE, New York, pp 25–36Google Scholar
  30. 30.
    Lin J, Dyer C (2010) Data-intensive text processing with mapreduce. Synth Lect Hum Lang Technol 3(1):1–177CrossRefGoogle Scholar
  31. 31.
    Lu J, Cao F, Ma L, Yu Y, Pan Y (2008) An effective sparql support over relational databases. In: Semantic web, ontologies and databases, pp 57–76Google Scholar
  32. 32.
    Lv L, Jiang H, Ju L (2010) Research and implementation of the sparql-to-sql query translation based on restrict rdf view. In: 2010 international conference on web information systems and mining (WISM), vol 1. IEEE, New York, pp 309–313Google Scholar
  33. 33.
    Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, Indianapolis, pp 135–146. doi: 10.1145/1807167.1807184
  34. 34.
    Manola F, Miller E, McBride B (2004) Rdf primer. W3c recommendation. In: World wide web consortium. http://www.w3.org/TR/rdf-primer/
  35. 35.
    Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York, pp 1099–1110Google Scholar
  36. 36.
    Prud-Hommeaux E, Seaborne A (2008) Sparql query language for rdf. W3c recommendation. In: World wide web consortiumGoogle Scholar
  37. 37.
    Sakr S, Al-Naymat G (2010) Relational processing of rdf queries: a survey. SIGMOD Rec 38(4):23–28CrossRefGoogle Scholar
  38. 38.
    Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10. doi: 10.1109/MSST.2010.5496972
  39. 39.
    Son J, Jeong D, Baik D (2008) Practical approach: independently using sparql-to-sql translation algorithms on storage. In: Fourth international conference on networked computing and advanced information management, 2008 (NCM’08), vol 2. IEEE, New York, pp 598–603Google Scholar
  40. 40.
    Son J, Kim J, Baik D (2011) Performance evaluation of storage-independent model for sparql-to-sql translation algorithms. In: 2011 4th IFIP international conference on new technologies, mobility and security (NTMS). IEEE, New York, pp 1–4Google Scholar
  41. 41.
    Sridhar R, Ravindra P, Anyanwu K (2009) Rapid: enabling scalable ad-hoc analytics on the semantic web. Semant Web-ISWC 2009:715–730Google Scholar
  42. 42.
    Sun J, Jin Q (2010) Scalable rdf store based on hbase and mapreduce. In: 2010 3rd international conference on advanced computer theory and engineering (ICACTE), vol 1. IEEE, New York, pp V1-633-V1-636. doi: 10.1109/ICACTE.2010.5578937
  43. 43.
    Theoharis Y, Christophides V, Karvounarakis G (2005) Benchmarking database representations of rdf/s stores. Semant Web, ISWC 2005:685–701Google Scholar
  44. 44.
    Thusoo A, Sarma J, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010a) Hive—a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE), pp 996–1005. doi: 10.1109/ICDE.2010.5447738
  45. 45.
    Thusoo A, Sarma J, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010b) Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, New York, pp 996–1005Google Scholar
  46. 46.
    Weiss C, Karras P, Bernstein A (2008) Hexastore: sextuple indexing for semantic web data management. Proc VLDB Endow 1(1):1008–1019CrossRefGoogle Scholar
  47. 47.
    Wilkinson K, Sayers C, Kuno H, Reynolds D (2003) Efficient rdf storage and retrieval in jena2. Proc SWDB 3:131–150Google Scholar
  48. 48.
    Zhang C, De Sterck H, Aboulnaga A, Djambazian H, Sladek R (2010) Case study of scientific data processing on a cloud using hadoop. In: High performance computing systems and applications. Springer, Berlin, pp 400–415Google Scholar
  49. 49.
    Zhou C, Zheng Y (2011) Query rewriting from sparql to sql for relational database integration. IEIT J Adapt Dyn Comput 1(1):1–8CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Naila Karim
    • 1
  • Khalid Latif
    • 1
  • Zahid Anwar
    • 1
  • Sharifullah Khan
    • 1
  • Amir Hayat
    • 2
  1. 1.School of Electrical Engineering and Computer ScienceNational University of Sciences and TechnologyIslamabadPakistan
  2. 2.COMSATS Institute of Information TechnologyIslamabadPakistan

Personalised recommendations