Can RDB2RDF Tools Feasibily Expose Large Science Archives for Data Integration?

  • Alasdair J. G. Gray
  • Norman Gray
  • Iadh Ounis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5554)

Abstract

Many science archive centres publish very large volumes of image, simulation, and experiment data. In order to integrate and analyse the available data, scientists need to be able to (i) identify and locate all the data relevant to their work; (ii) understand the multiple heterogeneous data models in which the data is published; and (iii) interpret and process the data they retrieve. rdf has been shown to be a generally successful framework within which to perform such data integration work. It can be equally successful in the context of scientific data, if it is demonstrably practical to expose that data as rdf.

In this paper we investigate the capabilities of rdf to enable the integration of scientific data sources. Specifically, we discuss the suitability of sparql for expressing scientific queries, and the performance of several triple stores and rdbrdf tools for executing queries over a moderately sized sample of a large astronomical data set. We found that more research and improvements are required into sparql and rdbrdf tools to efficiently expose existing science archives for data integration.

References

  1. 1.
    Adelman-McCarthy, J.K., et al.: The fourth data release of the sloan digital sky survey. ApJSup 162(1), 38–48 (2006)CrossRefGoogle Scholar
  2. 2.
    Bizer, C., Cyganiak, R.: D2RQ – lessons learned. In: W3C Workshop on RDF Access to Relational Databases, Cambridge, MA, USA (October 2007)Google Scholar
  3. 3.
    Bizer, C., Schultz, A.: Berlin SPARQL benchmark. Technical report, Free University of Berlin (September 17, 2008)Google Scholar
  4. 4.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Erling, O., Mikhailov, I.: RDF support in the Virtuoso DBMS. In: CSSW 2007, Leipzig, Germany, pp. 59–68 (September 2007)Google Scholar
  6. 6.
    Euzenat, J., Shvaiko, P.: Ontology matching, 1st edn. Springer, Heidelberg (2007)MATHGoogle Scholar
  7. 7.
    Galanis, L., Wang, Y., Jeffery, S.R., Dewitt, D.J.: Locating data sources in large distributed systems. In: VLDB 2003, pp. 874–885, Berlin, Germany (September 2003)Google Scholar
  8. 8.
    Gray, J., Szalay, A.S., Thakar, A., et al.: Data mining the SDSS skyserver database. The Computing Research Repository (CoRR) (February 2002)Google Scholar
  9. 9.
    Halevy, A., Rajaraman, A., Ordille, J.: Data integration: The teenage years. In: VLDB 32, Seoul, Korea, pp. 9–16 (September 2006)Google Scholar
  10. 10.
    Hambly, N., Read, M., Mann, B., et al.: The SuperCOSMOS science archive. In: ADASS XIII, San Francisco,CA, USA, pp. 137–140 (2003)Google Scholar
  11. 11.
    Hammer, J., Stonebraker, M., Topsakal, O.: THALIA: Test harness for the assessment of legacy information integration approaches. In: ICDE 2005, Tokyo, Japan, pp. 485–486 (April 2005)Google Scholar
  12. 12.
    Louys, M., Richards, A., Bonnarel, F., et al.: Data model for astronomical dataset characterisation. Recommendation, IVOA (November 8, 2007)Google Scholar
  13. 13.
    Malhotra, A.: Progress report from the RDB2RDF XG. In: Poster and Demo Session, ISWC 2008, Karlsruhe, Germany (October 2008)Google Scholar
  14. 14.
    McBride, B.: Jena: A semantic web toolkit. IEEE Internet Computing 6(6), 55–59 (2002)CrossRefGoogle Scholar
  15. 15.
    Plante, R., Benson, K., Graham, M., et al.: VOResource: An XML encoding schema for resource metadata. Recommendation, IVOA (February 22, 2008)Google Scholar
  16. 16.
    Pöss, M., Floyd, C.: New TPC benchmarks for decision support and web commerce. SIGMOD Record 29(4), 64–71 (2000)CrossRefGoogle Scholar
  17. 17.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL performance benchmark. Technical report, Albert-Ludwigs-Universität Freiburg (April 28, 2008)Google Scholar
  18. 18.
    Seaborne, A., Steer, D., Williams, S.: SQL-RDF. In: W3C Workshop on RDF Access to Relational Databases, Cambridge, MA, USA (October 2007)Google Scholar
  19. 19.
    Svihla, M., Jelinek, I.: Benchmarking RDF production tools. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 700–709. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alasdair J. G. Gray
    • 1
  • Norman Gray
    • 2
    • 3
  • Iadh Ounis
    • 1
  1. 1.Computing ScienceUniversity of GlasgowGlasgowUK
  2. 2.Physics and AstronomyUniversity of LeicesterLeicesterUK
  3. 3.Physics and AstronomyUniversity of GlasgowGlasgowUK

Personalised recommendations