Skip to main content

The RDF Pipeline Framework: Automating Distributed, Dependency-Driven Data Pipelines

  • Conference paper
Data Integration in the Life Sciences (DILS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7970))

Included in the following conference series:

Abstract

Semantic web technology is well suited for large-scale information integration problems such as those in healthcare involving multiple diverse data sources and sinks, each with its own data format, vocabulary and information requirements. The resulting data production processes often require a number of steps that must be repeated when source data changes – often wastefully if only certain portions of the data changed. This paper explains how distributed healthcare data production processes can be conveniently defined in RDF as executable dependency graphs, using the RDF Pipeline Framework. Nodes in the graph can perform arbitrary processing and are cached automatically, thus avoiding unnecessary data regeneration. The framework is loosely coupled, using native protocols for efficient node-to-node communication when possible, while falling back to RESTful HTTP when necessary. It is data and programming language agnostic, using framework-supplied wrappers to allow pipeline developers to use their favorite languages and tools for node-specific processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W3C: Resource Description Framework (RDF), http://www.w3.org/RDF/ (retrieved June 08, 2012)

  2. Anonymous, Articles on workflow (google scholar search), http://tinyurl.com/a5yf5ng (retrieved February 01, 2013)

  3. Walsh, N., Milowski, A., Thompson, H., XProc: An XML Pipeline Language, W3C Recommendation (May 11, 2010), http://www.w3.org/TR/xproc/ (retrieved February 01, 2013)

  4. Becker, C., Bizer, C., Isele, R., Matteini, A., et al: Linked Data Integration Framework (LDIF), http://www4.wiwiss.fu-berlin.de/bizer/ldif/ (retrieved June 08, 2012)

  5. Top Quadrant: Sparql Motion, http://www.topquadrant.com/products/SPARQLMotion.html (retrieved June 08, 2012)

  6. Phuoc, D.L., Morbidoni, C., Polleres, A., Samwald, M., Fuller, R., Tummarello, G.: DERI Pipes, http://pipes.deri.org/ (retrieved June 08, 2012)

  7. Fensel, D., van Harmelen, F., Witbrock, M., Carpentier, A.: LarKC: The Large Knowledge Collider, http://www.larkc.eu/ (retrieved February 01, 2013)

  8. Methedras: REST for the Rest of Us, http://developer.mindtouch.com/REST/REST_for_the_Rest_of_Us (retrieved June 08, 2012)

  9. Fielding, R.: Chapter 5: Representational State Transfer (REST). From PhD Thesis: Architectural Styles and the Design of Network-based Software Architectures, University of California, Irvine (2000), http://roy.gbiv.com/pubs/dissertation/rest_arch_style.htm (retrieved June 08, 2012)

  10. Booth, D.: rdf-pipeline, A framework for RDF data production pipelines, google code repository, http://rdfpipeline.org/ (retrieved February 01, 2013)

  11. Prud’hommeaux, E., Carothers, G. (eds.): Turtle: Terse RDF Triple Language (2011), http://www.w3.org/TR/turtle/ (retrieved June 08, 2012)

  12. TopQuadrant: TopBraid Composer, http://www.topquadrant.com/products/TB_Composer.html (retrieved June 08, 2012)

  13. Stenberg, D.: curl man page, http://curl.haxx.se/docs/manpage.html (retrieved June 08, 2012)

  14. Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition (2004), http://www.w3.org/TR/xmlschema-2/ (retrieved June 08, 2012)

  15. Gearon, P., Passant, A., Polleres, A.: SPARQL 1.1 Update (2012), http://www.w3.org/TR/sparql11-update/ (retrieved June 08, 2012)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Booth, D. (2013). The RDF Pipeline Framework: Automating Distributed, Dependency-Driven Data Pipelines. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39437-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39436-2

  • Online ISBN: 978-3-642-39437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics