Abstract
Semantic web technology is well suited for large-scale information integration problems such as those in healthcare involving multiple diverse data sources and sinks, each with its own data format, vocabulary and information requirements. The resulting data production processes often require a number of steps that must be repeated when source data changes – often wastefully if only certain portions of the data changed. This paper explains how distributed healthcare data production processes can be conveniently defined in RDF as executable dependency graphs, using the RDF Pipeline Framework. Nodes in the graph can perform arbitrary processing and are cached automatically, thus avoiding unnecessary data regeneration. The framework is loosely coupled, using native protocols for efficient node-to-node communication when possible, while falling back to RESTful HTTP when necessary. It is data and programming language agnostic, using framework-supplied wrappers to allow pipeline developers to use their favorite languages and tools for node-specific processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W3C: Resource Description Framework (RDF), http://www.w3.org/RDF/ (retrieved June 08, 2012)
Anonymous, Articles on workflow (google scholar search), http://tinyurl.com/a5yf5ng (retrieved February 01, 2013)
Walsh, N., Milowski, A., Thompson, H., XProc: An XML Pipeline Language, W3C Recommendation (May 11, 2010), http://www.w3.org/TR/xproc/ (retrieved February 01, 2013)
Becker, C., Bizer, C., Isele, R., Matteini, A., et al: Linked Data Integration Framework (LDIF), http://www4.wiwiss.fu-berlin.de/bizer/ldif/ (retrieved June 08, 2012)
Top Quadrant: Sparql Motion, http://www.topquadrant.com/products/SPARQLMotion.html (retrieved June 08, 2012)
Phuoc, D.L., Morbidoni, C., Polleres, A., Samwald, M., Fuller, R., Tummarello, G.: DERI Pipes, http://pipes.deri.org/ (retrieved June 08, 2012)
Fensel, D., van Harmelen, F., Witbrock, M., Carpentier, A.: LarKC: The Large Knowledge Collider, http://www.larkc.eu/ (retrieved February 01, 2013)
Methedras: REST for the Rest of Us, http://developer.mindtouch.com/REST/REST_for_the_Rest_of_Us (retrieved June 08, 2012)
Fielding, R.: Chapter 5: Representational State Transfer (REST). From PhD Thesis: Architectural Styles and the Design of Network-based Software Architectures, University of California, Irvine (2000), http://roy.gbiv.com/pubs/dissertation/rest_arch_style.htm (retrieved June 08, 2012)
Booth, D.: rdf-pipeline, A framework for RDF data production pipelines, google code repository, http://rdfpipeline.org/ (retrieved February 01, 2013)
Prud’hommeaux, E., Carothers, G. (eds.): Turtle: Terse RDF Triple Language (2011), http://www.w3.org/TR/turtle/ (retrieved June 08, 2012)
TopQuadrant: TopBraid Composer, http://www.topquadrant.com/products/TB_Composer.html (retrieved June 08, 2012)
Stenberg, D.: curl man page, http://curl.haxx.se/docs/manpage.html (retrieved June 08, 2012)
Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition (2004), http://www.w3.org/TR/xmlschema-2/ (retrieved June 08, 2012)
Gearon, P., Passant, A., Polleres, A.: SPARQL 1.1 Update (2012), http://www.w3.org/TR/sparql11-update/ (retrieved June 08, 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Booth, D. (2013). The RDF Pipeline Framework: Automating Distributed, Dependency-Driven Data Pipelines. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-39437-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39436-2
Online ISBN: 978-3-642-39437-9
eBook Packages: Computer ScienceComputer Science (R0)