Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

RDF Serialization and Archival

  • Javier D. FernándezEmail author
  • Miguel A. Martínez-Prieto
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_286-1
  • 389 Downloads

Keywords

International Resource Identifiers (IRIs) Blank Nodes Triple Pattern Fragments Graph Name Binary Serialization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Synonyms

Definition

RDF serialization is the process of writing down RDF graphs into a machine-readable format. RDF formats mainly differ in the concrete syntax to serialize RDF statements (called “triples”) and how to group or nest a set of statements, influencing the amount of storage space and bandwidth required for preserving and exchanging such data. These differences can be rather marginal for small RDF graphs, where the selection of a particular format is mostly driven by user preferences, the set of tools managing the RDF format, and the interoperability with other applications. In contrast, choosing an adequate serialization format can affect the overall performance and present important scalability issues when managing Big Semantic Data collections.

Additional challenges arise in scenarios where triples must be annotated with information about their context, such as provenance, trust, or quality information, to name but a few. The most standard solution in RDF is to consider named graphs, i.e., different RDF graphs are managed under a single RDF dataset. Diverse RDF formats have been proposed to cover this scenario and serialize annotated statements (called “quads”), at the cost of paying additional costs to represent triples that can be repeated across graphs.

This situation is particularly challenging when different versions of an RDF graph must be preserved, given that graphs can be near-copies of others. This problem is commonly referred to as RDF archival, where specific archival policies have been proposed in recent literature.

This entry provides a historical review of RDF serialization formats to understand their evolution over the years. Basic features are covered for each format, paying special attention to its capabilities for quad serialization. XML-based and text-oriented formats are first introduced to illustrate how RDF was originally used for metadata description. Their limitations led to JSON-based syntaxes, which overcome some processing challenges, but do not scale to the high demanding needs of Big Semantic Data management. This fact motivates the proposal of binary formats, able to deal with the storing and exchanging needs of large RDF collections. Finally, RDF archival proposals are surveyed and we conclude presenting open research trends.

Overview

RDF (Resource Description Framework) (Schreiber and Raimond 2014) proposes a logical model for expressing information about physical (e.g., people, buildings, vehicles, or pictures, among others) and abstract (e.g., films, songs, cities, etc.) resources. The information is represented as ternary relations, called RDF triples (or statements), which are organized into hyperlinked clouds, referred to as RDF graphs. The resulting knowledge is typically oriented for machine consumption. In the following, the main RDF concepts are briefly introduced.
RDF Triple.

RDF statements are built on a simple 〈subject, predicate, object〉 structure called RDF triple (aka statement), which sets a particular value (the “object”) for a given feature (“predicate”) of the resource (“subject”) being described. For instance, an informal representation of a triple which sets the birth date of the singer Bruce Springsteen can be 〈Bruce Springsteen, is born on, 1949-09-23〉.

The RDF data model (Cyganiak et al. 2014) establishes some restrictions about the universe of possible values for each component of a triple. Thus, a triple (s, p, o) ∈ (I ∪ B) × I × (I ∪ B ∪ L), where I stands for IRIs, B for blank nodes, and L for literals. These mutual disjoint sets of RDF terms are described as follows:
IRIs.

International Resource Identifiers (IRIs) are used to identify resources in RDF. For instance, < http://example.org/Springsteen>  is used to name the aforementioned resource about Bruce Springsteen. Note that IRIs can also be used to identify predicate and object values of a triple; e.g., < http://example.org/p/birthDate>  is a valid IRI to identify the corresponding “birth date” property. IRIs are global identifiers, so they can be reused to provide additional information about a resource or to convey the same meaning, e.g., a birth date feature in a different context.

Blank Nodes.

RDF uses blank nodes (also called anonymous nodes) to declare the existence of a resource without using a particular IRI. Blank nodes can play both subject and object roles, while they never mean that the IRI is unknown for the corresponding resource. In turn, the scope of the blank node is limited to the RDF graph where it is used.

Literals.

Final values (numbers, names, dates, etc.) are expressed as RDF literals, always used as objects. Literals are declared by default as strings (a language tag can optionally be associated in this case), but other datatypes can be used. The value “1949-09-23” is an example of an RDF literal.

Figure 1 illustrates the above RDF concepts. It comprises three triples that describe a new resource about Bruce Springsteen, identified by the aforementioned IRI < http://example.org/Springsteen> . Two of these triples set his birth name and birth date (using IRIs < http://example.org/p/name>  and < http://example.org/p/birthDate> , respectively), i.e., ‘‘Bruce Frederick Joseph Springsteen’’ and “1949-09-23.” The third triple connects the new resource with an existing description of Bruce Springsteen in DBpedia (< http://dbpedia.org/page/Bruce_Sprinsgteen> ), a conversion of Wikipedia to RDF. Note that the predicate reuses the sameAs property, described in the OWL ontology.
Fig. 1

RDF graph which comprises three triples about Bruce Springsteen

RDF is traditionally modeled as a labeled directed graph (as seen in the previous figure) because it provides an easy-to-understand (visual) explanation of the RDF data. This fact motivates the adoption of this concept as part of the RDF model.

RDF Graph.

An RDF graph G is a set of triples declared under the same scope. Thus, a triple belongs to an RDF graph, and a graph contains a well-determined set of triples.

It is worth noting that an RDF graph is only a “mental model,” and its triples must be serialized for preservation or exchanging purposes. Each serialization format has its particular features, but all ensure RDF graphs to be effectively written down.

RDF 1.1 extends the original model to support grouping RDF graphs within a single RDF dataset, enabling triples from different contexts to be managed together.
RDF Dataset.

An RDF dataset D is a collection of RDF graphs, where one of them is considered the “default graph.” The dataset contains zero or more “named graphs.” Each named graph is a pair consisting of an IRI or a blank node (the graph name) and an RDF graph. Graph names are unique within the RDF dataset, and blank nodes can be shared between graphs.

Figure 2 shows an RDF dataset which comprises two RDF graphs (note that a prefix notation is used to compact their IRIs). The first graph (dashed) is identified by the IRI < http://example.org/graph/artists>  and includes the set of three triples showed in Fig. 1. The new named graph (solid background) declares three more triples about Bruce Springsteen, one of them shared with the original graph: 〈ex:Springsteen, owl:sameAs, db:Bruce_Springsteen〉.
Fig. 2

RDF dataset which comprises two RDF graphs

The notion of RDF dataset also applies to a logical level. However, serializing triples from a dataset brings an additional requirement, as their context must be preserved.
RDF Quad.

An RDF quad is an extended statement that includes the corresponding triple and the name of the graph that declares it (aka context). More formally, an RDF quad q is a quadruple 〈subject, predicate, object, graph〉, where graph refers to the name of a graph which exists in the dataset. Thus, a quad (s, p, o, g) ∈ (I ∪ B) × I × (U ∪ B ∪ L) × (I ∪ B).

Serialization Formats

The RDF model describes the previous concepts using an abstract syntax, but it does not restrict how they are effectively serialized. Thus, RDF data can be written down in different ways, while several serialization formats are standards and widely accepted by the Semantic Web community. These formats allow RDF graphs to be effectively serialized, but only some of them are able to cover particular RDF dataset needs.

RDF/XML (Gandon and Schreiber 2014) was released hand in hand with the initial W3C RDF Recommendation. In early dates, RDF/XML was meant to be an ideal first serialization for RDF graphs as it could leverage all XML-based solutions. However, RDF/XML overloads the representation with verbose human-focused information, which can serve the intended exchanging purposes, but only on a small scale. Nonetheless, RDF/XML includes some naive compacting features, such as the possibility to (i) implicitly create blank nodes without giving a concrete identifier, (ii) omit nodes and place values as property attributes in XML, (iii) abbreviate IRI references via base IRIs (namespaces) and relative references, and (iv) create collections to define a set of terms related to a subject.

Figure 3 shows an example of an RDF/XML serialization that encodes triples from Fig. 1. In practice, the result is an XML document, which can be parsed, processed, and queried using well-established technologies (DOM, XPath, XSLT, etc.). However, its document orientation is an important weakness to deal with large amounts of RDF triples. Besides, it does not support named graphs.
Fig. 3

Serializations of triples from Fig. 1

Trix (Carroll and Stickler 2004) proposes another XML syntax for RDF which organizes triples by graphs, allowing multiple graphs to be serialized into the same document. It is a first approximation to an RDF dataset serialization, but the resulting format shows the same drawbacks that RDF/XML.

XML-based formats have lost relevance, and their usage is limited to small RDF graph serializations (e.g., descriptive metadata about a Web page).

N3 (Notation3) (Berners-Lee and Connolly 2011) is a format designed with human readability in mind. Although it may makes sense in the first times of RDF, managing and processing Big Semantic Data are far from any human capability. However, this format breaks with the XML predominance and introduces some interesting constructors which tackle particular RDF features.

N3 proposes the use of namespaces, as in XML. It is an effective compaction mechanism which allows relative IRIs to be declared to their corresponding namespace. On the other hand, N3 also introduces constructors for triples encoding in the form of adjacency lists: predicate lists allow subjects to be written only once for all triples containing it, while object lists concatenate all object values related with a pair 〈subject, predicate〉, which is written once.

This format proposes some other constructors which goes beyond the needs of RDF serialization, making the format relatively complex for such purpose. N3 does not support quads.

N-Triples (Becket 2014) is an extremely simple line-based syntax, easy to parse and generate. In essence, the subject, predicate, and object terms are separated by a white space, and the triple is terminated with a “.” followed by a new line. IRIs are enclosed in “< ” and “> ” and literals in “ ″ ,” and blank nodes start with “_:.” Figure 3 shows an N-Triples serialization that basically lists the corresponding triples. Note that N-Triples writes down each full term as many times as it is used in a triple, resulting in a simple but extremely verbose serialization due to long-term repetitions. As a result, N-Triples files need much more space than others, which can result in scalability issues for Big Semantic Data management.

On the other hand, N-Triples can be easily extended to support quad serialization. It only needs the graph name to be appended to the triple. N-Quads (Carothers 2014) formalizes this approach, featuring the same characteristics and limitations as N-Triples.

Turtle (Beckett et al. 2014) is a widely used format that exploits the previous experience of N3 and N-Triples. On the one hand, it delimits the expressive power of N3 to only serialize valid RDF graphs. On the other hand, it addresses N-Triples drawbacks to consolidate a more practical format.

Figure 3 also shows a Turtle excerpt that illustrates some of its more relevant features. For instance, it shows the use of namespaces. Note that each one is declared by the @prefix constructor, while IRIs in the terms are rewritten in relative form to their corresponding namespaces. The figure also illustrates the predicate list encoding proposed in N3; e.g., http://example.org/Springsteen is written once, but it plays the role of a subject for three different triples. Turtle supports object lists too, and it introduces more constructors and different kinds of syntactic sugar to alleviate RDF verbosity.

Although Turtle is a popular format, it does not support quads. As in the previous case, a new format, called TriG (Bizer and Cyganiak 2014), extends Turtle to allow RDF dataset serialization. It basically encloses triples that belong to each named graph in the dataset.

JSON-LD (Sporny et al. 2014) exploits JSON features to serialize RDF. It comes with the advantage of using a well-established scheme that is easy to parse and widely accepted by Web APIs. The main focus, then, is to be easy for humans to read and write and easy for machines to parse and generate automatically. JSON-LD is designed to be usable directly as JSON, with no knowledge of RDF. Note that JSON-LD supports named graphs natively, and it is gaining increasingly attention by the community.

Key Research Findings

The above serialization formats have been successfully used for managing small- and medium-sized RDF graphs. However, the steady adoption of RDF, in particular in the context of linked data (Bizer et al. 2009), brings larger graphs including hundreds of millions and even billion triples. For instance, the latest version of DBpedia (2016–10), an RDF conversion of Wikipedia, consists of roughly 13 billion triples, and LOD Laundromat (Beek et al. 2014), a service crawling RDF datasets, reports that around 4000 datasets contain more than 1 million triples.

In addition, named graphs are increasingly incorporated to consolidate complex RDF datasets. However, formats for quads are less mature and also suffer from the lack of scalability. This problem is particularly challenging when the corresponding RDF dataset is a historical archive of a graph, containing its different states over the time.

This section delves into detail of the most innovative binary serialization formats, designed with volume issues in mind in order to solve the aforementioned scalability issues. Some of them also cover quad management, although managing context information is a challenge by itself, which is also reviewed below. Finally, RDF archival foundations are introduced, summarizing the most recent approaches.

Binary Serialization and Compression

Traditional RDF formats were not designed for a scenario of large-scale and machine-understandable Web of data. Their syntaxes have constructors which organizes RDF statements in a human-readable way that adds unnecessary overheads for storing, exchanging, and consuming RDF graphs. Although this scalability issue can be partially solved through universal compression (e.g., gzip or bzip2) over such formats, specific RDF binary serializations and compressors have been also proposed. These tailored solutions mostly focus on taking advantage of particular features of RDF data in order to reduce the verbosity and produce important space savings at large scale. In the following, the three most prominent binary serializations are briefly reviewed: HDT, RDF Binary, and RDF4J. We then list solutions focused on streaming and provide a summary of RDF compression techniques to provide a big picture of the current state of the art (the interested reader can find a chapter specifically devoted to RDF compression).

The HDT (Fernández et al. 2011, 2013) format proposes a binary syntax for RDF data focused on producing very compact serializations to speed up data exchange, but also efficient data parsing and access. HDT minimizes the repetition of terms (IRIs, blank nodes, and literals) using the so-called HDT Dictionary, which assigns a numerical ID to each different term. Then, the graph structure of the dataset is managed as a graph of (term) IDs, in the HDT Triples component. Both dictionary and triples components are then compacted (e.g., looking for common string prefixes in the terms) and partially indexed. HDT is one of the most widespread RDF binary formats, mainly due to the HDT adoption as a compact data store for LOD Laundromat (Beek et al. 2014), and the data back end of lightweight APIs such as Triple Pattern Fragments (Verborgh et al. 2016).

HDT traditionally focuses on representing single RDF graphs. A recent approach, named HDTQ (Fernández et al. 2018), extends HDT to represent named graphs, keeping compact and retrieval features.

The RDF binary format (RDF Binary 2017) is an alternative solution proposed by the well-known Jena semantic framework. It consists of very simple mappings to encode triples in Apache Thrift (Apache Thrift 2017), which provides a scalable cross-language platform. In this case, rather than compactness, RDF binary mostly focuses on avoiding to parse the textual RDF triples; hence, the overall processing is sped up. RDF binary supports both RDF graphs and RDF datasets (named graphs) encoded as a stream of quads.

The RDF4j (RDF4j 2017) binary format is proposed and used within the Eclipse RDF4J framework. The RDF4j format partially combines both previous strategies. On the one hand, it mostly tackles parsing and processing efficiency, providing a concrete syntax to delimit the extent of each term and triple. On the other hand, it allows for an in-line declaration of a dictionary, where a term is mapped to an ID which can be referred in another triple. Nonetheless, terms are not compressed themselves (e.g., using prefixes such as in HDT); hence, only partial compression is achieved.

Compression is another way of serializing RDF. As explained, combining universal compression and any serialization format is a common practice, but different compressors have been designed from the scratch to deal with particular RDF requirements. RDF compressors can be classified into physical and logical compressors. Physical compressors (Fernández et al. 2013; Swacha and Grabowski 2015; Álvarez-García et al. 2014; Brisaboa et al. 2015) exploit symbolic/syntactic redundancy, removing term repetitions and compacting repetitive subgraph structures underlying to the dataset. In contrast, logical compressors (Iannone et al. 2005; Meier 2008; Joshi et al. 2013; Venkataraman and Sreenivasa Kumar 2015) focus on semantic-based redundancy, avoiding to represent triples that can be inferred from others in the RDF graph.

In addition, diverse binary formats and compressors have been proposed for RDF streams, i.e., a continuous flow of RDF data. In this case, the challenge consists of exploiting the trade-offs between the space savings achieved by the format and the latency introduced in the creation and parsing processes. Streaming HDT (Hasemann et al. 2012) adapts HDT to simplify the process by restricting the carried metadata and the maximum length of the dictionary; hence, shorter IDs are used. RDSZ (Fernández et al. 2014b) uses differential encoding to compact the similarities between consecutive triples in the stream. ERI (Fernández et al. 2014a) is an RDF stream compressor that adapts the W3C Efficient XML Interchange (EXI) format (Schneider et al. 2014) for RDF data. Note that EXI encoding can also be directly applied over an RDF/XML or JSON serialization. PatBin (Lhez et al. 2017) and FSSD (Karim et al. 2017) perform dictionary-based compression together with pattern-based encoding.

Context Information

As stated, graph names are increasingly used to capture additional information such as trust, provenance, temporal information and other annotations (Carroll et al. 2005; Zimmermann et al. 2012). Although there exist standard RDF syntaxes (such as N-Quads, Trig or JSON-LD) that represent RDF named graphs, serializing annotated RDF data (quads) efficiently remains an open challenge.

In spite of general approaches, such as AnQL (Zimmermann et al. 2012), most solutions focus on managing provenance information, as this is at the core of the linked data distributed philosophy (Bizer et al. 2009). Besides the aforementioned named graphs and the standard RDF reification (Schreiber and Raimond 2014), i.e., using the RDF vocabulary (rdf:Statement, rdf:subject, rdf:predicate, and rdf:object) to refer to statements, the main proposals are singleton properties (Nguyen et al. 2014) and N-ary relations (Noy et al. 2006). The former introduces unique predicates that are then annotated with the metadata of the triple it belongs to. The latest, used in Wikidata, represents a relation between a subject and object with a new resource, which is then connected to the subject, on the one hand, and predicate and object, on the other. Further information can be attached to the new resource in order to annotate the statement.

In addition, two recent solutions have been proposed. (Hartig 2017) extends RDF with a notion of embedded triples (encoded between ‘≪’ and ‘≫’), which can be directly used as subject or object of other triples. NdFluents (Giménez-García et al. 2017) creates unique versions of the subject and the object for each annotated triple, which are then linked to a context resource and to the original subject and object resources.

RDF Archival

RDF archival is a particular instance of the problem of managing context information. In this case, the context is set by the moment when a new version of an RDF graph is released. In general, RDF data are not static but evolve naturally, without centralized monitoring nor further advise, following the scale-free nature of the Web. Thus, RDF archiving emerges as a novel challenge aimed at assuring quality and traceability of RDF data over time.

On a high level, the World Wide Web Consortium (W3C) provides basic guidelines on how to perform data versioning on datasets published in the Web (Lóscio et al. 2017). The set of recommendations includes (i) providing a version indicator (e.g., via owl:versionInfo); (ii) serving different versions via the Memento framework (de Sompel et al. 2010), which can provide access to prior states of RDF resources using datetime negotiation in HTTP; and (iii) providing the changes made in each version. Nonetheless, these recommendations are generic and do not restrict how RDF data versions are stored or queried across time. Initial works on RDF archiving policies and systems are starting to address these issues, proposing different solutions to efficiently archive and query different versions of RDF data.

Main efforts on RDF archiving fall in one of the following four storage strategies: independent copies (IC) and change-based (CB) and timestamp-based (TB) and hybrid-based (HB) approaches.

Independent copies (IC) (Klein et al. 2002; Noy and Musen 2004) is the most naive approach where each version (aka snapshot) is managed as a different, complete graph. On the one hand, IC faces scalability problems as static information is duplicated across the versions. In addition, some operations such as knowing the difference between versions require non-negligible processing efforts. On the other hand, version materialization (retrieve certain version) is as efficient as querying a single snapshot.

Change-based approach (CB) (Volkel et al. 2005; Dong-Hyuk et al. 2012; Zeginis et al. 2011) partially addresses the space issues of IC by storing the differences (deltas) between versions. In contrast, CB requires additional computational costs for retrieving a particular version given that deltas need to be propagated.

Timestamp-based approach (TB) (Cerdeira-Pena et al. 2016; Gutierrez et al. 2007; Zimmermann et al. 2012) annotates each triple with its temporal validity, i.e., the version. Compression techniques can be used to minimize the space overheads, e.g., using self-indexes, such as in v-RDFCSA (Cerdeira-Pena et al. 2016), or delta compression in B+Trees (Zaniolo 2016).

Hybrid-based approaches (HB) (Stefanidis et al. 2014; Neumann and Weikum 2010; Zaniolo 2016) combine previous policies to inspect other space/performance trade-offs. In particular, the hybrid IC/CB approach (Dong-Hyuk et al. 2012; Meinhardt et al. 2015; Stefanidis et al. 2014) follows a CB solution where full version materialization is additionally provided in some intermediate steps; hence, delta propagation is mitigated. In contrast, other practical approaches (Graube et al. 2014; Neumann and Weikum 2010; Vander Sander et al. 2013; Zaniolo 2016) follow a TB/CB approach in which triples can be time-annotated only when they are added or deleted. Although this reduces the space needs (as it manages less annotations), version materialization requires to rebuild the delta similarly to CB.

Future Directions for Research

As a result of standardization efforts by the Semantic Web community, there are many diverse standard “plain” RDF serializations available. Despite potential future trends that may result in adaptations for RDF (such as JSON-LD, adapted from JSON), most research efforts focus on efficient representation of annotated triples, in particular to model provenance information (Giménez-García et al. 2017; Hartig 2017).

RDF binary formats and compression have also emerged as active research and development fields over the past years. The main reason is that (i) current plain RDF formats are dominated by a human-centric view and suffer from scalability problems at large scale and (ii) general compressed solutions still miss some types of redundancy underlying to RDF data. In this regard, there is still room for hybrid compressors leveraging syntactic and semantic redundancies. Then, RDF self-indexing (i.e., compressed and indexed RDF data) is still a main direction for research, in particular in the unexplored field of RDF streaming.

Finally, the community is just starting to face serious scalability issues for RDF archival. In the absence of a scalable archival approach at Web scale, RDF data change and vanish without further notice nor trace of previous versions. Future directions in this regard include further research on scalable archival methods (potentially distributed) as well as efficient mechanisms to resolve structured cross-time queries.

Cross-References

References

  1. Álvarez-García S, Brisaboa N, Fernández JD, Martínez-Prieto MA, Navarro G (2014) Compressed vertical partitioning for efficient RDF management. Knowl Inf Syst 44(2):439–474CrossRefGoogle Scholar
  2. Apache Thrift (2017) Apache thrift. https://thrift.apache.org/
  3. Becket D (2014) RDF 1.1 N-Triples: a line-based syntax for an RDF graph. W3C recommendation. https://www.w3.org/TR/n-triples/
  4. Beckett D, Berners-Lee T, Prud’hommeaux E, Carothers G (2014) RDF 1.1 turtle: terse RDF triple language. W3C recommendation. https://www.w3.org/TR/turtle/
  5. Beek W, Rietveld L, Bazoobandi HR, Wielemaker J, Schlobach S (2014) LOD laundromat: a uniform way of publishing other people’s dirty data. In: 13th international semantic web conference (ISWC), pp 213–228Google Scholar
  6. Berners-Lee B, Connolly D (2011) Notation3 (N3): a readable RDF syntax. W3C team submission. https://www.w3.org/TeamSubmission/n3/
  7. Bizer C, Cyganiak R (2014) RDF 1.1 TriG: RDF dataset language. W3C recommendation. https://www.w3.org/TR/trig/
  8. Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Semant Web Inf Syst 5(3):1–22Google Scholar
  9. Brisaboa N, Cerdeira-Pena A, Farińa, Navarro G (2015) A compact RDF store using suffix arrays. In: 22nd international symposium on string processing and information retrieval (SPIRE), pp 103–115CrossRefGoogle Scholar
  10. Carothers G (2014) RDF 1.1 N-Quads: A Line-based syntax for an RDF dataset. W3C recommendation. https://www.w3.org/TR/n-quads/
  11. Carroll J, Stickler P (2004) TriX : RDF triples in XML. Technical report, Digital Media Systems Laboratory, HP Laboratories BristolGoogle Scholar
  12. Carroll JJ, Bizer C, Hayes P, Stickler P (2005) Named graphs, provenance and trust. In: Proceedings of the 14th international conference on World Wide Web. ACM, pp 613–622Google Scholar
  13. Cerdeira-Pena A, Farina A, Fernández JD, Martınez-Prieto MA (2016) Self-indexing RDF archives. In: Proceeding of DCCGoogle Scholar
  14. Cyganiak R, Wood D, Lanthaler M (2014) RDF 1.1 concepts and abstract syntax. W3C recommendation. http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/
  15. de Sompel HV, Sanderson R, Nelson ML, Balakireva L, Shankar H, Ainsworth S (2010) An HTTP-based versioning mechanism for linked data. In: Proceeding of LDOWGoogle Scholar
  16. Dong-Hyuk I, Sang-Won L, Hyoung-Joo K (2012) A version management framework for RDF triple stores. Int J Softw Eng Know 22(1):85–106Google Scholar
  17. Fernández JD, Martínez-Prieto MA, Gutiérrez C, Polleres A (2011) Binary RDF representation for publication and exchange (HDT). W3C member submission. http://www.w3.org/Submission/HDT/
  18. Fernández JD, Martínez-Prieto MA, Gutiérrez C, Polleres A, Arias M (2013) Binary RDF representation for publication and exchange. J Web Semant 19:22–41Google Scholar
  19. Fernández JD, Llaves A, Corcho O (2014a) Efficient RDF interchange (ERI) format for RDF data streams. In: 13th international semantic web conference (ISWC), pp 244–259Google Scholar
  20. Fernández N, Arias J, Sánchez L, Fuentes-Lorenzo D, Corcho Ó (2014b) RDSZ: an approach for lossless RDF stream compression. In: 11th European conference on the semantic web (ESWC), pp 52–67Google Scholar
  21. Fernández JD, Martínez-Prieto MA, Polleres A, Reindorf J (2018) HDTQ: managing RDF datasets in compressed space. In: European semantic web conferenceGoogle Scholar
  22. Gandon F, Schreiber G (2014) RDF 1.1 XML syntax. W3C recommendation. https://www.w3.org/TR/rdf-syntax-grammar/
  23. Giménez-García JM, Zimmermann A, Maret P (2017) Ndfluents: an ontology for annotated statements with inference preservation. In: European semantic web conference. Springer, pp 638–654CrossRefGoogle Scholar
  24. Graube M, Hensel S, Urbas L (2014) R43ples: revisions for triples. In: Proceeding of LDQ, vol CEUR-WS 1215, paper 3Google Scholar
  25. Gutierrez C, Hurtado C, Vaisman A (2007) Introducing time into RDF. IEEE Trans Knowl Data Eng 19(2):207–218CrossRefGoogle Scholar
  26. Hartig O (2017) Foundations of RDF* and SPARQL* – an alternative approach to statement-level metadata in RDF. In: Proceeding of AMWGoogle Scholar
  27. Hasemann H, Kroller A, Pagel M (2012) RDF provisioning for the internet of things. In: 3rd international conference on the internet of things (IOT), pp 143–150Google Scholar
  28. Iannone L, Palmisano I, Redavid D (2005) Optimizing RDF storage removing redundancies: an algorithm. In: 18th international conference on industrial and engineering applications of artificial intelligence and expert systems (IEA/AIE), pp 732–742CrossRefGoogle Scholar
  29. Joshi A, Hitzler P, Dong G (2013) Logical linked data compression. In: 10th extended semantic Web conference (ESWC), pp 170–184CrossRefGoogle Scholar
  30. Karim F, Vidal ME, Auer S (2017) Efficient processing of semantically represented sensor data. In: 13th international conference on Web information systems and technologies (WEBIST), pp 252–259Google Scholar
  31. Klein M, Fensel D, Kiryakov A, Ognyanov D (2002) Ontology versioning and change detection on the Web. In: Proceeding of EKAW, pp 197–212CrossRefGoogle Scholar
  32. Lhez J, Ren X, Belabbess B, Curé O (2017) A compressed, inference-enabled encoding scheme for RDF stream processing. In: 14th European conference on the semantic Web (ESWC), pp 79–93CrossRefGoogle Scholar
  33. Lóscio BF, Burle C, Calegari N (2017) Data on the web best practices. W3C recommendation 31 Jan 2017Google Scholar
  34. Meier M (2008) Towards rule-based minimization of RDF graphs under constraints. In: 2nd international conference on web reasoning and rule systems (RR), pp 89–103CrossRefGoogle Scholar
  35. Meinhardt P, Knuth M, Sack H (2015) Tailr: a platform for preserving history on the web of data. In: Proceeding of SEMANTiCS. ACM, pp 57–64Google Scholar
  36. Neumann T, Weikum G (2010) x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc VLDB Endow 3(1–2):256–263CrossRefGoogle Scholar
  37. Nguyen V, Bodenreider O, Sheth A (2014) Don’t like RDF reification? Making statements about statements using singleton property. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 759–770Google Scholar
  38. Noy NF, Musen MA (2004) Ontology versioning in an ontology management framework. IEEE Intell Syst 19(4):6–13.  https://doi.org/10.1109/MIS.2004.33CrossRefGoogle Scholar
  39. Noy N, Rector A, Hayes P, Welty C (2006) Defining n-ary relations on the semantic web. W3C working group note 12(4)Google Scholar
  40. RDF Binary (2017) RDF binary using apache thrift. https://jena.apache.org/documentation/io/rdf-binary.html
  41. RDF4j (2017) Rdf4j binary RDF format. http://docs.rdf4j.org/rdf4j-binary/
  42. Schneider J, Kamiya T, Peintner D, Kyusakov R (2014) Efficient XML interchange (EXI) Format 1.0. W3C recommendationGoogle Scholar
  43. Schreiber G, Raimond Y (2014) RDF 1.1 primer. W3C working group note. https://www.w3.org/TR/rdf11-primer/
  44. Sporny M, Longley D, Kellogg G, Lanthaler M, Lindström N (2014) JSON-LD 1.0: a JSON-based serialization for linked data. W3C recommendation. https://www.w3.org/TR/json-ld/
  45. Stefanidis K, Chrysakis I, Flouris G (2014) On designing archiving policies for evolving RDF datasets on the Web. In: Proceeding of ER, pp 43–56Google Scholar
  46. Swacha J, Grabowski S (2015) OFR: an efficient representation of RDF datasets. In: 4th symposium on languages, applications and technologies (SLATE), pp 224–235Google Scholar
  47. Vander Sander M, Colpaert P, Verborgh R, Coppens S, Mannens E, Van de Walle R (2013) R&Wbase: git for triples. In: Proceeding of LDOWGoogle Scholar
  48. Venkataraman G, Sreenivasa Kumar P (2015) Horn-rule based compression technique for RDF data. In: 30th annual ACM symposium on applied computing (SAC), pp 396–401Google Scholar
  49. Verborgh R, Vander Sande M, Hartig O, Van Herwegen J, De Vocht L, De Meester B, Haesendonck G, Colpaert P (2016) Triple pattern fragments: a low-cost knowledge graph interface for the Web. J Web Semant 37–38: 184–206CrossRefGoogle Scholar
  50. Volkel M, Winkler W, Sure Y, Kruk S, Synak M (2005) Semversion: a versioning system for RDF and ontologies. In: Proceeding of ESWCGoogle Scholar
  51. Zaniolo SGJGC (2016) RDF-TX: a fast, user-friendly system for querying the history of RDF knowledge bases. In: Proceeding of EDBTGoogle Scholar
  52. Zeginis D, Tzitzikas Y, Christophides V (2011) On computing deltas of RDF/S knowledge bases. ACM Trans Web (TWEB) 5(3):14Google Scholar
  53. Zimmermann A, Lopes N, Polleres A, Straccia U (2012) A general framework for representing, reasoning and querying with annotated semantic Web data. JWS 12:72–95CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Javier D. Fernández
    • 1
    Email author
  • Miguel A. Martínez-Prieto
    • 2
  1. 1.Complexity Science Hub ViennaVienna University of Economics and BusinessViennaAustria
  2. 2.Department of Computer ScienceUniversidad de ValladolidValladolidSpain

Section editors and affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Olaf Hartig
    • 2
  1. 1.eXascale InfolabUniversity of FribourgFribourgSwitzerland
  2. 2.Linköping UniversityLinköpingSweden