1 Introduction

An increasing number of RDF datasets is available across all domains and, as a result, many non-programmers are expressing a need for exploring these datasets. The problem is that accessing semantic data requires proficiency in SPARQL, as well as familiarity with the specific vocabularies or ontologies employed by the dataset. Alternatives to searching directly with SPARQL are mainly visual query approaches, especially graph-based query editors, e.g. QueryVOWL [1], NITELIGHT [2]. While this type of interfaces can easily exploit the graph structure of RDF and SPARQL, mainstream users are not particularly comfortable with graph visualizations [3, 4], making this approach questionable for this user group. Moreover, many common querying tasks do not require the expressivity of full graph-based querying.

We propose PepeSearch [5], a portable form-based search interface for querying semantic RDF datasets specifically aimed at helping mainstream users in their search tasks. Forms allow the user to exploit the ontology without manipulation of graph structures. Instead, the end-user employs drop-down menus, free-text entry fields, and sliders to specify classes, properties, strings, and data value ranges of their queries. This frees the user from having to invest a significant amount of time learning technical characteristics of the dataset, e.g., what an OWL class is or what ontologies are used to describe the data.

Form-based interfaces tend to be designed for specific search tasks in a single domain. User experience and design work is therefore linked to a specific context. In contrast, PepeSearch exploits the self-describing nature of RDF and schema-level queries in SPARQL to develop a generic and portable solution that can run on any SPARQL endpoint. We allow the mainstream user to pose queries ranging from simply retrieving the members of a class, to queries joining multiple concepts and setting restrictions on datatype properties. So far, PepeSearch has been applied for use in two different contexts: government organizational data and healthcare. We will demonstrate PepeSearch at ESWC 2016: how to set up a PepeSearch instance, how to formulate queries and how to retrieve results.

2 Overview of PepeSearch

PepeSearch is an open source project under the Apache license developed at the University of OsloFootnote 1. It consists of the SPARQL analyzerFootnote 2, the PepeSearch componentFootnote 3, and a text search engine – see Fig. 1. The provided GitHub repository also includes a screencastFootnote 4 and a live demoFootnote 5.

The analyzer is employed in a bootstrapping stage to gather information about the target data set. Through a series of generic SPARQL queries, the analyzer obtains the classes employed in the dataset, their datatype properties, and the connections to other classes through an object property or through a subclass relation. The result is a data schema in the JSON format.

The obtained data schema can then be used to configure a PepeSearch instance. The query builder component is in charge of preparing a suitable view for querying the dataset. For an arbitrary RDF class, a form block is created, in which datatype properties are mapped to widget elements. In order to support multi-class queries, a collapsible form block is included for each RDF class that is connected with an object property to the selected class – see Fig. 2(a) for an example. The results viewer element is in charge of sending the query to the SPARQL endpoint and presenting the results in a tabular representation – see Fig. 2(b). Browsing is supported through the instance viewer that obtains all the data about a particular individual with links to other connected instances – see Fig. 2(c).

The text search engine is an optional component that allows dynamic term suggestions during query specification. This is employed to provide autocomplete capabilities for the text fields of a class, e.g. to suggest names such as “Martin” or “Maria” after typing “mar” in a name textbox.

Fig. 1.
figure 1

Logical architecture of PepeSearch.

3 Hands on with PepeSearch

To illustrate the operation of PepeSearch, we will employ a sample dataset containing health records of fictitious patients. Anonymized patient data has been provided by our hospital project partner in the form of tables from a widely used hospital records application. It describes health care processes, with associated diagnoses and medical personnel in various roles, supported by a body of code lists. This data is mapped into RDF according to an ontology with three main parts: (i) excerpts from the Disease OntologyFootnote 6 to cover the medical conditions that appear in the data, (ii) the Information Artifact OntologyFootnote 7 for documents, and (iii) local extensions for measurements of vital signs and for a part/whole hierarchy of health care processes. Upper classes and relations are provided by the OBO Relations OntologyFootnote 8.

Fig. 2.
figure 2

Snapshots of PepeSearch.

As an example, we show how to obtain a set of patients between 30–50 years of age that have suffered from an intestinal disease. Use of semantic technologies for cohort identification has been proposed [6], and is an important application area. We first run the SPARQL analyzer to generate the data schema out of the dataset structure with all the classes, properties and value types. PepeSearch can then be used to fulfill the aforementioned information need in this way:

  1. 1.

    PepeSearch presents a list of the top classes available in the dataset.

  2. 2.

    We select the concept “human being”.

  3. 3.

    PepeSearch presents a form block for the “human being” class and a list of collapsibles corresponding to classes directly connected to “human being” in the dataset, e.g. “diagnosis” or “health care encounter”.

  4. 4.

    We set the restrictions required for this search task: in the “human being” class we select “patient” as a more specific type; we use the age slider to set the appropriate range; and we select the “intestinal disease” after expanding the “disposition” collapsible. A snapshot of this query is shown in Fig. 2(a).

  5. 5.

    We push the “Get results” button at the top right corner of the search interface.

  6. 6.

    Behind the scenes, PepeSearch generates a SPARQL query from the form that is sent to the SPARQL endpoint.

  7. 7.

    With the response, PepeSearch prepares a tabular representation of the results (see Fig. 2(b)).

  8. 8.

    We can navigate through the results by following the links, e.g. Fig. 2(c) shows the information of one of the patients found.

4 Conclusions

PepeSearch is a portable form-based interface for searching semantic data sets devised for mainstream users. In this demonstration we will present the different components of PepeSearch. We will use the SPARQL analyzer to gather the data schema of several triple stores, and we will then use PepeSearch to formulate queries and retrieve results.