Keywords

1 Introduction

Open dataFootnote 1 are data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike. A huge volume of information is made available through data portalsFootnote 2 that supports citizen to access data. In its original form, open data are not known in advance and varied in format and content. Moreover, different datasets have different provenances and use different modeling approaches which make it even harder for consumers to consume the content of data catalogue.

The Semantic web community is facing similar challenge in linked data consumption [35]. Linked open data has already made a big step by adding both diversity and machine-readable semantics of data on the Web. However, the scale of the Web provides unlimited amounts of cross-domain data whose contexts impose various perspectives and interpretations from a human side. This requires a general method to handle open data, which is domain-independent and user-centered at the same time. Research done in [41] shows that people want more interactivity than simply downloading and manipulating files. The major research challenge currently faced by the linked open data community is to figure out how to present the structured data in an intuitive and generic way for common users [8].

Various visualization applications are being developed in Open Linked Data community. Most of these applications are domain-specific and serve specific purposes [11]. They work with specific scenarios and require fixed, predefined data. Moreover, these applications use different visualization and navigation techniques. They usually require certain level of technical knowledge and thus add an extra effort for the common users [12, 21, 37].

We propose a novel approach to align open data with user experience. In this approach people intuitively perceive things as entities and categorize them according to their similarities and differences. We describe open data as a network of interrelated entities. On top of this entity-centric description of open data, we build visualization layer.

An entity represents any unique object that can be described by a set of attributes and relations to other objects. Each entity has a reference class that actually determines its type. Entity type is defined in terms of attributes (such as name, age), relations (such as located-in, friend-of), services (such as computeAge or computeInverseRelation) and categories of Meta attributes (such as timespan, validity and provenance). Entity types are organized into hierarchy of domains, with an ordering on attributes, relations and services. There are relatively few commonsense entity types (such as person, event, location) and many domain dependent entity types.

The paper is organized as follows: Section 2 gives a brief overview of visualization tools found in Open Data and Linked Open Data communities. Then it describes UX in general. In Sect. 3, we show a motivating problem. Section 4 gives a solution methodology and proof-of-concept prototype. Section 5  describes the user study we have conducted on the prototype. Section 6 concludes the paper.

2 State of the Art

Open Data are being used for different purposes and in different domains, including business, military intelligence, research and innovation, tourism and others. Since our focus is in effective visualization, navigation and exploration of open data, we describe existing solutions relevant to our problem domain. Then we describe general UX dimensions that we aim to exploit in our solution.

2.1 Open and Linked Data Visualization Tools

In this section, we give comparative analysis of the open data visualization systems found in literature. Comparison is done against a set of criteria adapted from [11]. The criteria are:

  1. 1.

    Interactive Visualization (IV): Refers to the use of interactive representation through different kind of widgets (such as images, buttons and maps). Here, human perception is considered in understanding the complexity of the data structure and discovery and analysis of the data [3, 8, 10, 25].

  2. 2.

    Relations (R): Denotes different kinds of relationships within one or multiple datasets to understand the data and discover new data [4, 7, 16, 21].

  3. 3.

    Details on demand (DOD): Deals with exposing different level of details for the data as needed [10, 25].

  4. 4.

    Scalability (S): Denotes the ability to manage and link large amount of heterogeneous data, which are loosely coupled [4, 10, 22].

  5. 5.

    Filtering (F): Refers to an ability to suppress irrelevant information and focus only on information relevant for particular context or user session [4, 10].

  6. 6.

    History (H): Describes the ability to record history of interaction allowing the user to review or retrace paths, undo/redo their actions [8, 27].

  7. 7.

    Faceted exploration and Navigation (FEAN): Refers to flexible mechanism that enables setting particular context for search and exploration, and switching to another context based on relations the user explores during the session [9, 15, 39].

  8. 8.

    Domain Independency (DI): Means that applications are not coupled with the specific domain and can exploit wide range of underlying datasets [6, 8, 20, 25].

  9. 9.

    Target User Group (TUG): Means we differentiate three target user groups [11, 35].

    1. (a)

      Common User (CU): End user who does not have any background in ontologies.

    2. (b)

      Tech User (TU): End user who has understanding of the underlying technology and ontologies.

    3. (c)

      Domain Expert (DU): End user with expertise in data of a particular domain. DU might or might not have knowledge of an underlying technology.

Table 1 provides a brief comparison of existing Open Data applications. When visualizing open linked data to make them accessible for common end users who are not familiar with Semantic Web languages, it is important not to present data as URIs or triples but in a more user-friendly way. Although some systems provide high-level interactivity and emphasize different kinds of relations in visualized datasets, they mostly use RDF to describe the data. From the usability viewpoint, it adds an extra effort to the end user to interpret and understand what is being visualized. Applications like DipperFootnote 3, DiscoFootnote 4, MarbelsFootnote 5, Piggy BankFootnote 6, Sig.ma, URI BurnerFootnote 7, ZitgistFootnote 8 and IsaVizFootnote 9 employ complex notations for visualization. They mainly use knowledge graphs that quickly become cumbersome as users drill down the data. Tools like OpenLinkFootnote 10, RDF GravityFootnote 11, RelFinderFootnote 12, SIMILE/ExhibitFootnote 13 and LESSFootnote 14 provide good visualization support, but are highly contextualized and does not support better filtering and tracking history of interaction. In terms of cross-domain support, most of the systems allow to visualize in a predefined context, without the ability to switch to another context based on the relations user perceives during the session. This limitation also comes from the automated production of Linked Data which raises the problem of the accuracy and completeness of the datasets. In particular, incorrect or missing values, or incorrect links makes it hard to correlate data [11]. All the factors above require a certain level of expertise, either in ontologies or domain of the visualized data or even knowledge of specific UI notation. This is the reason why majority of systems is well suited for more experienced users. Thus, LOD community is still struggling to come up with common visualization tool that captures open data diversity and unexpectedness in a user-friendly way.

Table 1. Comparing functionality of different tools

2.2 Common UX Dimensions

An overall positive UX is the key to the wide acceptance and increased usage of the open data. In scoping UX for open data, we use existing body of knowledge that expresses UX through a set of the measurable attributes [43]. Here, we enumerate these properties as follows:

  1. 1.

    Visual and Aesthetic Experience: Shows how aesthetics affects user perception of the system. It deals with the pleasure that users gain from immediate perception of the system [2, 23, 29, 32].

  2. 2.

    Emotion: Describes affective side of UX in terms of feelings and emotions elicited as an outcome of the interaction with the system [2, 14, 29, 33].

  3. 3.

    Identification: Addresses the human need to express one self through the interaction with the system [30, 34].

  4. 4.

    Stimulation: Relates to the quality of the system to encourage user to use it [24, 31, 38].

  5. 5.

    Meaning and Value: Denotes the quality of the system to reflect or represent values that are important to the user [29, 30].

  6. 6.

    Social relatedness/coexperience: Describes the pleasure that comes from the social interactions [18, 30].

3 A Motivating Example

Open data is widely used in various kinds of services aimed at different categories of users (citizens, domain experts, technical users) [11]. Figure 1 illustrates the inherent property of open data - variety in format and content. This feature of open data is critical from the usability viewpoint. With respect to the example, we enumerate major elements of this variability:

Fig. 1.
figure 1

An excerpt from open data catalogue that illustrate the feature of open data critical for end-users acceptance - variety in format and content. It shows data catalogue entry that points to different data structures representing the same data. Record A shows the data catalogue entry for restaurants. Record B points to the tabular representation. Record C refers the JSON representation. Both records B and C contain information about the same restaurant Al Volt.

  1. 1.

    Variability in data formats: Datasets referred by data catalogue are stored in various formats such as XML, XLS, CSV JSON and so on. It is assumed that end user knows how to deal with multiple formats and use them [13]. This extra effort creates a barrier in utilizing the datasets by most of the users. In Fig. 1, record BFootnote 15 is stored in XLs format, whereas record CFootnote 16 is written as JSON CSV.

  2. 2.

    Tabular presentation: On one hand, it provides a concise overview of information to the user as data are aggregated and list view flattens the depth of the data itself [1, 35]. On the other hand, different types of data require proper visualizations to be perceived and to extract needed information. For example, numbers can be expressed as chart and/or graphs, whereas map-like presentations are more convenient for spatial data [28]. Coming up with a uniform generic visualization is rather difficult, as data from different domains require different ways to visualize information. In the example above, it would be more convenient to display location-based data on a map.

  3. 3.

    Inefficient search and exploration of datasets: In general, open data lacks a proper interface that allows for an efficient interaction and exploration [1]. Finding a specific tuple in the dataset requires an extra effort for a user. The data may overlap, be incorrect or incomplete (as shown in Fig. 1). An interface that allows for multi-faceted interaction and supports exploratory search is still missing [26].

  4. 4.

    Complex data correlation process: The process of discovering related data tuples within the datasets is rather difficult. Looking at the record B from Fig. 1, we can notice that extracting different points of interest located nearby or within the address via Santa Croce is almost impossible from the dataset.

  5. 5.

    Multiple languages: Datasets are usually localized. For example, all datasets in Fig. 1 are in Italian. This requires a potential user to have knowledge of the language.

  6. 6.

    Scope of specific domain: Existing open data visualizations are tailored to specific domains and aimed to be used in different contexts. They often use different ways to interact with the data. This requires users to use different tools in different manner and increases cognitive effort when users switch among the tools.

4 Entity-Centric Open Data Visualization

In this chapter, we propose a generic user interface framework that aims to bridge the gap between what is called data-driven and user-centered approach to describe open data. In general, we describe a method that captures diversity and unexpectedness of open data in a generic way.

We see two basics steps to this process. First we describe and encode open data in a domain-independent manner. We do that by identifying fine-grained elements that can be modeled individually and used to compose and encode open data for different domains. On top of this model, we build UI layer that brings the open data to the level of a user experience. In this section, we describe basic principles and concrete elements that make our solution.

4.1 Entity-Centric Open Data

In an entity-centric view on open data, instances (objects) are described as entities [19]. Accordingly, tabular view on open data (Fig. 1) is transformed into the knowledge graph showing relevant entities and their attributes and relations (Fig. 2).

Fig. 2.
figure 2

Simplified entity-centric knowledge graph that describes open data as linked entities. Each entity is described with the class and with the name attribute. It also gives different types of relations. Vertical relations are defined on the level of classes, whereas horizontal are generated at runtime.

Entity: Denotes representation of real-world object that can be found in different contexts in our everyday life. Each entity is described with a set of attributes and relations with other entities.

Attribute: Represents a property of an entity. Each attribute has a name and one or more values. Except the value of the property it represents, each attribute has associated meta attributes such as provenance (describes the origin), permanence, dependency (whether it is computed) or validity time periods (case of an attribute whose value has changed over time). In addition, we distinguish different kinds of attributes like qualitative (numeric), quantitative (expressed as an adjective) or descriptive (given in natural language).

Relation: Defines links between entities. In general, we differentiate between two types of relations - vertical (ontological) and horizontal (epistemic). Vertical relations are used to compose hierarchies and mainly for the purpose of classification. These relations are created at design time, on the level of classes. The examples are is-a and part-of. Horizontal (epistemic) relations link entities according to their specific properties. In this case, the entities can be instances of different, unrelated eTypes. They are defined at the level of instances and are computed at runtime from attribute values. The examples include located-in and near-by.

eType: Presents a reference class of an entity. It is a template that defines the constraints for creating attributes and relations of an entity. Some common examples of eTypes include Location (geospatial entities), Facility (physical entities providing services to the people) or MindProduct (the result of a human intellectual effort).

Clear separation of concepts of attributes, relations and entity classes leads to flexible mechanism that combines these elements to represent domain knowledge.

Lightweight ontology: Refers to a method to dynamically compose entity classes, attributes and relations to create the background knowledge about the domain of interest (e.g., tourism, transportation). Thus, we have hierarchies of entity classes (eTypes), attributes and relations. These hierarchies are also known as facet ontologies. Facet is a term widely used in knowledge representation community and denotes aspect of meaning. Each facet contains group of homogeneous terms, where each term in the hierarchy denotes a primitive atomic concept (whether it is a class, a relation or an attribute). Then the lightweight ontology is dynamically created by linking the elements belonging to respective facets into unique hierarchy. The mechanism basically enables construction of multi-perspective representation schemes combining the terms from the facets. This concretely means we can view an entity from different perspectives. While vertical relations provide precompiled classification and description of entity types, horizontal relations are created at runtime. For example, a restaurant can be described as a refreshment facility, as part of the city, as near-by hotel, as on the lake, as where the event is being held, depending on the context provided. Context can be defined by a user request. In this sense, they provide flexible and scalable mechanism to represent domain-specific knowledge that combines different views on the same concept. From the technical side, it brings performance benefits since it may be generated at the beginning of a user session and thereby provide flexible, but also efficient navigation and search.

Figure 3 provides a simplified example of using lightweight ontology to generate UI navigation control. The ontology classifies entity types from a tourism domain according to their purpose. The level of a menu item matches to the depth of a node that represents the entity type.

Fig. 3.
figure 3

An example that illustrates the usage of a lightweight ontology to generate a menu for navigating domain entities.

4.2 Navigation Modalities

Based on the type of relations among entities, we define two basic kinds of navigation modalities. Combined together, they allow flexible navigation among open data entities. The are as follows:

  1. 10.

    Vertical (Ontological) Navigation: Follows vertical relations that exist in hierarchical structures of entity classes. If we look at the Fig. 2, we can see part-of relations between location-based entities (province, mountain, city). On the other hand, Fig. 3 shows lightweight ontology with is-a relations between eTypes, starting from Facility as the highest level propagating to more specific types of Italian Restaurant and Pizzeria.

  2. 11.

    Horizontal (Epistemic) Navigation: Follows the relations that exist between entities. In the Fig. 2, it might not be easy to find Restaurants near Hotel using vertical relation. Horizontal relations connect entities that can belong to different eTypes. They are generated at runtime (for example on a user request). If we look at Fig. 2, Hotel and Restaurant are connected with near-by relation. In addition, the network of horizontal relations gives the possibility to reach different means of transportation.

4.3 Reference Architecture

We describe our solution with the general, high-level architecture (Fig. 4). The architecture consists of the three main components:

Fig. 4.
figure 4

System architecture

  1. 1.

    Domain Component: Describes source open data from the domain of interest. These data are taken from the open data catalogue.

  2. 2.

    Semantic Component: Handles entity-centric representation of open data. This component contains mechanisms that transform source open data into entities. The process and the platform behind are described elsewhere [5]. Converted open data are stored in an entity base.

  3. 3.

    Interaction Component: Provides UI layer on top of the entity base.

4.4 Experimental Prototype

We have developed experimental prototypeFootnote 17 that uses the entity base containing open data from the province of TrentoFootnote 18. The Entity Base is called Entitypedia. It is a large-scale knowledge base. Currently, it provides 98 percent of correctness with the goal and tendency to improve it. Aside from serving as an entity repository, it provides entity-based services ranging from simpler (such as standard CRUD operations) to more advanced (such as context-dependent operations of search, matching or navigation) The contents of the Entity Base is described in Table 2.

Table 2. Entities classified according to eType inside an entity base

Figure 5 shows main components of the UI. The UI calls the Web API endpoint which is responsible for querying the data from the Entity Base. The UI components are responsible for visualizing the entities returned as the query result.

Fig. 5.
figure 5

Main architecture

Developed prototype is shown in Fig 6. It contains four main parts: ontological menu, map, information container, and relational menu. Ontological menu is placed on the left side and contains different eTypes. It is created from a lightweight ontology. Currently, it supports Italian and English. Based on the selected menu item (eType), the entities are retrieved and visualized on the map.

Fig. 6.
figure 6

UI of the working prototype

Users can also select multiple menu items. Once an entity is selected on the map, the information about that entity is shown on the screen and relational menu is created on the right side of the screen. The relational menu lists the entities that are either near or located in the same place. There are also standard settings, such as the default language, the radius to search within, and geo-locational search for the entities relative to the user’s position. More detailed demonstration of the prototype can be seen in accompanied video.

5 Experiment

The goal of the initial evaluation of the system prototype was to obtain insights on how different UX and usability dimensions are addressed, specifically, how people perceived entities and their categories in an entity centric world was understood. During the period of one week, nine students from different faculties participated in the evaluation. The users were students from Bachelors to PhD with age ranging from 21 years to 29 years. Five users were male and four were female. They were given a brief overview of the system and shown how to use the prototype application called Trentino Entitypedia (TE). The users were then asked to find entities in TE. Each session lasted maximum for half an hour. All the interaction of the users and their comments were transcribed and video was recorded for future analysis. The users were instructed to think aloud and they were asked to perform the task in normal way without feeling any discomfort or stress. After completing the task, respondents were asked to fill questionnaires. The questionnaire consists of background information and questions that use 5-point Likert Scale to assess different usability and user experience criteria. In the subsections that follow we describe the results of the evaluation.

5.1 Task Description

Each user was asked to spend some time on the interface in order to get familiar with it. Then they were asked to read the instructions about the system. Four related tasks were assigned to the users as follows:

  • Task 1 (T1): Find Bed and Breakfast.

  • Task 2 (T2): Based on the T1, find any other Point of Interest (POI) that is in the same location (commune) as that of previously selected Bed and Breakfast.

  • Task 3 (T3): Find POI within the range of 500 m in Cavalese.

  • Task 4 (T4): Find Sports Club in Cavalese.

Most of the users were able to complete task 1 within minutes. They were also able to understand the relationship and to navigate through different relations. Task 2 also took less time to complete. The possibility to set the radius made the task 3 simple to be achieved and users were also excited to see the results. Task 4 was also easily performed by most of the users.

5.2 Usability Evaluation

On average, users’ assessments were quite high for all measured variables. The overall assessment of the system was positive. Five usability dimensions were selected as follows: Usefulness, Learnability, Memorability, Satisfaction and Visibility of system status. Evaluation study is summarized in Table 3. Detailed evaluation dimensions and statistics can be found elsewhereFootnote 19.

Table 3. Usability evaluation based on five-point likert scale.

Overall, the TE system was assessed as quite useful. Standard deviation from the mean value for each of the questions for the usability dimensions is also minimum. This proves that system was useful for almost all of the users.

Ontological menu on the left and the semantic menu were both considered as user-friendly. Left menu proved to be well organized as it allows user to find specific service in less time. Users also liked the possibility of setting entity search range on the system. Overall, the interactivity and the integrated environment was interesting for users. Some minor bugs were also discovered during the process. The check-box on the left menu behaved with certain issues. When an item was checked on semantic menu (right menu), the users expected that the previously selected entity will also be shown on the map. Some suggestions about font size and type were also provided. The system was little slower and users expected results in less time.

For the future version, users also suggested some features. Directional features like in Google Maps, direct search functionality, facet level service on attribute level. More interactive help was also pointed out as needed. Users also requested putting icon on both the menus. Some users also suggested accommodating the relational menu (right menu) directly inside the information container.

5.3 UX Evaluation

The system was evaluated considering five UX dimensions i.e. Aesthetics and Visual appearance, Emotion, Identification, Stimulation and Meaning and value. Table 4 shows responses related to specific UX dimensions.

Table 4. UX evaluation based on five-point likert scale

On average, the users’ assessments were positive for all of the UX variables. User 7 and 8 gave negative remarks for Q3 and Q7, respectively. This resulted in higher deviation for those questions from the mean. We explain this with the comments they have provided. User 7 noted that the system has easy traversal mechanism, but the discovery of information may not be intuitive. Similarly, User 8 mentioned that though it is easy to find entities and relations, the meaning of the words in Italian was not intuitive in some cases. For example, Cateratta was used to describe waterfall, whereas the Cascata may be an appropriate word. This issue comes from the quality of underlying entity-centric open data.

The idea of seeing entities of different types on the same page really excited the users. Users felt it is less time-consuming and can get a clear picture of what they were looking for. User 1 mentioned that the interface was intuitive and exciting. Users liked the idea of consistency between menu color with the color of an icon color which showed clarity and symmetry. For some users, the green icon was not helpful as it also matched with the color of the map. Icon for street also did not make any sense to the users. For most of the users finding new information was an innovative thing. User 6 expressed that she enjoyed the way in which connected entities can be traversed. User 4 was surprised to see the opening and closing time for POI. All users asked if they could start using the application immediately. They also suggested that it will be extremely helpful for them if the application can be made mobile. User 3 liked the menu. The arrangement of menu items seemed logical for him. User 2 was doubtful whether the menu item ‘Things to do’ was expressive. He also thought that menu ski and club should be placed together under same menu.

Table 5 gives percentage of users that indicated application features with respect to specific UX dimensions. With respect to the identification, the users were asked to choose the features that reflect their habits in using similar systems. Regarding stimulation, the users were asked to choose the specific features that encourage them to use the application.

Table 5. Application features rating with respect to UX dimensions

The table shows that the most interesting application features are relations between entities, view filtering and flexible exploration and navigation through entities. It also shows that some of the features were not noticed by the users. These features require further improvement.

6 Conclusion and Future Work

The paradigm shift to make the data free raises an issue of the effective usage by common users. This requirement comes from the fundamental properties of open data - diversity in format and content and unexpectedness (meaning they are not known in advance). Thus far, they are handled with predefined, hard-wired solutions.

We handle open data explicitly through entities as a domain-independent and a user-centeric methodology. From the data perspective, entities address fundamental issues of open data. By design, they capture context very well by encapsulating all the relevant properties (attributes) in a component. Once designed, they serve as good data aggregators (such as People, Locations, Events, Facilities). Moreover, different kinds of relations among entities make them usable across domains. From the usability perspective, we perceive that people intuitively think in terms of entities as objects (such as friends, events and places) and we aim to exploit this notion in human mental model.

In this paper, we have proposed an entity-centric solution to visualize open data. The solution is supported by the high-level, reference architecture. Upon the architecture, we have developed proof-of-concept prototype. Insights that are gained from the user study demonstrate the feasibility of our solution.

Our next step is to design entity-centric UI framework that will use generic UI components - entigets to improve development of open data applications for different domains and contexts of use.