Keywords

1 Introduction

Graphical representation of complex relations between items has been used in abundance in the recent years. Social graphs, in particular, may result in very large structures that deploy techniques such as zoom and pan and instant search for users to be able to browse effectively [1, 2]. Ontology graph is one of several ways of authoring and browsing ontologies, from a range that spans from list, trees and tables to 3D representations [3]. To ensure the visibility of the relations between the entities and the visual recognition of clusters, graphs are opted as an optimal means to visualise for almost all (small to very large) representations.

Recently, graphs have been used as part of advanced web interfaces that were designed for authoring complex ontology applications such as policy modelling [4]. As the graphs become large, problem arise for users that need to view specific entities or clusters. Depending on the size and complexity, ontology graphs may become too hard to follow, especially during the authoring of the ontology itself. Taking a few steps back, the new problem becomes proportionally larger as the size of the graph grows. In application-specific approaches like the one mentioned before, nodes have names that can be as large as sentences. Adding new nodes and relations becomes cumbersome even when the graphs are medium sized, as in Fig. 1.

Fig. 1.
figure 1

Policy model ontology graph

This work implements and evaluates a speech-enabled navigation and editing approach to enhance the user experience of authors of complex ontology graphs. The following sections present the design rationale and requirements, the set of speech commands that were implemented and the evaluation of the speech based interface compared as part of a new two-modal solution from the initial traditional web interface.

2 Design Considerations

For our design, an existing web interface that was designed to author ontology graphs was used [4]. The aim of the web authoring interface was to enable non-technically proficient authors from diverse work environments (parliamentary assistants, policy makers, crowdsourcing private sector, students) to create domains and policy models with the data that will drive the collection of documents from news pages and social media (Facebook, Tweeter), the sentiment analysis of the collected data sets and the argument extraction. That information is then fed back to the authoring environment for the fine-tuning and later extension of the models [5].

Figure 1 depicts a typical policy model authored and viewed on the aforementioned web interface.

The authoring of a policy domain or model is through the same generic concept. The author specifies the ontology domains by adding and editing instances of entities, norms and arguments. These can be connected as to describe the relations between them, essentially forming a graph. The simplest form for a vary small domain or model is a tree. The aim of the web interface was to provide a seamless user experience to the end users, yet enable them to create the envisioned ontology models. The high-level requirements were selected from groups of users from crowdsourcing service provision organizations and political bodies. The contextual framework for the interface specifications has been identified and described by a list of policy model domain specific items. The items include entities, sentiment and opinions, social and demographic information, sentence level arguments from a range of traditional web and social media-related sources, such as Blogs, Wikis, and Social Networks, namely Twitter and Facebook.

The described web interface and authoring approach work very well, utilizing the freedom of relation visualization of graphs to represent ontological structures like policy models and domains. Specific techniques for graph visualization were deployed in order to aide the users, such as zooming in/out and fast centering, panning, highlighting neighbouring nodes on node selection (Fig. 2). Additional non-graph related issues such as the large node names were addressed by displaying the first 16 characters of each node name.

Fig. 2.
figure 2

Highlighting a node and directly connected nodes

However, as the authors progressed and created very large graphs, they reported increasing difficulty finding the node they wanted to edit and clicking to it. Focus group discussion of issues during the next round of design revealed usability issues that directly relate to accessibility. This was evident also from previous studies that explored usability and accessibility as part of the design-for-all methodology for designing voice user interfaces [6].

3 Speech Interface for Graph Editing and Browsing

To address the usability issues above, the second round of the iterative design included the decision to utilize state-of-the-art web speech synthesis and recognition [7, 8] in order to improve the user experience with the ultimate aim to be able to provide a fully speech-driven interface by the end of the lifecycle.

A set of voice commands was implemented over the functionalities of the web interface in order to allow multimodal input to the system. All possible actions that the policy model/domain ontology authors may perform were matched by the voice interface. Two types of input were designed, the commands that initiate content-free interaction with the interface and the ones that include actual content of the model/domain, such as the title text of nodes. A slightly different look into the type of interaction would be to categorize the input as (i) browsing/navigation functionalities and (ii) editing/authoring functionalities. Speech recognition accuracy was more challenging for the latter types of speech commands. Table 1 lists all the speech commands as well as their description. The descriptions, where needed, refer to the non-voice interface interaction for the purpose of direct comparison for the reader.

Table 1. List of voice commands for graph editing

4 Experiments

Three distinct experiments based on the initial information derived from the user requirements and the web interface prior evaluation round were set up. The purpose was to ensure that the design-for-all approach could integrate with the speech enablement and refine the navigation and editing processes in order to maximize the user engagement and experience. Ten participants (age group 25–42) were asked to evaluate the proposed approach. The aim of the first experiment was to evaluate the impact of the speech based interaction for the graph navigation. The users were asked to verbally search for specific domain entities and semantic tags in order to filter and sort specific entities and relations of interest. They were also asked to use the traditional non-speech enabled interface to achieve similar tasks. The second was to investigate how adding new information and editing existing data could align with the user mental impression of how a domain should be created. That task, being user/domain dependent, was achieved by asking the participants to add new information and evaluate later whether their selection and choices were optimal, considering the use of both speech and non-speech actions that they had at their disposal.

The final experiment was the functional and non-functional usability evaluation, involving both domain experts and casual mobile users. One of the main requirements was to measure the impact of the speech driven authoring in terms of time, clarity and acceptance. Figure 3 depicts the test policy domain that the participants were asked to navigate and edit.

Fig. 3.
figure 3

The test policy domain graph for evaluation

5 Evaluation

The participants evaluated the interaction between the traditional non-speech interface and the speech-enabled (Fig. 4). Almost all opted to use speech for the search-related actions expecting to locate the node of interest much faster than by navigating the graph. The overall satisfaction feedback was overwhelmingly favorable for the speech modality, especially for the find and select nodes actions. The reason was that the voice interface enabled the users to search quickly and center the graph in on their selection. This was particularly apparent for the nodes that had long title text. Editing functions such as the add and delete node/relation were marginally easier through the use of both modalities, since the users were able to use speech whenever they deemed as an easier path to their goal.

Fig. 4.
figure 4

Evaluation results for non-speech versus speech-enabled interaction

Lastly, the navigation of the graph itself, as a casual browsing task, revealed the shortcomings of the absence of speech commands for the specific generic functionality. No specific commands existed for zooming in/out or panning the graph, hence the users reported that they would have preferred an innovative way to browse, hinting at further research into this method.

6 Discussion

Based on the results of the experimenting with the speech recognition and synthesis tasks, the design of the user interface has been extended to the speech modality that has led to less complexity, as reported by the users. The visual modality was also polished to a more inviting and clear overview of the ontology domain graphs and special features, such as highlighting of the nodes that contain text identified via spoken search, were added. Further work is currently underway for the backend extension of the services that are needed to fully implement the speech web API for the generic graph view functionalities. Additionally, other functionalities that are commonly used in graphs such as dynamic insets [9] may also be implemented into the speech API, essentially allowing the user to preview the insets over the larger graph, while editing. The results of this work are expected to enhance the design of the user interface to support and sustain a multimodal approach to ontology graph authoring.