Keywords

1 Introduction

One of the most significant cognitive issues for people who are blind is the development of orientation and mobility skills, so the person can become autonomous. Frequently the absence of vision adds unnecessary complexity to easy tasks that require spatial representation [1]. The absence of information about the environment leads people who are blind to choose a certain route based on safety concerns, instead of in the efficiency of the route. It happens due to the less risk of tripping or bumping into anything, although the distance may be longer [2]. In an unfamiliar environment such as an airport or a hotel, this experience is commonly far more complex and dynamic [1]. In these environments either autonomous aid would be essential, for example guide dog or cane.

However, the limitations of conventional aids when facing some obstacles, like escalators and rotation doors, difficult guiding the user to choose the best possible route to a given destination [4]. In order to navigate in an efficient manner it is necessary to have a mental representation of the environment, so one can develop orientation skills and mobility techniques. To assemble this mental image a person needs to gather information about the surroundings. Besides, it is necessary to be able to detect items and places, and to keep a trail of the relationships between the objects within an environment [5]. The visual channels are responsible for collecting most of the information required for such a mental representation [6, 7]. In order to gain spatial information and generate a cognitive map of the surroundings a blind person needs to use non-visual stimuli to perceive the environment. Receiving space information via complementary sensors collaborates with the creation of an adequate mental representation of the environment. There are evidences that audio-based and haptic interfaces can foment learning and cognition in blind children [8, 9].

Since children and young people widely use games as part of their daily routine [10] multimodal serious games can be attractive tools to stimulate cognitive improvement. There are several experiences with the design and use of video games for stimulating the development of various abilities in people with visual impairment [11, 12]. Video games and virtual environments with this purpose should meticulously combine different sources of perceptual inputs, as audio and haptics [13]. Once cognitive skills have been developed or improved, a multimodal game can still help to transfer them to a real environment and, ultimately, to everyday life.

However, it is fundamental to ensure that these games can stimulate cognitive development. It is crucial to promote a better understanding and adequate, relevant and meaningful use of the multimodal elements in a serious game. The purpose of this work is to investigate the role of multimodal elements in the development and evaluation of games and virtual reality environments, whose target is to enhance cognitive skills of blind people. We analyze the state of the art about approaches and technologies currently in use for the development of mental maps, cognitive spatial structures and navigation skills in blind learners, through video games. Besides, we identify the selected solutions for conducting multimodal evaluation and usability in this context. In order to perform the analysis it was adopted a protocol that defined the study procedures. The results are discussed in this paper and summarized in a table containing the game name, its capability of enhancing cognitive skills, the type of evaluation performed, its interface and interaction characterization (available on http://1drv.ms/1zW6vlY).

2 Methodology

The study was carried out based on the Systematic Review approach [14, 15], from July to November in 2014. A systematic literature review is a secondary study method that goes through existing primary studies, reviews them in-depth and describes their methodology and results [14]. There are three main phases in a systematic review: planning, conducting and reporting the review [15]. In this research, we used the tool StArt [16] to support to the application of this technique to the three stages of the review.

The first step of this research was the definition of the protocol to describe the conduction of the study. The protocol guided the research objectives and clearly defined the research questions and planning how the sources and studies selected will be used to answer those issues. Two researchers and two experts performed incremental reviews to the protocol. We revisited the protocol in order to update it based on new information collected as the study progressed. The research questions are: Q1: What strategiesFootnote 1 have been used for the design of multimodal games for blind learners in order to enhance cognitionFootnote 2? Q2: What strategies have been used to evaluate usability and quality of multimodal games for blind learners? Q3: What technologies have been used for the development of multimodal games for blind learners, in order to enhance cognition?

We selected eight digital libraries as sources: ACM Digital Library, Engineering Village, IEEE Xplore, Scopus, Science Direct, Springer Link, PubMed, and Web of Science. We refined the search string by reviewing the data needed to answer each of the research questions, as well as the relevance of the results returned for each test of the string. Figure 1 presents the final search string submitted to the eight sources addressing the research questions Q1, Q2 and Q3. A set of selection criteria filtered the suitable studies, according to the goals of the research. It consists of four inclusion criteria and eleven exclusion criteria. The large set of exclusion criteria is due to the variety of knowledge fields that this study covers.

Fig. 1.
figure 1

Search string applied to the eight selected sources

The obtained result of submitting the search string to the eight selected bases was a first set of 446 papers. Then, using the snowballing sampling technique [17] we manually added a set of 52 papers to the initial sample. The new sample resulted from the references in the first round of articles and the investigation of the DBLP pages of the principal authors. From the total of 498 studies obtained, there were 48 papers from ACM (9.6 %), 136 from IEEE (27.3 %), 28 from Scopus (5.6 %), 181 from ScienceDirect (36.3 %), 50 from Springer (10 %), 4 papers from Web of Science (0.8 %), 1 paper from Pubmed (0.2 %) and 52 added manually (10.5 %). It is important to note that, although ScienceDirect has had most of the results, there were not many outcomes related to the desired area. It happened because this source returned a vast amount of articles related to cognition and/or blind people, but under the medical point of view.

In order to choose the most suitable studies to answer the research questions we filtered the papers. The first filter (F1) consists of removing the duplicated and short papers, i.e. less than four pages; secondary studies and articles published before 1995. The F1 excluded 172 papers (34.5 %), so that 326 studies went to the second filter. The second filter (F2) consists of the application of the specific purpose exclusion criteria and the inclusion criteria, after the reading of papers title and abstract. F2 excluded 216 papers (43.4 %) and included 68 papers (13.7 %), that went to the third filter (F3), intending to refine the initially accepted set of studies. F3 consisted of the examination of the full text of the 68 articles and the review of the assigned inclusion and exclusion criteria. F3 excluded 34 articles by criteria and four duplicated papers (7.6 %) and included 30 papers (6 %). Most eliminated papers related to cognition, but not to multimodal games for blind people. From the 30 papers finally selected for data extraction, one paper was from ACM, two from IEEE, four from Scopus, two from ScienceDirect, two from Web of Science. Finally, 19 papers were added manually, through a snowballing sampling. The relevant papers are from 1999 to 2014, being 80 % of the papers from 2008 on.

We considered studies describing multimodal video games or navigational virtual environment. We also examined studies describing no application but introducing a model for the design or the evaluation of multimodal game or environment for blind people. The selected papers were: [6, 13, 1845]. Among these, 25 papers described 21 distinct applications: 17 multimodal games and 4 multimodal navigation virtual environment. Some papers discussed the same application, but from another point of view, or executing a complementary research or evaluation method. There was four proposals of models to design multimodal games for blind learners.

3 Trends and Issues on Multimodal Games for Blind Learners

3.1 Design and Development of Multimodal Games

The selected papers showed that there is not a widespread process for the design of this particular kind of application. Most of the papers use some traditional software engineering process. However, given the specificities of this type of implementation and the limitations of the public, several factors must be taken into consideration, such as context of use and desired skills. The typical development cycles do not cover these aspects. Thus, each author adapts the development process, according to the goals of the game in question. Nevertheless, four papers [27, 31, 32, 34] introduce models for the design and development of games for enhancing cognition of blind people. Each model relates to a specific context of use, audience and/or desired cognitive skill.

The work of [34] introduces a model for the development of videogame-based applications designed to assist the navigation of blind people. While [32] is a video game development model to serve as a framework for designing games to help learners who are blind to construct mental maps. These maps are for the development of geometric-mathematical abilities and orientation and mobility (O&M) skills. The second process modifies the first one, improving and extending it in terms of the cognitive abilities implied by O&M and geometric thinking. The study [31] introduces a novel technique using concept maps for the design of serious video games, in Ejemovil Editor. The goal is that teachers can be able to define the storyline of the video game, incorporating the concepts that they want to teach in a structured way. The proposed process guides the teacher in transforming a conceptual map into a video game model. Finally, [27] presents a complete model for developing virtual learning environments for learners with visual disabilities. The model is cyclic and includes various steps and recommendations by discussing critical issues for conceptualization and implementation. The result is the input to generate a suitable user-adapted aural output.

3.2 Interface and Interaction Characterization

Although the proposed methodologies are not yet widely used, there are several common elements in the design of the 21 applications. It shows some trends in interface characterization and the interaction style. All of the applications use at least one aural interface element, although most of the cases combine two or more aural elements. The prevailing combination is between iconic and spatialized sound, in 3D environments. Iconic sounds are the most common type of sonorous feedback, occurring in 16 applications (76 %) followed by spatialized sounds, present in 11 (52 %) applications. The spoken audio is more prevalent than the speech synthesis, what may cause more empathy to the interface. The first one occurs in 11 applications (52 %) while the second appears in seven (33 %). However, five applications combine the two approaches. Stereo sound is another option, present in five applications (23 %). Only one application uses music/tones to represent different objects.

Twenty applications (95 %) present a graphic interface in addition to the aural elements. The interfaces can be 2D or 3D graphics combined with images or text. Contrary to what one might imagine, the results do not point to sound-only interfaces. Three of the applications (14 %) allow users to navigate only by sound (no graphics mode) and uses a graphic interface only for configuration. It happens because these interfaces aim to include not only blind users, but also visually impaired and sighted users, especially the teachers. However, only one interface assure that shows no relevant information in colors (for color-blind people). Although some papers may have omitted this information; it is an essential issue to attempt, in order to ensure that this public will be able to use interface correctly. Other basic features that demand more consideration are the adaptation of the size of the elements and the use of a high contrast mode. Only 9 % of the applications allows the resize of interface elements and 23 % of the applications offers a high contrast mode. Both of these functionalities should be typical in such applications since they are crucial to people with partial blindness.

The most common interaction pattern is the keyboard, used by 15 applications (71 %) especially in those whose feedback is mainly sonorous. The second more used interaction form is the joystick, present in seven applications (33 %). The joystick interaction always has an alternative interaction mode, usually the keyboard. The interaction with joystick occurs in interfaces with 3D environments that commonly use some haptic feedback. Two applications (9 %) allows the use of mouse together with the keyboard and one claim the mouse as the primary interaction mode. However, this application main audience is not total blind users. Although natural language might be expected to be an easier and instinctive way to interact, only two games allows the user to give natural language commands. The reasons are not clear in the papers, but it can be due to it is not a trivial task to recognize and process the natural language accurately. Besides, blind users who have any experience with technologies are used to utilizing the keyboard in other applications, what may facilitate the interaction.

3.3 Evaluation of Multimodal Games

None of the resulting papers addressed a model for usability evaluation of multimodal games. There is no apparent standardization about the elements to evaluate, nor the methodology, instruments and measures. Some of the evaluations described in the studies are very formal, while it seems to be ignored in a number of studies. Between the 18 applications aiming to enhance blind people cognition, only 9 (50 %) performed a cognitive impact evaluation. From the 25 papers that presented applications 16 (64 %) performed at least one type of usability evaluation. It shows that usability evaluation is the most frequent type of quality evaluation, performed more often than cognitive impact evaluation. However, in this context both evaluations are essential.

Sixteen papers presented at least one kind of measure of quality, but none related to any formal standard. From these, 10 were measures specifically related to usability, one was about efficiency (number optimal of steps/number of steps taken in an interface) and there was 6 measures specific to the context of the paper. Some of the specific measures are user performance (percentage of achievement, based on the total number of steps to complete a task), learner performance and level of progress.

The instruments used for the usability evaluation were mainly specialized questionnaires, especially in the more formal evaluations. The most frequently used questionnaires are the Software Usability for Blind Children Questionnaire (SUBC) [46] and the End-user and Facilitator Questionnaire for Software Usability (EUQ) [47]. Both instruments applied in four distinct evaluations. Two evaluations utilized the Software Usability Elements Questionnaire (SUE) that quantifies the degree to which the sounds are recognizable. These three instruments seems to fit very appropriately in the context of multimodal games for blind learners. In addition, two evaluations applied the Open Question Usability Questionnaire (OQU). Other less common specialized instruments used in only one evaluation each are the Heuristic Evaluation of the Videogame (HEV), the Heuristic Evaluation Questionnaire (HEQ) and the Initial Usability Evaluation (IUE). It is interesting to point out that none of the papers reference these instruments, except for SUBC and EUQ. It may be a reason these other specialized instruments have such a small use. Although some evaluations combine the questionnaires, there is no identifiable pattern for it.

The third more common instrument was a survey with Likert Scale items, used in three distinct evaluations. In this case, the authors created the instruments, and they do not claim to base it on any validated instrument or particular formalization. The surveys are mainly based on the context of the application and are applied personally or via email. The other evaluations use simple observations, non-specified usability questionnaires, open questionnaires, prototype interface questionnaires or even give no details about the instruments used in usability evaluation. We found no information about the efficiency, advantages or limitations of using these instruments to evaluate multimodal interfaces in the context of cognitive enhancement of blind learners.

3.4 Technologies for the Development of Multimodal Games

From the 25 papers that presented applications four (16 %) did not describe any of the technologies used. Between the articles that described its technologies, not the same kind of information was available in every one. There were articles describing in details the programming environment, libraries and modules used while other papers described only the hardware used for the interaction with the game. Figure 2 summarizes the technologies utilized in the development of the applications, in order to allow the interaction and interfaces described in Sect. 3.2. We grouped the technologies into Development Environments, Software Development Kits (SDKs) and Toolkits, Programming Languages and Parsers. Besides, there are other specific software utilized, joysticks and devices and technologies relate to Text to Speech. These were the technologies identified in some of the papers. There were cases when the applications claimed to provide functionality, but did not describe the technology used.

Fig. 2.
figure 2

Technologies used in the multimodal games or environments

Concerning to development environments, Visual Studio.NET was the most used one (7 applications); it is an Integrated Development Environment (IDE) developed for the .NET software framework. Another utilized environment was Microsoft XNA Framework + Game Studio (two applications). Microsoft XNA is a set of tools with a managed runtime environment that aims to facilitate video game development and management. XNA is also based on the .NET Framework. Both environments being based on the .NET framework explains the extensive use of the languages C# and C ++, the use of XML for storage and the need for parsers, such as DOM and Stanford Parser. These results point that the .NET framework and its related technologies seem to offer a better support for developing multimodal video games and environments. One application used Macromedia Director (currently Adobe Director); that is a multimedia application authoring platform, initially designed for conceiving animation sequences.

The papers showed a considerable variety of Software Development Kits (SDKs) and Toolkits related to spatialized audio. Two applications applied the Microsoft DirectX SDK library. It is a set of application programming interfaces (APIs) for handling tasks related to multimedia, principally game programming, and video, on Microsoft platforms. These applications also used with Microsoft’s DirectSound that provides diverse capabilities such adding effects to sound (e.g., reverb, echo, or flange) and positioning sounds in 3D space. Two applications used the Aural A3D SDK. It is similar to an improved DirectSound, featuring hardware accelerated 3D positional audio, providing three-dimensional sound quality to an ordinary pair of speakers. OpenAL for 3D Audio appears in one application and allows a developer to produce high-quality audio output, specifically multichannel output of 3D arrangements of sound sources around a listener. In addition, one application utilized Xj3D, a Java toolkit to develop X3D applications. It displays the 3D modeling standard formats VRML97 and X3D.

The most used programming languages are C# (5 applications) and C ++ (2 applications), due to the significant use of the .NET framework. The one application that used Adobe Director also used Lingo, an object-oriented programming language, embedded into this environment. Besides, one application used the Virtual Reality Modeling Language (VRML). It is a file format for describing interactive 3D objects and worlds. Although this language is a standard (ISO/IEC 14772-1:1997), it is more common to develop these applications using the commercial frameworks support.

The functionality implemented using the wider range of technologies is speech synthesis. Two applications used FreeTTS, a speech synthesis system written in the Java. Java Speech API, Microsoft Agent System Module’s text-to-speech function and the Windows Speech API are present in one application each. Whether to use a Java-based or a Microsoft API depends on the development environment adopted. The applications use various devices to allow the function of haptic feedback. Novint Falcon, a USB haptic device, is the most popular one (4 applications). It seems to exist an attempt to reduce the cost of a specialized haptic device. Joysticks and low-cost devices, present in four video games: OWL joystick, Wiimote, SideWinder joystick and Digital Clock Carpet. The last device is based on a usual cane and a simple carpet, and it is specific to one application, but could be reutilized. Among the 21 applications there are only three (14.3 %) designed for the mobile paradigm. It seems to be a quite unexplored area since only among these applications takes advantage of the benefits that mobile offers, such as GPS, sonar, and the sound compass.

4 Discussion and Conclusion

The purpose of multimodal software is to deal with the problems of the human-computer interaction through the adaptation of a computer to the user’s needs [48]. While developing one must carefully consider several factors, such as context of use, the desired skills to be developed and the severity of visual impairment. Although the papers analyzed show that audio is a mandatory interface element, other important issues remain neglected, such as the adaptation of the elements size and the use of color-blindness safe colors. Developers could reduce this type of problem if they used proper models for the design of this kind of application, instead of using a traditional software engineering process whose development cycle do not cover these aspects.

We verified that half these video games do not perform any cognitive impact evaluation. In these cases, one cannot assure that a particular application can actually develop or enhance any cognitive skills in children and youth with visual disabilities. We identified a number of validated instruments to evaluate the usability in the context of blind learners and video games. Developers and researchers should apply more often these tools to improve the quality of the usability evaluation. It is clear that usability is an important aspect of quality of the game. There are though other aspects to considerate, as the satisfaction of blind users, the learnability of the interface, application reliability, and so on. There is an opportunity for the academy to develop works in this area, towards creating instruments and evaluating the effectiveness of the existing ones, in the context of multimodal video games for blind learner’s cognition enhancement.

There is a huge variety of technologic options for implementing these games. The mobile paradigm should be more explored to the construction of this type of games. It is possible to take advantage of the resources available in the mobile context to provide contextual information that may help in orientation and mobility of legally blind users.

It is crucial to promote a better understanding and adequate, relevant and meaningful use of the multimodal elements in a serious game. In order to help achieving this purpose, this work provides a holistic comprehension of the approaches and technologies currently in use for the development of cognitive skills in learners who are blind, by using multimodal videogames.