Keywords

1 Introduction

Human interaction with the world is intrinsically multimodal [1, 2]. There are two views on multimodal interaction: the first one focuses on the human side - perception and control [3] and in this context, the word modality refers to human intervention and input and output channels. The second view focuses on the use of two or more modalities for input and output on the computer to build systems that make synergistic use of input or output of these modalities in parallel. The input modalities of many computers and devices can be interpreted as corresponding to the human senses: cameras (sight), tactile sensors (touch) [4], microphones (hearing), olfactory (smell), and even taste [5].

Multimodal interaction systems aim to support the identification of naturally occurring forms of language and human behavior through the use of recognition-based technologies [6, 7]. Considering that humans can process information faster and better when this information is presented in several modalities [8], it has been promising the use of these systems aimed at increasing accessibility to people with disabilities. That is, the interfaces that promote such interaction must be developed in a flexible way in order to support human-computer communication, thus allowing users with different preferences and skill levels to choose how they will interact [9].

The current technological landscape allows digital access to be extended to previously excluded groups or categories of users. However, even with the resources and possibilities offered by the industry to people with disabilities, they still face barriers in the use of accessible software. Second, as the information technology industry focuses on making its offerings accessible to people with disabilities, it is becoming noticeable that moving towards the standards of affordability does not guarantee ease of use for those people with disabilities. This occurs mainly when accessibility criteria are not used during their development or even when these interaction criteria are designed in developer’s own way of interaction, that is, particular interaction issues of the individuals who are the target audience for the applications are neglected. According to [11], most developers are not aware of the difficulties encountered by users with disabilities.

Contextualizing this development of multimodal interaction scheme scenario to people with visual impairment, it is evident the importance in relating the specific interaction characteristics to this group of people in order for the system to have the criterion of accessibility in the quality of use that is related to the barriers that prevent visually impaired people from interacting with the system.

This paper is structured in five sections. Next, it is presented some concepts necessary for the proposal. Section 3 presents the scheme for multimodal component recommendation, including the knowledge acquired by designing and evaluating two applications. In Sect. 4 there is a brief discussion about the first steps towards the scheme’s validation followed by the conclusions (Sect. 5).

2 Background

Multimodal interaction is a trend topic in interaction design, especially for people with disabilities, because of the possibilities for adapting and customizing content for different devices and resources depending on the abilities each user has. The Web is a natural resource of information, but many websites are not prepared to handle different components for adapting to people’s different abilities and for adapting websites or any application for them, one way to act upon it is designing multimodal interfaces. For that reason, in the following subsections, it is presented some usual concepts about web accessibility and multimodal interfaces for people with disabilities.

2.1 Web Accessibility

The term web accessibility is used to emphasise that people with disabilities can access information though interacting on webpages. More specifically, people with disabilities can perceive, understand, navigate, Interact and contribute for the web. Web accessibility also is beneficial for other people as the elderly with reducing capabilities, resulted for the aging process [3].

Considering the principles for web accessibility, W3C created, in 1999, the WAI (Web Accessibility Initiative), formed by work groups dedicated to the elaboration of guidelines, called WCAG - Web Content Accessibility Guidelines (WCAG). Within the WCAG, there are many guidelines sets for fostering accessibility. The most recent one is WCAG 2.0.

On the developers’ side, the accessibility of web content using mobile devices has becoming a more relevant subject mainly because of the increasing number of people using mobile devices for navigating on the web. In navigating, people are transferring most tasks they used to do on computers to their mobile devices, as bank transactions, calendars and receiving and sending e-mails. With the growth in the used of mobile devices for multiple tasks, such devices are becoming more important for people with disabilities and they are using those devices for enhancing their interaction and communication [13]. Regarding accessibility on mobile devices, there are specific guidelines recommended for the applications development for Android, Accessibility Developer Checklist [4] and for iOS, Accessibility Programming Guide for iOS [5].

As aforementioned, there are some guidelines for guiding the development of accessible web content. Even though such guidelines are widely used, they are not enough for gradually supporting each activity involved in the development process of accessible web content, mostly when referring to the development process for a group, such as the group of people with visual impairments, because this development requires the knowledge necessary to understand specific contingencies of the specific group and the existent guidelines are generic.

The challenge posed is to foster web accessibility for each specific group of people with disabilities. For the specific group of people with visual impairments, web content must be accessible for screen readers, what will be discussed later, with the two example tools developed in the context of the proposed scheme.

2.2 Multimodal Interfaces for People with Disabilities

People with disabilities need different input and sometimes output ways to appropriately interact in web sites and computer systems. Multimodal interaction is a concept in Human-Computer Interaction, generally meaning the interaction with virtual and physical environments through more natural modes of communication. The interfaces developed within this concept, have been gradually acquired the capacity of understanding, interpreting and generating specific data in response to the analysis content, differing from the classical applications and multimedia systems which do not take in data semantic (sound, image, video) that they manipulate [6,7,8]. Such interfaces have becoming promising in software development, because they allow the integration of different modalities for user interaction.

Regarding development purposes, multimodal interfaces are directly associated to different investigation directions [8]:

  • developing more natural interactive interfaces, intuitive, efficient and, simultaneously, less awkward, associated to a crescent learning curve [9,10,11].

  • increasing the amount of information transmitted in useful time during an interaction, resulting in a run time decrease [7, 12, 13].

  • increasing system robustness, aiming at obtaining a superior intelligibility for recognizing information through signs received from different modalities to solve ambiguities, communication error prevention or fixing [14,15,16].

  • stimulating the commitment of the user in the activity to be developed, promoting his satisfaction [17, 18].

  • promoting user’s understanding and anticipating his intentions [19, 20].

  • allowing greater flexibility in accessibility using different contexts, dissociated from the user, that reveals usability constraints for certain modalities, through the possibility of selecting the most adapted modal channels that suits user’s needs, his proficiency level and/or the nature of the task to be performed [21, 22].

  • increasing accessibility to computers by people with specific disabilities (whether sensory or motor), by providing them with alternative modalities and styles of multimodality [23, 24].

  • providing new ways of computing, previously unavailable [25, 26].

  • providing input channels utterance, to prevent cognitive or physical overload for long interaction [21, 27].

  • decreasing the cognitive overload associated to a task and, consequently, the attention level necessary to perform it [28, 29].

  • fostering information systems adaptation for predominant interaction patterns for each user [30, 31].

Although the advantages of multimodal interfaces are evident, the development of multimodal projects is still a challenge [27], due to the lack of tools that appropriately guide the designer in the design, implementation and evaluation of multimodal interfaces. In addition, there is a need to process heterogeneous user group inputs and integrate multiple output/input modes that can operate in parallel or simultaneously, together with handling recognition and synchronization errors for the generation of efficient multimodal interfaces (apud [32]). The approach of this article extends these conceptions adapting them to the development of each specific application regarding their purpose.

3 Proposal

Considering the context, in this work we present a scheme for recommendation of multimodal components suitable for software development of accessible applications. Multimodal components, in this work, are an independent piece of software with the same functionality configured for all devices and medias available [33].

The proposed scheme, represented in Fig. 1, has a multilayered knowledge base (the block at left) comprising: (i) “theoretical mapping” – a set of heuristics extracted from related literature (recommendations from regulatory associations, scientific studies, multimodal fusion techniques, etc.); (ii) “recommendation history” – structured records of previous recommendations from experimental scenarios; (iii) a multimodal “applications repository” and correspondent “recommendations criteria” – organized according to aspects considered (e.g. physical and cognitive ones).

Fig. 1.
figure 1

A scheme for multimodal component recommendation

The resulting body of knowledge, along with a set of requirements elicited from people with disabilities, parents, teachers, therapists and other stakeholders with expertise on the idiosyncrasies of a certain situation, are used by an Integration Agent to generate components recommendations (resources, strategies, user profiles, etc.) recommended to tackle accessibility issues for that situation. In its turn, each situation and its specificities becomes a new case that is stored and retrieved in quite similar way to case-based reasoning.

The integration agent, as defined in [35] is responsible for integrating information from different input devices and/or available media, resulting on a set of recommendations grouped by components. The proposed scheme is a detailed description of components definition on [33, 34], incorporating the artificial intelligence techniques for managing knowledge bases and application repository. The proposed scheme is still on test. The multilayer knowledge block already has two applications for people with visual impairments, using different modalities from mobile phones and desktops or notebooks connected to the internet. From those applications, data about interaction and conformance tests have already been collected.

3.1 Application Repository

The developed applications are the key point for this scheme definition. To develop such applications, we followed recommendations criteria specific to their purpose. The first application developed followed Android’s accessibility recommendation criteria and the second application, WABlind, followed the recommendations of WCAG 2.0. The first (RotaColab) is a location-based collaborative tool aimed to support blind people navigating through unconventional paths using mobile devices. The second one (WABlind) is a tool allowing conventional webpages to be locally rebuilt and reorganized in such way they can be read by screen readers. Both software has been tested in real world conditions or in laboratory settings to assess compliance to usual standards (e.g. WCAG).

RotaColab was developed in the Android platform. This application demonstrates techniques of the use of accessibility features applied to the navigational orientation, for analyzing its contributions related to its communicability with visually impaired people. To develop RotaColab [33], recommendation criteria were adopted for the development of accessible applications of the Android - Accessibility Developer Checklist [4]. These recommendations were related and exemplified with specific interaction characteristics of the application, considering the specificities and limitations of the target audience, the visually impaired. The recommendation criteria were: text field hints, enable focus-based navigation, no audio-only feedback, temporary or self-hiding controls and notifications, controls that change function, supplemental accessibility audio feedback, decorative images and graphics.

Another tool developed for integrating the repository is WABlind (Web Accessible for Blind People) which is a tool for restructuring web content designed for people with visual impairments. Its use is aided by screen readers and voice synthesizers, commonly used by this group of people, on computers and mobile devices. This tool relates WCAG 2.0 guidelines with previous results obtained on how best to present web content to blind people, according to research conducted by the W3C [34].

The tool in its initial execution enables the treatment of 4 of the 15 most problematic interaction items found when browsing web pages, as reported in a recent survey conducted in 2013 by the W3C on the use of Assistive Technologies, Amplifiers and screen readers [34].

The 4 selected items, which are defined as the recommendation criteria, are related to the syntactic property of web pages. That is, at this level, any item related to the semantics of the page was excluded. The following are the listed research items associated with the WCAG 2.0 guidelines that run at this level:

  • The presence of inaccessible Flash content - Guideline 1.1.1 Non-textual content (Level A).

  • CAPTCHA - Use text image to verify that you are a human user - Guideline 3.3.2 Labels or Instructions (Level A).

  • Images with missing or inappropriate description (alt text) - Guideline 1.1.1 Non-textual content (Level A).

  • Unidentified foreign language texts - Guideline 3.1.1 Page language (Level A).

The recommendation listed cover the WCAG 2.0 level A guidelines. WCAG 2.0 defines levels for each guideline to provide testable success criteria to allow such guidelines to be used where requirements and compliance tests are required. Three levels of compliance are defined: A (the lowest), AA and AAA (the highest) [35].

The differential of this tool is the set of recommendations applied to the elements, originally inaccessible, of the informed web page, which allows the provision of textual alternatives so that screen readers can identify them while browsing a web page.

Initial Assessment of WABlind

In order to show tool’s feasibility, compliance levels tests were carried out with the pages restructured by WABlind with the validator AChecker [36]. Such validator is recommended by W3C [37].

To carry out the initial tests in verifying how much a Webpage improved after applying WABlind, we employed the free services of AlexaFootnote 1, which is a Web application that lists the most accessed web sites. We selected the most accessed web sites to verify, how many of them did not meet the needs (in terms of W3C rules not being followed) of visual impaired users. As a result, we gathered information on 50 websites across 5 categories: Education, Health, Entertainment, News and Shopping. In addition to that, we followed the same procedures after applying WABlind and dealing with the accessibility automatically identified problems. Table 1 shows the descriptive statistics of the results before and after WABlind considering all websites. Figure 2 shows the distribution of the results for this initial test without considering outliers (there were cases in which the maximum number of identified problems was 964) in order to facilitate visualization.

Table 1. Descriptive statistics for the results of applying wablind in the most accessed 50 Websites for each of the evaluated categories: Education, Health, Entertainment, News and Shopping.
Fig. 2.
figure 2

Boxplot graphs demonstrating the distribution of identified problems both in the original website categories and the categories after using Wablind.

Even with the increase of the number of identified problems of AA and AAA types (considering that after fixing A type problems, AA and AAA problems can arise), we can see that in some categories, the number of problems was reduced (i.e. Health, Entertainment and News). These results, suggest that WABlind can increase the quality of the evaluated web site in terms of accessibility. However, we still need to consider the causes for the increase in the number of problems in the Education and Shopping categories. We are investigating to what extent the A type problems considered by WABlind were corrected and which AA and AAA type problems arose.

4 Discussion

As the WABlind tool was developed after RotaColab, some information, as well as user behaviors obtained during RotaColab tests were taken into account, such as the layout of elements in the web page in an extended way and the maximization of the voice interaction, since they were perceived difficulties regarding the provision of information and guidance in reading the RotaColab application page during the tests.

This was partially achieved because users are not familiar with touchscreen devices. Therefore, WABlind’s development was reflected on the layout of elements in a web page and it was considered to label the elements arranged, both by means of automatic treatments and collaboratively in order to minimize obstacles encountered during navigation, specifically on mobile devices. This last phase of collaboration is still being tested.

For the conformity tests, pages with accessibility problems, in several categories, were collected, according to guidelines of WCAG 2.0. The idea was to find some clues of improvements after using WABLIND. The result of this study suggests that the use of this tool promotes accessibility by attending more conformance elements than without using WABlind, especially in web page elements that do not have auditory feedback.

5 Conclusions

This paper aimed at discussing about the challenge of increasing accessibility for people with visual impairments using multimodal resources as a support for interaction. The approach presented here starts with the development with two different multimodal tools for people with visual impairments, contextualized in a scheme to help developers to design multimodal interface projects for the existent and inaccessible tools based on the knowledge accumulated on the scheme.

The main challenge for the development of multimodal interfaces is the development of interfaces that meet accessibility criteria and services that abstract the needs of the user and not the specific characteristics of the device itself. Investigative studies on the interaction for visually impaired people in mobile devices is a latent demand that requires more and more studies and applied research. In addition to the increasing development of accessible mobile applications, tools for desktop are still used and a project to turn them accessible is necessary.

While standards and guidelines for the development of multimodal interfaces projects are not yet fully defined, individual initiatives such as the research paper reported here attempt to address the lack of guidelines for the development of such applications. As a result, a scheme, not fully consolidated as the proposed one, shows great feasibility from the perspective of development of applications following recommendations criteria appropriated to the development of projects of multimodal interfaces for people with visual impairments.