Keywords

1 Introduction

1.1 Motivation and Background

The foundation of human interaction involves some form of communication, whether through voice, symbols, numbers, pictures or signs. Whether they happen face-to-face or online, interactions can be challenging for people with disabilities. Disabilities and disorders such as cerebral palsy, aphasia and autism can lead to social isolation mostly to the difficulty of communicating. People with disabilities find it hard and tiresome to express their thoughts, while their interlocutors often lose patience waiting for delayed responses. The communication challenges create severe problems for the intellectual development of young children, further preventing and delaying their integration into the society. A recent UNICEF study [13] analyzing data from 15 countries shows that almost 50% of children with disabilities do not go to school, and 85% never received a formal education. Even for those few who actually attend school, there is evidence suggesting that traditional teaching methods are often not suitable for children with disabilities, especially those who are both non-verbal and have motor-skill challenges. Thus, overcoming the first barrier and bringing children with disabilities into a formal educational setup is crucial, and it has to be complemented by educational methods and strategies that can facilitate communication and reduce the so-called “reciprocity gap” often due to delays in responses.

Adaptive applications such as Augmentative and Alternative Communication (AAC) [5] have emerged as crucial players in enabling effective communication for people with disabilities [2]. Based on the manner in which they facilitate communication, there are two main categories AAC devices: (1) devices that synthesize speech through message composition [9], in which the user constructs sentences by selecting subsets of suggested words and then synthesizing them [18]; and (2) devices that offer alternative ways of communication through pictograms [5], in which the user expresses thoughts, desires and requests by selecting specific images. Although both are invaluable because they empower people with disabilities to communicate, the first group tends to be costly prohibitive [11], while also having reduced mobility due to size and shape.

Fig. 1.
figure 1

Scenarios showing the sequence of steps expressing the desire to eat an apple (interactions/selections (touches on the screen) indicated by light yellow background). (Color figure online)

We focus here on the devices in the latter category, namely the systems that provide pictogram-based communication. In these solutions the main idea is to somehow mimic the traditional communication boards, and expand them into the mobile computing context. Thus, there is always a trade-off between the versatility of communication, limited by the number of available pictograms, and the space available on the screen to display the images in a manner consistent with specific motor and cognitive skills of individuals. To circumvent this limitation, pictograms are usually organized in categories, and they can be accessed by interactively navigating through hierarchies of items or through “pages” shown on the screen. Nevertheless, this raises another challenge: the “deeper” or further down an item is placed on the hierarchy, the more physical effort is required from the user, due to the increased number of interactions (i.e. multiple pages to explore) necessary to reach the final item. We show in Fig. 1 the scenario where the disabled person wants to eat an apple as a snack. To express this desire, there are several steps to follow, each one corresponding to choosing an item on the screen and then moving to a “follow-up” screen. The selections are marked with a light background contrasting with the overall choices that are displayed on blue backgrounds. In Fig. 1 first selecting “I want to...”, is followed by the next selection “Eat...”, then “Fruit...”, and finally “Apple”. The three dots at the end of a selection signal that there are subsequent selections to be made, while the lack of ending dots means that there are no more available options.

Although trivial for a person with good motor skills, this sequence of choices can be quite slow and difficult for people with motor-skill challenges, thus it is crucial to present them with the most appropriate choices using the least number of steps or screens to navigate. One of distinguishing features of our LIVOX addresses precisely this challenge that often acts as a barrier in communication. Using machine learning algorithms to enhance communication, LIVOX reduces the number of necessary interactions to reach a desired item.

Figure 2 gives an on appreciation of the capabilities of LIVOX to apply personalised online learning algorithms to infer the most probable items for a child who cannot speak and also has motor skill challenges, based on a given time and location. For example, subfigure A indicates that based on the fact that LIVOX detects the location of the disabled person as “home”, and the time as “8am”, the customized choices are “I want to” (for expressing desires, such as “I want to eat”) and ‘I am’ (for expressing needs, such as “I am hungry”) and shows them as items available at the start of the conversation, on the first “page”. Similarly, the remaining subfigures customized screens based on location and time recommenders, and will be later discussed in Sect. 3.

Fig. 2.
figure 2

Example of LIVOX context-aware pictogram recommenders suggesting itemsets based on location and time. The customized screen as per user choice here displays four pictograms for each “page”, two recommendations and two standard items (blue boxes). (Color figure online)

1.2 Limitations of the State of the Art

Crucial challenges of Alternative and Augmentative Communication (AAC) continue to be related to the length of time to communicate, improving overall conversational reciprocity, and reducing the physical effort for people with motor challenges [12]. Many existing pictogram-based systems [1, 7] use static pages that people with disabilities browse to find the appropriate images to express their thoughts and needs. A rich experience necessarily involves the use of large collections of images, thus leading to increased response time and effort to find the appropriate items [15]. Sc@ut [16] and IMLIS [17] use more than 2,200 pictograms for interaction, available on different platforms like tablets and Nintendo DS, increasing the time needed to browse them. Similarly, Proloquo2go [19] and Tobii Dynavox [8] are AAC commercial systems that aim to reduce the communication gap, where the interlocutor waits for an answer and eventually loses interest due to the slow pace of the conversation. To mitigate this, others [10] use reduced functionality to allow users to call emergency services by tapping pictograms instead of using calls/text messages.

Although these applications are popular, they do not focus on customizing the pictogram selection, and thus suffer from increased gaps in communication. To our knowledge, our current application is the first to use Machine Learning without an active internet connection to make pictogram recommendations based on spatial and temporal context in mobile devices, thus reducing the effort needed to find the most appropriate images.

More recently, there are new efforts to explore the integration of various devices of the domestic environment through IoT protocols, allowing users to interact with the surrounding smart objects [14].

Several works studying the impact of using AAC on adults’ behavior [3, 4, 6] revealed that: (1) an increased number of people with aphasia, brainstem impairment, dementia, amyotrophic lateral sclerosis (ALS) and traumatic brain injury (TBI) use AAC; (2) the use of AAC is more effective under the supervision of professional staff; (3) the utilization time is usually low (25% of the time) especially due to the lack of support.

In summary, there is a need for AAC apps that can address many challenges simultaneously: portability, mobility, ease of use and customization to various needs based on specific disabilities, and reducing the reciprocity conversation gap by providing fast and easy access to the most appropriate images. Our LIVOX is a step forward towards solving some of these problems.

2 Overview, Key Features and Innovations

LIVOX is an Android application that translates the traditional concept of “communication boards” to a mobile environment. The app is designed to work on tablets, mostly to accommodate users who have motor-skill challenges in addition to speech disabilities. These disabilities often prevent users from being able to touch small areas of small screens, and for some also pose difficulties in distinguishing small shapes and low contrast images. LIVOX incorporates many distinct features to address specific needs of diverse types of disabilities. These allow parents and caregivers to easily customize the number of items shown on the screen, to use high contrast images, to adjust for repetitive touch behavior, and many others. Most of these features were inspired by observation and empirical evaluation in clinics during the development of the app.

The app is highly flexible and customizable, allowing for multiple users to co-exist, each with their own specific interface settings and collections of pictograms. The minimalist look of the user interface allows caregivers to easily learn how to setup and use the communication board in a short amount of time. Additional information can be added and accessed through a web interface, where the parent or caregiver can see all his devices and their respective users. This makes LIVOX a viable solution both for personal use by one person, and for sharing the same app over multiple users in educational and social settings such as schools, hospitals, clinics, recreation centers, etc. We first list here in short some of the many features of LIVOX, which although some might seem trivial, the combination of several features into the same app is not trivial, and it majorly contributes to better communication. Then we focus on our unique approach to using artificial intelligence to facilitate communication and increase engagement in social and educational activities.

Number/Size of Items on the Screen. By increasing or decreasing the number of columns and rows our app increases or decreases the number and size of the items in the user’s screen. This is especially important for people with visual impairments, making images easier to be viewed. For people with motor disabilities, the size of the items is directly connected to the physical effort required to choose them. Larger items are more easily accessed while smaller ones pose greater challenges. Similarly, the font sizes are adjusted accordingly.

Black/White and High Contrast Images. These customizations are especially important for people with cognitive impairments and vision disabilities.

Show Full-Sized Image After Click. This feature is particularly relevant if the user needs to get visual confirmation that an item was selected, and it is achieved by displaying the item image in full size upon the item being touched. This can assist with the comprehension that an action was completed, and it is particularly helpful for children with cognition-related disabilities.

Click Interval. Some disabilities lead to repetitive movements, causing multiple involuntary touches, which can flood and slow down the system. Livox uses a customizable “click interval” that can be adjusted to wait a for a specified number of seconds before allowing a next valid click on the tablet screen.

Screen as Actuator. Some physical disabilities prevent users from touching the screen. In such cases, including those with severe coordination challenges, LIVOX allows automatic scanning of items. As each individual item is sequentially highlighted, the user just needs to wait for the desired item to be highlighted and then touch anywhere on the screen (no need to click on the highlighted item). This way the entire screen acts as a switch.

Navigation Adjustments. Livox enables customized navigation through hidden/virtual navigation buttons and the use swiping.

Return to Initial Screen. This feature can be enabled to return to the home (initial) screen after a specified time interval. This contributes to the reduction of the physical effort who would otherwise have to touch the “back” button repeatedly to return to the home screen. For users with cognitive disabilities, this feature helps with cognitive adaptation. To put it metaphorically, after selecting the desired items, Livox returns to the home screen, like closing a book to restart the selection process.

Keep Items’ Relative Position on the Screen. For people on the autistic spectrum it is important to keep the relative position of items on the screen, even when the item is not being currently displayed. When enabled, this feature leaves a blank space in the place of hidden items on the screen instead of replacing them with new items.

Automated Imprecise Touch Adjustment (IntelliTouch). Users with poor motor coordination find it difficult to touch specific screen areas, even for large sized-pictograms. This algorithm checks how many fingers are touching the screen, if the hand was dragged on the tablet screen, how long the touch lasted and other factors to correct the imperfect touch performed by a disabled person.

Using Artificial Intelligence to Make Context-Base recommendations. Our first step towards introducing Machine Learning was successful in leveraging fast, contextual communication in mobile devices. LIVOX analyzes past usage data (i.e. item used, time of usage, GPS data, X and Y coordinates on screen, touch time), to predict what a person with a disability may want to say in a specific context as shown before in Fig. 2).

Natural Language Processing. In order to allow disabled users to answer questions faster, our existing work incorporates Natural Language Conversation for disabled users, enabling them to engage in conversation using LIVOX speech recognition. This feature continuously listens to the environment looking for trigger words such as the name of the disabled person, and activates communication based on sentences. We showcase in Fig. 3 the use of the NLP sentence classifier, triggered by the activation word.

Fig. 3.
figure 3

LIVOX Natural Language sentence classifier. The activation word (here Anna, the name of the user) triggers the trained sentence classifier. Once the sentence is classified as a Yes/No type of question, the Yes/No screen is presented as a choice.

3 Usability and Real Life Case Scenarios

We showcase here four scenarios of using LIVOX to enable a person with disabilities to communicate in four specific contexts: (1) in the morning, at home with a caregiver; (2) at school with a teacher; (3) in a restaurant; and finally (4) at home again, in the evening. For simplicity and natural flow, we use here a fictional character named Anna as the person who is non-verbal and has severe motor-skill limitations. Her specific disability is expressed by her inability to speak as well as severe difficulties in moving her hand to the desired item in the screen. To assist with both challenges, her LIVOX account has been configured to display four items at a time. The four pictograms are the “combined” recommendation of the various recommenders as well as the customized settings.

Fig. 4.
figure 4

Case scenarios for LIVOX usage during the day. Recommendations can be enabled or disabled, according to the needs and context for a given time and activity: (1) in the morning, at home with a caregiver, LIVOX recommender suggests the preferred breakfast food for Anna; (2) at school with a teacher, the LIVOX recommendations are disabled, in order to have more space to work on a literacy class; (3) in a restaurant, LIVOX recommender is active, and recommends Anna’s preferred food at that place; and finally (4) at home again, in the evening.

We note that the recommendations are hierarchical, consisting of subsequent “levels” of groups of four images. For example, in order to select “Cereal”, Anna would have to first select “I want to...”, then “Eat...”, followed by “Breakfast...” and finally “Cereal”. The further in the hierarchy a suggested item is, the higher are the gains in terms of time and physical effort to communicate.

The pictogram recommender is trained on the device, with the problem being posed as a multiclass classification problem, considering previously clicked items as classes for a given location. The model updates itself at regular time intervals by performing automated machine learning steps, including data preprocessing (stay point detection from GPS coordinates, time feature transformation, etc.), model training and, finally, storing the trained model on the local Android file system in order to use it for upcoming predictions. Random Forest is the default classification algorithm, chosen because it is computationally less intensive than other algorithms such as neural networks and that it can handle noisy data.

1. Communicating Basic Needs and Desires. Figure 4 (1) displays four pictograms presented as choices to Anna in the morning. These images have been selected by the LIVOX context aware recommender system as the most appropriate for this context. The recommendations are always presented with a different background color. For example, the first image gives Anna the opportunity to express the natural desire to eat breakfast, and this image shows as a top choice given the time of day (8am), the place (home) and the consideration of previous choices by Anna (she usually eats cereal at this time every day when she is home). We note here that LIVOX learns from the usage habits by keeping track of the frequency with which specific images are chosen, thus insuring that items that are favored show up ahead of other that are seldom used.

2. Teaching/Learning Literacy. One of the first steps in teaching literacy is assisting students in recognizing vowels and consonants. Such activities include verbal pronunciation, writing of the letter, giving and soliciting examples of words that begin with that letter, etc. We show in Fig. 4 subfigure (2) a scenario where Anna has to decide what is the initial vowel in the picture displaying an airplane. As an introductory step to this activity, LIVOX shows the letter in its written form handwriting and typed, it pronounces it out loud to generate awareness of the sound, and it gives Anna the option to see other images and listen to short songs featuring the letter in diverse contexts. To have more screen space, the teacher opts here to disable LIVOX recommendations during class.

3. Interacting in Diverse Social Contexts. We showcase here the case of a social interaction involving Anna being at a restaurant. As shown in Fig. 4 subfigure (3) she can now use the home screen to select her favorite food in that specific restaurant We want to emphasize that the suggestions presented to Anna at this moment are based on the existing context (time: late afternoon, place: restaurant, item: pizza) and they are different than she would get as recommendations if she was at home or another restaurant, for example.

4. Communicating Routine Activities. Now Anna is back home, in the evening, and the recommender is suggesting to her a “Snack”, as she usually asks for a snack before going to bed. This recommendation shown in Fig. 4 subfigure (4) is based on the time (evening), the place (home) as well as usage pattern (the most frequent item chosen at this time and place is the snack).

4 Conclusion

LIVOX is the first pictogram-based alternative augmentative communication mobile application to incorporate artificial intelligence to adapt to the needs of people with motor skill and verbal communication challenges. First, our system combines a wide range of algorithms to afford flexibility in creating custom items to be presented to the disabled user. Second, the application can adapt itself to diverse motor and cognitive disabilities through algorithms such as IntelliTouch, and Livox Natural Conversation.