Keywords

1 Introduction

Current electronic devices and consequently human-computer interaction have found their way into everyday life and now take part in many interactions which were formerly based on direct human interaction. Additionally, our society is aging rapidly which is why a growing user group of elderly, who are neither “digital natives” nor “digital emigrants”, will have to interact with the growing number of digital devices. Especially, if thinking about electronic healthcare devices, the question appears how these users interpret the signals a device sends and if there are differences compared to younger users, who have grown up with computer technology.

The use of mobile ICT systems for patients to independently monitor their own health has great potential to compensate for the expected results of demographic change by delinking medical care from local availability of medical staff. Here, telemedicine systems and services refer to the use of information technology and communication technologies within the provision of healthcare processes.

The interdisciplinary research project “Tech4age”, founded by the German Federal Ministry of Education and Research (BMBF), investigates how elderly users can be supported in using modern ICT for healthcare applications. One work package focuses on multimodal information representation and superposition and how (or if) this can compensate some of the restrictions in perception that can come along with age. The present paper describes the theoretical background and the development of an experimental setup including the development of an Android App for testing.

2 Theoretical Background

2.1 Common Restrictions in Perception for the Elderly (Unimodal)

For the ergonomic design of human-computer interaction, the consideration of the physiological changes in performance of sensory, cognitive and motor system associated with the aging process is essential. Here, the relevant explanation and prediction models for age differentiated performance have changed in recent years from deficit-oriented models to individual compensation strategies [1, 2]. This so-called “compensation” models take into account that the acquired knowledge and adapted practices and success strategies offer an individual range of possibilities [3, 4], which for example can be used for human-computer interaction [5]. Therefore the idea of “differential aging” was developed, emanating from the heterogeneous development of human skills (both in terms of capabilities and skills) [6]. The average power loss during aging can be explained by the accumulation of represent individual courses that remain for a long time constant and fall significantly only after reaching an individual limit in old age.

If someone takes a look on the specific changes that accompany aging it is obvious that they occur in every sense: Starting with the visual sense aging brings along a deficit in almost every visual function and therefore in the perception of the environment [7]. Reasons for the decline are the reduced elasticity as well as the yellowing of the lense [8]. As one consequence it is difficult for elderly to work with objects near to their eyes [9] as it can be the case with handheld devices. Taking a look at the faculty of hearing presbyacusis is a common consequence of the aging process. It leads to a generally impaired hearing ability but especially to worsened performance in high frequency areas between 1000 and 8000 Hz and in most cases speech comprehension is impaired as well [10, 11]. Furthermore older adults show reduced sensitivity in differentiating different frequencies especially those produced by the consonants “s”, “sch”, “f” and “z” [8]. Aging also leads to a reduction of tactile capabilities and skin sensibility [12] leading to difficulties in perceiving vibrations greater than 60 Hz [13].

Between 20 and 60 years, the reaction time is enlarged by approximately 13 % – 20 %. This process takes place very steadily. Between the 20th and 96th year of life, the response time for simple auditory requirements increases about 0.6 ms per year, with disjunctive requirements by about 1.5 ms per year at higher requirements these values continue to rise (as in a four-ways test by approximately 3 ms per year) [14, 15]. The properties of each object thus have a decisive influence on the reaction time, the age-related differences increase in proportion with the number of choices [16]. Substituting the use of a technical system with a task, which consists of three four-ways elements, resulting in an increase in reaction time of 9 ms per year. A 70-year-old would accordingly react 450 ms slower in the operation of a corresponding technical system as at the age of 20 years.

2.2 Multimodal Interaction and Reaction Time Experiments

In nearly all areas of our everyday life we react to feedback or stimuli that engage multiple senses. Thereby it is of special significance that we connect the input of the different senses in order to integrate the amount of information on an object or event [17].

One of the first and most basic scientific works regarding multimodality is repre-sented by the multiple resource model by Wickens [18]. It is based on the assumption that specific resources exist for certain functions such as processing level (perceptual-cognitive vs. motoric) or modality (visual vs. acoustic). Consequently there is more interference between tasks that require the same resources than resources that address different resources [19]. Talsma [20] showed that the multisensory integration takes place in a range up to 40 ms after stimulus presentation, meaning that stimuli that are presented with a distance of 40 ms are perceived as one unit.

In the context of modern technology multimodality plays an important role especially when it comes to the design of effective interfaces for touchscreen-devices [21]. Notifications of modern smartphones are often presented in a multimodal way and appear in form of blinking, sounds and vibrations. Many studies already showed different advantages of multimodal feedback:

According to Lee et al. [21] audio-visual feedback either alone or in combination with tactile feedback produces shorter reaction times. Furthermore they showed that audio-visual response is rated as more comfortable. In the context of this study participants had to dial a phone number on a touchscreen either in a single or a dual task combined with a recognition task. The tactile vibration feedback was presented by an 8.4 inch (21.34 cm) touch screen and for the auditory feedback a loudspeaker behind the screen has been used. The visual stimuli that had to be recognized were shown on a 19 inch (48.26 cm) monitor.

However, recently there are also contradictory findings in the literature. According to them especially haptic feedback either alone or in combination with visual response reduces reaction time [22]. Thus “the average of the conditions without haptic feedback had a mean of 0.77 s, while the average of the conditions with haptic feedback had a significantly shorter mean of 0.69 s” [22, p. 84]. However they used a search selection drag-and-drop task which leads to the assumption that for different contexts there are different feedback modalities that are advantageous. In this case visual and auditory stimuli were presented through a computer and haptic feedback from a vibrating mouse.

In the context of the application of multimodal feedback also technical experience has to be taken into account. By using almost the same experimental setup Jacko et al. [23] showed that experienced users benefit from ever kind of multimodal feedback whereas for less experienced users especially audio-haptic response is beneficial. These findings may be important for the use of modern technology by older people because according to Ellis and Allaire [24] age and computer knowledge show a negative correlation.

Multimodal interaction and elderly.

Taking a look on older users of modern technological devices someone needs to consider further particularities: When taking into account the processing of multimodal stimuli older people do have a greater vulnerability for visual bias if bimodal stimuli are presented [25]. Examples are the greater distractibility by items on screens and slowed processing of visual signals [23]. DeLoss, Pierce, and Anderson [26] showed that compared to younger people elderly show a prolonged multisensory integration. Therefore they used the sound-induced flash illusion paradigm [27]. Within this paradigm participants have to report the number of flashes that are presented at the same time as beeps occur. Sounds and visual signals where presented by a computer. An interesting theory states, that this longer and less exact multisensory integration could be a reason for the more frequent occurrence of falls by elderly [28].

Multimodal interaction in categorization tasks.

In addition to these findings about age-related changes Rozencwajg and Bertoux [29] stated that performance of older adults in categorization task declines by using an adjusted Wechsler’s similarity test. Regarding to Lenoble, Bordaberry, Rougier, Boucart and Delord [30] elderly show prolonged reaction times and more errors in classification of objects. However it must be noted that these findings are depending on centeredness and contrast of presented objects.

Previous research did already show that in general multimodal stimuli have a positive influence on categorization tasks. According to Molholm, Ritter, Javitt and Foxe [31] the presentation of congruent animal sounds combined with pictures of animals lead to an improvement of the decision whether the picture portrayed an animal or not. If someone compares the application of multisensory material in simple reaction tasks and choice tasks it is becoming clear that it enhances performances in both types of tasks but that its effect on choice tasks is even greater than on less complex tasks [32].

2.3 Knowledge Gaps and Motivation

The above mentioned studies show that, although multimodal information representation in general leads to better and faster performance, the detailed effects of multimodal information representation are highly task specific. Furthermore in a lot of experiments every feedback modality is presented by a different device which does not seem to be ecologically valid. That is why we decided to developed an experimental setup where multimodal feedback and interaction are given on one handheld device – a smartphone. That way a high ecological validity is given, as these devices are common in our days and are also often the devices where healthcare applications for the elderly take place (e.g. medication reminder to ensure adherence).

3 Experimental Plan

3.1 Main Idea

To investigate age effects in multimodal information representation and superposition we are planning to test a minimum of 30 subjects with ages ranging from 18 to 80 years in a categorical decision task. The stimuli will consist of haptic, auditory and visual cues presented alone or in each possible combination (haptic + auditory; haptic + visual; visual + auditory; haptic + auditory + visual) and will be presented in random order. To minimize other influences like learning strategies we will use only two categories and keep the stimuli as well as their unimodal characteristics very simple, clear to distinguish and easy to assign to one of the categories.

Each unimodal stimuli has only two forms of appearance: constant (category A) or rhythmic (category B). While “constant” means a continuous tone/vibration/image presentation of 500 ms “rhythmic” means an on/off pattern of the same tone/vibration/image presentation where three phases of presenting the stimuli for 100 ms are divided by two pauses with 100 ms each. This way both categorical stimuli types have the same length. While interpreting reaction times one has to keep in mind that the first 100 ms of the stimuli are the same and deliver no hint for the categorical decision task. As tone we will use a standard sinus wave with 750 Hz, so it is clear hearable in all age groups. The vibration signal will be the standard vibration of the Google Nexus 5. The visual signal will be a clear white square in the middle of the screen.

The experimental setup is programmed as native Android App and presented on a smartphone. As mentioned earlier we see it as important regarding ecological validity to present the whole setup on one devices in contrast to many laboratory studies, where signals are often presented on different devices (e.g. on a classic monitor for visual and acoustic stimuli and an extra vibration box plus external reaction buttons). However, using a native Android App can lead to more timing irregularities as discussed in the next chapter.

The experimental setup will be run as single task, where the whole focus of attention of the participants is on the reaction to the stimuli and under dual task condition, where participants will have to watch a movie in parallel while some questions about the content were asked later to ensure people divide their attention also towards the video. This dual task condition was chosen again because of its ecological validity, representing a typical everyday life situation where people might watch TV, while some signal might be incoming from an electronic device.

3.2 Procedure

After welcome the participant some demographic values like age and sight restrictions are collected followed by a visual test and audiometry. Then the participants are introduced to the setting by explaining their task and having a test run for about ten minutes, so the participants get familiar with the stimuli and the task. After that the single or dual task condition will start (order permuted). Each of those conditions will run for 30 min. The interstimulus interval will be 30 s plus a random time of up to 15 s which will ensure that about 40 stimuli can be proved so each of the 7 multimodal combinations will be presented about 5 times.

3.3 Statistical Analysis

Statistical analysis will be done by an ANOVA with repeated measurements using SPSS. Independent variable will be single or dual task as well as all multimodal stimuli combinations. Dependent variables will be reaction time (reactions longer as 3 s will be count as miss), misses and wrong categorizations. Age will be used as covariate in the analysis variances to determinate its influence (the traditional testing of age groups is not in line with the above described individual aging). A post hoc multiple regression analysis will furthermore investigate how much variance is determinated by sight and hearing restrictions based on the tests before the experiment.

4 Technical Development of an Android App as Multimodal Reaction Time and Categorical Choice Experiment

4.1 App Structure

The App is used in landscape mode. It has 2 buttons on the right side for reacting to category A or B. For lefthanders there is the possibility to switch the side of the buttons in a hidden preferences menu. Sound and vibration are initiated by the hardware and the visual stimulus is presented in the middle. Figure 1 shows an early prototype of the App.

Fig. 1.
figure 1

A prototype version of the Android App showing the reaction buttons on the right side. The image of the tiger at the top as well as the data window on the bottom will not be included in the final version and are for testing only. The slider on the left side is thought for online rating of subjective strain but still has to prove its validity in pretests.

Within the preferences of the actual developer version the experimenter can also set the stimuli dimensions included in the test (haptic, acoustic, visual), the interval between the stimuli (10–30 s), the random time which is added to the interval to make stimuli appearance less predictable (0–15 s) and the duration of the whole experiment, thought for easy pretest variation. Furthermore the subject number, age of participant, sex, diopter of both eyes, used fingers for reaction and a free field for comments can be added before the data is send via email.

The data structure is event based and has time since app start in milliseconds as first column followed by other columns describing the discrete event. The parameter calculation (e.g. reaction time) is done in SPSS later using successive differences.

4.2 Timing Tests

One potential problem while using an App for reaction time experiments are timing irregularities. Especially as computers are never (really) tight and mean reaction time differences between two experimental conditions are sometimes less than 50 ms. To determine these irregularities we programmed an app in the first place which sends a ping every 100 ms which was compared to the system time to show up timing irregularities. To simulate some interaction the whole screen changed color every 100 ms and a button was included that triggered an acoustic and vibration signal by pressing. Table 1 shows the timing irregularities of that test App on two devices based on 5 min testing (n of ping = 3000).

Table 1. Mean variation and deviation of time stamps in milliseconds during test series with Nexus 5 (Android 6.0) and LG G3 (Android 5.0)

As it could be seen in Table 1 the timing irregularities are for ping are less than 1 ms if no button is pressed. However, with some kind of interaction the amount of variation rises to 6–11 ms. Hereby the new Android 6.0 performs nearly twice as well as Android 5.0. Furthermore it could be seen by comparing the mean variation with the mean amount of variation that most timing irregularities are in form of a lag.

However, the interpretation of those timing irregularities might lead wrong, as they just describe the time when some action is recalled, not necessarily the time when some action is also completely displayed or played. Therefore we ran another test using a camera with 120 Hz filming. This test focused on the difference between the time a button was pressed (or released) and the time the system reacts to that interaction (by changing the color of the button) – the touch response time. Table 2 illustrates that there is a lag of about 55 ms for releasing the button and a lag of 72 ms for pressing the button which is align with findings on other hardwareFootnote 1. Both accompanied by timing irregularities as seen in the standard deviation. So while a linear lag can be subtracted for interpretation of reaction times the irregularities lead to blurred results, which are hard to interpret if the mean difference between two experimental conditions is only small in amount.

Table 2. Time lag in milliseconds between system time and video observed time of reaction (touch response time) using Nexus 5 with Android 6.0

In conclusion of these timing tests, we decided that the ecological validity of an experimental setup as native App is more important than some minor timing irregularities, especially for our use case of healthcare applications for the elderly: The lag, estimated as 72 ms for pressing reaction button plus 6 ms on internal time stamps, will be the same on all experimental conditions and therefore not influence the comparison. The irregularities, estimated as 12 ms for internal time stamps (if interaction is given) plus 8 ms for pressing button, is in its sum of 20 ms far below most effects found in literature.

4.3 Lesson Learned Coding the App

In the first stage of implementation, the stimulus was invoked by a CountDownTimerFootnote 2. The CountDownTimer is to schedule a countdown until a time in the future, with regular notifications on intervals along the way. Since the stimulus should not be invoked in a regular interval, but in a specific time range (regular interval + random time) the timer needed to be extended. Therefore the first operation within the onTick()-method was generating a random number to determine the time, the stimulus has to be delayed. This led to an influence on the UI and a bad user experience.

However, after some researches in scheduling of time-based actions in Java-Code the HandlerFootnote 3 and its method “postDelayed()” seemed to fit perfectly, since it is optimized to schedule runnables to be executed at some point in the future. Furthermore the UI does not get affected because the runnable within the Handler has its own Thread. After the code was processed, the Handler was called recursively until the time the experiment is finished. The use of the Handler in this specific case is shown in the code-example below.

5 Discussion

While investigating how multimodal information representation can compensate perception restrictions that can come along with age we presented an experimental setup as an Android App. Although we experienced and described a timing lag regarding touch response time and even more critical (although smaller) irregularities in timing we decided to use such an App because of its ecological validity, which we judge as more important in our use case of healthcare applications for the elderly.

Other researcher might also have a look at some frameworks for psychological experiments on Android as given in Open SesameFootnote 4 or ExpyrimentFootnote 5.