Keywords

1 Introduction

Globally, almost one in five people in the world is estimated to have hearing loss, with 5% needing rehabilitation [1]. The vast majority of people with hearing problems are older adults. Especially in the elderly, impaired hearing has been found to lead to a restriction of social relations and hobbies and to loneliness [2, 3]. Hearing problems are related to anxiety and depression, particularly in males, and in some studies, increased mortality has been described as being related to impaired hearing. Of all the single risk factors, hearing loss without the use of hearing aids poses the greatest relative risk for dementia; those who experience hearing loss in their mid-life will be more likely to experience cognitive decline later on. However, the use of hearing aids has been suggested to decrease this risk for dementia [4].

Technical rehabilitation, that is, the use of hearing devices, forms the foundation for how problems with hearing are compensated for. Transition into a new, advanced era in hearing care took place in the 1990s when digital hearing aids and multichannel cochlear implants were implemented into clinical practice. Cochlear implants replace the severely malfunctioning or not at all functioning sensory cells in the inner ear; they transform sounds into electric signals, process them through band-pass filtering according to the encoding strategy of the sound processor device and then stimulate the hearing nerve. Additionally, a variety of assistive listening devices, such as audio induction loops (hearing loops), FM devices and infrared systems, are available to help patients at home, at work and during leisure time activities.

1.1 Auditory Training is Needed to Maximally Benefit from Technical Hearing Rehabilitation

Providing the patients with hearing devices does not suffice―they also need counselling and mental support for learning the new skills needed when starting to use their hearing devices. Additionally, their brain often needs specific training to deal with the new technically processed and, thereby, altered auditory information the hearing instruments convey or produce. Namely, hearing aids not only amplify the sounds the hearing aid user is focusing on, but they also amplify all background noises. Moreover, hearing aids also process sounds in many ways, so the outcome may sound very distorted and annoying to the device user [5]. This is why getting used to hearing aids and listening with them may take weeks or months, even up to one year. A common clinical finding verified by research (e.g., [6, 7]) is that a large share of patients prescribed with hearing aids do not actually use them at all or use them only infrequently.

Time and training are needed to get the best benefit from the hearing aids and the altered sensory input they provide. Auditory training is, indeed, frequently needed in supporting the acclimatization process, which is the improvement in auditory performance as the hearing aid user gets used to listening with their new devices [8]. Furthermore, a minority of patients have lost their hearing to the extent that cochlear implantation has been necessary. Auditory training is also often needed to maximize the benefiting from cochlear implants―devices that are expensive technology and require medical operation and lifetime care of the patients. In both patients with hearing aids and cochlear implants, auditory training is especially needed when the progress in auditory skills is slow, the expected level of speech perception is not gained, or the patient struggles with asymmetric hearing.

In many Western countries, the rapid ageing of the population has drastically increased the need for hearing rehabilitation services. Even if the technical rehabilitation with fitting of hearing devices could be arranged by public health care with patients often needing to be on the waiting list for a long time, auditory training provided by, for example, speech and language therapists, can only be offered to a fraction of those needing it [9]. This means that there is a clear need to provide patients with materials and methods that are time and cost effective. Independent auditory training (self-training) is a noteworthy option for serving that need.

Materials Constructed for Auditory Training.

Auditory training as self-training in, for example, home environment has earlier been based on the use of videocassettes and CD ROM and DVD materials, such as Angel SoundTM and LACE® (Listening and Communication Enhancement) and, in Finnish, for example, Huulioluvun ja kuulonharjoituksen ohjelma [10]. Validated instruments for computer-based auditory training, such as websites or applications, are still scarce, especially when it comes to languages other than English [11]. Therefore, studies investigating the effectiveness of computer-based training in various languages typically develop new training and evaluation programs that are then aimed at being generalized into clinical use [e.g., 12-14]. Researchers have used various cloud solutions, for example, uploaded the audio files for listening onto Google Drive, and shared them with the study participants via e-mail [12] or placed the training materials onto websites, such as HEARO™, which is specifically constructed for auditory training [13, 15]. Another route for offering the participants access to the training program has been to offer them a tablet computer that has the program installed on it [16].

Today, a variety of computer-based and mobile device auditory training applications are available, though often only for English-speaking persons. Applications include, but are not limited to, the ones that Olson (2015) [17] and Völter et al. (2020) [11] have listed: AB Clix, Hear Coach, The Listening Room®, Read My Quips and LACE® (Listening and Communication Enhancement) and, for children, Angel SoundTM and i-Angel Sound and SoundScape. Examples of web-based auditory training platforms offered in languages other than English are SisTHA portal for Portuguese-speaking adults in Brazil [18] and the already mentioned HEARO™ website (Hearo.co.il) for Hebrew-speaking adults in Israel [13].

Outcomes of Auditory Training.

Some earlier evidence for the effectiveness of computer-based auditory training was provided, for example, in systematic reviews by Sweetow and Palmer (2005) [8] and Henshaw and Ferguson (2013) [19]. As reported in these reviews, both analytic (identification of single sounds by using the so-called bottom-up cognitive processes) and synthetic (targeting to understand the meaning of the message delivered by using the so-called top-down cognitive processes) training methods or their combination were used in the reviewed studies. Additionally, in some studies, different noise conditions had also been included in the training. However, in the former review [8], the number of articles meeting the inclusion criteria was only six, and the number of participants in the studies was rather low. Moreover, the articles included, particularly in the latter review [19], represented very low to moderate study quality; in addition, the reported improvements in auditory skills were considered small. Furthermore, when practicing with a computer, evidence of the ability to successfully generalize the performance obtained into untrained stimuli was not robust.

Later research has shown, however, with a stronger evidence base that various forms of auditory training are beneficial for individuals with hearing loss, whether the training material includes speech [16, 20]) or music [21]. Most recently, many studies have offered even more systematic evidence, here highlighting the advantages of computer-based training (e.g., [13, 16]). Furthermore, generalizability of the skills related to word-level items has been detected; Sato et al. (2020) [22] found that independent auditory training at home by using a tablet computer increased the intelligibility of both the perceived trained and untrained words in patients who had received hearing aids or cochlear implants one year earlier. In studies reporting positive outcomes, the training dose used was at least eight hours [8], and computer-based auditory training has already been detected as effective after one month of training [13]. In addition, compliance with computer-based training was found to be high (see, e.g. [16]).

Research has suggested that auditory training in adults can improve not only auditory performance, but also working memory, attentive skills and communication [23]. In their review, Stropahl et al. (2020) [24] concluded that the use of hearing aids together with auditory training is a beneficial combination. According to common clinical experience, it is especially important to use background noise in training because the auditory signal one tries to listen to needs to be separated from the background noise for speech perception to be successful. Indeed, adding various background noises as an additional challenge while training has been detected as beneficial [15, 16]. The use of music has also been found to support the auditory training performed with speech materials. Moreover, according to a systematic review and meta-analysis [21], music used in auditory training can improve musical perception; auditory discrimination, recognition, and sound localisation skills.

1.2 Speechreading Training is Needed by Both Children and Adults with Hearing Impairment

Speech is multimodal in nature, and audiovisual information is utilized in speech reception by all people and those with impaired hearing even more so. Visual support for speech recognition is important, especially in noisy and echoic listening environments. A need to support auditorily perceived speech with visual information has drastically been shown during the COVID-19 pandemic because people with impaired hearing faced big challenges by masks covering the important visual information they need from the face [25]. In speechreading (lip reading), all available cues of the movements of lips, tongue and the face, together with head, eye and torso actions, are used to determine the intended message of a speaker. Interindividual differences in speechreading ability are large [26]. Just like with auditory training, practicing in speechreading can be done using either the analytic (use of visual patterns in articulation) or synthetic (inferring by utilizing context) methods or both [27].

Since the introduction of digital hearing aids and cochlear implants, the use of hearing technology and auditory training have been at the centre of habilitation of children with impaired hearing. The utilisation of visual information about speech seems to have largely been neglected. This situation is confusing given the research base, which suggests that, in both hearing children and those with hearing impairment, visual information is important for learning to speak, in the acquisition of phonological knowledge and even in learning to read [28, 29]. Reading skills need to be supported because they are often compromised, especially in children with more severe hearing impairment [30].

Materials Constructed for Speechreading Training.

Just as with auditory training, speechreading skills take time and effort to learn. Speechreading training has traditionally been accomplished in live situations with a therapist or group, but such practice is prone to large variations. Gradually, over time, videorecorded materials have enabled more systematic training, which then advanced to the adoption of computer-aided programs and web-based applications, such as Lipreading.org. Currently, there are some speechreading computer applications available for research use [31]) and some for clinical and public use. Only a few applications can be used for free. To the best of our knowledge, some mobile applications, such as Lip Reading Academy, Seeing and Hearing Speech® and the Android application MirrorMirror [27], have thus far been developed.

Outcomes of Speechreading Training.

There is a paucity of evidence on how children benefit from speechreading training. Most studies have been conducted with adults and have shown modest improvements [32,33,34,35]). Overall, research has suggested that, in adults, there is often much room to improve speechreading skills [26]. Speechreading is not an immutable skill because it has been shown that speechreading ability improves between 7 and 14 years of age [36]. Some research suggested that children [36] and adults [37] with hearing impairment may be better speechreaders than individuals with typical hearing, but contrasting evidence also exists [29, 38, 39]). In hearing children, speechreading training has been shown to improve single-word speechreading and, even more importantly, general speech sound processing skills (phonological processing) [40]. However, intensive computerized speechreading training in 5- to 7-year-old children with a hearing impairment (N = 32) led to only a small improvement in speechreading test performance but greater improvement in some other outcome measures, such as answering everyday questions presented with visual speech, vocabulary and audiovisual speech production assessment [41]. However, there was a large variation in the amount of training realized. The use of easily accessible mobile applications might increase both the usability and frequency of training.

In the present article, we describe the contents, construction and use of two Finnish free applications—Auditory Track and Optic Track—developed for persons with a hearing impairment. Both applications are suitable not only for clinical purposes (used as training material in speech therapy and in independent training at home), but also for research.

2 Methods

2.1 Contents and Construction of the Auditory Track Application

The training material included in Auditory Track [42] comprises about 3,000 audio file items consisting of one- to seven-syllable words and one- to seven-word sentences. Additionally, some 200 audio files contain the sounds of various music instruments or musical pieces. For training, there are three task types for listening to speech (Table 1) and two for listening to sounds produced with musical instruments and pieces of music (Table 2). The speech discrimination tasks were partly based on Kuulorata käyttöön [43] material, which includes exercises presented in paper form (original training material by Cochlear Corp., translated and adapted into Finnish).

Table 1. Contents of the speech discrimination tasks of the Auditory Track application.
Table 2. Contents of the music tasks of the Auditory Track application.

Within each task type, all materials were categorized into three difficulty levels: easy, moderate and advanced. Additionally, in memory games, the difficulty level can also be further adjusted so that the user can always choose between six, eight and 12 cards (corresponding with three, four and six audio file–word pairs to be matched). At the easy level, words differ considerably from each other by phonemes and/or syllable number; at the moderate level, words are rather similar, and at the advanced level, they are very similar to each other. Within each task type and difficulty level, all target items are presented to the user in random order. Examples of some word- and sentence-level task types are illustrated in Fig. 1.

Fig. 1.
figure 1

Examples of the auditory training exercises (same or different word and which of the two or five sentences was heard) of the Auditory Track application.

The final task of the speech material comprises a set of 14 about two-minute narratives with five questions on the content of each narrative presented. For answering the questions after listening to each story, four alternative choices are always given (Fig. 2). The content of the training materials has been described in Finnish in more detail by Huttunen, Vikman and Pajo (2022) [44].

Fig. 2.
figure 2

After listening to a short narrative, the user of Auditory Track needs to answer five questions about the content of the story.

Both a female speech and language therapist and male professional actor spoke all the materials. Recording of the speech materials and their instructions was done using the AKG CK92 microphone and Reaper 6.14 software in a professional recording studio (the LeaF Research Infrastructure, University of Oulu, Finland). A calibration signal for the recordings was produced with a Bruel & Kjaer 4231 Sound Level Calibrator. Recording was done with a 96 kHz sampling rate and 24-bit depth, and the recordings were saved in.wav format. After this, all the materials were manually segmented using Audacity (version 2.1.3), AVS Audio Editor (version 8.0.2.501) and Praat (version 6.0.49/6.1.0) software and saved in mp3 format. When editing, all extra noises like smacks and loud, disturbing inhalations and misarticulated productions were removed. To enhance smooth streaming when using the Auditory Track application, down sampling into 48 or 44.1 kHz was done, and a variable bit rate between 170 and 210 kbps was chosen for the mp3 format. Additionally, sound level was equalized, and 1 s of silence was added to precede and to follow each single word, sentence or narrative.

Music was produced with the digital piano Kawai. The electronic sound synthesiser of the digital piano was used to produce the sounds of organs, pan flute, accordion, violin, trumpet and so forth for the tasks in which one needs to distinguish between the pitch (low/high) and tempo (slow/rapid) or the number of sounds produced with the instrument. For these same tasks and for playing the instrumental pieces of the music, acoustic guitar, violin and cello were played, in addition to the use of the digital piano. The pace of playing the songs (70 to 90 beats per minute) was determined with the help of a metronome, and 30-s musical pieces were played using the mid registers because this was seen suitable for hearing device technology and the typical features of hearing impairments.

One male adult, a trained performing musician, sang the vocal parts of the music material. For some of the songs, a female joined so that the songs were performed as a duet. All 10 musical pieces chosen to be presented were royalty free and consisted of well-known folk songs, hymns and children’s songs, such as ‘A Frog Went A-Courtin'‘ and ‘Old Mac Donald Had a Farm’.

Music and sounds were mainly recorded by using Audacity software, Shure SM7B Cardoid Dynamic microphone and Focusrite iTrack Scarlett Solo external sound card. Additionally, a minor part was recorded using either Zoom H2N audio-recorder or Audacity application, AKG C 544 L microphone and Focusrite iTrack Solo external sound card.

2.2 Use of the Auditory Track Application

With its gamification features and carefully planned usability, Auditory Track aims to maximally enhance the independent training of auditory skills. These features of training materials are commonly known to be helpful in training. The gamification includes sentence-assembly tasks (Fig. 3), together with bingo and memory games.

Fig. 3.
figure 3

Gamification examples of the Auditory Track application: assembling of three-word sentences and a bingo game.

Compliance in training with Auditory Track is supported by immediate feedback given (correctness of the answer chosen) and a reward system embedded in the software; gold, silver and bronze medals are credited after receiving a certain proportion of points (success in training) (Fig. 4).

Fig. 4.
figure 4

Feedback system embedded in the Auditory Track application for giving visual feedback (according to the correctness of the answer and granting medals).

In the task type in which short narratives are listened to, the user has, along with the option of no noise or echo (one attenuated repetition with a start delay of 0.7 s), two noise types to select from: white noise masker and speech-shaped noise masker (consisting of mixed sentence-level speech). After selecting the noise type, the user can select the noise level using a slider (see Fig. 5).

Fig. 5.
figure 5

Selection of the speaker (female/male) and listening mode; silence, echo and noise (including noise type and level) in the Auditory Track application.

The progress bar shows the time elapsed when using the Auditory Track application. It is placed in the uppermost part of the smart device screen, with the 30-min bar automatically being reset to zero daily. According to the instructions of Auditory Track, it is recommended that training should be done three times a week for 30 min at a time for a minimum of two months. This instruction was based on studies reporting positive outcomes in auditory training when the training dose varied from 10 to 50 h, with no difference in outcomes between training done twice or five times a week [20, 45]. In their study, Humes et al. (2014) [45] concluded that, for auditory training to be effective in older adults, practising should take place, at minimum, two or three times a week for 5–15 weeks. Streaming directly to hearing aid(s) or cochlear implant(s) is recommended [12, 16]. This also enables auditory training focusing on one ear only which is very helpful when hearing is asymmetrical or when the patient gets a new device.

When coding Auditory Track, an abridged pilot application, Kuulorata Pilot (available in Google Play and App Store), was used as the starting point. Technical implementation of Auditory Track was performed by a Finnish company, Outloud Ltd., by using the Unity game engine and C# language.

2.3 Contents and Construction of the Optic Track Application

The speech material included in the Optic Track application [46] comprises almost 3,800 silent videos. Most of the items are single words, having one to seven syllables, but the material also contains about 400 sentences, each having one to seven words. The frequency of words occurring in the Finnish language was also checked from some databases and other sources and considered when selecting the words. Visemes are groups of phonemes (speech sounds) having similar lip shape, articulation place or other visual cues of the articulation gestures. One of the main principles in constructing the items for the Optic Track application was to provide training material in which discrimination needs to be done between different viseme categories to separate single words from each other (e.g., pahvi (cardboard) – kahvi (coffee)). In some cases, the difference between the words to be discerned from each other is based on segment duration (e.g., tuli (fire) – tuuli (wind)) because duration is a feature affecting the meaning of words in Finnish. Both in word- and sentence-level task types, the items to be compared (same/different, which one of these two/four/five words or sentences was spoken; see Fig. 6) were allocated into easy, moderate and advanced levels. Categorization of the difficulty level was based on, for example, the viseme categories and length of words and sentences.

Fig. 6.
figure 6

Examples on the sentence-level exercises of Optic Track include discrimination between two (which one of the two sentences was spoken) or four items (which of the four sentences representing the same length was spoken).

The tasks mainly cover word, sentence or phrase levels. In addition, a minor part of the speech material also represents connected speech level because, in one task type, the user needs to discriminate whether the three-sentence stories are identical or not.

Six speakers—three females and three males (three speech and language therapists, one teacher, one special education teacher and one layperson)—produced the speech materials at the recording studio of the Digital Pedagogics and Video Services of the University of Oulu. For recording the audiovisual materials, the Sennheiser SE 2 wireless clip-on microphone system and Panasonic AW-UE100 video camera were used, and the files were saved with Open Broadcaster Software into transport stream format (.TS with h264 codec) using a 1920 × 1080 resolution and 50 fps. A green screen was used as the background to allow for later replacement with the desired style.

Editing and cutting of the video material was done using a process with command-line tools (ffmpeg, bash) to allow precise frame-level accuracy and a unified output. The long studio sessions included several takes of words and sentences; manual editing of all samples to match the desired short clip characteristics would have been burdensome. Initially, automated subtitling was done using YuJa video platform auto-captioning. This resulted in a subtitle file in .SRT format, giving time codes within longer per-session files, despite a few auto-captioning errors. This was hand-matched to the studio session script, documenting which words and sentences were recorded. The best samples and cut points were then manually iterated at a frame-level accuracy (20 ms for 50 fps video). Special care was taken to verify that the cut points started and ended with as neutral a face as possible, with eyes open and mouth closed. These edits typically resulted in 1.5–2.0 s clips for isolated words and 2.5–4.0 s clips for longer sentences, with clear speaker-specific differences.

After the exact cut points had been defined, the original green screen was replaced with a neutral blue tint gradient background. Then, the video clips were cropped to a 3:4 aspect ratio clip containing only the facial area, with a resolution of 672 × 896. The first frame of the video was extended to 0.4 s of still face. At the end of the video clip, a symmetrical 0.4 s washout period was edited with only the background visible, so the originally cut videos were extended by 0.8 s. All the ffmpeg-based editing steps were stored in a bash shell script to allow relatively fast re-encoding from the source files in case there was a need to adjust the video clip details (start and end period durations, video format). Finally, the video files were saved without audio in an mp4 format using 50 fps and h264 codec, and the functionality of the videos was verified on a selection of mobile phones.

For alternative choices in many memory games and bingo tasks (Fig. 7), royalty-free colorful drawings or color images were selected from domestic and international image databases. In addition, about 100 colorful drawings were self-constructed when no suitable illustrations were available. Illustrations were added to the application to increase its appeal and support training motivation.

Fig. 7.
figure 7

Two gamification examples (bingo and memory games) of the Optic Track exercises.

Technical implementation of Optic Track was performed by a Finnish company, Outloud Ltd., by using the Unity game engine and C# language. Coding was partly based on the Auditory Track application. Special attention was paid to constructing the user interface and the graphical outlook as the main users of the application are children.

The research version of the Optic Track application, also coded by a Finnish company Outloud Ltd. and constructed for our research group, collects data on active time used for training (the application pauses after inactivity of one minute), time stamps regarding pausing and progress realised in training, that is, points and medals earned. This information is stored on the mobile device in a custom binary log file. The log file can be transferred from the device either via standard file browser and available file sharing methods (USB cable, wireless transfer) or directly from the Optic Track application via Bluetooth and computer host program written in Python. The binary log file can then be converted into a simple text CSV file for easier analysis.

2.4 Use of the Optic Track Application

The user can repeat watching the videos of the Optic Track application as many times as needed and select to play all the videos in either normal or slowed playback mode. The slowed pace is two-thirds of the pace of the normal playback pace. User interface of the application was made as clear as possible with easy-to-understand buttons helping in navigation.

The suggested amount of training with Optic Track is 45 min a week, for example, three times a week for about 15 min per training session. Practising should take place for a minimum of eight weeks, comprising a total of six hours. This recommended ‘training dose’ is based on research reports in which training leading to improvement of skills took place from one to five times a week and extended over a period of three to 12 weeks, with the duration of training sessions totalling from two to eight hours [31, 40, 41].

The time used for using the application is shown on a bar on the uppermost part of the mobile device’s screen. After a minute of inactivity with training, the application is automatically paused with an announcement about it seen on the screen of the device used. The bar is divided into three parts, each showing 15 min. The user is verbally praised for having done the practising three times (again, a text appears on the screen), once after each 15 min and, finally, after 45 min. The user is also prompted to continue with practising, if they have an interest in doing so, even after this suggested weekly training dose has been fulfilled.

Accessibility was increased by giving feedback on correctness of the answers not only by using colors but also using symbols. Feedback is given immediately after the user selects their response and, additionally, through a reward system with gold, silver and bronze medals included in each of the 13 task types and their difficulty levels (see Fig. 8). The functionality of the application includes a summary of the number of medals earned across the whole application that can also be checked. For further improving compliance, in the application’s instructions parents are instructed to support their child by taking part in the training and, for example, reading aloud the written alternative choices and feedback messages provided by the software if their child’s reading skills are in an emerging phase or the child has difficulties reading.

Fig. 8.
figure 8

Some features of the feedback system embedded in Optic Track (showing the correctness of answers and rewarding with medals according to the scores obtained).

When on the results view of the application, using the replay button (see Fig. 8), the user can look in a slow motion at the videos corresponding to the items in which they have succeeded and not in identifying the target.

3 Results

3.1 Implementation of the Auditory Track Application in the Medical Rehabilitation Field

After the Auditory Track application was completed, informing about its existence was actively taken care of. A poster presenting Auditory Track was created and sent to the 22 hearing stations and hearing centres at tertiary care Finnish hospitals in which hearing aid fittings and, in some cases, cochlear implant sound processors are also programmed. Staff members were asked to hang the poster on the wall for their patients to see. When needed, they have helped the patients download the application. Two slides giving information about the application were also created for use on digital displays of these hospitals’ waiting halls. In addition, internet, social media, webinars, national seminars of hearing care professionals, patient organisation magazines and direct contact with the regional associations of the Finnish Federation for Hard of Hearing have been used as routes for reaching the potential end users of the application.

Among different digital care and rehabilitation services, independent auditory training with the help of the Auditory Track application has recently been recommended for professionals in the National Criteria for Referring People to Medical Rehabilitation 2022 Guide for Healthcare and Social Welfare Professionals and Those Working in Rehabilitation Services [47]. There is also another national route for informing about the Auditory Track: Terveyskylä (Health Village) [48] is a Finnish public 24/7 service for everyone who needs information and support in health issues. Among other advice and instructions for supporting auditory training, Auditory Track is also introduced among its materials (https://www.terveyskyla.fi/kuulotalo/ohjeita/).

About 70% of mobile phones and tablet computers in Finland have an Android operating system, and the rest have iOS. Auditory Track can be used in Android devices with an operating system of 5.1 or newer and in iOS devices 11.0 or newer. In December 2023, there were more than 2200 downloads from Google Play onto Android devices and more than 1100 downloads from the App Store onto iOS devices.

At one hospital, one cochlear implant user had realized with the help of Auditory Track that his main problem was in understanding speech in speech-shaped background noise. He found particularly valuable the possibility to train speech perception with different background noises. According to a recent MA thesis in Logopaedics [49], after an average use time of eight hours during a four-week independent training, nine adult users found Auditory Track mostly clear, convenient, and easy to use. The utility of the training application was perceived as good, the speakers were articulate, and musical parts were clear. The included gamification was appreciated, and the application helped the users become aware of their own hearing level. Some users hoped that it would be possible to change the background noise type in the middle of task execution.

3.2 Implementation of the Optic Track Application in the Medical Rehabilitation Field

A research version of Optic Track is currently used in the multidisciplinary and multiprofessional research project Gaze on lips?Footnote 1 in which, among other things, the effectiveness of Optic Track in improving the speechreading skills of hard-of-hearing children aged 8 to 11 years is studied. The aim is that the participating children use the application for 45 min a week over an eight-week period, totalling six hours of training.

After the data for the currently ongoing research project Gaze on lips?1 have been collected, the forthcoming ‘consumer version’ of Optic Track will be downloadable free of charge from the application stores Google Play and App Store in the same way as Auditory Track already is. It is intended that information about the Optic Track application would also be included in the Terveyskylä (Health Village) materials.

4 Conclusions

Our aims have been obtained because we now have, for both auditory and speechreading training in Finnish, free applications that can be used by both deaf and hard-of-hearing children and adults. Additionally, research focusing on interventions can also be conducted by utilising both applications.

Since the 1980s, closed captioning has aided persons who have problems with hearing. Along with the latest huge leaps of technology based on advanced knowledge of computer linguistics, computer sciences and electrical engineering, persons with hearing loss can also get help from automatic speech recognition. With this, speech can nowadays be transformed into text practically in real time. However, the use of assisting technology does not remove the need for auditory training because of its benefits for an individuals’ speech perception processes.

In addition to technology aiming to help improve the speechreading skills of human beings, automatic speech recognition technology has also been actively developed for machines and has accomplished many kinds of activities related to speechreading. According to a summary of Pu and Wang (2023) [50] and a recent systematic review based on 23 articles on deep learning by Santos, Cunha and Coelho (2023) [51], progress has been made using computerised speechreading in research projects, but at the word level, for instance, the results still remain below human performance. However, one exception—Watch, Listen, Attend and Spell—which utilises computer vision and machine learning and is based on training with immense video material, has been reported to clearly outperform at least one professional speechreader [52]. A special machine learning-based SRAVI iOS application (https://www.sravi.ai/) has been developed for, for example, staff members in hospitals in helping them understand speech produced by tracheostomy patients who cannot use their voice normally. In the future, portable speechreading technology based on artificial intelligence may also provide real-time help in real-life situations for those with hearing loss. Before reaching this situation, the aim is to help persons with impaired hearing train their skills as much as possible. This is important because most communication situations take place ‘in the wild’ without any human-technology interaction. Moreover, because speech perception is multimodal in nature, both senses need to be utilised. It is beneficial because an improvement in auditory skills can enhance speechreading improvement [53], and speechreading, in turn, is widely known to help in recognising audiovisual speech.