1 Visual impairment, visually impaired (VI) people and gaming

Visual impairment can be defined as any functional limitation of human vision that cannot be corrected by means of corrective glasses or contact lenses. This term encompasses all degrees of loss of vision, including low vision and blindness, that affect the ability of a person to perform the typical tasks of daily life (Bailey and Hall 1990). Most forms of visual impairment are defined in terms of visual acuity, which is a measure of the ability of the human eye to distinguish, e.g., different shapes or objects’ details at a given distance. Alternatively, visual impairment can be determined by a reduction of the visual field, leading to frequent eye movements or head turning to cover the area usually monitored within the field of view. Visual acuity is typically expressed as a ratio y/x, where:

  • the numerator (y) represents the maximum distance in meters at which the subject can discern a so-called optotype (i.e., a standardized symbol for testing vision, such as a specially shaped letter, number, or geometric symbol represented in black against a white background, that is, at the maximum possible contrast);

  • the denominator (x) represents the maximum distance in meters at which a person with normal visual acuity can discern the same optotype.

For example, visual acuity of 3/6 means that the subject can only read 3 m away the same material that a person with functional eyesight could read at most 6 m away. A variety of degrees of visual impairment exists, such as low vision (which stands for visual acuity smaller than 6/18), legal blindness (which refers to visual acuity smaller than 3/60), and color blindness, i.e., the impossibility of distinguishing specific colors. As reported by Sekhavat et al. (2022), in 2018, the World Health Organization (WHO) estimated 285 million Visually Impaired (VI) people in the world, 39 million of which were also blind.

People with sensory deficits, such as visual impairment, can be limited in their access to certain social activities. One of these cases is represented by games (intended as ludic activities, as in the case, e.g., of board games and video games) due to their frequent inaccessible design (Bolesnikov et al. 2022). Often, barriers to access exist for specific games, because such games were not initially designed having in mind players with special needs (Thevin et al. 2021). A typical barrier is given by an inaccessible gaming rulebook (e.g., one that contains several images that are difficult to be interpreted by optical character recognition systems), which prevents even the possibility of understanding the rules of the game. Other possible barriers are represented by the use of too similar colors, the absence of sounds, the presence of timed actions (whose timings may be too short compared with the amount of time needed by a person with visual impairment to acquire the information needed to select such actions optimally), and, in the case of board games, by the adoption of too similar shapes for distinct pieces/place holders, which can make it difficult to distinguish them by touch. Such barriers can determine dependency on others, gaming abandonment, and an unfair game experience (see Bolesnikov et al. 2022). Removing these hampering elements as much as possible is essential, since, according to da Rocha Tomé Filho et al. (2019), playing board games is an important way to promote the integration and socialization of participants with visual impairment. Indeed, an increase in their autonomy in playing board games—e.g., in their possibility of playing without the help of fully sighted people—is deemed to have positive effects on their social interaction, quality of life, and, in general, personal fulfillment. It is worth remarking that, according to Thevin et al. (2021), a possible drawback of the help from a fully sighted person is that it could reveal the potential next moves of a player with visual impairment to her/his opponents. Moreover, according to Bolesnikov et al. (2022), even the use, e.g., of the Braille system to help blind people with card identification has some drawbacks, since fully sighted individuals could gain an unfair advantage by memorizing patterns in blind people’s cards.

It follows from the paragraph above that people with disabilities typically express the desire and need to play games. For instance, as reported by Bolesnikov et al. (2022), according to a survey conducted by the Accessibility Foundation, 92% of the survey participants with disabilities reported playing an average of 10 h per week. Similarly, based on an online survey, Prarazu et al. (2020) found that VI people, particularly those who are blind, are very fond of gaming, despite the current limitations in game accessibility, i.e., in their possibility of playing a specific game. Surprisingly, some blind participants declared they could play video games not designed specifically for them, such as the arcade fighting video game “Mortal Kombat”. For this game, accessibility was possible, since different fighting moves are associated with distinct sounds, and blind persons can use sounds to their advantage much better than fully sighted people (Ricciardi et al. 2020). Indeed, blindness has been associated with altered processing across multiple auditory functions (see Sabourin et al. 2022 for a recent review). A large body of evidence suggests that congenital and late-onset blindness can determine compensations over specific higher order auditory functions, leading to performance enhancements (Röder et al. 2020). These effects have been observed in the context of spatial (location and motion) processing of auditory stimuli (Battal et al. 2019) and for tasks based on fine (spectro-temporal) auditory analyses, such as mnemonic representations of sounds (Röder and Rösler 2003), verbal memory (Amedi et al. 2003), frequency tuning (Huber et al. 2019), speech comprehension (Dietrich et al. 2013), and auditory temporal resolution (Muchnik et al. 1991). This altered auditory processing calls for a crucial role of audition when designing games specifically for VI people. Accordingly, audio feedback is extremely important for such people to locate players and objects. In this regard, results of the online survey reported by Prarazu et al. (2020) stressed also that auditory feedback helps visually impaired individuals move faster in a game. However, in general, a multimodal approach to the design of accessible games for VI people should be considered. For instance, in the case of residual vision, the design of fully customizable interfaces in which it is possible to move and resize visual elements of the game looks crucial for an effective and accessible game experience.

Among games played by people with visual impairment, tabletop games (i.e., non-digital games such as board games, card games, and paper-and-pen games) represent an effective educational tool that has several important applications, such as teaching mathematics, developing social skills, and, in the case of people with visual impairment, teaching Braille. This is a relevant issue, since, according to the National Federation of the Blind (2009), more than 80% of people with visual impairment are not able to read Braille. Considering another typology of games, the accessibility to VI people of educational video games was recently considered by Neto et al. (2020). As discussed therein, for the design of such games, it is essential to focus not only on the balance between playful and educational aspects but also on reaching the largest number of potential players possible.

2 Contributions and structure of the work

Given the framework reported above, this work aims to provide an up-to-date review of the literature on game accessibility for people with visual impairment. A special focus of the work is given to:

  • discussing benefits, limitations, and possible improvements of currently available game accessibility solutions for VI people;

  • highlighting possible policy implications of improvements in such accessibility.

The present work differs from the previous review paper by Yuan et al. (2011) on game accessibility, which was more focused on accessibility solutions for video games (the case of board games is considered in more depth in the current work). Moreover, Yuan et al. (2011) analyzed several kinds of disability, such as visual, hearing, motor, and cognitive impairments, presenting an overview of some of the accessibility strategies developed to deal with each of these impairments. Instead, the present work focuses on visual impairment, considering the most recent literature related to game accessibility for VI people and the associated technologies. Indeed, among the 60 references cited in the work, 49 appeared after the year of publication of Yuan et al. (2011), whereas 31 appeared in 2020 or later. The present article differs also from the works by Pedrini et al. (2020) and Uzan and Wagstaff (2017), which focus on discussing accessibility solutions for VI people considering different applications from gaming (namely, music production software known as Digital Audio Workstations or DAWs, and transport systems). Nevertheless, several of the ideas presented in such works (e.g., orientation, localization, and access to information in the case of Uzan and Wagstaff (2017)) can be translated to the context of game accessibility for VI people, since both such works and the present article refer to accessibility solutions proposed for the same category of end users.

The review is structured as follows. Section 3 describes some general techniques useful for designing accessible games in the case of VI people. Section 4 focuses on specific accessibility-enhancing techniques based on replacing visual stimuli with auditory stimuli, namely sonification and sound-source simulation. The section also details some recently proposed sonification-mapping strategies. Section 5 presents a case study of the application of machine-learning techniques to the accessibility of the online version of a board game, focusing on the case of VI people. Finally, Sect. 6 ends the work with a discussion and presents some conclusions.

To end this section, Fig. 1 categorizes the main accessibility techniques discussed later in the review, also reporting the sections of the work in which they are presented.

Fig. 1
figure 1

Main accessibility techniques discussed in the review and their relationships. Game accessibility solutions for visually impaired people based on sensory substitution include haptification and visual-to-auditory conversion (Sect. 3). The latter include sonification, text to speech, and sound-based cues (Sect. 4). Machine learning (Sect. 5) can also be used to provide image classification and next move suggestion (both followed by sensory substitution)

3 Designing accessible games for VI people: general techniques

To promote their effective inclusion, it is important to provide VI people with the conditions to play together with other people, regardless of their visual abilities, and with similar winning chances. This holds both for video games and for other games that do not need a screen in their original version but can become more enjoyable by VI people by including a suitable digital interface (e.g., a sound-based or a haptic-based interface). In two works by da Rocha Tomé Filho et al. (2019, 2021), a set of guidelines were defined to adapt board games, making them more accessible to people with visual impairments. Among all the possible ways that can be exploited, the use of sound-based descriptions was considered therein as a very important means to help improving players’ immersion. According to Bolesnikov et al. (2022), other ways to improve accessibility are the use of Braille rulebooks, accessible websites, or Quick Response (QR) codes for the rulebooks; the adoption of physically distinct game elements; the use of sound for state communication and orientation purposes (e.g., to indicate how close one is to a specific element of the board, or which pieces one can take in the current turn). Moreover, Bolesnikov et al. (2022) highlighted the importance of allowing players with visual impairment to configure games’ elements, such as color contrast or font size, by themselves, increasing their autonomy. It is worth mentioning that, although they are often related, accessibility is not the same as inclusion, which is the possibility of people with and without special needs playing a game together. As reported by Thevin et al. (2021), an example of a video game that is both accessible and inclusive is “Kinaptic” (Grabski et al. 2016). This is a virtual 3-D pursuit game in which the player with visual impairment has to catch the fully sighted player who is digging a tunnel. In this case, a fair game experience is obtained by providing two different interfaces for fully sighted players and players with visual impairment. Indeed, the system is characterized by an asymmetric setup based on touchless Kinect interaction for the fully sighted player and haptic, wind, and 3-D audio feedback for the player with visual impairment. Morelli and Folmer (2011) achieved inclusion in the video game “Kinect Sports” by providing sensory substitution (either haptic or auditory) to detect the game state (for instance, the presence of an approaching obstacle that the player tries to avoid by jumping). For that game, the average players’ performance was evaluated experimentally by letting persons with and without visual impairment play (the latter interacting both with the original version of the game and with its modified version). Remarkably, the average player’s success rate turned out to be practically the same in the various experimental groups. Instead, in the case of “Blind Hero” (Yuan and Folmer 2008), which is an accessible version of the music rhythm game video game “Guitar Hero”, a significant difference in accuracy was observed among fully sighted players playing with vision (i.e., playing with the original version of the game) and fully sighted players playing without vision (i.e., playing with its modified version). In more detail, accuracy (defined in this case as the percentage of events in which a player presses the right button combination) turned out to be more than 70% in the first group and less than 35% in the second group.

Thevin et al. (2021) also reported some considerations about which game mechanics are accessible (or are likely to be made accessible) and which are not. For instance, card games are usually accessible, since the cognitive load required to choose, distribute, and memorize cards is not too high. Similarly, negotiation games are typically accessible, because most of their elements are shared among the players. Instead, games involving silent collaborative decision-making (e.g., games in which a team must decide whether to eliminate or not another player using only gestures or grimaces) are typically inaccessible (and difficult to be made accessible, even using advanced technologies). According to Thevin et al. (2021), a category of games that are complicated to adapt through hand-crafted solutions is represented by those in which the spatial layout of the board is essential for game playing. This holds especially in the following cases: (1) the configuration of the board depends on the actions of the players; (2) the presence of visual patterns in the board has an important role in the game (e.g., in “Arboretum”, the score depends on the longest path between two identical trees on a grid); (3) the size of the board is quite large, or it evolves in time. Moreover, among issues that cannot be easily solved by hand-crafted adaptations, Thevin et al. (2021) reported the knowledge of the locations of the players’ pawns to decide the next move. In each of these cases, it was argued therein that sonification/haptification of the game can decrease the high cognitive load needed to play such a game in the presence of vision impairment. Thevin et al. (2021) also developed a specific application of Spatial Augmented Reality (SAR) to deal with these issues. As a case study, they augmented a board game (“Jamaica”) that was previously considered inaccessible to make it playable simultaneously by people with and without visual impairment. Their specific SAR system can detect the pawns’ positions and provide an audio overview of the game. Additionally, people with visual impairment can obtain information on a specific location by audio feedback triggered by touching that location, whereas fully sighted people can get the same information simply by looking at the location. Thevin et al. (2021) acknowledge some unresolved issues common to other augmented reality systems in their SAR system, regarding image detection. Indeed, calibration and finger tracking were sensitive to light, leading to some delays in game playing.

According to Prarazu et al. (2020), blind people typically rely on digital applications, usually called screen readers, to interact with a computer, a smartphone, or another device. The screen reader input is the contents of the Graphical User Interface (GUI), whereas its output can be either audio (i.e., via voice synthesizer based on a Text To Speech, or TTS, module) or in the Braille format via a Braille display. Braille displays are hardware devices able to translate text from a computer to the Braille language (Prarazu et al. 2020). Such devices contain an array of up to 80 refreshable cells, where each cell consists of eight dots. The latter can rise and form words, depending on which area of the text is under focus. Such displays are very useful but also expensive, so not everyone can afford them. For instance, in 2018, the cost of refreshable Braille displays ranged between $1500 and $10,000 (Rob 2018). Recently, metasurface grids were proposed as possible wearable Braille communicators composed of a small number of macro-pixels, i.e., patches of several pixels that are excited by a single actuator (Bilal et al. 2020).

Interaction with a computer via a screen reader is typically performed using keyboard shortcuts, which are considered to represent a better option than using the mouse (Archambault et al. 2007). Indeed, the use of the mouse could be perceived as being associated with a visually based spatial reference, and this should be avoided in the case of blind people. Examples of screen readers are:

for Windows: Jaws® (https://www.freedomscientific.com/products/software/jaws) and Narrator (https://support.microsoft.com/en-us/windows/chapter-1-introducing-narrator-7fe8fd72-541f-4536-7658-bfc37ddaf9c6); for macOS: VoiceOver (https://macfortheblind.com/What-is-VoiceOver).

Every screen reader has its shortcuts for reading the next word or the next line of text and for moving across different elements in the user interface or on a web page. A special care is needed for elements such as images on web pages or electronic books and documents. Indeed, to be accessible, images need to have text alternatives. These should describe the information or functions represented by such images and should be readable by screen readers. For instance, as reported by Bolesnikov et al. (2022), current screen readers cannot read symbols on cards directly, in the case of gaming. If an application (e.g., a game) is not made accessible for the screen reader, then it is typically impossible for a blind person to use it without help from a fully sighted person. This can easily induce a sense of frustration, which could be contrary to the main goal of a game, which is typically to amuse players. Nevertheless, according to Beeston et al. (2018), accessibility often comes at the cost of too simplified game mechanics, in the sense that games that are developed taking disability in mind are often childlike and too easy to be challenging for an adult. Moreover, alternative text is often useless for VI people (this can occur when such a text simply reproduces the name of the image file, and this is not representative of the image content). In addition, alternative descriptions are typically added manually by a person when a web page or, more generally, a user interface is developed. In this case, to provide additional information to VI people, the automatic recognition of graphical components would be helpful to support accessibility. This holds especially for games characterized by a significant presence of such graphical elements.

Interacting with a game on a touch-screen also requires its user interface to be designed for being accessible via a screen reader. Nevertheless, the screen reader on a touch-screen-based device relies on the use of gestures that could interfere with other common gestures used by the operating system or by applications, including games. Leporini and Palmucci (2018) investigated how the common drag-and-drop gesture could be made accessible on a touch-screen. To this purpose, the authors designed a matching quizzes-based game in which the drag-and-drop gesture could be used to match words and answers. In addition, the use of audio feedback allows the players to better orient themselves in the context of the game. Ahmetovic et al. (2022) designed a game for touch-screen devices for blind children to practice reading-writing. The “WordMelodies” game was specifically designed for blind children as there are not many applications accessible via screen readers on mobile devices for users of that age. The game uses the audio channel to engage the children and support them in the game while learning using gestures made on the touch-screen.

Johnson and Kane (2020) developed “Game Changer”, a system that combines audio descriptions and tactile landmarks to increase the accessibility of board games to VI people (particularly to blind people). The main goal of the system is to monitor the movement of game pieces across the board, relying on the images captured by a camera positioned above. The player uses a keypad to query the state of the board, receiving information via TTS feedback (e.g., through headphones). Moreover, the system allows players to include tactile overlays to the pieces and also use game tokens for easier recognition. In any case, the main features of the system are available regardless of the presence/absence of tactile overlays and game tokens. The audio feedback is provided only on request to avoid interfering with players’ conversations and because a player’s hand often occludes some parts of the board during her/his turn. Remarkably, the “Game Changer” system adopts a game description file for each game, which allows its extension to new games by simply changing that file. Although each file is generated manually, its generation process can be quite fast [about 1 h per game, as reported by Johnson and Kane (2020)]. Moreover, the system is thought for playing board games in a traditional way (i.e., on a table, not online), so an extension of its interface would be needed to adapt that system to the case of board games played online, based, e.g., on the application of suitable machine-learning techniques (Kalita 2022) for easier identification of the pieces. It is worth mentioning that the “Game Changer” system already uses some machine-learning techniques when comparing the image of the board acquired by the camera with the one reported in the game description file. However, additional machine-learning techniques, such as those that rely on dimensionality reduction (Fantoni et al. 2023; Gnecco and Bacigalupo 2021) and/or on suitable a-priori information about the specific learning task (Bargagli Stoffi et al. 2022; Gnecco et al. 2022), could be applied. These techniques could be particularly useful for games without tokens or in case less information is available about the specific board. Moreover, the current version of the system described by Johnson and Kane (2020) does not contain any sophisticated sonification of the board, which could be included as a further improvement. Finally, Johnson and Kane (2020) also report some feedback from blind users who tried their “Game Changer” system. An interesting outcome from these interviews was that, even though for some games Braille cards are available, such games still lack accessibility, since a blind user may have no idea of what is on the whole board, i.e., a high cognitive load would be required to be able to construct a mental map of the board.

Caporusso et al. (2010) proposed an advanced electronic device for increasing the accessibility of sight-impaired people to play board games online. Since it was mainly thought for deaf-blind people, the main goal of their system was to provide haptic feedback to the user related to both the position of the pointer on the digitalized board and the information on the identity of the object located in that position. Among plausible extensions of their system, the possibility of producing a sonification of the pieces on the board could help people having only visual impairment to construct a mental map of the board itself. As in the case of the “Game Changer” system, other performance improvements could be obtained by applying machine-learning techniques for object recognition. For instance, by providing corrupted versions of an image to the learning machine (simulating, e.g., a visual impairment such as macular degeneration or tunnel vision) and applying machine-learning interpretability techniques specifically thought for the case of image recognition [such as saliency maps, see Simonyan et al. (2014)], one could suggest to people with visual impairment a specific portion of an image to focus the attention on. Finally, at the time of its publication, the device proposed by Caporusso et al. (2010) was tested only on blindfolded fully sighted people and received no feedback from people having visual impairment. As a proof-of-concept, the device was applied to chess, but not during a real game. Noteworthy, one advantage of computer-based chess games over physical chess games is that in the former case, the pieces cannot be inadvertently knocked down (Balata et al. 2015). However, depending on the complexity of the digital solution implemented, the cognitive load needed to remember the positions of the pieces could be very high in the absence of a physical board.

As reported by Bolesnikov et al. (2022), the digitalization of board games is a possible way to increase their accessibility, but it may ultimately remove the benefits coming from physical interaction with the other players. Hence, it can be important to maintain a physical nature in a digitalized game, which can be achieved, e.g., through haptic or sound feedback, or even using a so-called Tangible User Interface (TUI), in which a person can interact with digital information through a truly physical interaction, e.g., by literally grasping data with her/his hands (Sekhavat et al. 2022). For instance, Lozano et al. (2018) designed an interactive tangible game for blind children with the aim of teaching them Braille and geometry concepts while playing. The game relies on voice interaction to communicate with the child, and on a Near-Field Communication (NFC)-tagged object detection system to monitor the child’s learning. In this way, the child learns by listening to the explanation from a narrating voice, while being able to touch the objects that are presented to her/him from time to time. On the other hand, the game presented by Buzzi et al. (2015) is aimed at teaching geometry using a touch-screen application. The type of interaction based on gestures and the touch-screen is perhaps more complex for understanding geometric figures. To conclude, it is worth mentioning the recent work by Miyakawa et al. (2021), in which accessibility to VI people of a card game was achieved using physical audible cards, able to communicate via Bluetooth and produce specific sounds from their own sound sets.

4 Designing accessible games for VI people: specific techniques

This section focuses on specific accessibility-enhancing techniques based on replacing visual stimuli with auditory stimuli, namely, sonification and sound-source simulation. Then, it details some sonification-mapping strategies.

4.1 Sonification, audio-only games, and sound-source simulation

Sonification refers to the adoption of (typically non-verbal) audio to convey information about data. A description of the sonification techniques employed in several audio games was reported by Sekhavat et al. (2022). According to them, stimuli provided by games can be either primary or secondary. Primary stimuli are called in such a way, because they are necessary to play a specific game. In contrast, secondary stimuli provide supplementary information, not necessary to play that game, but still able to positively affect the gaming experience. It follows from the above that the lack of a primary stimulus makes a game unplayable. In the case of blind people, this occurs when the primary stimulus is visual. In this case, it turns out that a game can become accessible to blind people only if a different primary stimulus is provided, e.g., when audio or haptic sensory processing is used instead. This leads, respectively, to sonification/haptification of the primary stimulus. Regarding the first issue, Sekhavat et al. (2022) highlighted the importance of sound feedback to enhance the orientation and navigation skills of people with visual impairment when playing video games. Additionally, verbal notifications can help players to identify their location and inform them about tasks they have to accomplish. Moreover, non-verbal cues can provide such players with information regarding objects’ location, direction, and distance, thus helping them to construct spatial cognitive maps. A typical way to encode distance from a specific object is represented by varying features of the associated sound source, such as loudness and tempo. However, too frequent variations can be annoying to the player, making it preferable to use a quantization approach, in which a finite number of possible volumes or beats per minute is used. As another example, Sekhavat et al. (2022) reported that, in “AudioQuake” (an accessible version, available at https://github.com/matatk/agrip, of the first-person shooter “Quake”), a sound compass is used, in which different tones are exploited to refer to the possible directions a player is pointing at, whereas a sound radar is used to identify the objects around that player. Sekhavat et al. (2022) also reported that, in audio games, sound sources can include, e.g., loudspeakers, smartphones, wearable devices, and even smart glasses and audio bracelets. They also highlighted the importance of providing tutorial levels in audio games. Indeed, playing tutorial levels allows blind people to learn the interaction and sonification techniques used in a specific game more easily than by providing instructions to them.

One of the first popular audio-only games is the arcade game “Touch Me”, developed by Atari in 1974. In that game, the player presses buttons on an electronic device to produce sounds (Prarazu et al. 2020). Similarly, “AuditoryPong” (Heuten et al. 2007) is a modified version of the arcade sports video game “Pong” that can be played either with or without a visual interface. In the second case, the ball can be located through a continuous sound (which also allows the player to infer its distance and speed), whereas a different sound is employed to represent the bounce of the ball. More recently, Berge et al. (2020) proposed an audio-only version of the classical game “Pinball”. The adopted sonification strategies included shifting pitches, varying volumes, and spatialization techniques, such as moving audio sources through a three-dimensional space. Similar approaches could be employed to sonify pieces on a variety of board games. It is worth remarking that the MIT Digital Humanities Lab recently developed (in 2021) a sonification toolkit (https://digitalhumanities.mit.edu/project/sonification-toolkit-for-musicians/) which allows, among others, to sonify trajectories, providing a clear perception of their evolution in time and space. Sonification is considered an important communication means in audio games, since it eases spatial recognition through, e.g., 3-D sounds, obtained by modifying sound features like frequency, amplitude, and duration (Prarazu et al. 2020). The location of a sound source can be simulated using the so-called Head-Related Transfer Function (HRTF) libraries, whose sound parameters are modified based on the distance between the listener and that sound source. Sound-source simulation is based on the ability of the human brain to combine the different information captured by the two ears: specifically, the interaural time difference (i.e., the difference between the two instants of time in which the sound reaches the right and the left ear, respectively), and the interaural intensity difference (i.e., the difference in the intensities perceived by the two ears). Based on this idea, binaural recording was developed as a sound-recording technique that relies on the use of two microphones, arranged at the distance between the two ears. This method induces a 3-D stereo sound sensation in the listener of the recorded sounds. Besides the adoption of binaural 3-D sounds and head tracking, Andrade et al. (2019) also reported echolocation—for instance, the use of self-emitted noises like mouth clicks as well as ambient sounds originating from a person’s cane or shoes—as an effective tool for human navigation, useful also for users’ interaction with virtual environments.

Friberg and Gärdenfors (2004) provided a classification of audio sounds for audio game design as avatar sounds, object sounds, character sounds, ornamental sounds, and instructions. Moreover, according to Csapó and Wersényi (2013), non-verbal sounds (sound-based cues) can be divided into two distinct categories: auditory icons (which are sounds that represent real-world events), and earcons (which are abstract message-like sounds). Chavéz-Sánchez et al. (2020) presented a case study about the evaluation of two audio games (“AudioMagos” and “Preguntados”), focusing on the player’s game experience and its implications for sound design. Several VI players were involved in the study. In that work, recommendations that may guide the audio game design process were provided. Particularly, it was found that, in the current versions of the two games considered therein, only 56% of the sounds were identified correctly, and that, among those, voices were the most identifiable, followed by auditory icons and earcons. In this respect, evidence exists that blind individuals outperform fully sighted people when processing voices (see, e.g., Föcker et al. 2012, 2015). For what concerns possible improvements, participants in the game evaluation made by Chavéz-Sánchez et al. (2020) suggested the use of rising frequencies to communicate favorable effects to the player, adopting instead falling frequencies/progressive decrease in volume for the unfavorable effects. Moreover, it was observed in Chavéz-Sánchez et al. (2020) that, in one of the two games examined (“Preguntados”), the sounds indicating earning or losing a point were variations on the same pattern, making it very difficult to establish whether the player was progressing or failing. Hence, the importance of using earcons coming from different families was highlighted therein. As discussed by Andrade et al. (2019), however, a serious issue with earcons is that there is no consistent set of them that is adopted across different games, so players need to learn potentially a different set of earcons for each game. Nevertheless, they also reported that the more a sound resembles sounds already listened to in similar situations, the easier it is to learn and remember it (although the presence of a large number of sounds to be learned for a specific game can create a high cognitive load on a player’s memory). Another issue with sounds discussed by Andrade et al. (2019) is that they must be repeated to be salient. However, an excessive amount of audio content may complicate gameplay instead of making it easier. Finally, referring to audio games on mobile phones available at the time, Chavéz-Sánchez et al. (2020) concluded that accessibility remained an issue, because problems such as the lack of maintenance and the integration with screen readers and their navigation paradigm were still to be solved.

An audio-only game was designed by Sekhavat et al. (2022), based on various sonification techniques. In the game, called “GrandEscape”, a prisoner is trapped in a big dark room, which has no lights. To escape the room, the prisoner needs to reconstruct the exit door’s key by collecting its parts from some friend characters. However, the player has also to avoid contact with enemy characters, who can grab the parts of the key that have been collected so far. Both friend and enemy characters move randomly in the dark room, and they can be located and identified by the prisoner only through the different sounds they emit. Interaction in the game is based either on tapping or tilting the mobile phone. The audio was designed in such a way as to help players easily distinguish among distinct sounds. Moreover, for a more natural association with actions, sounds are generated instantaneously, once the corresponding actions have been completed. Interestingly, Sekhavat et al. (2022) observed that a different approach, which is often used in sound design, consists in creating ambiguity in distinguishing between different sounds, to make the game more challenging. They also observed that verbal notifications are better than non-verbal notifications when informing the players about their current location, whereas non-verbal notifications can provide information about their direction and distance from target objects in a better way than verbal notifications.

Sekhavat et al. (2022) evaluated their “GrandEscape” game by measuring various features, making the game be played by both fully sighted and VI people. In particular, the performance metrics used in their study were:

  • the time to complete the game (i.e., the time needed to find all the parts of the key and exit the door);

  • the distance traversed before escaping the room;

  • the number of direction changes;

  • the errors count (i.e., the number of times the player loses parts of the key before escaping the room).

Other more qualitative features, evaluated through suitable questionnaires, were

  • the sense of presence, evaluated in terms of questions related to involvement, interactivity, and spatial presence;

  • the game experience, which was evaluated in terms of questions related to utility, joy, appeal, and aesthetics.

Independent variables were sonification (based either on loudness modulation or on tempo modulation), and interaction technique (based either on the tapping mode of the mobile phone or on its tilting mode). In the case of fully sighted participants, a 2 × 2 between-subjects design was adopted to evaluate the various combinations of factors. In this way, by exposing each subject to only one combination, it was possible to neglect any potential learning effect when repeating the task under slightly different conditions. The same smartphone was used by all the participants, to remove the type of the device from the set of independent variables. Moreover, participants were randomly assigned to one of the four combinations of factors. Interestingly, it turned out that the sense of presence was maximized by the tempo-tapping condition. In particular, the tapping interaction mode turned out to be associated with a larger sense of presence than the tilting interaction mode. This was explained by Sekhavat et al. (2022) as a possible consequence of the fact that a given amount of movement was associated with each tapping on the screen, whereas the amount of movement associated with tilting depended on how long the smartphone was tilted in a specific direction, and hence, a larger cognitive effort was needed for such movement to be learned by the player.

In the case of participants with visual impairment, a different design was chosen, due to the small number of participants. In this case, the tapping interaction—which was preferred by fully sighted participants in the first experiment—was used as the only interaction in this second experiment. Moreover, in this case, a within-subjects design was adopted, in which each participant played under both loudness modulation and tempo modulation. The results obtained were in line with the ones reported in the first experiment. Concluding, the main finding obtained in the evaluation of the “GrandEscape” game was that the combination of tempo sonification and tapping provided the best results in terms of presence, game experience, and player performance.

4.2 Sonification-mapping strategies

In the following, different ways to represent the position of an object by means of sound signals are reported. These are relevant in the context of game accessibility for VI people, because they can be applied, e.g., to transmit information related to the positions of the different pieces on a board.

As reported by Gao et al. (2022), spatial sounds (i.e., those generated by HRTFs, which simulate, e.g., the full 3-D space of a game) are typically perceived with a good azimuth accuracy (i.e., related to rotation with respect to an axis that is perpendicular to a horizontal plane), but not with a good elevation accuracy (i.e., related to rotation with respect to an axis that is perpendicular to a vertical plane). This can potentially have a negative impact on the effectiveness of an auditory guidance system. For this reason, Gao et al. (2022) suggested integrating spatial sounds rendered by generic HRTFs with sonification, the latter providing elevation information. As discussed therein, there are both advantages and disadvantages of these two techniques. On one side, spatial audio provides a natural representation of a sound source in both space and time, whose localization relies on the natural abilities of the human hearing system. However, such localization is affected by both the quality of the HRTF and the spectral content of the sound. On the other side, sonification has already demonstrated its usefulness in many applications, but it typically requires the user to undergo a training process, which may be demanding from a cognitive point of view. The approach proposed by Gao et al. (2022) aimed at combining the advantages of both techniques. More specifically, it investigated 4 sonification-mapping strategies to represent elevation information. These are summarized in the following:

  • Absolute elevation mapping: the pitch of an audible object is selected proportionally to its elevation;

  • Unsigned relative elevation mapping: the pitch is the highest when the elevation matches the one of a target. The reduction in pitch is proportional to the absolute value of the difference between the current elevation and the one of the target;

  • Signed relative elevation mapping: it is similar to the above, but it uses two different pitch intervals to keep track of the sign of the difference between the current elevation and the one of the target. In this way, the user can understand both the sign and the absolute value of the difference between her/his current elevation and the elevation of the target;

  • Binary relative elevation mapping: it is a simplified version of signed relative elevation mapping, in which only the information related to the sign of the difference of the elevations is kept.

A baseline strategy (no elevation mapping) was also considered in the comparison. Moreover, to represent azimuth information, Gao et al. (Gao et al. 2022) considered another sonification-mapping strategy, called unsigned relative azimuth mapping. In this case, instead of controlling the pitch, the tempo is varied, since the auditory parameters of each axis need to be orthogonal in the perception space for better communication of spatial information through sonification.

The study developed by Gao et al. (2022) was performed with fully sighted people, but in that particular case, visual information was not helpful for the specific source localization task. Indeed, several potential sound sources were visible simultaneously, but only one of them was active at a time, whereas the other ones worked simply as visual distractors. As discussed by Gao et al. (2022), their experiment could be repeated using sonification-mapping strategies based on other sound parameters different from pitch and tempo (e.g., on loudness and timbre). According to their results, the best sonification-mapping strategy (in terms of accuracy, completion time, and user experience) turned out to be binary relative elevation mapping, which was also the simplest to be learned by the participants. Indeed, in this case, the azimuth accuracy—i.e., the percentage of trials in which azimuth was evaluated correctly—was much larger than the azimuth accuracy achieved adopting the baseline strategy (more precisely, the azimuth accuracy was nearly 100% for the binary relative elevation mapping and nearly 50% for the baseline strategy). A similar result was obtained for the elevation accuracy.

5 Accessibility to VI people of online versions of board games through machine learning: a case study

In this section, as an example of application of machine learning to accessibility, we summarize the approach recently proposed by Gnecco et al. (2023) to define an interface aimed at improving the accessibility to VI people of online versions of board games through machine learning. We argue that machine-learning approaches (for tasks such as image classification and next move suggestion) could represent a preliminary step for the successive application of sensory substitution techniques, such as the sonification of image properties or their haptification (see Fig. 1). This might increase the flexibility and generalizability of solutions by providing, e.g., abstract representations of images, and thus improve their accessibility independently of the type of sensory channel used to convey the information. Moreover, as argued by Gnecco et al. (2024), one of the motivations for the growing interest in the topic of online versions of board games is the huge increase in people’s online interaction via the Internet, starting from the spread of the recent COVID-19 pandemic (Pandya and Lodha 2021). Indeed, the pandemic severely limited social interaction in the real world, making it quite difficult for VI people to play board games by sitting together at the same table and interacting with the board, e.g., in a tactile way. However, online interaction is most often designed having fully sighted people in mind. This may limit substantially its accessibility to VI people and particularly to blind people, in case it is mainly based on visual information, not always replaced by either textual or audio feedback.

As an illustrative case study, Gnecco et al. (2024) focused on the development of an accessible interface for the online version of a specific game, namely “Quantik”. This is a recent two-player pure-strategy abstract board game, published by Gigamic in 2019. Its relevance as a case study derives from the fact that this game was inserted in the list of Mensa Recommended Games in 2021. It is worth mentioning that the online version of “Quantik” is available on the Board Game Arena gaming platform, at the following hyperlink: https://en.boardgamearena.com/gamepanel?game=quantik.

The following is a description of the game “Quantik”. At the beginning of the game, each player has at her/his disposal a set of eight game pieces, which are colored according to the identity of that player (a light color for one player, a dark color for the other player). Each piece is characterized by one among four distinct shapes (i.e., a ball, a cone, a cube, and a cylinder). Moreover, when the game starts, each player has at her/his disposal two identical pieces for each shape. In each round, players take turns by inserting one available piece in an empty space of the board, following only one rule: no player can place a shape in a row, column, or quadrant in which her/his opponent has already inserted a piece characterized by the same shape. The first player who places the fourth distinct shape in either a row, column, or quadrant wins the game. The game itself cannot terminate with a draw.

In their study, Gnecco et al. (2024) discussed the following accessibility issues of the game “Quantik”, with the aim of simplifying the development of an interface for its online version (and motivating the choice of that game for the study itself):

First, the presence of only one simple rule allows an easy player’s understanding about the functioning of the game, making it unnecessary to resort to a complex rulebook to learn the game mechanics. As already mentioned in Sect. 1 of this review, the presence of a complicated rulebook (e.g., one based on several images) could make even playing the game very hard for people having visual impairments;

Second, although “Quantik” belongs to a category of games which appear to be hard to make accessible by means of hand-crafted adaptation solutions (due to the relevance of the board configuration for game playing, see Sect. 3 of this review), the small size of its board and its low number of distinct pieces make its online version particularly suitable for either a textualization or a sonification of its rows, columns, and quadrants. Indeed, both the single rule of the game and its winning condition rely on information about pieces that are placed in the same row, column, or quadrant.

As discussed by Gnecco et al. (2024), automatic recognition of the pieces on the board (which can be achieved, e.g., through the application of suitable machine-learning techniques) is required as a preliminary step for the successive application of either textualization or sonification. Indeed, only after its correct identification, one can associate each piece with a desired predetermined text or sound. In such a context, the application of machine-learning techniques can be useful, because it does not require an a-priori knowledge (by the gaming device) of the internal state of the game, nor the use of possibly advanced computer programming skills to access that state. Among machine-learning architectures, Convolutional Neural Networks (CNNs) look particularly suitable for the case of the online version of a board game, due to their ability to extract automatically features from training images (see Alzubaidi et al. 2021) for a review on CNNs, Chen et al. (2021) for a second review focused on their specific application to image classification problems, and Cazenave et al. (2020) for another application of CNNs in the context of board games, namely in learning optimal players’ strategies). Moreover, according to Gnecco et al. (2024), beside the automatic recognition of players’ pieces in the online version of “Quantik” (and in the online versions of other similar games, such as “Quarto”), machine learning can be used in this context also to:

  • suggest a user-specific sonification of the pieces, taking into account personal preferences, e.g., about timber, pitch, and volume. For instance, one could exploit users’ similarities to personalize (possibly according to some optimality criterion) the specific choice of the sonification;

  • suggest/disadvise the player’s selection of specific moves, based, e.g., on reinforcement-learning techniques (Platt 2022), which are often applied to learn optimal strategies for board games (Soemers et al. 2021). In the case of “Quantik”, their application is motivated by the fact that this is a two-person sequential zero-sum finite game with perfect information, which can be solved exactly, in principle, by the application of dynamic programming (Bertsekas 2022), and approximately by the application of reinforcement learning.

The rest of this section summarizes the main aspects of the machine-learning based accessible interface designed by Gnecco et al. (2024) for the online version of “Quantik”. That interface was thought for use in two consecutive phases:

  • In the first phase, a fully sighted individual collects a subset of images directly from the web page of the game and labels them based on their shape. Then, a suitable machine-learning model (namely, a CNN) is trained/validated based on an augmented dataset generated by random horizontal/vertical translation and by the inclusion, in each element of the initial subset of images, of additive white Gaussian noise with varying variance;

  • In the second phase, the same individual or a different individual (e.g., one having visual impairment) navigates the web page of the game. Then, the trained/validated machine-learning model is tested on the images generated in real time by that user. In this way, machine learning is combined with movement analysis.

In more details, in the specific implementation presented by Gnecco et al. (2024), the union of the training/validation sets was made of 500 images per class (corresponding to 50 noisy images for each of the 10 images which were initially collected and labeled per class). Validation of the trained machine-leaning model was obtained by applying the holdout method, giving the same size to the training/validation sets. Specifically, a CNN was trained/validated with the aim of classifying images of objects as belonging to one of four classes, each corresponding to one of the four shapes of pieces employed in the game “Quantik”: a ball, a cone, a cube, and a cylinder. Being the specific learning task multi-class classification, cross-entropy was selected as the loss function defining the objective of the associated training optimization problem, with the four classes being represented by one-hot encoding. Stochastic gradient descent including a momentum term was chosen as the training algorithm. The accuracy on the validation set (defined in this case as the percentage of correct classifications achieved by the trained learning machine on that set) turned out to be 94.5%. Since the task of distinguishing between the two distinct colors of the pieces used by the two players was easier, color classification of any test image was obtained first by finding the color at its center, then attributing such a color to the nearest of the colors associated with the pieces of the two players. All the training/validation/test images had initially the same size (315 × 317 pixels) and were reduced to a smaller size (26 × 26 pixels) before using them as inputs to the CNN. Such a reduction was done with the aim of decreasing the complexity of the CNN needed to achieve a desired accuracy (allowing, for instance, to employ CNN filters represented by matrices of small size). Moreover, the image-generation process included the possibility of obtaining such training/validation/test images by zooming in/out around the current position of the cursor/finger on the screen. Each such image was centered on the position of the cursor/finger on the screen at the time of its generation.

In the interface developed by Gnecco et al. (2024), navigation on the screen can be performed in several different ways: using the mouse, moving a finger on a touch-screen, or using Leap Motion (https://www.youtube.com/watch?v=rnlCGw-0R8g), which is an optical hand-tracking device that captures hand movements, making interaction with digital content quite natural and effortless (this is another example of TUI; see Sect. 3 of this review). Such a modality of interaction appears to be more natural than the other two modalities for the case of a blind person, who may be not accustomed to them. The core of this part of the interface relies on event-based programming (implemented in MATLAB in Gnecco et al. (2024)), which allows to detect cursor/finger movement on the screen, activating the machine-learning module based on the current cursor/finger position and on the current value of the zoom parameter. For a more precise description of the specific technical aspects related to the use of event-based programming in the interface, the reader is referred to Gnecco et al. (2024).

It is worth remarking that the interface developed by Gnecco et al. (2024) provides also a sonification of the identified pieces. Indeed, for each of the four classes, a continuous sound is generated (a different sound is used for each class). This is modified in case of a change in the classification of the test image (motivated by a successive movement of the cursor/finger, which modifies the image provided as input to the CNN). Pieces having the same shape but belonging to distinct players are sonified using the same typologies of sound but with a shifted pitch. Specifically, such sounds are played at two successive octaves, namely, at a higher octave for the opponent’s pieces. This is done with the aim of increasing each player’s attention toward those pieces, since their identification is relevant for the application of the single rule of the game.

Following Gnecco et al. (2024), it is also worth comparing the interface proposed therein with the electronic device previously designed by Caporusso et al. (2010) to provide accessibility to an online version of chess, which was already introduced in Sect. 3 of this review. Of course, a first difference is that the two studies were focused on different board games. Moreover, as already mentioned, a second difference is that the device proposed by Caporusso et al. (2010) was designed having deaf-blind people in mind, and hence, there was no focus on sonification in that study. However, for persons affected only by visual impairment, a sonification of visual information is deemed to represent a more natural modality of interaction. This holds true especially for the case of the game “Quantik”, motivated by the fact that its board has a much smaller size than the one of chess (4 × 4 compared with 8 × 8), which allows an easier construction of a mental map of the content of the game board. Further, at the time of publication of Caporusso et al. (2010), touch-screens were much less diffused than nowadays. Hence, an alternative software (rather than hardware) accessibility solution appears to be preferable, due to reasons such as smaller cost and easier reconfigurability. Moreover, differently from Caporusso et al. (2010), the interface proposed by Gnecco et al. (2024) relies on an extensive application of supervised machine learning.

One important limitation of the research performed by Gnecco et al. (2024) is the lack of auditory feedback about the position of the cursor/finger on the screen, which is expected to be especially useful to blind users. A separate sonification (e.g., pitch/volume modulation by the horizontal/vertical coordinates of the cursor/finger) could be exploited to address this issue. Among the other possible improvements of the research presented by Gnecco et al. (Gnecco et al. 2024), it is worth mentioning the additional use of unsupervised learning techniques (namely, of image segmentation, see Lei and Nandi (2022)) in such a way as to activate the multi-class classifier during the test phase not for every possible test image, but only when its segmentation satisfies a suitable constraint (for instance, when the segmented object of interest does not meet the boundary of the image, or has only a small enough overlap with that boundary). Additionally, more sophisticated CNN models (including, for instance, additional layers, batch normalization, and drop-out) could be used, to achieve better generalization capability. Finally, the machine-learning analysis could be performed at the video level (e.g., using as input a sequence of images, instead of a single image).

6 Discussion and conclusions

This review has aimed at evaluating the most recent literature on the gaming experience and accessibility design for visually impaired people. The article has reviewed several methods used to make games more accessible to visually impaired people, presenting the main advantages and drawbacks of current accessibility solutions. Main findings include highlighting the effectiveness of sensory substitution approaches, such as sonification and haptification, and the potential enhanced use of machine-learning-based techniques to describe and translate visual information contained in images into other sensory inputs. Table 1 summarizes the accessibility-related literature discussed in this review by classifying its references according to the three following criteria: typology of game prevalently considered in the work (audio game, board/card game, video game), presence in the work of interviews with visually impaired people (yes, no), and presence in the work of experimental analyses (yes, no).

Table 1 A classification of some recent literature on game accessibility for visually impaired people

Moreover, Table 2 summarizes the main disadvantages and disadvantages of the techniques aimed to improve game accessibility for visually impaired people that have been reviewed in this article. The table also compares such techniques in terms of their effectiveness and generalizability to different game genres.

Table 2 Advantages and disadvantages of techniques aimed to improve game accessibility for visually impaired people

We hope that this work will enhance the design of accessible games for visually impaired people and the development of tools to make already existing games more accessible to them. In the following, we discuss, respectively, the main possible economic/managerial, educational, and health-related implications of such a development:

One possible economic/managerial implication consists in the expansion of the gaming market for visually impaired people, which could potentially lead, in turn, to the design of new games and the creation of new platforms. Also, such a development could lead to the creation of new job opportunities in the gaming industry. Finally, governments and gaming industry bodies could consider incentivizing the development of more accessible games for visually impaired people through special grants, tax benefits, and/or specific funding programs, and introducing policies to encourage the hiring of professionals specializing in accessible game design.

As it was discussed in this review, making gaming more accessible to visually impaired people is important to increase their opportunities to socialize, learn, and play, finally increasing their tenor of life. Indeed, tools for increased accessibility could provide opportunities for more engaging and interactive educational experiences, making learning more accessible and enjoyable by means of techniques, such as text to speech, audio cues, and haptic feedback. Also, they could foster spatial awareness and problem-solving skills, enhancing visually impaired people’s independence and adaptability. Moreover, in the case of multi-player games, better accessibility tools would strengthen visually impaired people’s connections and friendships, reducing possible feelings of isolation. From a policy-maker perspective, educational authorities could consider integrating the tools discussed above into specialized curricula to enhance the learning experience of visually impaired people.

Ultimately, the development of more accessible games could positively impact both the cognitive and physical health of visually impaired people. In terms of cognitive and mental well-being, strategic games and games that increase the relaxation of individuals can play an important role. Moreover, games with haptic feedback and movement-based controls as well as virtual reality-based games can be integrated into physical health and rehabilitation programs, with the aim of helping improve coordination, balance, and motor skills, thus aiding in the recovery from injuries and enhancing overall physical fitness.

Finally, the review has also identified future research directions, especially in the field of machine learning, which could help making the developments reported above easier to achieve from a technical perspective.