1 Introduction

1.1 Background

Globally, more than 2.2 billion people are living with visual impairment; nearly half of these cases could have been avoided or are being unaddressed [1]. An estimated 2.28 million persons in the UK are thought to have moderate to severe vision impairment; of these, around 171 thousand individuals were registered as totally blind [2]. This has a significant negative impact on people’s quality of life (QoL) and imposes a significant economic and financial burden with an estimated annual global cost of $411 billion [1]. Additionally, delays in a child’s physical, verbal, emotional, social, and cognitive development might have long-term consequences if there is a visual impairment. Adults with vision impairment, on the other hand, have lower rates of labour force involvement and productivity, as well as greater prevalence of anxiety and depressive disorders. Also, elderly people with visual impairment are more likely to fall and break bones, feel lonely, and enter nursing homes sooner [1,2,3,4].

The main causes of visual impairment are [5]: uncorrected refractive errors, cataracts, age-related macular degeneration, glaucoma, and diabetic retinopathy [1, 4]. Poor diabetes awareness [6], long-standing uncontrolled diabetes [3] and higher rates of retinopathy [7] have all been associated to visual impairment in diabetics. People with diabetes mellitus frequently experience preventable vision damage due to diabetic retinopathy [8]. Based on the type of damage they cause; visual impairments can be divided into [9, 10]: (i) visual acuity deficits and (ii) visual field deficits. Visual acuity deficits can result in a variety of defects such as myopia (near-sightedness), hypermetropia (far-sightedness), and astigmatism. On the other hand, depending on where in the visual loss occurs, the visual field problems can be classified as central and peripheral vision impairments. Visual field defects are more difficult to rehabilitate than visual acuity defects, which can be corrected with a variety of procedures and conventional options such as eyeglasses. This is because the bulk of these deficits originate from brain injuries or eye conditions that cause persistent damage to parts of the visual system [11].

Today’s technology advances can help visually impaired people (VIP) in different ways such as going to school, finding jobs, and successfully performing daily activities of living. Low-vision rehabilitation aids can be categorised into two groups [10]: (i) visual field deficit aids and (ii) visual impairment (partial or total impairment) aids. According to several studies, about a quarter of those with low vision suffer from peripheral vision loss, with the remainder from central vision loss [12]. As a result, people who have central vision loss receive the most rehabilitation and assistive technologies. Scientists and clinicians proposed different aids to help with visual field loss problems. These aids that aimed at compensating the visual field and increasing the individual’s awareness of their surroundings are widely used and their benefits are well documented [13]. Despite the clinical distinction between low vision (partial impairment) and blindness (total impairment), these terms have been interchangeably used in the literature. In the context of assistive technology, both terms refer to visual impairment that affects the patient’s ability to perform their daily tasks [14].

The term assistive technology (AT), which is used to describe both hardware and software that enables people with disabilities to utilise technology in a way that improves their quality of life, encompasses a variety of devices, systems, services, and applications. [15]. Based on this definition, AT were broadly categorised into [16]: (i) traditional (e.g., eyeglasses, prisms, white canes, occupational therapy, etc.) and (ii) mobile IT-based (e.g., navigation & wayfinding devices, screen & text readers, object & facial recognition, etc.). As VIP have difficulties in utilising devices that are visually demanding, researchers began looking into other possibilities for AT development. Speech recognition [17], text-to-speech [18], haptics feedback [19], multimodal feedback [20,21,22], and gesture identification [23] are examples of non-visual sensory modalities that have been used to make AT more accessible for VIP.

Based on the examined research, it was difficult to find a unified classification approach in the literature that covers all previously reported studies linked to vision impairment rehabilitation aids. In some studies [10], these aids were categorised into: (i) navigation & wayfinding, (ii) obstacle detection and (iii) scene perception, depending on the main functions they perform. Other studies classified them according to the type of data gathering devices or sensing input [24,25,26,27], or according to the purposes for which they were intended [24]. For example in [25], they were classified them into systems with 3D Sound, map-based systems, visual imagery systems and non-visual data systems.

1.2 Libraries and search strategy

In this article, the relevant libraries and publishers are searched using combinations of relevant keywords including assistive technology/tools, human–computer interaction (HCI), human interface modalities, smartphone aids/apps, substitutive interventions, visually impaired and their cognate variations. The main digital libraries searched are ACM Library, IEEE Xplore, ISI Web of Science, ITU Publications, Biomed Central, BMJ Best Practice, British Standard Online, ProQuest, Pubmed, Springer link, ScienceDirect, Scopus, and Springer link.

A combination of the following keywords is used to search the literature:

  • “assistive aids” or “assistive technologies” or “assistive devices” or “substitutive assistive interventions”;

  • “blind” and/or “visually impaired “;

  • visually impaired“ and/ “human-computer interaction” or “human interface modalities”;

  • visually impaired“ and/ “smartphone assistive technology” or mobile assistive technology”.

Initially, over 250 papers are collected and after evaluating the relevance of the searched studies with the subjects of interest (by reading their abstracts) and excluding the unsuitable ones, a pool of 180 review and experimental research papers are considered relevant, thus they are used and cited in this study. Next, the following inclusion and exclusion criteria is applied:

  1. (i)

    The review publication date must be within the last five years (2018–2023), and the experimental research publication date must be within the period from 2010 to 2023.

  2. (ii)

    The research study is disqualified if it does not satisfy one or more of the following requirements:

    • Pertinent to the topic covered in this article.

    • A full-length research publication.

    • Contains laboratory experiments with user-system interaction and system validation.

    • Written in English.

As a result, 52 research projects are included for further analysis in this review, as well as 18 review papers are explored to identify the topics they covered and their contributions, as shown Table 1.

Table 1 List of recent review studies, 2018–2023

1.3 Objective and article structure

The VIP rehabilitation industry is highly diverse and can be seen from a variety of angles. Its focus spans a variety of topics, including medical and adaptability interventions, social and psychological factors, and technological aspects of creating technology-based assistive devices. In this study, the assistive aids are broadly categorized into three main interventions, as opposed to the prior studies: compensatory, restitution, and substitutive. These interventions are outlined briefly as follows:

  1. (i)

    Compensatory interventions – a group of tools that help VIP compensate for or adapt to their impairments [42], thus making their daily tasks easier to do. Ong et al. [43, 44] reported that considerable developments in eye-search as well as reading-writing tasks were achieved as a result of using online tools such as eye-search [45] and read-right [46]. In these interventions, some form of audio-visual stimulation was utilised [47] to increase saccadic movements [48, 49], and improve eye motions into the defective field [50,51,52]. Other studies considered specific interventions that occupational therapists offer to patients in their daily lives. For example, Scheiman [53] described some guidelines for occupational therapists to improve independent movement and training in instrumental daily living skills.

  2. (ii)

    Restitution interventions – a therapeutic concept that suggests damaged neurons in the visual cortex can regenerate from light stimulation. It was thought to have a limited effect on visual rehabilitation for decades [42, 46]. Recent research, however, has revealed that the visual field can be enlarged following brain or optic nerve damage with the right use of certain therapies [47]. This kind of interventions involved a series of treatments in which the defective visual field is repeatedly trained or activated [54]. The vision restoration therapy (VRT) is one of the most described treatments in the literature. Numerous studies [55, 56] found that VRT improved QoL assessments. Other studies [54, 57], however, reported that VRT is ineffectual when compared to placebo or no treatment, taking visual field outcomes into account [42].

  3. (iii)

    Substitutive interventions – a strategy that uses technology assistance and sensory substitution devices for the rehabilitation of the VIP [58]. The scope of these assistive technologies covers a wide range of applications including: mobility, wayfinding, object and human activity recognition, information access, interaction, education, wearable & handheld devices, and others [26,27,28,29,30,31,32,33, 35,36,37,38,39,40,41].

In addition to the human-machine interface modalities, the main objective of this article is to review the recent advancements in the substitute tools and technologies, focusing on the developed research with experimental prototypes. Even though the real-world perception is generally multimodal, the use of various technologies that produce independent visual, auditory, and tactile feedback encourages the discussion of these modalities individually. Depending on the primary cueing feedback signal offered to the user as an input, the substitutive tools and technologies are categorized as visual, auditory, or haptics-based aids. The dominant feedback signal will be considered for classifying the multimodal systems that may mix multiple feedback signals. The context of use as well as the participation of VIP in the evaluation are also considered while discussing these aids. In light of the findings, several recommendations are also made to help the scientific community address the persisting challenges and restrictions faced by both totally blind and partially sighted people.

The rest part of this article is structured as follows. The characteristics of the various human-machine interface modalities are detailed in Sect. 2. Sections 35 cover the visual, haptics, and auditory-based aids, respectively. The primary cueing feedback in these aids is produced by computer devices other than smartphones. In contrast, Sect. 6 describes additional substitutive aids in which a smartphone’s sensor provides the user with cueing feedback signal(s). In Sect. 7, the challenges and limitations of the tools and technologies addressed in this article are discussed along with some recommendations for the likely future course of substitute AT.

2 Human-machine interface modalities

Recent technical developments in numerous domains and an increase in processing power have led to the emergence of new human-computer interaction (HCI) modalities. If these modalities are successfully combined into one interface, it might be able to alleviate the HCI bottleneck that has evolved with the development of computing and communication [59]. Through an interface modality, a user and a computing device can exchange sensory data. [32]. The interface modality can be unimodal (i.e., employs single sensory channel) or multimodal that relies on multiple channels. The term “multimodal” stands for the simultaneous utilization of different modalities to perform a functionality [60]. Gathering data from different input modalities (mi) and integrating them together into a particular format for further processing is termed as a fusion process. [61]. In contrast, the process of fission occurs when the resulting command is subsequently carried out in multiple output modalities (mo) or devices [62]. Combining this input (sensing) and output (action) modalities in a single system is called multimodality. A schematic for a multimodal system is shown in Fig. 1.

Fig. 1
figure 1

Schematic of a multimodal sensing/action system

According numerous studies in [32, 63], multimodal systems can offer more flexibility and reliability than that of unimodal systems. Furthermore, it has become increasingly clear that combining different sensing modalities into a multimodal interface can solve the problems related to the interpretation and processing of each type of sensing modality. In order to provide complementary solutions to a task that may be redundant in function but communicates information to the user more effectively, HCI designers and developers have tried to leverage a variety of modalities [64]. Modalities can be roughly divided into two categories based on how information is perceived: human-computer (input) and computer-human (output) [60]. The system responds to the user using a range of output modalities while the user interacts with the system utilising the available input modalities [65]. A multimodal system has the potential to improve accessibility for users by utilising a variety of interface modalities. Additionally, the advantages of combining multimodal inputs and outputs have led to the adoption of multimodal fusion in a number of applications to support user needs [66]. To enable their interpretation, multimodal systems must be able to recognize a variety of input modalities and combine them in line with temporal and contextual boundaries [67,68,69]. Figure 2 shows an example of a multimodal HCI system, in which the two-level flow of modalities (action and perception) provides an overview of how the user interacts with a multimodal system and the numerous activities that are carried out throughout the HCI process. [32]. These modalities work in a complementary manner to create interfaces that are more adaptable and trustworthy and to improve the perception of “reality.“

Fig. 2
figure 2

Schematic of a multimodal HCI system; adapted from [32]

In the context of AT, multimodal systems have been proposed and used frequently to communicate with VIP [65, 70]. Depending on the contextual usage, audio benefits from rich interaction experiences and aids in the creation of more reliable systems when combined with other modalities while vibrations and other tactile sensations are used in haptic communication. According to Stanton & Spence [71], the brain continuously prioritizes, filters, and integrates a wide variety of incoming input cues. It then combines these inputs with knowledge and experience from the past to produce a perceptual inference, which is a singular perception of the human body and its surroundings. For example, when presenting feedback of movement, the discrepancy between the visual, proprioceptive, tactile, and audible information may be related to valence and the failure to match expectations (i.e., motor prediction error) [72].

The most widely used modality in contemporary mainstream technology is visual, followed by audio, and haptics [32] that has been recently receiving more and more attention [73]. These main modalities are covered in more detail in the sections that follow. Other modalities that use other senses like taste, heat, and smell are less used in interactive systems [74], thus they are not covered in this article.

3 Visual-based aids

This section provides a review of research projects that seek to improve or correct the visual perception of the user. Eyeglasses were used in the early attempts to extend field of vision; the idea was increasing the field of view (FoV) to shift a person’s peripheral area of view inward. This would improve the overall functional field [12]. Peli et al. [75, 76] proposed glasses with high-power prism-segments, offering the user a quick peek at the lacking information in the periphery area. In the latter study, different multiplexing prism (MxP) glasses for acquired monocular visual field extension were tested, with each user’s performance being measured perimetrically (for a total of four individuals). Because MxP glasses provide a wider field extension than other devices, the contrast and monocular visual disorientation were trade-offs. Additionally, despite the fact that MxP glasses expanded the visual field to a range of about 20°, users had poorly adapted [77]. More recently, Jung, et al. [78] refined their earlier work in [76] by suggesting a new field expansion aid with MxP glasses to increase pedestrian detection for acquired monocular vision. In three dimensions, a clip-on MxP holder that can be adjusted for a specific user’s facial features was developed. To investigate the effect of MxP field expansion on the identification of an approaching person arriving from different initial bearing angles and courses, virtual reality (VR) walking scenarios were developed. The pedestrian detection rates and response times were evaluated with volunteers who had one eye covered and three visually impaired users. It was reported that the proposed aid provided a field expansion of roughly 25°. Also, the participants with MxP performed better than those without MxP in the pedestrian identification test on their blind field, while their performance on the healthy field was not substantially different.

Other visual-based research studies incorporated smart glasses that have a range of functions to improve how the user gets information and engages with the environment. Digital eyewear models utilized partially transparent digital screens that transfer visual data without obstructing the user’s FoV. A few of the several techniques utilized to accomplish augmented reality (AR) smart glasses include the half-mirror [79, 80], retina scanning [81, 82], geometric waveguide [83, 84], and diffractive waveguide [85, 86]. However, these devices always have a sizable volume and weight, thus it is difficult to improve user experiences with AR display systems based on half-mirror and freeform optical prisms. Miniaturization, compactness, and mobility are the current research topics for AR head-mounted display (HMD) systems to meet the expectations of daily use of wearable consumer electronics devices. Diffractive waveguide-based AR-HMD devices have the advantages of being lightweight and compact [86, 87]. Because there is only a piece of glass in front of the user’s eyes, these devices can easily have positive wearing experiences. A recent study by Wu, et al. [88] described a compact gating waveguide AR display system using curved variable-period gating as in-coupler. According to the authors, this technology can significantly lower the thickness of optical systems by 39% when compared to traditional grating waveguide systems with the same collimating system focal length. Additionally, the system’s diagonal FOV may reach 36.6 degrees, and the average in-coupling efficiency can approach 70%. There is, however, no proof that the proposed technology was made into a product or tried out using VIP.

In recent years, a different choice called waveguide holographic optics [89,90,91], which digitally overlays text and images in the FoV, was proposed to enhance the user’s experience. Near-eye displays (NEDs) is another technology that provides VR, AR, and mixed reality (MR) [92]. AR-NEDs can superimpose virtual images onto real scenes to provide a combination of virtual and real scenes. Thus, it is particularly important to develop lightweight NEDs with high optical transmittance and high image fidelity performance. Typical AR-NED solutions include freeform-based prism systems [93, 94], hybrid reflective-refractive systems [80, 95], and optical waveguides [83, 96,97,98,99,100]. In order to make AR glasses portable and wearable, Ni et al. [101] proposed the most recent option, which is based on the optical waveguide technology. In this study, a 2D eye box expansion (2D-EBE) holographic waveguide prototype with an integrated micro projection optical system is created and its functionality was tested experimentally. The presented results look promising with a wide diagonal FOV of 45°. However, there was no mention that the developed prototype was evaluated with visually impaired users.

For those with tunnel vision, Elango and Murugesan [102] presented an AR system that utilizes cellular neural network and HMD to enhance the knowledge of VIP. The user’s understanding of the environment was enhanced by using a model that has a camera and a microcomputer for image processing to superimpose useful information from the environment on the user’s observation. The developed prototype did, however, have some shortcomings: (i) the CNN architecture needs to be optimised by lowering the resource requirements and making it practical for parallel implementation; and (ii) the presented visuals essentially project what the camera sees into the user’s central view, resulting in the generation of superfluous data, which could be distracting and diminish performance of the user’s healthy eyesight. Also, the was no mention that the system was tested with VIP. Recently, Younis et al. [103] proposed a context-awareness outdoor navigation aid for people with peripheral vision impairment. The context-awareness concept—which denotes the system’s ability to learn about its surroundings and adjust behaviour accordingly—was used to develop a hazard detection and tracking system. The system utilizes smart glasses that have a tiny camera attached to them to capture and process videos in real-time and provide suitable output warnings depending on pre-established rules and extracted object’s attributes. The glasses can deliver the output warnings without obstructing the user’s normal vision because the display was built inside the transparent lenses. Real-time processing begins with identifying the type of head motion, followed by the detection, tracking, and classification of the risks surrounding the subject. The system then generates a warning notification that is coloured (red, orange, and green) according to the risk level (high, medium, and low, respectively). Based on predetermined danger thresholds, the risk levels—which rely on the speed of the object—are determined. Finally, the generated notification is positioned in front of the central visual field. The initial experiments suggested relatively slow performance due to the low processing capability of the smart glasses utilized in this study, and there was no conclusive evidence that the prototype was tested with VIP.

4 Auditory-based aids

Auditory cues, such as the sound made when a user moves their body or interacts with their environment, can provide important information to affect how people perceive the items they engage with [71]. Research studies that use auditory cues as the primary feedback to the user in various circumstances, such as navigation, obstacle detection and avoidance, and scene perception, are reviewed in this section, as follows.

To assist totally blind people in their navigation, Yánez [104] proposed an IoT-based solution based on Blind Guide technology, an artifact that helps blind people navigate both indoors and outdoors scenarios. The developed system was modular, making it adaptable to the needs of the user and compatible with other solutions like the white cane. The blind guide wireless sensor in the forehead can identify impediments at the head level in addition to the white cane’s ability to detect obstacles below waist level. This feature was deemed especially crucial because some sightless people may feel uncomfortable without their white cane. When an obstacle is discovered, a wireless signal is delivered to a central processing unit (a Raspberry Pi board), and the user is provided with a voice feedback message containing the obstacle’s name and its location in relation to them. It was reported that the developed prototype was tested with a group of sightless volunteers from different ages, and the obtained results suggested successful detection of incoming obstacles and the system was received positively by the participants. The operation of this system, however, is restricted to locations with data network access because the obstacles recognition requires internet connectivity.

To help completely blind persons navigate the outdoors, Kammoun et al. [105] created a prototype called NAVIG to enhance traditional mobility aids (e.g., a white cane) by offering guiding and navigational information via binaural 3D audio sceneries, it takes advantage of the human hearing ability, particularly spatial audition. With the intention of giving the user the knowledge essential to create cognitive maps of the environment, it provides spatial information regarding the trajectory, position, and significant landmarks. There was no evidence that this prototype was evaluated with VIP. Sohl-Dickstein et al. [106] presented a tool to assist VIP in using ultrasonic echolocation for indoor navigation and object perception. It consists of a headgear that can be worn, stereo microphones with attached artificial pinnae, and an ultrasonic emitter. Ultrasonic pulse echoes were recorded, their frequencies were time-extended to make them human-audible, and they were then played back to the user. It was mentioned that volunteers wearing blindfolds were used to test this prototype. The findings were interpreted to indicate that while some echoic cues delivered by the device are immediately and intuitively apparent to users, perceptual acuity is potentially highly trainable, thus, it could be a helpful aid for VIP.

A wearable stereovision system that can help VIP to avoid obstacles at outdoors settings was proposed by Lin et al. [107]. It consists of eyeglasses with two tiny cameras on one end for stereo imaging, as well as Field Programmable Gate Arrays (FPGAs) integrated circuits and first-in, first-out buffers on the other end to synchronise and integrate the stereo images. The video captured by the cameras was broadcasted to a mobile device over 3G network. This technology has a notable function in that a healthy sighted person can use the live video feed to give logistical advice to a visually impaired user. However, the running cost of the mobile connectivity was considered as key limitations for the developed prototype. Also, there was no evidence that this prototype was tested with VIP. A light-weight smart glasses with a front camera was proposed by Lan et al. [108] to assist VIP with the recognition of the public street signs in cities. It was based on a tiny computer offered by Intel (named Edison) as a development system for wearable devices. When Edison receives the video stream from the camera (USB video class) via the UVC (USB video class) module, Opencv routines are called to process and analyze the images. When a public sign is matched, the system provides the user with voice hints through wireless bone conduction headphones. The presented results suggested that the system was successfully implemented but there was no mention that it was tested with VIP.

Mahmud et al. [109] proposed a navigation aid for totally blind people at indoor and outdoor navigation environments. A microcomputer and ultrasonic device were utilised to identify a variety of obstacles and provide the user with vibration and speech warning feedback. As long as the user is within 70 cm of the obstruction, the feedback remains active. The navigation aid was attached with sonars for sensing obstacles in certain directions, thus the user didn’t need to move the cane around to detect barriers like they would with a regular cane. This prototype was not tested with VIP. Pundlik et al. [110, 111] also built a collision-warning system to assist people with peripheral vision impairment in avoiding objects in an indoor navigation environment. It comprises of a portable video camera coupled to a microcomputer to predict approaching collisions based on time to collision rather than proximity. In the case of a prospective collision, a simple audio warning message is provided to the user. According to the authors, 25 visually impaired users successfully completed four consecutive loops both with and without the device. This system was regarded as a significant contribution in the application of computer vision to wearable devices for VIP. However, this prototype had some limitations, including the exclusion of impediments at the floor level, the detection is limited to stationary obstacles, and the absence of information on the projected collision’s direction, which can be crucial for safe navigation.

Fiannaca et al. [112] developed a wearable technology that helps VIP navigate open environments. Google smart glasses and OpenCV blob detection algorithm were utilised to lead VIP towards doors using the shortest path possible. It explores the environment and provides audio feedback to guide the user towards the desired landmark. According to the authors, the usability and efficacy of two types of auditory feedback (sonification and text-to-speech) for leading a user across an open space to a doorway were examined satisfactorily with eight totally blind individuals using the built prototype. However, the system was only capable of identifying doors as landmarks and was not capable of avoiding hazards in the user’s environment. Tsirmpas et al. [113] also presented an indoors navigation aid for VIP and elderly individuals based on passive radio frequency identification (RFID) tags. These tags that were placed in various locations across the user’s path. In their study, the authors utilized RFID tags in 40 × 40 cm cells, which is considered a short range and so necessitates the addition of more tags in large environments. Also, there was no mention that the developed prototype was evaluated with VIP or elderly people.

Bai et al. [114] proposed an additional travel aid for completely blind folks to use indoors. It came with a depth camera to gather depth information from the environment, an ultrasonic rangefinder to determine obstacle distance, an embedded microcontroller board acting as main processing module to perform operations (e.g., depth image processing, data fusion, AR rendering, guiding sound synthesis, etc.), a set of AR glasses to display the visual information, and an earphone to listen to the guiding sound. Algorithms based on multi-sensor fusion and depth images were developed to address the challenges with avoiding translucent, small obstacles.

Three auditory cues can be provided by the guiding sound synthesis module: a stereo tone [115], recorded instructions, and variable frequency beeps. However, the wayfinding and route-following features of this prototype prevented it from assisting the user in avoiding dynamic barriers or providing location data. The authors therefor created an improved version of this device [116] a year later to overcome these drawbacks. They addressed user identification, object recognition, navigation, and obstacle avoidance using a mapping technique and a SLAM, or simultaneous localization and mapping, algorithm. The depth and fisheye cameras were utilized to create the virtual blind path and to locate the user utilizing the SLAM algorithm. They were paired with a set of optical see-through (OST) glasses. These glasses contained two loudspeakers and earbuds so the user can hear directions. According to the authors, both prototypes were tested on a group of VIPs who were free to move around on their own.

Li et al. [117] described a wearable obstacle stereo feedback system to assist VIP in their indoor navigation based on 3D space obstacle detection. Depth information was utilised to detect obstacles in the user’s path and provide the user with auditory feedback notifications. The developed prototype was put to the test on a user who was wearing a blindfold and carrying a laptop in a backpack. The results showed that the approach for detecting barriers and representing their positions by auditory perception was effective. In the developed prototype, however, the detection was limited to stationary obstacles and the user’s movement was not considered. Kang et al. [118, 119] and Chae et al. [120] also investigated an obstacle detection method, called deformable grid (DG). Obstacle avoidance employing the shape change of the DG was then proposed to the VIP navigate both indoors and outdoors. When compared to other equivalent methods, which typically only use two consecutive frames to estimate the risk, this method updates the risk continually, thus it is more resistant to motion tracking mistakes and offers an improved detection rate. A prototype that comprises a camera, WiFi module and Bluetooth earphone were developed and mounted into eyeglasses. The acquired videos are transmitted to a laptop, which performs the necessary computations for the obstacle identification and avoidance, and then provides the user with auditory feedback on the estimated risk of collision. The produced prototype, which was evaluated with blindfold volunteers, had some limitations since the motion tracking with deformable grid sometimes fails when the user approaches non-textured barriers such as a door or a wall.

To help totally blind people with their indoor navigation, Everding et al. [121] proposed a lightweight wearable device. It made use of two depth cameras that operated on the vision stream before it was sent to a computing stick for depth extraction. The detected events were transformed into virtual spatial sounds using event-based algorithms, and then provided as auditory feedback to the user’s ear via headphones attached to a USB sound adapter. The operating principle of the depth cameras in this project differs from frame-based cameras. Every pixel on the chip runs independently from the others and creates an event each time it detects a change in luminance that exceeds a certain threshold, simulating the visual processing of animal eyes. It was reported that the developed prototype was evaluated with 11 VIP. However, these tests were limited to static subjects and did not account for moving objects which limits their implementation in the real-world scenarios. Tapu et al. [122] also created a navigation system to assist VIP when navigating in crowded urban scenes. The proposed system utilized an object recognition that has the advantage of recognising both moving and stationary items [123]. This system employed two convolutional networks to detect and track objects in real-time. After detecting an object, the system classifies it using its type, location, and distance attributes, and then generates a set of acoustic warnings provided to the user through bone conducting headphones. This prototype, however, was not tested in real-life scenarios with VIP.

Yang et al. [124] described a framework to help VIP in the indoor and outdoor pathfinding tasks. It comprises wearable smart glasses integrated with a waist-worn pathfinder that was composed of RGB-D sensor, Intel RealSense RS410 depth camera, an inertial measurement sensor MPU6050, and a bone-conduction headphone. The bone-conduction headphone that was utilized to transfer sound from the processing units to the user would not prevent the users from hearing environmental sounds. However, neither the detected barriers nor the motion model of the dynamic objects was mentioned in the framework that was offered. Also, it was not validated with real VIP. Aladrén et al. [125] proposed another system to guide VIP navigation of indoor settings by means of sound commands. The system uses an RGB-D camera, from which they fuse a range of information and colour information to detect obstacle-free paths. It recognizes and categorizes the primary scene structural components, giving the user with clear paths to safely travel across unknown scenarios. It was claimed that the created algorithm had successfully segmented floors in real-life scenarios using a public data set, but there was no mention that this system was tested with VIP.

Mekhalfi et al. [126] also described an indoor navigation system for totally blind people, which offers a set of navigation features such as obstacle detection and avoidance, multi-object recognition and path planning. The recognition model includes a portable chest-mounted camera that the user employs to capture an inside scene. This captured image is then sent to a microcomputer where the proposed multi-object recognition algorithms are implemented. The output of this algorithm was then translated into an audible voice. According to the author, the developed prototype was tested successfully in an indoor setup, and appropriate voice navigation instructions and warning messages were generated and fed back to the user via earphones. However, the proposed multi-object recognition method is limited by real-time processing constraints of the computing device. Additionally, there was no mention that this system was tested with VIP. Another group of researchers [127] also built a context-aware indoor navigation system (named ISANA), which was based on the Google Tango AR platform that made it feasible for mobile devices to locate themselves in respect to their surrounding environment without the use of the GPS by relying solely on their hardware and software resources. It incorporated obstacle detection algorithms and semantic map editors. This system was proposed to provide indoor navigation path for VIP. A speech-audio interface employs a priority-based strategy was used to deliver real-time guidance and alert cues, while reducing the cognitive strain on the user. According to the authors, field trials with blindfolded and VIP suggested that the developed prototype was successful at performing context-aware and secure indoor aided navigation. However, this prototype cannot be utilized without a pre-planned user path.

In the context of scene perception, Yang et al. [128, 129] proposed a navigational framework that was based on deep neural networks and depth sensory segmentation to assist VIP in both indoor and outdoor settings. A functional prototype was developed in a wearable device to provide efficient semantic comprehension of the surrounding world. The device comprises a portable microcomputer, and a set of smart glasses that integrate a RealSense R200 camera, RGB-D sensor, and bone-conducting earphones. Obstacles such as stairs, sidewalks, water hazards, pedestrians, and cars were all incorporated into a single device that functions in a real-time navigational support framework. According to the authors, the developed prototype was evaluated successfully with six VIP.

5 Haptics-based aids

Haptic/tactile feedback (or haptics, a Greek for “I touch”) is the use of complex vibration patterns and waveforms to communicate information to a user. Haptics utilizes a vibrating component, sometimes referred to as an actuator or a linear resonant actuator. A microcontroller typically decides when and how to vibrate, with a specific haptic driver chip controlling the actuator. Although haptics is not yet well established, they are being used more frequently to give users a better sense of “reality.“ A haptic feedback based on tactile (touch) and kinaesthetic (force) input can therefore be used as an alternative to establish a tactile connection with the user. Understanding how various modalities interact to enhance the user experience is urgently needed given that haptic (vibrotactile) feedback is now a standard feature of consumer VR equipment [73]. Haptics feedback was also proposed to assist VIP in navigation scenarios dealing with dynamic settings that may require multimodalities. Due of this, it is simple to assume that “adding haptics” will inevitably enhance the user experience [73, 130]. The research studies that use haptics as the primary feedback to VIP in various circumstances, such as navigation, obstacle detection and avoidance, and scene perception, are reviewed in this section, as follows.

Prattico et al. [131] also developed a navigational device to lead totally blind individuals in indoor situations. Four vibrating motors were utilised to alert the user when an impediment was encountered, together with a pair of infrared sensors, an ultrasonic sensor, and a microcontroller were used to make the proposed device. These elements are all included in an easy-to-wear belt. The built prototype was put to the test on a walking distance that had a wall and other barriers. The results suggested that as the user approached the obstacle, the vibration intensity was raised. The created prototype, however, had some drawbacks due to its poor response, noise filtering, and short range of obstacle detection. This prototype was not evaluated with VIP. A similar system was developed by Nada et al. [132] to aid totally blind people with indoor navigation via haptic feedback. This system was made up of a laser stick with an ultrasonic obstacle detector and a micromotor that vibrated to alert the user when an obstacle was detected. Once an obstruction is detected, the ultrasonic sensor provides a signal to the system, activating the haptic feedback via the user’s stick. According to the authors, this prototype was tested with six VIP volunteers, and the system was able to identify the majority of the obstacles that were put in the user’s way.

Rizvi et al. [133] proposed a wearable glove to help totally blind people navigate indoor spaces. It was made up of a microprocessor, short- and long-range ultrasonic sensors, a buzzer, and vibrational haptic feedback for the user. The microprocessor triggered one of the ultrasonic sensors based on the user’s selection, which then released sound pulses and waited for echoes to bounce to reflect from obstacles. The microcontroller was then fed the received echoes in the form of PWM (Pulse-Width Modulation) pulses in order to determine the distance of the obstacle by measuring the width of these pulses. The vibrating motor begins with a beep to signal the existence of a hurdle (if exists) and lies within the predetermined range. Additionally, it used GSM (Global System for Mobile Communication) to provide the user’s carer location data. However, the developed prototype only covered a small area for obstacle identification and did not support the detection of moving obstacles. Also, there was no mention that the created prototype was evaluated with VIP. Bharambe et al. [134] also proposed a system (known as a “substitute eyes”) to help the totally blind navigate outdoor. They employed a microcontroller, a couple of ultrasonic sensors and an Android app to detect nearby obstructions. The haptic feedback was provided to the user’s fingers via three vibrator motors. Depending on the estimated distance between the user and the obstacle, different vibration frequencies and intensities are produced. The Android app is used to provide complementary feedback on navigation directions. It was reported that one person wearing a blindfold tested the produced prototype successfully. Based on the photographs provided, the prototype appears to be in its early stage, cumbersome and difficult for VIP to use.

To help VIP indoor mobility in uncommon circumstances, an obstacle detection and warning system was proposed by Hoang et al. [135]. To help VIP indoor mobility in uncommon circumstances, an obstacle detection and warning system was proposed. The suggested system was built on a mobile Kinect (a line of motion-sensing devices produced by Microsoft in 2010) and an electrode matrix. It was composed of two primary parts: first, an obstacle detection unit that uses the mobile Kinect to gather scene data, which is then processed on a laptop computer to identify predefined obstacles like stairs, doors, chairs, and other obstructions. The second unit was used to encode the obstacle information (colour image, depth image, and accelerometer information) and represent it to the user as stimuli in touch with his or her body (through the user’s tong). In this project, the vision information was converted to stimulation of the vibrotactile or electro-tactile matrix using the electrode matrix (a set of electrodes used for detecting electric current or volage and it can stimulate patterns). After that, the authors expanded on this prototype [136] to detect moving items (like humans) as well as new static objects (such trash, plant pots, and fire extinguishers) and reduce the miss rate of the obstacle identification method. The tactile-visual substitution method, which makes use of the tongue as a human-machine interface was reused for the obstacle warning. According to the authors, the developed prototypes were evaluated with 20 young VIP participants who managed to walk independently in an indoor environment on one floor. However, this system requires user pre-training to be able to interpretate the system’s feedback correctly.

Katzschmann et al. [137] presented a wearable navigation solution for completely blind individuals in confined and open indoor environments. For local navigation, it enables users to feel physical boundaries in their immediate environment as well as low- and high-hanging impediments. The proposed system was made up of a sensor belt and a haptic strap. The sensor belt is an ensemble of time-of-flight distance sensors worn around the user’s waist, and the infrared light pulses it emits enable accurate estimates of the distance between the user and any nearby objects or surfaces. The haptic strap, on the other hand, transmits the measured distances through a network of vibrating motors worn around the user’s upper abdomen, providing the user haptic sensation. According to the authors, the developed prototype was evaluated with 12 totally blind users who were able to navigate through hallways, avoid obstacles, and recognize staircases. The viewing range of the array of vibrotactile and lidar units, however, still had to be improved so that it can be automatically altered according to the user’s speed, whether they are advancing or sidestepping.

Mancini et al. [138] proposed a vision system to aid completely blind people in soft jogging and walking in outdoor settings. The user’s position in relation to the desired lines or lanes that were used as a reference path was adjusted by sending vibration feedback to specially designed gloves that the user is wearing. The user then responds to vibrations by accelerating or decelerating, or by turning left or right. It is simple for a human to change pace or turn left or right in this scenario where the user behaves like a differential wheeled robot. The created system’s various design elements are clearly presented, however there was no mention that the developed prototype was tested with VIP. Also, the vibration bracelets were not tested to determine the wearer’s sensitivity to vibration.

To help totally blind people navigate a path and avoid hazards, a handheld force feedback device has been proposed by Amemiya & Sugiyama [139]. A kinesthetics perception approach (called the “pseudo-attraction force”) that the haptic direction indicator employs to create a force sensation by taking advantage of the nonlinear relationship between perceived and actual acceleration was utilized in this study. This kind of haptics modality helped the users to experience the kinaesthetic illusion of being pulled or pushed towards the correct path and avoiding collisions by speeding further in the correct direction. According to the authors, the developed prototype was put to the test by 23 VIP subjects who received the developed device positively [140]. Sharma et al. [141] developed a smart stick to assist VIP navigation in an unstructured indoor settings. The stick detects both dynamic and static obstacles and provide a fair idea about the distance and the location of obstacles through vibration in hand, as well as auditory feedback to the user. The audio signal was provided to the user via Bluetooth connection between the stick and the user’s earphone. It was reported that the developed prototype was tested successfully using different vibration frequencies and tracks of the audio alerts. It was unclear whether the developed prototype was tested on VIPs or just blindfold ones.

Li et al. [142] also proposed an indoor navigation system (named ISANA) to assist totally blind people with independent indoor travel utilizing Google Tango AR platform. In this study, the authors combined feature-based localization maps from Tango devices with semantic maps to provide semantic localization, navigation, and context awareness information. A multimodal human-machine interface (haptics as well as audio) was designed for interactions through an electronic SmartCane. The produced prototype reportedly underwent evaluations with blindfold totally blind users in a variety of contexts, including both single- and multi-floor scenarios. Feedback from the test subjects indicated some limitations in terms of speech recognition in noisy environments. Additionally, the semantic map annotation feature needs to be made simpler for users who are completely blind to utilize, especially when adding point-of-interest markers. The audio feedback also needs user-dependent frequency customization.

6 Smartphone-based aids

The operating systems of modern smartphones (or mobile phones) include numerous features that make them accessible. The option to increase text size, speak to text communication, vibration alerts rather than ringtones, and the facial recognition software on the most recent smartphones are notable features [143]. VIP can use accessibility tools like screen readers, magnifying glasses, and high-contrast screens to interact with these tools [144]. The screen reader provides audio feedback of the interface elements that are in focus, the magnifier enlarges the visible elements on the screen, and the increased contrast changes the colors of the user interface elements. The emergence of new tech-based assistance for VIP has also been made possible by these recent technology developments [145,146,147]. Given that senses other than vision (touch, hearing, smell, and taste) have smaller bandwidths, one major difficulty is how to convey information to the user in a clear and understandable method [148]. However, despite these usability and accessibility issues [149, 150] that VIPs encounter while interacting with smartphones, a variety of applications (apps) have been proposed to aid them in their everyday tasks. In contrast to the studies covered earlier in Sects. 35, this section presents additional substitutive aids that use a smartphone’s sensor to deliver the cueing feedback signal to the user. The aids presented in this section can use visual, auditory, haptic, or a mix of these modalities as cueing feedback to the user.

Senarathne et al. [151] proposed a mobile software, named BlindAid, to assist VIP in both indoor and outdoor settings with a variety of tasks, including face recognition, mobility (employing distance measurement to distinguish objects from obstacles), and extracting data from signboards and product labels. These tasks were processed in real-time using mobile devices only. The user received audio messages through headphones or the device’s speakers after the built-in camera and deep learning algorithms completed these tasks. The authors claim that depending on the ambient lighting, the produced app showed various degrees of accuracy. Testing in VIP was not performed in this prototype. Patel et al. [152] presented a real-time system to assist completely blind individuals in spotting potholes and other obstacles in their path while they walked through an unknown outdoor environment. Two ultrasonic sensors, an Arduino Nano microcontroller, a Bluetooth module, an accelerometer, and a smartphone with a camera and software application constitute the suggested system. One ultrasonic sensor was attached to the bottom of a one-foot-long stick, while the other is affixed to the stick so that it faces the user’s front. These sensors were utilized to locate impediments and gauge their distance from the user, and image processing methods were applied to the images captured by smartphone camera to identify objects in the user’s immediate environment. The smartphone app received the data from the ultrasonic sensors (through Bluetooth), processed it using an obstacle detection algorithm, and then provided the user a vibration or voice warning feedback. In this project, the user can also capture pictures with the smartphone’s camera to have a better understanding of their surroundings. The captured photo was then analysed using image processing algorithms to identify objects. Regarding the evaluation of this prototype with or without VIP, insufficient data was provided.

Uddin et al. [153] created a smartphone-based system to assist totally blind people in their outdoor navigation. It generated vocal commands and used an ultrasonic sensor to find holes (laying impediments) and obstacles. The shortest path between source and destination once the user speaks the destination location as the initial input. If an obstacle is detected by an ultrasonic sensor, its distance is calculated by a microcontroller and communicated to a smartphone (through Bluetooth), where it is converted into a voice that the user could hear. The developed prototype did, however, have some limitations, including power consumption, as well as dependency on the accuracy and coverage of the Microsoft Bing Map and the Global Positioning System, both of which are impacted by weather. It was reported that the developed prototype was tested with five volunteers, but it was not clear whether they were VIP, blindfold or sighted subjects. Another smartphone-based guidance for navigation and obstacles avoidance for VIP was proposed by Lin et al. [154]. The system was created using a smartphone app utilizing image recognition algorithms. The smartphone was connected to a remote server to execute the obstacle recognition task and the server communicates the findings back to the smartphone app, which then delivers the user audible warning messages. Since this application was only intended to serve as a proof of concept, it was not evaluated with VIP.

Croce et al. [155] built an indoor navigation aid (named ARIANNA) for totally blind people. It allows users to navigate various indoor areas of interest by following a pre-planned path that is panted or sticked to the floor. It can be deployed on smartphones or other handheld devices with augmented reality capabilities. This technology utilizes computer vision to detect the navigation path and provide haptic feedback signals in the form of vibration that the user can utilize to correct his or her direction. It was reported that the user can walk normally while using the smartphone to examine his/her immediate surroundings. The location of the hand in relation to the body suggested, through proprioception, what was the direction to follow. The early experiments pointed out some limitations in the smartphone camera and the optical flow accuracy. A couple of years later, the authors reported another version of ARIANNA [156] to address these limitations by using an extended Kalman filter and weighted moving average filtering, together with topological information available on the path. However, there was no evidence that those applications were tested with VIP.

An e-stick module was proposed by Bharatia et al. [157] to assist totally blind people with their outdoor navigation using an Android app and Google’s cloud vision. A vision API was used to capture and process images taken with a portable camera on the stick for object recognition. For each functionality, specific keywords from the voice command were recognized and provided as feedback to the user via a smartphone. It was reported that the primary goal of this project was to provide a simple and affordable solution by keeping the stick structurally like the traditional stick, that is thin, lightweight, and easy to handle, as well as optimizing its performance and efficiency. However, no clear optimization and performance metrices were presented and there was no evidence that this prototype was tested with VIP. Another wayfinding application (named GuideBeacon) was proposed by Cheraghi et al. [158] to aid VIP’s mobility in large indoor spaces. It enables smartphone-equipped users to communicate using inexpensive Bluetooth beacons placed strategically across a desired indoor location, and it also provided the users with directions over the speaker of their smartphones. It was reported that both sighted and visually impaired individuals successfully tested the developed prototype. However, this prototype had insufficient testing with various situations and some infrastructure deployment factors, such as reduction of the speech distortion and timeliness of the user’s instructions.

Another mobile app (known as HandyAPPs) proposed by Chuckun et al. [159] to assist people with different impairments (visual, speech or hearing). Therefore, it had multiple features, including text recognition, face detection and recognition, object recognition, speech-to-text, text-to-speech, and other functionalities. Talkback, a voice assistant, was offered for VIP to help them navigate its different features. Vibrant buttons, sounds, and using a large touch area were provided to make it easier for users to interact with the app. According to the authors, the created app underwent testing with VIP as well as those who had hearing and speech impairments, employing the functionalities offered to each disability, and the participants seemed to like the device. However, this app did have two potential limitations: (i) the functionality provided to the VIP is dependent on the availability and quality of the smartphone camera; and (ii) the object recognition, and face detection and recognition functionalities require an internet connection to process images at a remote server.

Kaushalya et al. [160] offered another smartphone app (named AKSHI) to aid completely blind people navigate outside while they are unaware of their surroundings and without the assistance of a sighted person. It delivers early obstacle recognition, gives the user auditory tones to indicate how far away obstacles are, gives spoken directions to a specific spot, recognizes pedestrian crossings, and sends position information and emergency SMS messages to the VIP’s guardian. The authors claim that the preliminary investigation showed acceptable functioning and accessibility. The small range of the RFID scanners and tags as well as the battery life, however, are limitations of this technology. Additionally, there was no indication of VIP testing this prototype. A similar application (known TransmiGuia) crested by Landazabal et al. [161] to assist totally blind people with public transportation services in the city of Bogotá D.C., using voice commands. The system directs the user using sound emissions that specify the available routes in accordance with the required route, the user’s location in the city, the hour, and the day. The users must enter the target path after locating the closest station. This is done using a set of buttons with Braille surfaces and continues in this manner until they arrive at their intended location. The system’s effectiveness in noisy surroundings, weather conditions and whether the developed prototype was evaluated with VIP were not mentioned, though.

Sumanasekera et al. [162] proposed a voice-based smartphone application (known as Kawulu) to address social isolation of VIP in Sri Lanka. The proposed app eliminates the issues of sifting through pointless content in existing social media networks, which was time-consuming and of little interest to the user, by allowing the user to select and share information in line with their preferences. Although there were certain limitations in terms of the application’s usability and the evaluation process, it was mentioned that this prototype was tested with 11 participants who had varying degrees of visual impairment.

Kardyś et. al. [163] proposed an Android application that allows VIP to use voice commands to access the phonebook, make calls, send and receive text messages, as well as other features such as current time, location and battery monitoring without a significant engagement. Many pre-defined voice commands were used to create these activations. If the user forgets any commands, they should say “help” or “help me.“ The user will then hear a list of the possible commands along with brief usage instructions such as “close, “switch,“ and “turn off” and others. This application reportedly worked; however, VIP testing was not done on it. Total blind people had also been assisted in recognising common money notes by smartphone applications that use machine vision [164,165,166].The notes’ recognized value was translated from text to speech, which was then provided to the user via the smartphone’s speaker. These applications’ recognition accuracy is generally impacted by the lightning conditions and mobile phone’s processing power.

In the field of mHealth (mobile health), a term for the application of wireless technology and mobile phones in healthcare [167], drug information, medicine identifications, and insulin dosage calculation were among a small number of applications in the context of VIP’s healthcare services. Madrigal-Cadavid et al. [168] created a mobile drug information application for VIP, so they could access a device on how to utilize their medications. A user-cantered process was adopted to design and develop a functional prototype of this application, highlighting the importance of involving users in the process. However, the developed prototype is restricted to drugs with barcodes, which is unusual for many manufacturers. Additionally, there was no information regarding the application’s VIP test. A similar mobile application was also proposed by Almuzaini et al. [26, 169] to aid VIP in identifying and managing their medications, using an object recognition technique based on feature matching. The pharmaceutical pictures were detected and described using a fast-rotating detector and descriptor. Using a Brute-force matcher, the detected feature is matched with the feature of the medication box in the scene. It was mentioned that the proposed application was susceptible to lightning variation, and the established algorithm needs more work to cut down on the frequency of false matches. Additionally, there was no proof that the produced prototype had been tried out on VIP.

In the context of providing aids beyond the visual impairment, a few applications have also been reported in the literature for those with partial visual impairments. Radfar et al. [170] proposed a voice-activated mobile app for calculating insulin dosage for the VIP with diabetes. The user can interact with the app by saying the name of the meal and how much of it they want to eat. Following that, a speech recognition system compares the spoken name to the one already stored in the meal database. This work was a proof-of-concept demonstration of an insulin bolus dose (a quick-acting insulin that is administered at mealtimes to keep blood sugar levels under control following a meal) calculator using a voice-based interface. Additionally, this app was not tested on visually impaired diabetics. Additionally, Muhsin et al. [171] have recently proposed another voice-activated smartphone software for insulin-dependent diabetics with visual impairment. The insulin doses for each meal can be calculated automatically by this application while taking into consideration any remaining insulin in the body. It recognized digital readings from a range of popular blood glucose (BG) monitors, blood pressure monitors, and weight scales using machine vision, and it stored those readings as text in a smartphone database. Then, utilizing voice-driven dialogues, the amount of carbohydrate consumption and level of physical activity were acquired from the user and recorded in a smartphone’s database. Eventually, the user is given a spoken message containing the estimated insulin dose. According to the authors, the created prototype can improve blood sugar control, boost trust in dosage accuracy, and lessen anxiety over hypoglycemia brought on by a potential insulin overdose. However, there was no mention that this prototype was evaluated with the intended user.

7 Discussion

A wide range of AT devices and applications were explored in this review, and one of the primary difficulties is that most of these aids were focused on the functioning aspects of the services rather than on the human aspects of the user experience. The examined research revealed that despite the development of a range of important technological solutions for substitutive assistive devices and smartphone apps, the user acceptance of these solutions is still relatively low [25, 172]. This was mainly because they were mostly conceived, built, and tested in various settings with poor VIP participation, thus they were underutilised due to challenges in terms of human factors of the user experience [172, 173]. Nevertheless, this does not negate the existence of other popular apps that have favorably impacted the VIP’s social integration [174]. On the other hand, the lack of a common wearable platform for AT, also makes the development of compatible and interoperable tools and technologies a challenging task for the scientific and research community. Another shortcoming is attributed to the lack of utilizing emerging technologies such as IoT, 5G, and big data. According to recent studies [175,176,177], these technologies can result in significant improvements in the tools and technologies that support the VIP’s daily activities.

The employment of various types of sensors, wireless networks, speech-text-speech, and computer vision algorithms has advanced significantly, particularly in navigation and wayfinding support systems for the totally blind people. Many of these systems solved technology issues, but they also had limitations in terms of usability and accessibility, learning and adaption time to the new system, and other factors [178]. In addition, most of the proposed aids assumed that users were totally blind, and as a result, people with other visual impairments such as colour blindness and those with normal vision in specific portions of their visual field have received little attention in the literature. Specifications of assistive devices could also differ based on whether the VIP are adults or children, as well as whether they are partially or totally blind. Additionally, only a few studies addressed difficulties other than visual impairments, such as managing diabetes, which is one of the leading causes of vision loss, in partially sighted people.

Smartphone platforms enabled development of a variety of AT for VIP, using built-in sensors, as discussed earlier in Sect. 6. Depending on the system complexity as well as availability of the tools/libraries in the mobile development platform, the developed apps were either (i) native mobile apps that are installed directly onto the smartphone and can work, in most cases, with no internet connectivity, (ii) web-based mobile apps in which the smartphone is linked to a remote web server and used to perform secondary tasks (e.g., user terminal, wireless communications gateway) or (iii) hybrid apps that are part native apps, part web-based apps. In complex navigation and wayfinding support systems [104], the core functionalities of the mobile apps were performed by a local external processing unit and/or a web server. Unlike the web-based and hybrid mobile apps, the native apps have several limitations including camera optics and sensors, computation speed, power requirements and the challenges of obtaining sufficient information about image processing pipelines for mobile vision problems. As a result, the algorithms used in native mobile apps must be both reliable and efficient. The researchers and developers employed a combination of algorithms, heuristics, refinements, and know how to achieve these objectives [179].

Based on the analysed research in this review, only 35.2% of the of the created assistive aids were tested on VIP and 64.8% were tested with blindfolded or sighted participants, as shown in Fig. 3. Without testing with VIP, it is difficult to assess whether an AT is useful for visually impaired people and easy to use, assume it is wearable, whether it is bulky or heavy, and whether it can timely respond. In addition, it is found that only a few of the analysed projects (less than 4%) were involved VIP in the design and development process. Participation of VIP in these activities also requires clinical or ethical approval. As a result, these findings clearly demonstrate that the VIP’s needs were not effectively communicated to the system’s designers and developers.

Based on the findings highlighted in this discussion, we believe that the following ideas can improve future design and development of AT for VIP:

Fig. 3
figure 3

Participation of VIP in the design, development, and testing of AT aids

  1. 1)

    To produce successful and acceptable AT for VIP, the development method must follow a user-centred agile method. Figure 4 shows a simplified flow diagram for the suggested method, in which the development process follows an iterative and incremental model [180]. In this process, the required functionality is divided into small increments that can be delivered independently. Instead of developing a complete prototype and asking for feedback, the outcome of each increment is shared with the end-users to obtain their evaluation and. As a result, it is simpler to identify the user’s desires in a timely manner and to deliver small parts of the design to the development team for faster execution. Additionally, this process focuses on deep understanding of the user’s need and their context in all stages of the design and development stages. These necessities an awareness of the types of interface modalities and interactions that make learning and using the AT with physical convenience easier for the intended users. This goal can be met by gathering data from the intended users and their carers regarding the problems they face in managing their daily tasks, and the changes they believe the proposed system will help them achieve. The findings of such investigations could inspire more people with visual impairments to utilise these devices and to provide a solid foundation for the development process.

    After identifying the actual need, the usage context the application is used to determine who the application is intended for, why users are using it, what they require, how and where they are using it. The context of utilizing the application provides information about the tasks, setting, VIP attributes that can be utilized to create a user’s persona. Next, the application requirements are specified based on information gathered from the user (via observations and interviews) and connected to the environment in which the application will be developed. All the following design, development, and evaluation stages in each increment will be affected by these needs. In the design and development stage, the user interface modalities that we discussed earlier in Sects. 35 are created according to user’s needs that are defined in the previous stages of the development process. In the evaluation stage, the USE Questionnaire approach (Usefulness, Satisfaction, and Ease of Use) [173] can be implemented to assess the user perception. As shown in Fig. 4, this process cycle continues until a suitable result is attained before releasing the product for usage by the targeted users.

  2. 2)

    The user interface modalities must be embedded in the design and development process to improve the system’s accessibility and usability. Existing aids, particularly the smart-phone-based ones, can be enhanced in terms of usability, accessibility, learnability, and time to adapt to new systems by adopting more efficient user interfaces and human-computer interaction techniques. Interactions with others and body awareness are fundamentally multimodal experiences. [71]; thus, AT interface with VIP can be multimodal to enhance the user experience, with the preferences of the user directing the selection. More research is required to produce a general-purpose vision-to-touch and vision-to-audio translator that is reliable and robust for everyday use. To attain this goal, multidisciplinary research efforts, funding, and a universal wearable platform combining advances in wireless networks, GPS, voice recognition, and other essential technologies are still needed. In addition, for the vision-based AT, more efficient algorithms for more accurate interpretation of visual information and contents of an image or a scene are still needed for vision-based assistive devices.

  3. 3)

    To address the challenges of the white cane, the most widely used tool by VIP to detect obstacles below waist level, there is a need to utilize renewable energy source like solar energy, innovate concepts built on low-cost technology, effective algorithms, and low-power consumption devices. Recent studies reported that the white cane can be integrated with other technologies such as the IoT-based Blind Guide [104], which allows utilization of a wireless sensor in the forehead to identify obstacles at the head level as well, as explained earlier in Sect. 4.

  4. 4)

    It is critical to separate the needs of VIPs who have normal vision in some areas of their visual field or colour-blind from those who are totally blind. This is likely to open up new research opportunities to address challenges other than complete blindness. The difficulties faced by people with partial vision impairments as well as other chronic diseases, such as diabetes, have also received little coverage in the literature [181–184]. Despite recent initiatives in this approach, more effort is still required to fill the AT research vacuum. AT aids are projected to minimise dependence of VIP on others in terms of the prevention (or delay) of disease progression, its related complications (including vision impairment), and the long-term treatment expenses.

  5. 5)

    Well-designed smartphone-based apps that focus on the needs and expectations of users are a viable path towards building adaptable and acceptable aids for the VIP community. More research is therefore needed to improve present smartphone technologies as well as programming tools/libraries. Potential upgrades include computer vision sensors and algorithms, as well as voice and text detection capabilities. Technology experts [67] believe that users have become more ready for trying out various new modalities as the use of smartphones and other mobile devices expands. Others started utilizing voice assistants like Siri, Alexa, Cortana, and Google Home as an alternative to using computers and other digital devices for communication after these tools were introduced [151, 153]. This demonstrates how specific modalities with varied intensities are advantageous in a variety of contexts [154]. Other modalities, such as computer vision sensors, can be used to gather three-dimensional movements with depth cameras like the Microsoft Kinect [74]. Voice and text detection capabilities are also being explored as prospective enhancements.

  6. 6)

    There are numerous research prototypes developed by the academic community, but the VIP community was not given access to these technological advancements. The challenges of moving from a research prototype to production were significantly impacted by a lack of available resources or the necessary knowledge to do so. This challenge can be mitigated by including stakeholders, like industry partners, in the prototyping and design process and raising public awareness for the quality of prototyping and yielded benefits on many levels. The standard of the AT research, on the other hand, is negatively impacted by the sometimes challenging and drawn-out processes of gaining authorisation to access the VIP group in order to involve them in the development or testing. The healthcare sector should be more eager to adopt more effective, more efficient procedures to promote collaboration between the research and VIP groups.

  7. 7)

    Existing technology advancements in AT are helpful in preserving the health and comfort of those with visual impairments or chronic diseases. However, to ensure a respectable standard of living and accessible medical care, AT needs to develop further in the healthcare sector. People with visual impairments have the ability to make themselves visible in a manner they never could since they now have access to smartphones and other tools and technologies (e.g., IoT, big data, and machine intelligence), which are becoming more and more accessible, affordable and reliable.

Fig. 4
figure 5

A simplified diagram of the suggested user-centred agile method

8 Conclusion

Based on the reviewed research, there is no tool or technology is considered ideal. In order to assist those with visual impairments, it is therefore crucial to develop more intelligent systems that can address standing challenges including participation of the intended users in the design, development and evaluation of these systems. This review analysed state-of-the-art technology aids that have been proposed by the research community to assist people with visual impairments, as well as posing critical questions concerning the direction substitution aids may go in the future. Over the last 10 years, we have seen numerous technological developments in the creation of research prototypes with user-system interaction and system validation for VIP. Most of these aids addressed technical problems and assumed that the users were totally blind, while few other aids were proposed for children and those with healthy eyesight in parts of their visual field. Previous studies demonstrated that developing navigation systems for the totally blind people has been the most active research topic as well as a difficult one in which human aspects of the user experience must be considered. The analysed research studies revealed that neither the researchers nor the AT developers were able to successfully identify the needs of VIP in terms of human factors of the user experience such as usability, learnability, and time to user adaption. This is supported by the fact that many technical aids fall short by the poor participation of VIP in the development process or the evaluation of the developed prototypes. As a result of these limitations and others, most of the created research prototypes are still far from seeing systems used in everyday by the totally blind community.

We believe that future assistive tools and technologies should take advantage of technological advancements in and apply them to create a globally accessible navigation aids, creating new concepts based on low-cost technology, efficient algorithms, and low-power consumption. Moreover, distinguishing between the needs of the people who have normal vision in some portions of their visual field and colour-blind from those who are totally blind would open new research opportunities to address challenges beyond the visual impairment.

We hope that the findings and recommendations presented in this article will open new discussions among the research community, advance the development of AT aids that are more adaptable for VIP, and encourage further research into challenges these people face beyond their primary visual impairments.