Introduction

There is a growing interest in sensor technologies that replicate human-level sensory modalities, such as vision, touch, and smell [1,2,3]. These technologies are used in assistive devices and systems for people with impairments [4,5,6]. In the case of a visually impaired person, literal communication often relies on braille, a system predominantly dependent on vision and touch. As the degree of impairment progresses, visual function gradually diminishes; consequently, individuals become increasingly reliant on their sense of touch. Specifically, according to a report by the World Health Organization (WHO) in 2019, there are 238 million visually impaired people, and 39 million people are blind (relying solely on their sense of touch). Unfortunately, these individuals face difficulties in learning braille because the current state of braille education and rehabilitation technologies for visually impaired individuals is insufficient both domestically and internationally [7]. Therefore, assistive technologies that utilize both visual and tactile senses are essential for the education and rehabilitation of visually impaired individuals.

Braille consists of six small dots with a diameter of 1.5 mm, and each dot is separated by 1 mm. Visual and tactile perception techniques have been introduced to recognize braille characters. In the visual perception techniques, deep-learning-based vision techniques have enabled object detection, segmentation, and classification because of the advancement of graphics processing units (GPUs) for parallel computing and high-quality datasets [8,9,10,11]. However, vision-based braille recognition remains susceptible to external environmental factors, such as variations in light intensity, distance, and angle of view, because of the small size of braille characters. Therefore, collecting a new braille dataset with a preprocessing technique for braille images must be accomplished using a deep learning model. As an alternative technique unaffected by the aforementioned constraints, the tactile-based technique shows clear advantages in the perception of braille. Through tactile perception, braille structures can be detected using microfabricated pressure sensors and sensor arrays [12,13,14,15,16]. To recognize braille patterns through tactile feedback, the sensors need to be fabricated on a millimeter scale and form a high-density array. Also, they must exhibit sufficiently high sensitivity for detecting small braille structures. However, in addition to the insufficient size and density of tactile sensors and arrays for braille recognition [17, 18], combining vision and tactile techniques has not been demonstrated to educate and rehabilitate visually impaired individuals.

As a preliminary investigation on the fusion of visual and tactile perceptions, this study devised visual and tactile approaches for recognizing braille. The visual perception technique utilizes a deep learning model to recognize braille based on RGB images (Fig. 1a). In contrast, the tactile perception technique utilizes a flexible capacitive pressure sensor array to recognize braille (Fig. 1b). Finally, we discuss the advantages and disadvantages of visual and tactile perception approaches for realizing a human-level visuotactile fusion technique.

Fig. 1
figure 1

Vision- and tactile-based braille recognition. a Visual perception based on deep learning technique and b tactile perception based on pressure sensor technique

Results and discussion

Visual perception based on deep learning

To enable visual perception, a transfer learning approach based on a deep learning model was adopted. Transfer learning utilizes a pretrained deep learning model, and the output layer is trained to extract results to improve computational efficiency [19]. Transfer learning can be effective for visual perception because it exhibits rapid processing and excellent performance, even when dealing with relatively small datasets. In this study, the transfer learning model, namely, Faster region-based convolutional neural network (Faster R-CNN)–feature pyramid network (FPN)–ResNet-50, was utilized for object detection [20]. The Faster R-CNN model is fast and accurate for object detection because it combines the region proposal and detection steps by identifying and classifying the location of an object. The FPN–ResNet-50 serves as the backbone network that extracts features from images and constitutes a critical component of the object detection model [21]. In the Faster R-CNN–FPN–ResNet-50 architecture pipeline shown in Fig. 2a, images were processed using the FPN to derive feature maps. These derived feature maps were used by the region proposal network (RPN) to identify possible object locations. Region of interest (RoI) pooling was performed on the proposed locations using the RPN, and the results passed through a fully connected layer. Finally, predictions were generated using the results of the classifier for object detection and bounding box (B-box) regression for locating the objects.

Fig. 2
figure 2

Model architecture and braille for creating custom-made dataset with constraints. a Faster R-CNN–FPN–ResNet-50 model architecture. b Position of the braille for the alphabet “a.” c Distance of the camera from braille for all alphabets

In this study, a new braille dataset was constructed to facilitate accurate braille recognition while considering environmental factors such as position and distance because of insufficient prior research related to braille and open-source datasets [22, 23]. The new dataset was classified into two categories: online and custom-made. In the online category, 210 digital braille images consisting of all 26 alphabetic characters from “a” to “z” were collected. In addition, 210 real braille images, including 26 alphabet characters and 10 special symbols, were captured using an RGB camera with three constraints: the number of braille characters, the position of the braille, and the height of the camera (Fig. 2b, c). Using data augmentation techniques, such as those provided by the Albumentation library [24], these images were expanded to 520 images to construct a custom-made braille dataset. The Albumentation library provides various data augmentation algorithms, such as image flip, random resize crop, and random gamma, which randomly adjust the brightness tone of braille images.

As aforementioned, braille consists of small dots, each with a diameter of 1.5 mm. When recognizing braille through an RGB image, all objects except for the braille dots are considered background. Therefore, preprocessing is necessary to accentuate the braille dots while suppressing background noise. Figure 3a illustrates the preprocessing algorithms used in this study. Preprocessing involves several steps, including contrast-limited adaptive histogram equalization, binarization, medial filtering, erosion, and dilation. The histogram equalization divides an image into multiple tiles, and each tile is equalized by considering the local pixel values, resulting in natural equalization. Subsequently, binarization using the adaptive threshold technique emphasizes the dots and weakens the background through dilation and erosion. Finally, the noise is removed using a median filter. Figure 3b shows the enhanced braille dots obtained by comparing the images before and after preprocessing. The new datasets, both online and custom-made with data augmentation, were preprocessed and learned using transfer learning techniques. As a learning result, mAP50 and mAP75, which represent the mean average precision (mAP) for each class at intersection over union (IoU) 0.5 and 0.75, respectively, were confirmed for images that had undergone preprocessing and those that had not. In terms of overall performance, the preprocessed images exhibited an improvement of approximately 30% compared to images that had not been preprocessed (Fig. 3c).

Fig. 3
figure 3

Preprocessing algorithms, final image, and accuracy for visual perception. a Preprocessing algorithms for model performance improvement. b Braille image before and after preprocessing. c Accuracy of the original image and preprocessed image for mAP50 and mAP75

The braille detection model utilizes a Faster R-CNN with FPN–ResNet-50 as the backbone network. The model was trained using the aforementioned dataset and a transfer learning technique. The process of training the model involved applying data augmentation to increase the number of data instances and performing preprocessing using various algorithms on the augmented data. In the final output of the trained model, the class is predicted using the classifier and B-box. Additionally, the results of mAP50 and mAP75 are displayed for the preprocessed and non-preprocessed braille datasets. The overall model performance showed improved accuracy when the model was trained with preprocessed images.

Tactile perception based on capacitive pressure sensor array

Pressure sensors are transducers that convert applied pressure into changes in electrical signals (e.g., resistance and capacitance) [25, 26]. In this study, a capacitive sensing approach was utilized with a micropatterned pressure-sensitive layer to enhance sensitivity and response/recovery time, as shown in Fig. 4a. In capacitive pressure sensors, the sensing performance is primarily determined by the gap between the top and bottom electrodes and the relative permittivity of the sensing layer. When pressure is applied, the micropatterned dome structure gradually deforms, resulting in a decrease in gap distance and an increase in effective permittivity by deforming the polymeric microdome structure [27, 28]. Therefore, owing to the synergistic effect, the resulting capacitance increases in response to the applied pressure.

Fig. 4
figure 4

Capacitive pressure sensor and sensor array. a Mechanism of capacitive pressure sensor. b Fabrication process using printing technique. Print the bottom electrode on a PI film, and then print the interdigitated electrode on the opposite side of the film through an alignment process. Finally, attach the top electrode with a dome-structure layer to the capacitive sensor. c Exploded view of capacitive pressure sensor array. d Top electrodes with a dome diameter of 30 μm and a pitch of 60 μm in the dome-structure array

To recognize braille characters with a single dot having a 1.5 mm diameter, pressure sensors must be fabricated on a millimeter scale to form large-area and high-density sensor arrays using a printing technique (V-One, Voltera). Each sensor was designed to be 1.5 mm × 1.5 mm in size with a parallel-plate capacitor structure, comprising a plane electrode at the top and interdigitated electrodes at the bottom [29, 30]. Between the electrodes, a microfabricated dome-structured array was used as a pressure-sensitive layer. The fabrication process is shown in Fig. 4b. The bottom electrode was first printed on the back side of the polyimide (PI) substrate using a silver nanoparticle paste (AgNP paste, with 75% AgNPs and glycol 20%; Voltera). Subsequently, the interdigitated electrodes were printed on the front side of the PI substrate. The overlapping area between the back-side and front-side electrodes formed a static junction capacitor to connect electrically without a direct vertical interconnection. This approach helped address the issue of wire complexity and allowed for the efficient expansion of the array. Subsequently, a top electrode with a dome-structured layer was integrated into the bottom interdigitated electrodes to form a parallel-plate capacitor, as shown in Fig. 4c and d. As a pressure-sensitive layer, the dome-structure array was fabricated using deformable polymeric materials (styrene–ethylene–butylene–styrene, SEBS) with a dome diameter of 30 μm and a pitch of 60 μm (Fig. 4d) [31, 32].

By attaching the film to the top electrode, the capacitive pressure sensor, comprising a static junction capacitor and a parallel-plate capacitive pressure sensor, was integrated. The circuit model includes a single sensor and multiplexed sensors, forming a 5 × 5 sensor array as shown in Fig. 5a, b shows the measured relative capacitance change as a function of the applied pressure and compares the pressure sensing performance using the dome-structured film with that of the non-structured blank film. The pressure sensor with the dome structure exhibited an eightfold improvement in performance compared with the pressure sensor with the blank film. In addition, it exhibited stable performance even under high-pressure conditions exceeding 100 kPa, as shown in Fig. 5c. The pressure sensing performance was conducted under flat conditions and measured using an LCR meter (E4980A, Keysight). Finally, a multiplexed sensor system with a 5 × 5 array was implemented to generate real-time heatmap images for braille character recognition, as shown in Fig. 6. The tested braille word was “wearable,” and each braille character, including “w,” “e,” “a,” “r,” “a,” “b,” “l,” and “e,” was gently pressed on the sensing area. The input braille characters matched well with the heatmap images (Fig. 6).

Fig. 5
figure 5

Equivalent circuit model and pressure sensing performance. a Equivalent circuit model of single pressure sensor and sensor array. b Comparing pressure sensor performance using dome-structured film with non-structured blank film. c Pressure sensor performance under high pressure (> 100 kPa) conditions

Fig. 6
figure 6

Heatmap-based braille recognition using capacitive pressure sensor array. Multiplexed sensor system configuration (top) and heatmap-based braille recognition for the letters “WEARABLE” (bottom)

Conclusion

This study examined visual and tactile perception techniques for recognizing braille characters among the visually impaired. In the visual perception technique, a new dataset was built using data augmentation techniques, and braille recognition technology was implemented through the dataset using transfer learning techniques and preprocessing. In the tactile perception technique, small-sized pressure sensors are expanded into an array to conform to braille standards. Subsequently, a heatmap-based braille recognition technique is developed and implemented. The proposed braille recognition technique based on vision and tactile exhibited complementary characteristics. Visual perception techniques offer fast and efficient data processing, making them suitable for handling large datasets. However, they are limited by environmental factors such as light, distance, and angle. Conversely, tactile perception techniques overcome environmental constraints; however, they are not only limited by slower data processing rates but also require specific conditions, including small size, high density, and high sensitivity.

In future work, we will enhance visual technology with improved braille character recognition accuracy across various scenarios and tactile technology with increased sensor density and sensitivity to ensure precise braille character recognition. Thus, we need to develop an adaptive sensor fusion technology that will incorporate a fusion module with four layers to address the limitations of visual (non-environment-free) and tactile information (slow frame rate). The fusion module is expected to effectively use either visual or tactile information based on the surrounding environment, thereby enhancing braille character recognition. These results can serve as the foundation for future advancements in assistive technology development for the visually impaired using visuo-tactile fusion approaches.