Keywords

1 Introduction

In this section we briefly explain Braille and degraded braille.

1.1 Braille and Braille Library

The origin of Braille is old. In 1825, Braille [3] invented Braille in France. In 1879, Megata introduced Braille to Japan. In 1890, Ishikawa devised Japanese braille. This is the beginning of Japanese Braille. Since then Braille has been used as a character for the blind.

Braille consists of six embossed or flat points. These six points constitute one cell with two by three shape, which represents one character or a modifier of 63 kinds. With the modifier, characters more than 64 kinds are represented. In Japanese braille, each of voiceless sounds is represented by a plain cell and other characters such as syllabic nasals, voiced consonants, semi- voiced consonants, numerals, alphabets and so on are represented by a modifier and subsequent cell(s). In Fig. 1, Japanese Braille has a blank space for each clause like English. Some Japanese Braille are constructed with a prefix as last three examples in Fig. 1. Though Japanese is vertical writing or horizontal one, Japanese braille is horizontal writing only.

Fig. 1.
figure 1

Japanese Braille

The most primitive tool for writing braille is the slate and stylus. To use it, the user presses the tip of the stylus down through the small rectangular hole to make braille dots. Therefore the user is required to punch the braille dots with mirror image in reverse order. Nowadays, for publishing a book in braille, texts are input according to the braille grammar by using a braille text editor, are revised, and are printed in braille, as Fig. 2.

Fig. 2.
figure 2

Normal dot and mirrored one

In 1988, IBM opened “Tenyaku Hiroba (Field of Braille Transcription)”, that was the Braille information network system, and digitization of braille had become widespread. The books in braille before the IBM’s project are stored as they are without being converted into electronic data. Even if they have high storage value, the braille books are bulky, so braille libraries in various places are struggling to save. Each library has been forced to decide whether to discard them in recent years or to leave them.

In the earlier days when the Braille code was still in the research phase and there was no Windows, no personal computer, no internet, Shimomura et al. [1] studied coding of Braille.

Braille books are stored in the braille libraries throughout the country. The braille libraries registered in National Association of Institutions of Information Service for Visually Impaired Persons have about 100 libraries. Braille books. Braille books are borrowed from the braille libraries and public libraries via the braille libraries. In addition, many braille books are provided from the WEB braille library.

Fig. 3.
figure 3

Page of Braille book

1.2 Degraded Braille

Braille is a tactile reading system. One reads tangible points with a finger’s belly. Figure 3 is a typical page of Braille book. The braille books that are frequently read are dirty, holes are opened in dots, and dots collapse. Figures 4 and 5 show fresh dots and a part of a page that both normal cells and mirror images of cells, respectively. Figures 6, 7, 8 and 9 show the degraded braille cells, where the collapsed cell, the hole opened one, the dirty one and the distorted one is presented, respectively. Tears, creases and stains in pages are shown in Figs. 10, 11 and 12, respectively.

Fig. 4.
figure 4

Cell

Fig. 5.
figure 5

Normal and mirrored cell

Fig. 6.
figure 6

Collapsed cell

Fig. 7.
figure 7

Hole opened cell

Fig. 8.
figure 8

Dirty cell

Fig. 9.
figure 9

Distorted cell

Fig. 10.
figure 10

Tears

Fig. 11.
figure 11

Creases

Fig. 12.
figure 12

Stains

Shimomura et al. [2] also studied restoration of the degraded Braille by shadow of Braille using fuzzy theory. In that study, the extraction of Braille was made by hand and computer programs executed the determination of existence of Braille. In this project, we extract Braille from scanned page images, which contain the degraded Braille, by computer programs, recognize them and restore the pages.

2 Restoration System

In this subsection we explain our project of the restoration system for old books written in Braille.

  1. (1)

    Page scanning.

  2. (2)

    Determination of whether scanned dot is normal or mirrored.

  3. (3)

    Extraction of cells and classification into 63 categories.

  4. (4)

    Error correction by using a scanning redundancy, that one cell is read twice as the normal image and the mirror image.

  5. (5)

    Error correction by using Braille grammar.

  6. (6)

    Interpretation Braille into Japanese.

  7. (7)

    Error correction by using Japanese grammar.

As noted above, the system has two main components, a machine learning system for recognition of Braille and an error correction system.

2.1 Preparation of Braille Images for Machine Learning

To detect degrade cells by machine, we must prepare many image of degrade cells. We have already scanned many the old braille books with resolution of 200 dpi, extracted each cell into \(54 \times 36\) pixel by hand, and have obtained about 15000 cell images. These images were classified into 63 normal categories and 63 mirror ones. Next, we cut each call images into 6 dot areas by a tiny application program and got about 45000 dot images and 45000 flat (non dot) ones. The followings are the normal image, the mirrored dot one and the background one for the machine learning (Figs. 13, 14 and 15).

Fig. 13.
figure 13

Normal dot image

Fig. 14.
figure 14

Mirrored dot image

Fig. 15.
figure 15

Background image

2.2 Dot Recognition by OpenCV

OpenCV is a famous computer vision library and has functions of object detection. Some detectors such as face, eye, mouth and so on are prepared, and one also can make detectors for arbitrary objects. To make the object detector, one prepares many images one wants to detect and execute the machine learning. In the machine learning, characteristics are extracted from the many images of the object, and the machine learns the characteristics. The set of images and characteristics learned by the machine is called a cascade classifier. OpenCV has some algorithm to make the classifier. In our system, we adopt train the cascade routine with LBP characteristic to search dots in scanned page.

As the cells are regularly aligned on paper, by using the gradient of dots or the edge of paper we can correct the gradient of page. The distribution of detected dots determines row lines in page.

Fig. 16.
figure 16

ScanSnap SV600

2.3 Scanning of Old Braille Books

It is necessary to treat the old Braille books carefully because of the degradation of cells in them as mentioned above. Therefore we do not adopt flatbed scanners but adopt a noncontact type of scanner, Fujitsu ScanSnap SV600, which operates by taking an elevated view (see Fig. 16), with resolution of 600 dpi and grey scale.

If a Braille book is printed on both side and all pages are scanned, all Braille printed is scanned twice. They are read as the normal image and the mirror image. This gives us a redundancy for interpretation of cells.

The next step is the dot detection by OpenCV mentioned above. The result is given in Figs. 17 and 18.

Fig. 17.
figure 17

Dot detection by OpenCV

Fig. 18.
figure 18

Close up of detected dots

2.4 Normal and Reverse Dot Classification by Deep Neural Network

To detect degrade cells by machine, we must prepare many image of degrade cells. We have already scanned many the old braille books with resolution of 200 dpi, extracted each cell into \(54 \times 36\) pixel by hand, and have obtained about 15000 cell images. These images were classified into 63 normal categories and 63 mirror ones. Next, we cut each call images into 6 dot areas by a tiny application program and got about 45000 dot images and 45000 flat (non dot) ones.

The deep learning is the one of technique of machine learning based on a deeply structured and hierarchical neural network. This technique is applied on the image recognition, the sound recognition and so on. In our project, we use this technique.

As usual Braille books are printed on both side, we must distinguish between the normal dots and the mirror ones. The both cell image are give again in Figs. 19 and 20. We have already obtained many dot images for them, so we use the images as reference data for the dot classification in the deep neural network.

This procedure gives us a normal dot distribution and a mirrored distribution. By converting the mirrored distribution into left-side right, we obtain the second candidate of the normal dot distribution.

Fig. 19.
figure 19

Normal cell

Fig. 20.
figure 20

Mirror cell

2.5 Cell Classification

As Braille is regularly aligned, the coordinate of dots tells us the starting point of cells. From this point, the dot images are sequentially analyzed, recognized and classified as cells by the deep neural network learned with 15000 cell images. Figures 20, 21, 22, 23, 24, 25 and 26 are binary aligned cell images. According to the classification, a character code is assigned to the cell image and is stored into database with a page number, a line number in the page, a word number in the line and a character number in the word. If the cell image has no character code, a flag of scanning error is also stored in the database.

Fig. 21.
figure 21

“1xxxxx” cell

Fig. 22.
figure 22

“x2xxxx” cell

Fig. 23.
figure 23

“12xxxx” cell

Fig. 24.
figure 24

“1234\(\,\times \,\)6” cell

Fig. 25.
figure 25

“12345x” cell

Fig. 26.
figure 26

“123456” cell

2.6 Error Correction by Scanning Redundancy

When a Braille book is printed on both side and all pages are scanned, all pages become to be scanned twice. The character codes obtained from the normal images and ones from mirrored images are stored in the database and the codes obtained from normal and mirror images, for examples both shown in Figs. 16 and 17, are identical each other. By collating these codes and finding discrepancies, we have possibility to correct the scanning error.

2.7 Error Correction by Braille Grammar

Braille has own grammar. The character codes stored in the database are analyzed according to the grammar. In Japanese Braille, for example;

  • A postposition “ha” is translated into “wa”.

  • There is not the postposition “ha” following numeric characters.

  • A long sound written with kana character is translated into a macron and so on. When unacceptable sequence of characters is found, a flag of Braille grammatical error is stored in the database. If a probable code can be presumed by the grammar, the code is also stored as a candidate in the database.

2.8 Error Correction by Japanese Grammar

According to Japanese grammar, we check the sequence of code, set flags of Japanese grammatical error for ungrammatical expressions and correct the codes to probable codes if possible. Typical grammatical errors are as follows;

  • A sequence of punctuation marks;

  • A sequence of contracted sounds;

  • Non-correspondent parentheses Incorrect sonant marks.

2.9 Output and Correction by Hand

Finally the codes are converted into ink-spots expressing Braille and are output (See Fig. 27). For codes with the error flag, the ink-spots and candidates of Japanese character are output, and are corrected by hand.

Fig. 27.
figure 27

Ink-spots expressing Braille

3 Concluding Remarks

In this paper, we explain our project to restore old Braille books by converting Braille books into machine-readable electronic data. The Braille book is scanned by an image. With the machine learning Braille are detected by image recognition technology, classified and identified. Furthermore, the error corrections are executed. Finally, the machine-readable character codes are stored.

Shimomura and colleagues have studied Braille 35 years before. The resumption of our study of Braille is because a request by the Japan Braille Library and Ishikawa Braille Library. In recent years, Braille books have been converted into electronic data and stored, and printed by computer processing. However the previous books are not converted into electronic data and are left as Braille books as it is. There are Braille books where the original is not found and without the original. The request was to convert Braille books into digital data before they became degraded and they became unreadable. Responding to this request, we resumed our research. We want to quickly restore degraded Braille books.