Keywords

1 Introduction

To achieve high recognition accuracy and system efficiency in online handwritten character recognition (OHCCR) systems, many features and classifiers have been proposed [1], and some satisfying experimental results have been obtained on the existing datasets [2, 3]. Recent technologies in vision sensors are capable of capturing 3D finger positions and movements. To chase a more friendly experience of writing, the conception of in-air writing has been proposed and several writing-in-the-air systems [46] have been developed, which brings people’s writing behavior to the 3D space now.

Feng et al. [4] proposed a finger-writing character recognition system based on the Kinect sensor. By using the depth information and clustering algorithms, the fingertip is located and then the fingertip’s trajectory is captured. This approach has gained high tracking accuracy in the dataset including digits and some Chinese characters. In [5], the algorithms for fingertips’ detection and tracking are further improved. Jin et al. [6] proposed a digit string recognition method, where the trajectory captured by Kinect is first over-segmented and then recognized by a path-searching algorithm.

The Leap Motion controller is a new generation of 3D interaction sensor which focuses on the interaction by human hands. It can accurately track the movement of hands and fingertips in a three-dimensional space [7] and it provides application programming interfaces (API) for related interaction. The Leap Motion Controller has been applied into many fields [810] and these applications demonstrate the high performance and practical value of it. In our work, we apply the Leap Motion Controller to provide precise and real-time fingertip positions in its 3D workspace. Actually, writing with the Leap Motion Controller is very user-friendly owing to its excellent performance for fingertip detection. In the proposed system to recognize handwritten characters in the air, users can write a Chinese character in the air by moving their fingers relatively fast and fluently.

Compared with traditional OHCCR, in-air handwritten character recognition (IAHCCR) is technically more difficult due to two reasons: First, in-air writing behavior is more likely to be casual which can result in great variation and distortion of the character’s structure. Second, there is no pen-up or pen-down information when writing in the air, because the whole character is written by one single stroke. Examples of the handwritten SCUT-COUCH2009 dataset [2] and some in-air written samples from our IAHCC-UCAS2014 dataset are shown in Fig. 1.

To overcome the challenges in in-air handwritten Chinese character recognition, a more robust feature is needed. We exploit the 8-directional feature [11] widely used in online handwritten Chinese character recognition, since the two problems have a lot in common. The 8-directional feature can reflect writing direction for the input Chinese character, and is relatively robust. In [12], a similar directional feature is introduced and in [13] the 8-directional feature is improved. In [14, 15] the direction-change feature is proposed and combined with the 4-directional feature. The direction-change feature reflects the direction variation during the writing process. This paper combines the 8-directional feature with the direction-change feature into our recognition system for IAHCCR.

Fig. 1.
figure 1figure 1

Examples of handwritten and in-air handwritten Chinese characters

The rest of this paper is organized as follows: First, we introduce how the trajectory was captured using the Leap Motion Controller. Second, we describe our combined feature using 8-directional feature and direction-change feature. Third, the framework of our recognition system is introduced. Finally, we test the proposed feature on our IAHCC-UCAS2014 dataset and compare the performance with 8-directional feature and the origin direction-change feature.

2 Writing Trajectory Capturing

In our system, by using the Leap Motion Controller, users can move their fingers casually in a customized 3D space. Compared with the Kinect sensor, the Leap Motion Controller has a higher tracking accuracy for fingertips. So the proposed method makes it possible for users to just write by their fingertips with little body movement. Also, owing to the real-time performance of the Leap Motion Controller, users can write relatively fast and naturally, which is hard to achieve in systems based on Kinect sensors. The writing progress by our system can be seen from Fig. 2, where the user is writing the Chinese character of “Shi”.

Fig. 2.
figure 2figure 2

A user writing a Chinese character “Shi” using our writing-in-the-air system

The 3D writing trajectory can be captured by tracing the movement of the writing fingertip based on the APIs of the Leap Motion Controller. Further, the 2D writing trajectory is obtained by projecting the 3D trajectory onto a screen plane. By adjusting the parameters of Leap Motion Controller, the stability and accuracy of writing trajectory can be guaranteed.

In practice, we find the detection accuracy of the Leap Motion is so high that a slight shake of fingertip can cause apparent structure jitter in the 2D trajectory. So we apply the classic Kalman Filter to smooth the obtained 2D tracking trajectory to reduce the distortion caused by slight shake.

In OHCCR, the imaginary stroke refers to manual straight lines between the end point of one stroke and the start point of its next stroke. In our writing-in-the-air system, the character is always written by one single stroke, so the imaginary stroke is already there for recognition. It should be also noted that the sampling points are usually dense in OHCCR problem, but in our system they can be sparse, since some users can write really fast. So we join the sampling points using Bresenham’s line algorithm to construct the final 2D trajectory.

3 Combined Directional Feature

3.1 8-Directional Feature

After several pre-processing steps, each testing sample is normalized into a fixed size of \(64 \times 64\), and then the 8-directional feature is extracted. Concretely, for a given point \(P_j = (x_j,y_j)\) in the the sequence of sampling points \(P_j,j=1,2,\cdots \), let \(\varvec{V}_j\) denote its direction vector, \(\varvec{V}_j\) is defined as follows:

$$\begin{aligned} \varvec{V}_j = {\left\{ \begin{array}{ll} \overrightarrow{P_jP_{j+1}}&{} \text {If}\, P_j \text {is a start point}\\ \overrightarrow{P_{j-1}P_{j+1}}&{} \text {If}\, P_j \text {is a non-end point}\\ \overrightarrow{P_{j-1}P_{j}}&{} \text {If}\, P_j \text {is an end point} \end{array}\right. } \end{aligned}$$
(1)
Fig. 3.
figure 3figure 3

Axes and mapping example of the direction vector

Then its normalized \({\varvec{V}_j}/\Vert {\varvec{V}_j\Vert }\) is projected to two directions from eight directions as shown in Fig. 3(a). One is from the direction set \(\{D1,D3,D5,D7\}\) and denoted by \(d_j^1\), and the other is from set \(\{D2,D4,D6,D8\}\) and denoted by \(d_j^2\). Figure 3(b) shows an example, where \(d_j^1=D1\) and \(d_j^2=D8\) for the highlighted sampling point. The corresponding mapping values \(a_j^1\) and \(a_j^2\) for directions \(d_j^1\) and \(d_j^2\) is computed by

$$\begin{aligned} \begin{aligned} a_j^1 = \frac{|d_x-d_y|}{s},\\ a_j^2 = \frac{\sqrt{2}\cdot \min (d_x,d_y)}{s}, \end{aligned} \end{aligned}$$
(2)

where \(d_x = |{x_{j+1}-x_{j-1}}|\), \(d_y = |{y_{j+1}-y_{j-1}}|\), and \(s=\sqrt{d_x^2 + d_y^2}\) for a non-end point. Further, eight directional pattern images \(\{B_d = [f_d(x,y)], x, y=1,\cdots ,64, d =D1,\cdots , D8\}\) are generated by setting \(f_{d_j^1}(x_j,y_j)=a_j^1\) and \(f_{d_j^2}(x_j,y_j)=a_j^2\). All the remaining values for \(f_d(x,y)\) are set as 0s. The eight directional pattern images are thickened by a maximum filter and then smoothed by a Gaussian filter \(G(x,y) = \frac{4}{\lambda ^2}\exp [-\frac{2(x^2+y^2)}{\lambda ^2}]\), where \(\lambda \) is the wavelength of the plane wave of the original Gabor filter.

Finally, each directional pattern image is divided uniformly into \(8\times 8\) grids. In each grid, the values are summed up to get a feature value. Since we have 8 images and each image has 64 grids, we obtain \(8\times 64 = 512\) dimensional feature vector. A nonlinear transformation (the square root function) is applied to form the final 8-directional feature vector.

3.2 Direction-Change Feature

The direction-change degree and the directions after direction change are obtained from the normalized on-line data by using the direction-change feature. For each sampling point \(P_j\), the direction-change degree is measured by the absolute value of the difference in direction from direction vector \(\overrightarrow{P_{j-1}P_{j}}\) to the next direction vector \(\overrightarrow{P_{j}P_{j+1}}\). The direction-change feature’s degree (Fdc), is calculated by

$$\begin{aligned} Fdc = \frac{|D\theta |}{60} + 1 \end{aligned}$$
(3)

where \(D\theta \) \((-180^\circ \le D\theta \le 180^\circ )\) is the angle of the direction change between \(\overrightarrow{P_{j-1}P_{j}}\) and \(\overrightarrow{P_{j}P_{j+1}}\).

Just as the 8-directional feature, the Fdc of each sampling point is also mapped to eight directions as Fig. 3(a). However, in direction-change feature, each Fdc is mapped to only one direction \(d_j^m\) from \({D1,D2,\ldots , D8}\). Concretely, it is the direction which the greater value between \(a_j^1\) and \(a_j^2\) corresponds to. Similarly, 8 direction-change pattern images \(\{\dot{B_d}=[\dot{f_d}(x,y)],x,y=1,2,\ldots ,64,d=D1,\ldots D8\}\) are generated by setting \(\dot{f}_{d_j^m}(x_j,y_j)=\max (a_j^1,a_j^2)\) and the remaining values as 0s. Then, the same computation is carried out on the generated direction-change pattern images to obtain the 512-dimensional direction-change feature vector.

We combine it with the 8-direction feature to form 1024-dimensional combined feature vector. Figure 4 shows 16 pattern images extracted from the Chinese character “Shi”, where the first row represents the eight directional pattern images and the second row represents the direction-change pattern images.

Fig. 4.
figure 4figure 4

Examples of pattern images for the directional and direction-change feature

4 Framework of Our Recognition System

We have implemented an recognition system for the in-air handwritten characters. Our system contains the following three stages of computations.

  1. (1)

    Pre-processing. A series of pre-processing steps are used to reduce the noise and normalize the trajectory shapes of input characters to make the samples easier to recognize. First, we normalize the X-coordinates and Y-coordinates of the sampling points to a fixed size of 64 by 64 by linear mapping. Then, the coordinates of each sampling point are smoothed by computing the average of its neighbors, and we remove some redundant points to ensure that only one point left in the same position of the trajectory. Further, we exploit the dot density shape normalization method [16] to adjust the trajectory shape of the input Chinese character. Finally, the re-sampling step is carried out to generate a sequence of equidistance points.

  2. (2)

    Feature Extraction. After the preprocessing step, we can extract the 1024-dimensional combined features. The related computational details have been presented in Subsects. 3.1 and 3.2.

  3. (3)

    Two-Level Classifier. To make the classification more efficient, we exploit the Linear Discriminant Analysis (LDA) to learn a projection subspace so as to project the feature vector to a low-dimensional subspace. The projection axis learned by LDA helps to make prototypes more separable in the subspace. In LDA, we define within-class and between-class scatter matrices by \(S_W\) and \(S_B\) respectively and also define the optimal projection axis (discriminant vector) by \(\mathbf w \). We then estimate \(\mathbf w \) by maximizing the Fisher criterion:

    $$\begin{aligned} J(\mathbf w ) = tr( (\mathbf w ^TS_w\mathbf w )^{-1} (\mathbf w ^TS_B\mathbf w ) ) \end{aligned}$$
    (4)

    where \(tr(\cdot )\) denotes the trace of matrix. This criterion considers the within-class and between-class scatter matrices and helps to make the data separable in the projected subspace. It can be shown that \(\mathbf w \) is the solution to the generalized eigenvector problem \(S_B w_i = \lambda _i S_W w_i, i = 1,2,\ldots \) where \(w_i\) denotes the eigenvector for the ith eigenvalue \(\lambda _i\). By using LDA, the dimension of the feature space is reduced while different classes are separated. Also, the dimension reduction makes the following computation cost of training process decrease.

In our system, we design a two-level classifier to achieve both accuracy and efficiency. Our classifier is based on the Nearest Prototype Classifier (NPC) rule. For each unknown pattern, we label it by the class of the nearest prototype. The metric we use between samples and the prototypes is the the Euclidean distance. The first-level classier is the coarse classifier which aims to remove most impossible candidate classes with low computation cost. In the first level classifier, the combined feature vector is projected to 20-dimensional, and the nearest 450 prototypes are retained. Afterwards, in the second level classifier, we project the combined feature vector to 160 dimensional subspace, and then the distances are computed between the testing sample and prototypes retained by the first-level classifier. Finally, we sort these prototypes by the corresponding distances and generate the candidate label list.

5 Experimental Results

We evaluate the performance of the proposed feature on the IAHCC-UCAS2014 dataset which is constructed by ourself, since there is no related dataset publicly available. The dataset includes 3755 classes of Chinese characters, and each of them has 65 samples. The 3755 classes include all Chinese characters in GB2312-80 level-1 set, which makes our dataset challenging since works of other researchers [4, 5] only cover limited number of classes in Chinese characters. Some of the samples in our dataset are shown in Fig. 5. As described in the previous section, it can be seen from Fig. 5 that the IAHCCR is technically difficult due to great variation of character’s structure.

Fig. 5.
figure 5figure 5

Some in-air handwritten chinese characters from our IAHCC-UCAS2014 dataset

To evaluate the classification performance, the recognition accuracy on the testing data is of primary interest. We compare our recognition result with the other two features. The 8-directional feature refers to the method in [11] and the direction-change feature refers to the feature in [14]. The recognition accuracy is calculated by

$$\begin{aligned} R_{k} = N_k/N \end{aligned}$$
(5)

where \(R_{k}\) denotes the top-k recognition accuracy of the system. For each testing sample, we generate a candidate label list for it by the two-level classifier. The top-k metric means we check candidate label list and find out if the right label is included in the top k label of the list. The \(N_k\) denotes the number of testing samples whose labels are properly included in the top-k candidate labels, while the N denotes the total number of the testing samples. We compare the accuracy of top 1, top 5, and top10, which are metrics widely used to compare Chinese character recognition performance. In our experiments, we randomly select 10 samples from each class as the testing samples and the remaining samples are used for training. The experimental results are summarized in Table 1.

It can be seen from the table that the proposed feature obtained better performance compared with the other two features according to top 1/5/10 metrics. It is also worth noting that the framework of our recognition system achieves \(90.6\,\%\) accuracy on the top 1 metrics when dealing with the OHCCR using dataset of SCUT-COUCH2009 [2]. The relatively low accuracy on the IAHCC-UCAS2014 dataset indicates that the IAHCCR is very challenging and it needs more research efforts in the future.

Table 1. Recognition accuracy comparison of three features on our in-air handwritten Chinese character dataset

Compared with the 8-directional feature, the proposed method seems to bring more computational cost owing to combining the two features together. In practice, we apply the same pre-processing steps and direction vector extraction process to the two features in our recognition system so that the time cost can be reduced. We compare the time consumption (millisecond) during the feature extraction step and the recognition step. The experiments are performed on a desktop computer with 2.40 GHz CPU, and the recognition system is implemented using MATLAB. It can been seen from Table 2 that our combined feature results in little time consumption. Regarding the applied background of the IAHCCR, the increase of time consumption is negligible.

Table 2. Comparison of time consumption (millisecond) vs. 8-directional feature

6 Conclusions

In this paper, we present a novel HCI interface for writing interaction. The writing behavior can be conducted in a 3D space and in a more natural and user-friendly way by using the Leap Motion Controller. We then propose a combined feature based on the 8-directional feature and the direction-change feature, and apply them to our in-air handwritten character recognition system. The performance of the combined feature is evaluated on our IAHCC-UCAS2014 dataset and the experimental results show that the combined feature can achieve better performance with reasonable computational cost.