1 Introduction

In recent years, more and more biometric technologies have emerged for identifying and verifying individuals by analyzing fingerprint [1], face [2], iris [3] etc. or a combination of these traits [4]. Compared to other biometric methods, gait recognition offers several unique characteristics, such as non-invasive, hard to disguise and hide, and can be captured at a far distance without requiring physical information from subjects [5]. In addition, gait recognition offers great potential for recognition of low-resolution videos, where other biometrics technologies may be invalid because of insufficient pixels to identify the human subjects [6]. Especially at the present time, the stable domestic environment of many nations is under the growing threats from extremist attacks. On this occasion, the most attractive characteristic of gait–unobtrusiveness, which does not require observed subjects’ attention and cooperation, would be particularly important.

Generally, gait recognition can be divided into two major categories from the perspective of computer vision, namely model-based methods and model-free methods [5]. Model-based methods [710] obtain a series of static or dynamic body parameters via modeling or tracking body components such as limbs, legs, arms and thighs. Gait signatures derived from these model parameters are employed for identification and recognition of an individual. Yoo et al. [7] extract nine coordinates from the human body contours based on human anatomical knowledge to construct a 2D stick figure. Tafazzoli et al. [8] also present a model-based approach for human gait recognition, which is created based on anatomical proportions, and a posterior model is constructed upon the movements of the articulated parts of the body, using active contour models and the Hough transform. However, the model based approaches are sensitive to the quality of gait sequences, Thus, gait image sequences of high quality are required to achieve a high accuracy. In addition, the modeling of human body structure and motion are always the unsolved problem in computer vision and the process of parameters fitting of the model need a large-scale computational cost and a long operation time.

On the contrary, the model-free methods [1115, 19, 20] focus on either shapes of silhouettes or the whole motion of human bodies, rather than modeling the whole human body or any parts of body like the former. They directly extract the feature from the binary image of gait contour with lower computational complexity, and not affected by color and texture of clothing. Furthermore, the model-free methods are insensitive to the quality of silhouettes and have the advantage of low computational costs comparing to model-based methods, so it is more suitable for real life. For instance, Han et al. [11] employ the idea of Motion Energy Image (MEI) and put forward the Gait Energy Image (GEI) for individual recognition. GEI converts the spatial temporal information during one walking cycle into a single 2D gait template [12], which avoids matching features in temporal sequences. GEI is comparatively robust to noise by averaging images of a gait cycle. However, it loses the dynamical variation between successive frames. Lam et al. [13] adopt Horn Schunck (HS) optical flow method to calculate the optical flow field between two neighbouring frame silhouettes in a period for constructing the Gait Flow Image (GFI) which can represent the dynamic characteristics of gait. But the long computation time of HS leads to poor real-time performance. Roy et al. [14] propose a new gait recognition algorithm which is a fusion of gait characteristics and spatiotemporal model. Although with high recognition rate, the algorithm is complex and requires multiple fixed cameras. Therefore it is strongly influenced by multiple points of view.

This paper aims to establish an automatic human identification and recognition algorithm based on motion and spatiotemporal information. Firstly, we unwrap the outer contour counterclockwise to turn it into a distance signal that is composed of all distances between each boundary pixel and the centroid according to Wang et al. [15]. Secondly, we apply ELDA to reduce the dimension and acquire the feature space. Finally, we accomplish the recognition task with multi-class SVM classifier. Experiment results proved that our algorithm achieves higher recognition rate, lower computational cost, less affected by environment, and faster processing speed compared with the former.

The rest of this paper is organized as follows. Section 2 gives an introduction on feature representation. Section 3 presents the existing problem of LDA, then details the proposed algorithm specifically. Experimental evaluation and results are reported in Sect. 4. Section 5 concludes the paper.

2 Feature representation

Once the person has been tracked for a certain number of frames, its spatiotemporal gait parameters can be estimated (e.g. aspect ratio, the size of closed area between legs). So it’s necessary to extract target silhouette for effective spatiotemporal gait parameters. The whole process of our feature extraction is described below outline: (1) Constructing the background from a small portion of image sequences by using LMedS (Least Median of Squares) method [16]; (2) Filling the hole of silhouettes and get rid of the excess random noise in the image by morphological processing [17]; (3) Extracted a complete target silhouette from a frame by analyzing the single connected component [17]. The feature representation will be carried out after extract all the pedestrains’ silhouettes in a single gait cycle.

2.1 Period detection

In this paper, the direction is all selected the perpendicular direction of walking and the camera axis, with the camera position fixed. If a person’s gait is frontal, Dual-ellipse [18] fitting approach can be used. Commonly, the size of closed area between legs and the aspect ratio are alternating periodically when pedestrians are walking laterally. Therefore this paper proposes two period detection methods. Figure 1 demonstrates the aspect ratio of a pedestrian’s silhouette in (a) and the size of closed area between legs in (b).

Fig. 1
figure 1

Gait period analysis: a the aspect ratio of moving silhouettes, b the size of closed area between legs, c period detection of aspect ratio, d period detection of closed area between legs

Signal on the upper row in Fig. 1c, d are the periodic transformations to detect aspect ratio of the silhouette or the size of a closed area between legs. We may remove its background component by subtracting its mean and dividing by its standard deviation, and then smooth it with a symmetric average filter. As can be seen from the figures, either aspect ratio or the size of closed areas are changing periodically. One advantage of these characters is that they will not be constrained by observation angles. Further, we compute its autocorrelation to find peaks. Then we compute its first-order derivative to find peak positions by seeking the positive-to-negative zero-crossing points and shown correspondingly on the second row in Fig. 1c, d. Due to the bilateral symmetry of human gait, the curves will sometimes have minor peaks half way between each pair of major peaks. Hence, we estimate the real period as the average distance between each pair of consecutive major peaks. This process has been demonstrated to be computationally feasible with respect to our background subtraction results.

2.2 Gait feature description

The distance between border and centroid is utilized in this paper to extract the gait feature since it performs the best in terms of computational expenses and noise preservation [21]. To get a period of gait sequences by tracking, the centroid coordinate \(\left( {x_{c} ,y_{c} } \right)\) of the silhouette can be computed by:

$$x_{c} = \frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i} } \quad y_{c} = \frac{1}{N}\sum\limits_{i = 1}^{N} {y_{i} }$$
(1)

where,\(\left( {x_{i} ,y_{i} } \right)\) is the \(i\)th point of the silhouette and \(N\) is the number of the whole border points of a silhouette. The distance between border and centroid can be computed by:

$$d_{i} = \sqrt {(x_{i} - x_{c} )^{2} + (y_{i} - y_{c} )^{2} }$$
(2)

The whole distance compose the distance vector \({\mathbf{R}} = (d1,\,d2, \ldots ,dN)\).

To eliminate the influence of spatial scale and signal length, we normalize these distance signals with respect to magnitude and size.

$${\mathbf{R}}' = {\raise0.7ex\hbox{${\mathbf{R}}$} \!\mathord{\left/ {\vphantom {{\mathbf{R}} {\hbox{max} (d1,d2, \ldots ,dN)}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\hbox{max} (d1,d2, \ldots ,dN)}$}}$$
(3)

Firstly, we normalize its signal magnitude through L-norm. Then, equally spaced re-sampling is used to normalize its size to a fixed length (260 in CASIA and 700 in SOTON). A whole distance signal in a single gait period according to [15] is shown in Fig. 2.

Fig. 2
figure 2

Normalized distance signals

The volatile position shows the pedestrian’s gait motion in Fig. 2. Therefore, the solution of counterclockwise unwrapping is more suitable for the feature selection in this paper than other gait description methods. However, the high dimension of the feature spaces are necessary to reduce, otherwise the dimension disaster will slow down the computation [22].

3 Gait recognition based on ELDA

In the process of training, we have to reduce the dimension of feature space at first. The method of dimension reduction is aimed to project the data from high dimensions into low dimensions. So far, the widely used dimension reduction methods include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and so on. As an unsupervised algorithm, PCA calculates the covariance matrix of the data and gets a better coordinate system to describe the characteristics of the data. Despite it makes data more convenient for presentation, the classification might become more difficult since the classified feature will be independent of each other [23]. Besides that, the final recognition performance of unsupervised algorithms is often below supervised dimension reduction method, because there is no introduction of original data with category information. In contrast to unsupervised algorithm like PCA, LDA not only reduces the dimensions, but also keeps the differentiation of the data according to their category as far as possible [24]. The main idea of LDA is to project data into a lower dimension with minimized deviation within each class and maximized deviation between classes of the data at the same time.

3.1 Linear discriminant analysis (LDA)

LDA extracts low dimensional features which have the most sensitive discriminant ability from high dimensional feature space. These features can make the same categories together and separate different types of samples as far as possible. That is to choose the features which make the highest proportion between between-class scatter and within-class scatter.

$${\mathbf{S}}_{w} = \sum\limits_{i = 1}^{c} {P(i)E\{ ({\mathbf{u}}_{i} - {\mathbf{x}})({\mathbf{u}}_{i} - {\mathbf{x}})^{T} |x \in class\text{ }i\} }$$
(4)
$${\mathbf{S}}_{b} = \sum\limits_{i = 1}^{c} {P(i)({\mathbf{u}}_{i} - {\mathbf{u}})({\mathbf{u}}_{i} - {\mathbf{u}})^{T} }$$
(5)

\(N = n_{1} + n_{2} + \cdots + n_{c}\) \({\mathbf{x}}_{1} ,{\mathbf{x}}_{2} ,{\mathbf{x}}_{3} , \ldots ,{\mathbf{x}}_{m}\) \(P(i)\) \({\mathbf{R}}^{n}\) Consider a data set \({\mathbf{X}}\), which consists of \(m\) measurement in a high dimensional space. Let \(ni\) denotes the cycle of class \(i\) and there are \(c\) classes in total in the sample spaces. The total number of samples is \({\mathbf{u}}i\) denotes mean vector of class \(i\) and \({\mathbf{u}}\) denotes mean of the entire data. ···is the prior probability of class \(i\). Then the within-class scatter \({\mathbf{S}}w\) and the between-class scatter \({\mathbf{S}}b\) are defined as follows:

Since the optimal projection direction is with minimized deviation within each class and maximized deviation between classes of the data at the same time. In other words, the value of \({\mathbf{S}}w\) should be as small as possible and the value of \({\mathbf{S}}b\) should be as big as possible. Hence, the optimal projection space \({\mathbf{W}}opt\) is got by introducing the Fisher discriminant analysis representation equation:

$$J_{\text{Fisher}} (\varphi ) = \arg \mathop {\hbox{max} }\limits_{{\mathbf{W}}} \frac{{\varphi^{T} {\mathbf{S}}_{b} \varphi }}{{\phi^{T} {\mathbf{S}}_{w} \varphi }}$$
(6)

Where, is a column φ vector. Fisher linear discriminant analysis is to select the vector φ of maximized \(J{\text{Fisher}}(\varphi )\) as the projection direction, and its physical meaning is that maximises the between-class scatter and minimises the within-class scatter. To calculate the generalized characteristic equation of Eq. (6): \({\mathbf{S}}_{b} \phi = \lambda {\mathbf{S}}_{w} \phi\), and the \({\mathbf{W}}opt\) is the first \(d\,(d < c)\) eigenvectors correspond to large changes in training patterns.

3.2 Generalized LDA based on euclidean norm

LDA overemphasizes the influence of edge classes which are able to more easily separated from other classes for projection direction. However, the definition of between-class scatter matrix does not avoid to produce some neighboring classes, even some overlapped classes. The classes with the largest deviation dominate the principle direction in the projected space. Therefore it is such a strategy just separates the classes with large deviation and ignores the classes with small deviation from the mean, which still does not enhance the differentiation of the data in total and then recognize the undesirable results. For a detailed demonstration, we suppose there are four classes of two-dimensional space, their distribution as shown in Fig. 3.

Fig. 3
figure 3

Projection direction of LDA and ELDA for 4 two-dimensional classes

As can be seen from the illustration, the class 1, 2 and 3 are located in close proximity to each other, but the class 4 is far from them. According to the definition of LDA algorithm, we obtain the projection direction A. However, if we construct to project the samples under this direction, data from class 1, 2 and 3 will be overlapped with each other. It is such a projection reduces the differentiation between class 1, 2 and 3 which may be confusing in the recognition. Since LDA overemphasizes the influence of edge classes, so this paper adds the distance weights into the definition of between-class scatter. Obviously, the projection direction B is more standard by using ELDA, which also indicates that the data from direction B provide the largest entropy and easy to get better classification.

The detailed of ELDA for training process is illustrated as follows:

  1. 1.

    According to the description of Sect. 3.1, the whole training set can be represented by \({\mathbf{D}}_{c} = \{ {\mathbf{d}}_{11} ,{\mathbf{d}}_{12} , \ldots ,{\mathbf{d}}_{{1n_{1} }} ,{\mathbf{d}}_{21} , \ldots ,{\mathbf{d}}_{{2n_{2} }} , \ldots ,{\mathbf{d}}_{c1} , \ldots ,{\mathbf{d}}_{{cn_{c} }} \} .\)

  2. 2.

    Computing the mean vector of class \(i\) by

    $${\mathbf{u}}_{i} = \frac{1}{{n_{i} }}\sum\limits_{j = 1}^{{n_{i} }} {{\mathbf{d}}_{ij} }$$
    (7)
  3. 3.

    Obtaining the within-class scatter matrix \({\mathbf{S}}w\) and between-class scatter matrix \({\mathbf{S}}b\):

    $${\mathbf{S}}_{w} = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{n_{i} }} {P(i)({\mathbf{d}}_{ij} - {\mathbf{u}}_{i} )} } ({\mathbf{d}}_{ij} - {\mathbf{u}}_{i} )^{T}$$
    (8)
    $${\mathbf{S}}_{b} = \sum\limits_{i = 1}^{c - 1} {\sum\limits_{j = i + 1}^{c} {P(i)P(j)\frac{{({\mathbf{u}}_{i} - {\mathbf{u}}_{j} )({\mathbf{u}}_{i} - {\mathbf{u}}_{j} )^{T} }}{\Delta ij}} }$$
    (9)
    $$\Delta ij = \left\| {ui - uj} \right\| = \sqrt {\left( {ui - uj} \right)^{T} \left( {ui - uj} \right)}$$
    (10)

    where, \(P(i)(j)\) denotes the prior probability of class \(i\) and \(j\). \(\Delta ij\) denotes the Euclidean norm of two samples. As everyone knows, the greater distance between the classes, the more easily to classify. So we introduce the distance between each samples as a weight to calculate the between-class scatter matrix. That is to say, we should think about the direction of the between class, instead of the size of the distances.

  4. 4.

    Computing the first \(k\) nonzero eigenvalues \(\lambda 1,\lambda 2, \ldots ,\lambda k\) of \({\mathbf{S}}w^{ - 1} {\mathbf{S}}b\) and their associated eigenvectors to form the eigenspace matrix \({\mathbf{W}}opt = \left[ {e1,e2, \ldots ,ek} \right]\).

  5. 5.

    Constructing the transform matrix \({\mathbf{W}}opt\) to project an original distance signal \({\mathbf{d}}i\) into a vector \({\mathbf{P}}i\) in the d-dimensional eigenspace, \({\mathbf{P}}i\) is the final projected trajectory of the \(i\)th class:

    $${\mathbf{P}}_{i} = \left[ {{\mathbf{d}}_{i1} ,{\mathbf{d}}_{i2} , \ldots ,{\mathbf{d}}_{{in_{i} }} } \right]^{T} {\mathbf{W}}_{opt}$$
    (11)

Accordingly, a sequential movement of gait can be mapped into a manifold trajectory in such a parametric eigenspace.

3.3 Recognition

In the recognition phase, the classifier selection for gait recognition is crucial [25, 26]. In this paper, we use support vector machine (SVM) for classification which has excellent generalization ability. SVM is currently widely used in the field of statistical learning and it used in all kinds of recognition and classification problems. It is good at solving linear inseparable data by introducing kernel functions. In addition, SVM classification has two advantages: first, linear classifier design can generate nonlinear decision boundary; second, using kernel function allow the classifier to apply to the vector spaces with flexible dimensions.

However, the most basic form of SVM is a binary classifier. In order to use it in multi-class classification problems, we have to extend it to multi-level classifiers via various kernel functions. Moreover, many researchers have proposed that we can combine multiple binary classifiers into a multi-class classifier for solving the multi-class classification problem. The widely used multi-class SVM methods are “one-against-one” and “one-against-rest”. This paper selects the faster classifier with “ one-against-one “ combination method [27]. We construct k (k − 1)/2 classifiers (k sample categories) in total, and select two different classes for training every time to complete classification. In the process of classification, we use the voting mechanism for classification: every SVM classifier is considered as a voter (k (k − 1)/2 voters in total), and then classify the test samples into the class which gives the most votes.

4 Experimental results

4.1 Gait databases

While there are several benchmark gait datasets used for gait signal computing in the literature, most of them contain only a few subjects. To better verify the efficacy of a gait recognition algorithm, a large dataset containing more subjects is strongly desirable and the corresponding result is more convincing. So far, there are some publicly available gait databases containing more than 100 subjects used for gait recognition, such as the SOTON [28] and CASIA databases [29], which were collected by University of Southampton and Institute of Automation, Chinese Academy of Sciences, respectively.

CASIA (Dataset B) gait database is created in 2005, it contains 124 people. Every subject has 6 natural walk sequences of each view angle (ranging from 0° to 180°, with view angle interval of 18°). The frame size is 320 × 240 pixels and the frame rate is approximately 25 fps. An example is provided in Fig. 4. SOTON gait database is published in 2001, it includes a total of 115 people, each perspective has six or more sequences. Figure 5 shows some images from SOTON.

Fig. 4
figure 4

Sample images for a subject of CASIA (dataset B) database with centroids

Fig. 5
figure 5

Sample images for a subject of SOTON database with centroids

Compared with the size of face databases, the gait databases are smaller, so we use the leave-one-out cross-validation rule with the databases to estimate the performance of the proposed method. Each time we leave one image sequence out as a test sample and train on the remainder. After computing the similarity differences between the test sample and the training data, the SVM is then applied for classification. In other words, everyone in the databases has six normal gait sequences, so we select five sequences as training sets, and the remainder sequence as testing sets according to the leave-one-out cross validation rule.

4.2 Results and analysis

In the experiments, we first get each gait cycle sequence as it mentioned in Sect. 2.1, but the lengths of the cycle are not the same because of the different stride frequency. Then calculating the distance between the boundary pixels and centroid on each silhouette according to Sect. 2.2. After that, calculating the within-class scatter matrix and the between-class scatter matrix of the whole training set according to the ELDA algorithm described in Sect. 3, and obtain the optimal projection matrix by introducing Fisher function. We keep the first 22 eigenvalues and their associated eigenvectors to form the eigenspace transformation matrix. Each gait sequence will be accordingly projected into a manifold trajectory in the projection matrix. That is, each distance signal can be mapped into one point in a 22-dimensional eigenspace. Finally, we recognize the test samples with multi-class SVM classifier.

As we all know, kernel function is the key technology of SVM, the choice of kernel will affect the learning ability and generalization ability of SVM. So we conducted experiment with different types of kernel to select the appropriate kernel function and also to get the correct classification rate (CCR) and the recognition time (RT). The results of kernel selection on different databases as shown in Table 1. The hardware platform is Intel Core2 Quad 2.5 GHz, without any hardware and software acceleration and the algorithm is written in visual C++ and Matlab.

Table 1 Selection of Kernel Function

The performance of the three kernels in the databases show that the linear kernel acquires better classification than the gaussian and the polynomial kernels. Because in the sample space of this paper, the sample dimension is much larger than the number of samples. As we known, the number of support vectors of a non-linear classifier will be too much in high dimensional space, and then it will cause the problem of “over fitting”, so the classification rate is lower than the linear classifier and the processing time is longer than the linear classifier. This leads us to believe that the boundary between the gait sequences may be approximately linear. So we choose the Linear kernel function with the highest correct classification rate and the shortest recognition time for gait recognition. In addition, the Linear kernel is adequate for our needs because of the low dimension and the small amount of data in this paper.

In order to verify the ELDA algorithm of this paper is more general than LDA algorithm on gait recognition, we conduct experiments on the SOTON and CASIA (Dataset B) databases with SVM. Although we choose the SVM as classifier, we still want to show that SVM is more suitable for this algorithm, so in the process of actual validation of this paper, we also conduct experiments on the nearest neighbor classifier with respect to class exemplars (ENN) using the distances between each boundary pixel and the centroid as its features as well proposed in paper [15]. Table 2 gives out the recognition results conducted on two different databases.

Table 2 Comparison of the Algorithm on Different Gait Databases

From the experimental results, the proposed algorithm is significantly better than LDA algorithm in terms of the accuracy and efficiency on both of the databases. Since the traditional LDA algorithm only emphasizes to dispart the edge class and non-edge class, non-edge class is likely to overlap. Therefore, the classifier is difficult to correctly distinguish. And the calculating process of the support vectors will be very slow so the processing time is longer than the method of ELDA and SVM. But in the method of ELDA and SVM, support vectors can be found directly by lowering the dimension of matrix. In addition, the SVM classifier is more suitable than ENN for the experiments in this paper. It is because different people do not keep the same pace and stride in different cases, so it can not achieve correct classification only through calculating the distance to assign the test sequence to the class which belong to the nearest samples. It can be seen from Figs. 4, 5 that the gait sequences from CASIA (dataset B) are more standard than the gait sequences from SOTON, so the values of CCR from CASIA (dataset B) are higher.

Generally, a biometric identification system has two different kinds of working mode, namely the identification and verification. As a part of the biological characteristics, gait also has such characteristics. For two different tasks of gait recognition system, it also employ different performance evaluation indexes.

As mentioned above, the identification performance of gait recognition has been completely reflected by CCR. Moreover, the False Alarm Rate (FAR) curve of the gait recognition is introduced in Fig. 6 to illustrate the verification performance. FAR is defined by Number of False Alarms (NFA) and Number of Total Outclass Data (NTO):

Fig. 6
figure 6

FAR curves

$${\text{FAR}} = \frac{\text{NFA}}{\text{NTO}} \times 100\;\%$$
(12)

So we estimate FAR curves via the leave-one-out rule in verification mode. That is, we leave one example out, train the classifier using the remaining, and then verify the left out sample on all 124 classes. Note that, in each of these 744 iterations for each viewing angle, there is one genuine attempt and 123 imposters since the left-out sample is known to belong to one of the 124 classes. Obviously, lower FAR indicates the verification performance better. Table 3 shows the corresponding average FARs of the four different methods in Fig. 6.

Table 3 False alarm rate using four combinations

For a global perspective to show our method, we also compare the performance of the proposed algorithm with those of a few recent methods described in [7, 13, 15], respectively. To some extent, they reflect the latest and best work of these research groups in gait recognition. The best results of all algorithms are summarized in Table 4, from which we can see that our method compares favorably with others.

Table 4 Comparison with the popular gait recognition methods in CCR

The results here only provide preliminary comparative performance and may not be generalized to say that a certain algorithm is always better than others. Each method might have its own unique advantages and disadvantages under different testing conditions, so further evaluations and comparisons on more realistic and challenging databases are needed. Furthermore, in order to make the experiments environment close to the reality, the cross-view methods should be considered [30, 31].

5 Conclusion

In this paper, we have proposed a simple but effective method of gait recognition based on ELDA for human identification. Compared with other algorithms, it not only has a high recognition rate, but also has lower computational complexity and faster processing speed. To solve the problem of LDA which can not distinguish overlapping classes correctly, we put the weights of classes into the algorithm on the basis of LDA, that is take the distance between classes into account in calculating the between-class scatter matrix. Moreover, ELDA can reduce the impact of edge classes on feature projection space to optimize and reduce the feature dimension. On the stage of recognition, we applied “one-against-one” multi-class SVM for classification. It is because SVM can use the known knowledge to find the global optimal value of the objective function compared to other classification algorithms which can only get the local optimal solution. Finally, this paper evaluated the overall performance of the proposed algorithm from the identification and verification performance which can also achieve high recognition rate even in a large database.

In the experiments of this paper, the direction is selected by the perpendicular direction of walking and the camera axis, with the camera position fixed. In future research, we will try to add other views to achieve preferable effect for gait recognition.