1 Introduction

Gait recognition is a useful technique to authenticate a person from his/her walking style and has increasingly attracted attention in the computer vision community. Compared with other physiological biometric modalities, such as DNA, fingerprints, irises, and faces, gait has promising advantages because it is difficult to obscure and imitate and is available at a large distance from a camera without subject cooperation (e.g., closed-circuit television (CCTV) footage with low image resolution). Gait recognition has therefore been in increasing demand in many applications for surveillance and forensics [4, 14, 25].

Fig. 1
figure 1

GEI examples of different subjects at various ages

However, gait recognition often suffers from the influence of various covariates, such as view [28, 44], walking speed [10, 45], clothing [12, 20], and elapsed time [31]. Among these covariates, a long elapsed time period, that is, aging, is one of the challenging factors that often occurs in real application situations (e.g., looking for long-lost children and the recognition of fugitives that escaped many years ago). Although the study [31], which is the only work in which researchers have investigated the effect of elapsed time on gait recognition performance, demonstrated that gait does not drastically change within 1 year, studies on gait-based age group classification [1, 6, 7, 30] and age estimation [27] have reported that gait has obvious differences among children, adults, and the elderly in terms of, for example, stride length, body length, and head-to-body ratio, which can be easily understood from Fig. 1. In Fig. 1, the gait of different subjects at various ages are shown with the gait energy image (GEI) [11], which is a simple yet effective gait feature frequently used in the gait recognition community. We can observe an obvious change in the body shape such as head-to-body ratio during the growth of children (e.g., less than 20 years old), which indicates body height growth in the size-normalized gait silhouette images. Additionally, the relative stride to body also tends to be larger for the children and teenagers. When people get older (e.g., after 50 years old), the stride tends to be smaller due to the decrease of physical strength, whereas the middle-age spread and stoop appear for most middle-aged and elderly subjects simultaneously. Consequently, these types of common changes in gait features may therefore significantly degrade the performance of gait recognition under age variation.

To reduce the appearance difference caused by the age gap between the gallery and probe, modeling the gait aging process is one of the possible solutions. More specifically, gait aging modeling includes a simulation with natural aging and reverse aging effects, that is, age progression (i.e., prediction of the future gait) and age regression (i.e., estimation of the previous gait), which is similar to the definition of aging and reverse aging effects considered in existing studies on face analysis [18, 38,39,40,41, 43, 47]. However, in real scenarios, such as surveillance, facial images may not work well because of low image resolution or even occlusion by a face mask. By contrast, human gait can be well perceived because of its unique ascendancy, and hence, gait-based age progression/regression has high application potential. In addition to enhancing the surveillance capabilities for locating lost children and escaped criminals, it could also help with some latent applications, such as entertainment and health examination (e.g., a young person with large middle-age spread in age progression may need to be conscious of staying in shape). However, to the best of our knowledge, there is no prior work that considers age progression/regression using gait features.

A few works [9, 18, 21] have tackled face-based age regression; however, most existing facial aging modeling works have focused on age progression, which is mainly classified into two categories [38, 47]: (i) physical model-based, which models the change in biological facial parameters for texture/shape with age, and (ii) prototype-based, which transforms aging patterns between the prototypes (e.g., averaged faces) of two predefined age groups. Although these methods have achieved reasonable performance, it is not feasible to extend most of them to gait-based age progression/regression for the following two reasons.

  1. (1)

    Most existing face-based age progression/regression methods require paired samples for different ages of the same individual, or even real aging sequences over a long age span to better model the complex biological pattern and physical mechanisms of aging [47], particularly for physical model-based approaches. It is still difficult to collect multiple facial images over a long age span; however, short-term (e.g., 10 years) aging sequences are available in some face databases (e.g., MORPH dataset [34]) or can be obtained from social network sites (e.g., photographs of celebrities on Facebook and Twitter [38]). However, there is no existing gait database that contains such aging sequences over a 1-year span because it is quite difficult or almost impossible to collect gait sequences of the same person for his/her different ages in real-world surveillance scenarios. Therefore, most face-based age progression/regression methods cannot be applied in the case of gait.

  2. (2)

    Existing face-based age progression/regression studies focus more on handling color and texture changes (e.g., wrinkles, muscles, and skin) rather than geometric variations to better reflect the aging patterns of facial images. However, these methods are unsuitable for modeling gait aging patterns because texture and color changes are not apparent in gait features during the human aging process [22], as shown in Fig. 1. By contrast, geometric transformations are more important for gait because the variations caused by aging mainly occur in shape and appearance deformations.

Motivated by these problems, we propose a baseline algorithm for gait-based age progression/regression using generic warping between two age groups, and evaluate the performance of simulations using both human perception and automatic algorithms. Note that, in this paper, we do not aim at technical novelty for the age progression/regression method, but at providing a baseline algorithm and suitable set of experimental evaluation results that contribute to the human gait analysis research community because there is no prior work on image-based gait age progression/regression to the best of our knowledge. The main contributions of this work are twofold.

1. The first attempt at gait-based age progression and regression.

This is the first work on the topic of gait-based age progression/regression to the best of our knowledge. Because of the lack of multiple gait aging samples of the same individual, we adopt a similar notion to prototype-based methods [5, 15, 35] in face-based age progression/regression as a baseline algorithm. More specifically, the GEIs are first averaged over subjects that belong to each age group (referred to as the mean-GEI), and then, the transformation between the mean-GEIs of two neighboring age groups is generated using free-form deformation (FFD) [37] to render the general aging process of humans and preserve some personalized gait characteristics (e.g., walking style and clothing appearance) for a specific subject, simultaneously.

2. Subjective and objective quantitative evaluation through age group classification and cross-age gait identification.

We evaluated the simulation results of the proposed method on the OU-ISIR Gait Database, Large Population Dataset with Age (OULP-Age) [46], which is the world’s largest gait database with age information. Similar to face-based age progression/regression works [40, 41], we conducted both subjective (human perception) experiments and objective (machine perception) experiments as quantitative measurements by implementing age group classification and cross-age gait identification to validate both the quality of aging patterns and preservation of identity for the simulation results.

The outline of the paper is as follows. In Sect. 2, we review existing face-based age progression/regression methods and studies on gait analysis relevant to age. We then introduce the proposed gait-based age progression/regression method in Sect. 3 and present various performance evaluations of the proposed method in Sect. 4. In Sect. 5, we analyze several failure cases in the experiments and finally conclude this paper in Sect. 6 and discuss future work on this topic.

2 Related work

2.1 Face-based age progression/regression

Currently, extensive studies on face-based age progression have been conducted, with approaches mainly classified into two groups: physical model-based and prototype-based. Physical model-based methods correlate biological facial aging patterns with human age using complex models, such as the statistical face model [18] and-or graph [40, 41], concatenational graph evolution aging model [39], and craniofacial growth model [33]. By contrast, a limited number of works on facial age regression [9, 18, 21] have been based on physical models by simply removing textures from facial surfaces. However, to model the intricate aging mechanisms, sufficient training samples with aging sequences over a long age span for each individual are required, which are almost unlikely to be collected for gait videos.

In an early study on face-based age progression, Burt and Perrett [5] divided the training faces into several age groups and created an averaged face as the prototype of each group, with the transformation between prototypes serving as the aging model. Based on this, other prototype-based approaches [13, 15, 36] were proposed subsequently to involve more individual characteristics. Shu et al. [38] proposed an age group-specific dictionaries-based age progression method that synthesizes the aging pattern formed by the learned dictionary bases and an extra personalized facial pattern (e.g., a mole). Although some of these approaches have achieved relatively better performance, they still require short-term paired samples of the same subject. Moreover, to present bidirectional simulations, that is, both age progression and age regression, prototype-based methods may need to inversely retrain the model [47].

To date, deep learning-based approaches [43, 47] have achieved state-of-the-art performance. In particular, Zhang et al. [47] proposed a conditional adversarial autoencoder (CAAE) that first achieved age progression and regression simultaneously without using the paired samples of each person. Although it presented reasonably good simulation results, it is still unsuitable to be simply applied in the case of gait, considering the differences between face and gait aging patterns (e.g., texture changes, such as wrinkles on a face, do not exist in a gait).

Fig. 2
figure 2

Overview of the proposed method

2.2 Age-related gait analysis

Effect of elapsed time on gait recognition In [31], Matovski et al. studied the effect of elapsed time on gait recognition performance, which is the only work that considers this problem. To investigate the pure influence of elapsed time, the other covariates, such as the clothing worn by the subjects and the environment, were controlled during image capture. The experimental results illustrated that gait is relatively invariant over a short time period (i.e., 9 months), and hence, a short-elapsed time period between the gallery and probe does not significantly affect gait recognition performance. However, no study has examined the effects of a long-elapsed time period (e.g., 10 years) on gait recognition because of the difficulties of data collection.

Gait-based age group classification and age estimation By contrast, there is a rich body of literature on gait-based age group classification and age estimation [1, 6, 7, 22, 24, 27, 30], which are based on the analysis of various gait features. Davis [7] classified children (3–5 years old) and adults (30–52 years old) by analyzing the gait differences in terms of leg length, stride width, and stride frequency; Begg et al. [1] classified young people (28.4 years old average and 6.4 years standard deviation) and the elderly (69.2 years old average and 5.1 years standard deviation) using minimum foot clearance data. In [30], the frequency-domain feature [28], which is another famous appearance-based gait feature, was used to classify four age groups: children (under 15 years old), adult males, adult females, and the elderly (over 65 years old). Chuen et al. [6] investigated the correlation of gait features (e.g., stride length, body length, and head-to-body ratio) among children and adults.

Researchers have also made great progress on gait-based age estimation, with the approaches typically using the gait feature (e.g., GEI) combined with an age estimation model, including Gaussian process regression [26, 27], multilabel-guided subspace [22], and ordinary preserving manifold analysis [23, 24].

The aforementioned studies on gait-based age group classification and age estimation provide evidence that the human gait or more specifically, the gait feature, contains discriminative aging patterns, which display the feature differences among different ages, and provides the possibility of gait-based age progression and regression.

3 Gait-based age progression/regression using FFD

3.1 Overview

An overview of the proposed method is shown in Fig. 2. Using the training set composed of multiple GEI samples with various age values, we first compute the mean-GEIs by averaging all GEI samples that belong to each predefined age group, which represent a general gait of each age group that mitigates the gait variations among different individuals. We then generate the transformation fields between mean-GEIs from two adjacent age groups because the gait changes between neighboring age groups are relatively small, and this results in less distortion in the deformation. In the testing phase, given an input GEI sample and its age, we first morph it into its nearest neighboring age groups with the corresponding generic transformation fields and then simulate the subsequent age groups by taking the morphed GEI as the input, which renders the entire aging process from the current age step by step.

The details of the proposed method are given in the following sections.

3.2 GEI representation

Compared with the other gait feature representations [16, 28, 42], GEI is the most popular and widely used gait feature in gait analysis studies because it is simple yet effective. Hence, we also use GEI as our gait representation.

As a preprocess, gait silhouettes are first extracted from the captured gait video sequences through background subtraction-based graph-cut segmentation [29]. The size-normalized and registered silhouette sequences are then obtained by normalizing the height and registering the region center [28]. After detecting the gait period from the size-normalized and registered silhouette sequences [28], GEI G(xy) is extracted by averaging the silhouettes over one gait period T [frames] as

$$\begin{aligned} G(x,y) = \frac{1}{T} \sum _{t = 1}^{T} I(x,y,t), \end{aligned}$$
(1)

where I(xyt) is the binary silhouette value at position (xy) from the tth frame in the size-normalized silhouette sequence, with 0 and 255 indicating the background and foreground, respectively. Examples of GEI can be found in Fig. 1. We can see that GEI not only successfully reflects the static body parts (i.e., where the intensity value is 255), but also represents well the dynamic gait feature (i.e., where the intensity is between 0 and 255), such as arms and legs.

3.3 Geometric transformation using FFD

To better render the change of gait characteristics during the human aging process, we first divide the ages into several age groups. The mean-GEI is then computed by averaging over all GEI templates for each age group considering its insensitiveness to the variations among individual subjects (e.g., different clothes and various walking poses), and hence, the gait diversity, in addition to the effects of gait fluctuations within the gait period (e.g., look down temporarily), is mitigated by the mean-GEI. Consequently, the mean-GEI is able to represent a general gait of each age group (i.e., a prototype), and the differences among mean-GEIs can well reflect the inter-variations among different age groups for a general gait aging process by mitigating intra-variations for each age group. In addition, the mean-GEI also contains the gait individuality with different intensities (e.g., an intensity value close to 255 in the torso indicates a slim subject, whereas an intensity value between 0 and 255 indicates an overweight subject) simultaneously.

Because there are large variations among the GEIs of different subjects even within the same individual age, the finer age groups (e.g., individual ages) retain more detailed aging patterns but result in more variations in the mean-GEI of each age group due to the reduced number of training samples (especially for elderly subjects), whereas the coarser age groups lead to less detailed descriptions of the change in gait characteristics but create a more stable mean-GEI representation for each age group. Thus, we consider the appropriate age groups realizing the trade-off between the stability in representation of mean-GEI and the sufficient descriptions of the whole gait aging process.

We next utilize a warping field between two neighboring mean-GEIs to reflect the general gait changes between adjacent age groups. To achieve this, we use the notion of FFD with piecewise linear interpolation because FFD is suitable for representing the transformation of non-rigid objects (e.g., the human body) due to its high flexibility [37]. Moreover, it can preserve the derivative continuity of adjacent regions and never corrupt appearance-based gait features [8], which helps to maintain the personalized gait characteristics of specific subjects. Given a mean-GEI pair from neighboring age groups, that is, a source and target, a traditional transformation typically transforms the source into the target so as to minimize the difference between the target and transformed source. However, to achieve a bidirectional simulation, that is, age progression and regression, an inverse transformation from the target to the source needs to be retrained, which causes asymmetry between the source and target and increases the computational cost. We therefore adopt an intermediate transformation that transforms both the source and target into the intermediate state, which has been applied in cross-view gait recognition [8] and thus could freely render age progression and regression simultaneously because of the symmetric and equalized deformation degree between two directions (i.e., deformation from the source to the intermediate state equals the reverse of that from the target to the intermediate state).

Fig. 3
figure 3

Illustration of the transformation. a Original source mean-GEI from age group 5–9. b Intermediate warping field from the source to the intermediate state. c Transformed source mean-GEI using the warping field in b. d Original target mean-GEI from age group 10–14. e Intermediate warping field from the target to the intermediate state. f Transformed target mean-GEI using the warping field in e. Comparing c and f, it is obvious that the source and target mean-GEI become very similar after the transformation, which demonstrates the effectiveness of the FFD-based intermediate transformation

Let a pair of source and target mean-GEIs from neighboring age groups i and \(i+1\) be \(\bar{G}_{i}, \bar{G}_{i+1}\in \mathbb {R}^{H \times W}\)\((i = 1, \ldots , N-1)\), respectively, where W and H are the width and height of the GEI, respectively, and N is the number of defined age groups. We first allocate a set of control points on both \(\bar{G}_{i}\) and \(\bar{G}_{i+1}\) and then define the transformation from source \(\bar{G}_{i}\) to the intermediate state using a set of two-dimensional displacement vectors \(\vec {u}_{i}\) on the control points. An entire warping field \(F(\vec {u}_{i})\) from the source to the intermediate state is subsequently obtained using piecewise linear interpolation, and similarly, the warping field from target \(\bar{G}_{i+1}\) to the intermediate state is represented by \(F(-\vec {u}_{i})\). Consequently, we optimize the displacement vectors \(\vec {u}_{i}\) by minimizing the difference between the transformed source \(\bar{G}_{i} \circ F(\vec {u}_{i})\) and transformed target \(\bar{G}_{i+1} \circ F(-\vec {u}_{i})\) in the intermediate domain as

$$\begin{aligned} \vec {u}^{*}_{i} = \arg \min _{\vec {u}_{i}} \Vert \bar{G}_{i} \circ F(\vec {u}_{i}) - \bar{G}_{i+1} \circ F(-\vec {u}_{i}) \Vert _{F}^{2} + \lambda R(\vec {u}_{i}), \end{aligned}$$
(2)

where \(\circ \) indicates a transformation operator and \(R(\vec {u}_{i})\) is a linear elastic smoothness term on the displacements between adjacent control points [19], with a hyperparameter \(\lambda \) controlling the smoothness. The optimization of Eq. (2) is solved using gradient descent, and the visualization of the intermediate transformation is illustrated in Fig. 3.

Fig. 4
figure 4

Example of year-by-year age progression. a Input testing GEI with the age of 5 years old. b Intermediate warping field between the age groups 5–9 and 10–14. c Simulations from 6 to 10 years old, with the corresponding morphing ratio used for each simulation result. From the year-by-year simulations in c, we can clearly observe the gradual transition from 5 to 10 years old

3.4 Morphing-based age progression/regression

Once we obtain the warping fields between each pair of adjacent mean-GEIs, we simulate the future/previous gaits of a given subject by first morphing the input GEI into its nearest neighboring age group and then extending it to the other age groups sequentially. Additionally, based on the assumption about a uniform transition between adjacent age groups, we can present the simulation of any target age by adjusting the morphing ratio during the transformation, which helps to improve the effectiveness for cross-age gait recognition in surveillance and the attractiveness for applications, such as entertainment, with a complete year-by-year aging/reverse aging process. By contrast, many face-based age progression/regression approaches [38, 40, 41, 47] only generate a single simulated face for each age group.

Let the absolute age difference between two neighboring mean-GEIs \(\bar{G}_{i}\) and \(\bar{G}_{i+1}\) be \(|\bar{a}_{i} - \bar{a}_{i+1}|\). Because we consider an intermediate transformation between the source and target mean-GEI pairs, the absolute age difference between the intermediate state and original source/target approximates to \(|\bar{a}_{i} - \bar{a}_{i+1}|/2\) because we assume a uniform transition between adjacent age groups. Given input testing GEI \(G_{\mathrm{in}}\) with its age of \(a_{\mathrm{in}}\) that belongs to age group i, simulation \(G_{\mathrm{sim}}\) at the age \(a_{\mathrm{sim}}\) that belongs to age group \(i+1\) is obtained as

$$\begin{aligned} G_{\mathrm{sim}} = {G}_{\mathrm{in}} \circ F(\alpha \vec {u}^{*}_{i}), \end{aligned}$$
(3)

where morphing ratio \(\alpha \) is determined based on the ratio of the age difference between the simulation and input GEI, and the age difference between the intermediate state and original source as

$$\begin{aligned} \alpha = \frac{2|a_{\mathrm{sim}} - a_{\mathrm{in}}|}{|\bar{a}_{i} - \bar{a}_{i+1}|}. \end{aligned}$$
(4)

Thus, we render the simulation of any intermediate age inside the adjacent age groups, as shown in Fig. 4.

4 Experiments

4.1 Datasets and parameter settings

We adopted the OULP-Age dataset [46],Footnote 1 which is the world’s largest gait database with age information, to evaluate the proposed method. It is composed of 63,846 subjects with ages ranging from 2 to 90 years old and has a good gender balance (the ratio of males to females is close to one). The detailed distribution of subjects’ gender and age groups in 5-year intervals is shown in Fig. 5, which illustrates that the dataset provides a large number of samples for training reliable transformations for age progression/regression. Half the samples (i.e., 31,923 subjects) randomly chosen from the entire dataset were used to train the general transformation for each adjacent age group pair, and the other half were used for the qualitative and quantitative evaluation of the proposed method. One gait period is detected for constructing a single GEI image for each subject in this dataset.

We defined the age groups based on the physical growing process (e.g., body height) of a natural human. According to the biological constraints of human [2, 3], the children and teenagers (i.e., less than 20 years old) tend to have relatively rapid growing speed, whereas the growth of adults (i.e., more than 20 years old) becomes much slower because their bodies have grown into a mature physical state [2, 48]. We therefore divided the ages into 11 age groups, that is, 0–4, 5–9, 10–14, 15–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, and over 80, where the age interval was set to be finer (i.e., 5 years) for children and teenagers while that was set to be coarser for adults (i.e., 10 years),Footnote 2 which is also often done in face-based age progression/regression studies [38, 47].

4.2 Qualitative evaluation

4.2.1 Comparison with ground truth

To evaluate the performance of the proposed method, we first qualitatively compared the simulation results of the mean-GEI with the corresponding real images for each age group in Fig. 6, which were the only samples that had ground truth. We performed long age span progression that started from the input mean-GEI of the first age group (i.e., 0–4) to validate the quality of the simulated aging patterns through the entire aging process, from a child to an elderly person. Compared with the true mean-GEIs, we observe that the simulation results of the proposed method were similar to the ground truths, which presented obvious gait changes during the aging process, such as the head-to-body ratio reduction in the growth of children, in addition to the middle-age spread and stoop for the middle-aged and elderly.

Fig. 5
figure 5

Distribution of subjects’ gender and age groups in the OULP-Age dataset

Fig. 6
figure 6

Comparison of the simulated mean-GEIs and true mean-GEIs. The leftmost image is the input of the age progression, whereas the other images in the first row show the simulation results of a typical age within each age group. The second row provides the true mean-GEIs that correspond to each simulation image above. The digit under each image indicates the age of the simulation or age group to which the true mean-GEI belongs

Fig. 7
figure 7

More age progressed/regressed examples using the proposed method. The first two samples are male subjects and the last two are female subjects

4.2.2 Simulation examples of specific subjects

We picked up two male subjects and two female subjects from different age groups to test the simulation results of the proposed method, as shown in Fig. 7. More simulation examples can be found in Fig. 11. It is obvious that the simulations present the natural aging patterns in a human’s gait, in addition to maintaining the personalized gait characteristics (e.g., walking pose and body appearance) of different individuals. Moreover, although the proposed method used a generic transformation across the population, the examples in Figs. 7 and 11 still illustrate that subjects with different walking styles or body shapes could result in different gait changes in the proposed age progression/regression that are consistent with our intuition (e.g., an overweight young person may have a larger spread in his/her middle-age or when elderly than a slim person, and a young person that appears to stoop may become heavily stooped, whereas a young person that has an almost straight back may also remain relatively straight while aging) because the mean-GEIs are averaged across a large number of subjects with various appearances (e.g., walking style and body shape), which makes the warping fields contain some type of different transformations for different gait appearances (e.g., the displacement vectors for a slim body and an overweight body could be different). The examples of the year-by-year aging/reverse aging process using the proposed method are shown in a movie in the supplementary material.

4.3 Quantitative evaluation

Similar to face-based works [40, 41], we quantitatively evaluated the performance of the proposed method for two aspects: (i) the quality of aging patterns, that is, whether the age progressed/regressed image truly presented the age characteristics of the intended age, and (ii) the preservation of identity, that is, whether the simulations retained the identity information of the original subject. We conducted both subjective (human perception) experiments and objective (machine perception) experiments by implementing age group classification and cross-age gait identification to quantitatively measure the two criteria, respectively. Twenty-four participants were asked to participate in experiments for age group classification and cross-age gait identification.

4.3.1 Age group classification

We first investigated the human perception of classifying the full age range into fine age groups (i.e., 11 age groups as defined in the training stage) using real GEI samples. Twenty images were randomly selected for each age group from the entire testing set and then randomly reordered to present to the participants for classification. However, the overall correct classification rate averaged over all age groups and participants was only 28%, and, particularly for age groups over 20 years old, the classification accuracies were only close to the chance-level (i.e., 9%), which is unsuitable for quantitative evaluation. The low accuracy in this human test was mainly because of the slow growth over 20 years old, which resulted in small appearance differences across different age groups.

Fig. 8
figure 8

Correct classification rates [%] of simulations and real images for each age group in the subjective evaluation, objective evaluation using the same samples, and objective evaluation using more samples

Therefore, we reselected seven discontinuous age groups with sparse intervals, 0–4, 5–9, 10–19, 20–29, 30–39, 50–59, and over 70, to be involved in the final subjective evaluation of the performance of the proposed age progression/regression method. Specifically, we randomly chose 40 subjects (20 females and 20 males) aged 20–29 with diverse gait appearances (e.g., different walking poses and different clothes) from the testing set as the inputs, and for each subject, simulated the age progressed and regressed GEI images for the other six age groups; thus, we prepared 240 simulated images with 40 images for each age group. Similar to [40, 41], we also prepared a set of real GEI images with 40 images randomly selected for each age group to compare the performance of age group classification between simulation and real images, which validated the quality of the simulated aging patterns. Some simulated and real images are shown in Fig. 9.

Table 1 Overall correct classification rates [%] of simulations and real images for all age groups in the subjective evaluation, objective evaluation using the same samples, and objective evaluation using more samples

Similarly, we executed objective age group classification on both sets of simulations and real images using directed acyclic graph support vector machine (DAGSVM) [32], which is more suitable for classifying groups with ordered information using a rooted binary directed acyclic graph that integrates multiple binary SVM classifiers. Additionally, we also increased the evaluated samples by eight times, that is, 320 simulations and 320 real images for each age group, to objectively evaluate the performance of the proposed method in a more statistically reliable manner.

The correct classification rates of simulations and real images for each age group for both subjective and objective evaluations using the same samples, in conjunction with the objective evaluation using more samples, are shown in Fig. 8. Additionally, the overall correct classification rates for each image set are reported in Table 1.

Clearly, the automatic algorithm yielded much higher accuracies than human perception because it is more difficult for a human to correctly perceive age-related variations from grayscale GEI images compared with more informative facial images that contain aging patterns in, for example, muscles, wrinkles, and skin. By contrast, for both subjective and objective evaluations of the same samples, the classification of simulations for the middle three age groups, that is, 5–9, 10–19, and 30–39, resulted in better performance or competitive accuracies compared with that for real images; for the other three age groups, there was relatively worse classification performance for simulations than real images. However, the differences in classification performance between simulations and real images for specific age groups has also been observed in the face-based age progression/regression literature [17, 41], and the differences can be reduced by increasing samples in the objective evaluation, which provides more statistically reliable evaluation results. Additionally, because we performed a longer age progression for the last two groups, that is, 50–59 and over 70, for which the performance for simulations was relatively worse, it is understandable that the performance would be improved if we started from a middle-aged input (e.g., 40 or 50 years old), which would lead to a shorter age progression for the elderly.

The overall correct classification rates in Table 1 also illustrate that the age group classification results of the simulations were almost consistent with those of the real images in all three evaluation experiments, which indicates that our age progressed and regressed images contained most of the reasonable age-related variations.

4.3.2 Cross-age gait identification

Table 2 Rank-1 identification rate [%] for each age group, and the overall rank-1 identification rate [%] for both subjective and objective evaluations
Fig. 9
figure 9

Failures and successes from the subjective evaluation of age group classification. a Samples with the ground truth age group of 0–4. b Samples with the ground truth age group of over 70. The first row shows simulation examples, whereas the second row shows real image examples. For each sample, the digits in black represent the ground truth age group and the bracketed digits represent the predicted age group; blue indicates failure and red indicates success (color figure online)

To validate the preservation ability in terms of the personal identity information for the proposed method, we prepared one simulation set and one real image set as the probe and gallery, respectively, in the experiment for cross-age gait identification. However, the OULP-Age dataset contains only a single GEI per subject; hence, we used a sequence from another camera that was different to that used for the OULP-Age dataset, but from a similar viewing angle (i.e., \(90^{\circ }\) azimuth angle),Footnote 3 to construct a gallery. Specifically, the 240 simulation images, in addition to their corresponding inputs of real images aged 20–29 (i.e., totally 280 images) used in the experiment for age group classification constituted the probe set. Because we found that it was difficult for a human to identify one probe subject from a large gallery set (e.g., over 40 gallery subjects) using GEI images, we further divided the probe set into two subsets of equal size (i.e., 20 subjects for each age group in each probe subset) to reduce the gallery size. For each probe subset, the gallery set was composed of 30 real images for age group 20–29, including the same 20 individuals in the probe subset and another 10 distracting samples. Thus, we could evaluate the gait identification (i.e., match one probe from 30 galleries) performance in both cases for the same age and cross-age.

We simply adopted direct matching, that is, we computed the Euclidean distance between the probe and each gallery, as the dissimilarity measure, and choose the nearest neighbor from all the galleries as the final identity of that probe for the objective evaluation of the performance of cross-age gait identification. The rank-1 identification rate for each age group, together with the overall rank-1 identification rate averaged over all probes for both subjective and objective evaluations, are shown in Table 2.

Generally, the experimental results demonstrate that the proposed method maintained most identity information of the input image during the age progression and regression. For both subjective and objective evaluations, the same-age case (i.e., 20–29) obtained the highest rank-1 identification rate, and the identification accuracy decreased with the increase of the age gap between the probe and gallery, which indicates the difficulty of gait identification under a large age difference, in addition to the increased challenge of age progression/regression for a longer age span. Again, the performance of human perception was still much lower than that of the machine, which was even true for the same-age case because the existence of slight differences between the probe and genuine caused by segmentation and/or gait fluctuation easily led to an incorrect matching for human eyes.

5 Discussion

In this section, several failure cases in the subjective evaluation of age group classification and cross-age gait identification are discussed.

Fig. 10
figure 10

Failures from the subjective evaluation of cross-age gait identification. a Probes with the ground truth age group of 20–29. b Probes with the ground truth age group of 0–4. Each column shows a different failed example. The first row is the probe, the second row is the imposter (false match) that most participants mismatched, and the third row is the genuine (true match). The fourth and fifth row shows the difference images between the first and second rows and the differences between the first and third rows, respectively. The imposters and genuines (i.e., galleries) were both in the 20–29 age group

5.1 Failures in age group classification

We first focus on the failure modes from the evaluation of age group classification. The samples were selected from the ground truth age group of 0–4 and over 70, which were located at the end of the age progression/regression sequences and obtained relatively worse results than the other age groups. Both failed simulation and real image examples are shown in Fig. 9 (with blue digits). Additionally, some successful examples are also shown in Fig. 9 (with red digits) for comparison.

From our observation, the failures were mainly because of two reasons: (i) the inconsistency between the person’s chronological age and physiological age and (ii) the unrealistic walking pose or fashion style for the person’s age. The first type of failures existed for both simulations and real images, such as children with relatively small head-to-body ratios, and the middle-aged and elderly who looked very slim or had no apparent stoop in their gaits. These types of failures also occur in the field of gait-based age estimation [27, 46], which often leads to the large overestimation or underestimation of a person’s age from the gait features.

The second failure more frequently occurred in the case of simulations. For example, children appeared to have relatively mature walking poses that were rarely observed in the real 0–4 group because they were transformed from mature adults aged 20–29 using the mean-GEI, which did not present an obvious immature walking pose because of the large diversity in the poses of young children; thus, this is a limitation of the proposed method, which uses a single generic subject-independent transformation. Similarly, this problem also appeared in the fashion style of the elderly (e.g., very few elderly people wear long dresses or have long hair in real scenes), which was particularly true for females, and it is difficult to solve this well using a generic transformation for all subjects.

5.2 Failures in cross-age gait identification

We next analyze some failures in the evaluation of cross-age gait identification. Considering the large performance differences between the subjective and objective evaluations, we chose the examples in which humans failed but the machine succeeded to investigate the difference in the gait recognition capability between the human and machine using the appearance-based gait feature, which has not been discussed to the best of our knowledge.

We provided the probe, the imposter that most participants mismatched, and the genuine for each example in Fig. 10. We first focus on the failures in Fig. 10a for both probes and galleries aged 20–29, that is, gait identification for the same age. Although the probe and genuine were captured almost simultaneously by two cameras with a similar view angle, because of the different segmentation results and gait fluctuation across gait periods, there still existed slight differences between the appearances of the probe and genuine, which easily caused the mismatch for human eyes, particularly when other imposters had some similarities (e.g., similar clothes and similar body shape) to the probe. By contrast, it was much easier for the machine to make a correct match by computing the dissimilarity scores, which can be understood from the difference images shown in Fig. 10, where smaller differences are obtained for the genuine pairs compared with the imposter pairs.

Next, we consider the examples in Fig. 10b, in which the age difference between the probe and gallery was relatively large (i.e., 0–4 vs. 20–29). The appearance differences between the probe and genuine greatly increased, which made it more difficult for a human to determine the true match by simply observing the 30 gallery candidates and thus led to the low accuracy for this age group in the subjective evaluation. The identification rate of the objective evaluation also degraded in this case; however, it still had superior accuracy to that of the human because the genuine had a smaller dissimilarity score than the other imposters for most subjects.

Although the performance of subjective evaluation is much worse than that of the objective evaluation, we noticed that there are large differences among the results originated from different annotators. Actually, two of the twenty-four participants in our experiments were experts in gait recognition, whereas the remaining twenty-two participants were general people that were not familiar with observing GEI images. The statistics of the rank-1 identification rates received from all participants for identification under both the cases of same age and cross-age are shown in Table 3. Based on the standard deviation, it is obvious that there exist large variations among the identification performance by different people. While the experts gained almost 100% identification rates which even outperform that of the machine, most of the participants obtained much lower accuracies because they were not skilled at finding dissimilarities from the GEI images compared with the raw gait videos, which resulted in the worse identification rates than the direct matching consequently.

5.3 Impact of gender

To analyze the impact of gender on the proposed method, we retrained two warping fields for female and male subjects, respectively (i.e., gender-dependent morphing) to compare the performance with the original proposed general warping for both genders (i.e., gender-independent morphing). Because there is no existing work that considers gait-based age progression/regression, we also compared the simulation results of our gender-dependent morphing with a state-of-the-art approach in face-based age progression/regression that considers gender in the training stage, that is, the CAAE [47], which is a unique study that does not use paired face samples from each individual. The same GEI training samples with the same age group definition used in our method were fed into the network for CAAE.

5.3.1 Qualitative evaluation

We first qualitatively compared the simulations of the proposed gender-independent and gender-dependent morphing, in addition to the results of CAAE from two male subjects and two female subjects in Fig. 11.

Table 3 Statistics of the rank-1 identification rates [%] from all participants for both the cases of same age and cross-age identification
Fig. 11
figure 11

Comparison of CAAE, proposed gender-dependent and gender-independent morphing. The leftmost column shows the inputs together with their ages. For each subject, the first row on the right shows the results of the CAAE for each age group; the second and third row shows the results of the proposed gender-dependent and gender-independent morphing for a typical age within each age group corresponding to the CAAE results above, respectively. The first two samples are male subjects and the last two are female subjects

Although the gait aging patterns reflected in the results of the CAAE were consistent with our observations from Fig. 1, we observe that the CAAE lost the gait individuality of a specific subject because it generated the same age progression/regression results for different males/females, which demonstrates that the face-based age progression/regression method is unsuitable to be directly applied to the gait because of the differences between the face and gait in terms of aging mechanisms and data types (i.e., facial images contain rich texture information that is relevant to aging, such as wrinkles, whereas silhouette-based gait does not contain such information).

By contrast, both the proposed gender-independent and gender-dependent morphing keep the personalized gait characteristics while presenting the reasonable gait aging patterns. The results generated by gender-dependent warping have slight differences from that of gender-independent warping mainly in terms of the degree of stoop appeared in the middle-aged and elderly, and the simulations using these two morphing models are, however, still very similar to each other generally, which demonstrates that the females and males have similar gait changes during the aging/reverse aging process.

5.3.2 Quantitative evaluation

Table 4 The comparison of the CAAE, the proposed gender-independent and gender-dependent morphing in terms of correct classification rate and rank-1 identification rate

We next quantitatively evaluated the performance of CAAE, the proposed gender-independent and gender-dependent morphing again using age group classification and cross-age gait identification. The same 40 test subjects aged 20–29 that used in Sect. 4.3 were chosen as the inputs to simulate for the other six age groups, i.e., 0–4, 5–9, 10–19, 30–39, 50–59, and over 70. Only objective evaluation was executed for the simulation images considering its reliability and higher accuracies compared with subjective evaluation.

(a) Age group classification The correct classification rates of simulations by each method are computed for females, males and both genders from each age group, respectively, as shown in Table 4(a).

Because the CAAE generated the same simulation results for different males/females (i.e., one female and one male simulation for each age group), we observe either 0% or 100% correct classification rate for each gender in each age group, which illustrates the results of CAAE somewhat render gait aging patterns although the gait individuality is lost. By contrast, both the proposed gender-independent and gender-dependent morphing obtain better accuracies than CAAE in terms of the overall correct classification rate.

Generally, the gender-dependent morphing yields slightly better results than the gender-independent one, which mainly results from the minor improvements for the elderly females. Therefore, the effects of gender on the proposed geometric warping are subtle considering the quality of simulated aging patterns, which is consistent with the observations from the simulation examples in Sect. 5.3.1.

(b) Cross-age gait identification Similar to Sect. 4.3.2, direct matching was applied to identify the probes including simulations of aforementioned six age groups from the galleries of real images aged 20–29. We compared the rank-1 identification rates of each gender and both genders from each age group, in addition to the overall rank-1 identification rate for the CAAE, the proposed gender-independent and gender-dependent morphing in Table 4(b).

The low identification accuracies of CAAE clearly show that the personal identity is corrupted in its simulated results, which quantitatively demonstrates it is unsuitable to directly apply face-based age progression/regression method to the gait scenario.

Similar to the results of age group classification, only the accuracies of females are slightly improved by considering gender in the training of the proposed geometric warping fields, and hence, similar preservation ability of the gait identity are yielded for the gender-independent and gender-dependent morphing, which is easily understood from the statistics of both genders.

6 Conclusion

In this paper, we presented a baseline algorithm for gait-based age progression and regression, which has not been addressed in the literature to the best of our knowledge. We first divided the entire age range into several age groups and then generated the FFD-based general geometric transformation between adjacent age groups to render age progression and regression for the full age range simultaneously. Qualitative evaluation in conjunction with the subjective and objective quantitative evaluations through age group classification and cross-age gait identification were executed to validate the proposed method for the quality of aging patterns and preservation of identity and to provide several insights for future research on this topic.

One important future direction is to extend age progression/regression to silhouette sequences to render the change in motion by simulating different gait phases, which is more beneficial for potential applications, such as entertainment and health examination. Moreover, rather than using a general transformation across the population, the performance improvement of the proposed baseline method by involving more individuality, such as incorporating clothing-dependent transformations, also needs to be investigated. Additionally, to collect paired samples of individuals with a short age span (e.g., 5 years or 10 years) remains challenging but meaningful future work, which will contribute to the development of a better gait-based age progression/regression approach and the research on cross-age gait recognition tasks.