1 Introduction

Defining and extracting proper object features is a key component of any object recognition pipeline. In this paper, we deal with the problem of gait recognition and propose utilisation of dynamic properties of rough shape skeletons as a gait cycle descriptor. Our contribution is threefold.

  • We introduce the concept of the Skeleton Variance Image and demonstrate that it stores important information about moving human silhouette figures. We show that the Skeleton Variance Image is a powerful gait cycle descriptor which leads us to a significant improvement over the existing state of the art gait recognition rate.

  • We demonstrate that smooth distance fields yield robust extraction of rough skeletal structures which promote stability with respect to shape boundary perturbations.

  • In particular, we demonstrate that solving the so-called screened Poisson equation yields a computationally efficient way to define a family of smooth distance functions with simple and efficient control over their smoothness yielding a skeleton which is significantly more robust compared to the exact distance function.

1.1 Gait Recognition

Gait recognition seeks to identify a person by their walking manner and posture [45]. With applications including surveillance and access control, gait as a behavioural biometric is advantageous over physical biometrics, e.g. fingerprint, given capture without consent or cooperation, unobtrusively, at low resolution and at distance. Early studies in medical [48] and psychophysics [15] demonstrate the uniqueness of gait, and gait recognition has developed significantly since the first computer-based approach by Niyogi and Adelson [49] in 1994. In practical terms, we require robustness to real world covariate factors capable of altering gait appearance and motion which are detrimental to performance, e.g. clothing, bags, shoe type and even elapsed time between capture.

Approaches are split into model-based, model-free and multi-information fusion. Model-based approaches [41, 72] construct gait signatures by modelling or tracking human body segments via anthropometrics [17, 19], model-free approaches [27, 28] disregard human body structure in favour of silhouette-based representations, while multi-information fusion approaches replicate human vision perception by utilising multiple features [40, 69] or biometrics e.g. face [32, 35]. We currently consider single feature and biometric gait recognition, however this is not to say the performance of our proposed approach could be boosted with such efforts; we also find the benefits of low computational cost and image quality insensitivity associated with model-free approaches outweigh the benefits of view and scale invariance associated with model-based approaches.

Considering model-free approaches more in detail, silhouettes commonly serve as the foundation and can be extracted easily from sources such as time of flight, Microsoft Kinect and Lidar; colour and texture are rejected thus ensuring no bias to appearance occurs during gait recognition given motion is more consistent over time.

Skeleton, compared to silhouette, gait representations are few and far between—especially those founded on distance functions. Lack of implementation is linked to boundary perturbation sensitivity from imperfectly extracted silhouettes and the natural self occluding nature of gait. For example, an oversimplified skeleton can be constructed by connecting the silhouette figure centroid to its head and limbs [13], whereas anthropometrics enable a more realistic six joint skeleton [72]. Both examples utilise a gait cycles worth of skeletons which is uneconomical with respect to memory and computational costs; the alternative is to perform the increasingly popular space- and time-normalisation techniques to yield a single, compact 2D gait representation [6, 27, 33, 68, 71, 74].

1.2 Generalised Distance Fields and Distance-Based Shape Features

A generalised distance field is a scalar (vector) field approximating the minimum distance (minimum distance and direction) to a shape with respect to a certain metric. Generalised distance fields and distance-related shape features such as skeletons [9] are widely used in pure mathematics in relation to analysis of Hamilton–Jacobi equations and curvature-driven manifold evolutions [1, 43], computational mathematics [50] in connection to level set methods, computer vision, pattern recognition, and image processing [2325, 54, 76], shape matching [51], computer graphics and geometric modeling [11, 14, 34, 52, 53], computational mechanics [21], CFD and turbulence modelling [66] (the so-called wall distance, the minimum distance to a solid wall is a key parameter in several turbulence models), medical image processing, analysis, and visualisation [36], and many other areas.

Our gait recognition approach deals with smooth distance fields approximated by solutions to the Poisson equation alongside its normalised and screened Poisson equations; results suggest our approach yields an efficient manner of extracting rough shape skeletons associated with the smooth distance fields. Given a sequence of silhouettes representing a gait cycle, the pixel-wise variance of their corresponding skeletons reflect dynamic gait patterns which turns out to be a powerful gait descriptor.

1.3 Validation

Validation of our proposed approach is performed on the largest, latest and most covariate factor rich, standardised publicly available database: TUM Gait from Audio, Image and Depth (GAID). Overall, our representation significantly boosts robustness as we focus on gait motion which is more consistent over time than gait appearance.

2 Smooth Distance Functions

It is well known that the true Euclidean distance function and its corresponding skeleton (medial axis) are very sensitive to small boundary perturbations. In our study, imperfect silhouette segmentation leads to an abundance of boundary noise. As a possible remedy, one can hope that a properly defined smoothed distance function and its corresponding skeleton are less sensitive to segmentation inaccuracies and silhouette boundary noise. Below we exploit a partial differential equation (PDE) approach and consider several PDE-based schemes to generating smooth distance functions.

To the best of our knowledge, the idea of using diffusion-type PDEs for skeleton extraction purposes was first proposed in [64] where the so-called screened Poisson equations were used. While we consider some other PDE-driven schemes for the distance function approximation and skeleton extraction, the screened Poisson equations serve as our main working horse.

2.1 Screened Poisson Distance Function

Our first approach to constructing a family of smooth distance functions explores an asymptotic relationship between the distance function and solutions to screened Poisson equations [67, Theorem 2.3].

Consider a Dirichlet boundary value problem for a screened Poisson equation in a bounded domain \(\varOmega \).

$$\begin{aligned} v-t\Delta v=0\quad \text{ in }\quad \varOmega , \qquad v=1\quad \text{ on }\quad \partial \varOmega , \end{aligned}$$
(1)

where \(t\) is a small, positive parameter. Then, as shown in [67],

$$\begin{aligned} \lim \limits _{t\rightarrow 0}-\sqrt{t}\ln [v({\varvec{x}})] = d ({\varvec{x}},\,\partial \varOmega ), \end{aligned}$$
(2)

where \(d({\varvec{x}},\partial \varOmega )\) is the distance from \({\varvec{x}}\in \varOmega \) to \(\partial \varOmega \). In other words, \(d({\varvec{x}},\partial \varOmega )\) is approximated by

$$\begin{aligned} u({\varvec{x}})=-\sqrt{t}\ln v({\varvec{x}}), \end{aligned}$$
(3)

which defines a smooth distance field and parameter \(t\) controls the smoothing properties of \(u({\varvec{x}})\).

Distance function approximation (3) has been previously employed to extract skeletal structures from grayscale images [64]. An inhomogeneous version of the screened Poisson equation in (1) has been employed [26, 56] to estimate the distance function from a point set. An anisotropic version of (1) was used very recently [14] for tracing geodesics on triangulated surfaces.

It is interesting that the energy corresponding to (1) is a part of the Ambrosio-Tortorelli elliptic regularisation [2] of the Mumford-Shah functional [47]. See also, for example, [57] and [5, Sect. 4.2].

An intuitive explanation of (2) is given in [26] and uses a variant of the so-called Hopf-Cole transformation [20]. Substituting

$$\begin{aligned} v({\varvec{x}})=\exp \left\{ -u({\varvec{x}})/\sqrt{t}\right\} \end{aligned}$$
(4)

in (1) yields

$$\begin{aligned} \frac{\partial v}{\partial x_i}=-\frac{v}{\sqrt{t}}\frac{\partial u}{\partial x_i}, \qquad \frac{\partial ^2 v}{\partial x_i^2}=\frac{v}{t} \left| \frac{\partial u}{\partial x_i}\right| ^2 -\frac{v}{\sqrt{t}}\frac{\partial ^2 u}{\partial x_i^2}. \end{aligned}$$

Thus (1) can be rewritten as

$$\begin{aligned} 0=v-t\Delta v =v\left[ \left( 1-|\nabla u|^2\right) +\sqrt{t}\Delta u\right] . \end{aligned}$$
(5)

This gives a regularised eikonal equation for \({u}({\varvec{x}})\)

$$\begin{aligned} \left( 1-|\nabla u|^2\right) +\sqrt{t}\,\Delta u=0 \text{ in }\varOmega , \quad u=0\text{ on }\partial \varOmega . \end{aligned}$$
(6)

Thus it is natural to expect that \(u({\varvec{x}})\), the solution to (6), approximates the true distance function \(d({\varvec{x}},\partial \varOmega )\) which satisfies the eikonal equation

$$\begin{aligned} |\nabla d|^2=1 \quad \text{ in }\quad \varOmega , \qquad d=0\quad \text{ on }\quad \partial \varOmega . \end{aligned}$$
(7)

Note that (1) is linear and can therefore be easily and efficiently solved numerically by using a sparse system of linear equations. Figure 1 shows the graphs of smooth distance functions (3) for various values of smoothing parameter \(t\).

Fig. 1
figure 1

Graphs of smoothed distance fields (3) for an L-shaped domain, \(t=0.5\) (left), \(t=5\) (middle), and \(t=50\) (right)

2.2 Screened Poisson Distance and Mean Curvature Flow

In the two-dimensional case an interesting relationship between \(v({\varvec{x}})\), the solution to (1), and its level set curvature was derived in [64] and utilised for grayscale image skeletonisation purposes. Below we informally extend the relationship to the multidimensional case.

Let \(\partial \varOmega \) be oriented by its inner normal \( {\varvec{n}}\). It is not difficult to show [22, Appendix B] that the minus Laplacian of the distance function \(d({\varvec{x}},\partial \varOmega )\) yields the mean curvature \(H({\varvec{x}})\) of the distance function level set passing through \({\varvec{x}}\)

$$\begin{aligned} \Delta d = -H, \end{aligned}$$

where we assume that the level set of \(H({\varvec{x}})\) is smooth at \(x\).

Since \(u({\varvec{x}})\) tends to \(d({\varvec{x}},\partial \varOmega )\), as \(t\rightarrow \infty \), it is natural to expect that \(\Delta u\) is close to \(\Delta d\) for small \(t\) values. Thus (5) implies that

$$\begin{aligned} |\nabla u|^2\approx 1-H\sqrt{t}\,,\quad \text{ as }\quad t\rightarrow 0. \end{aligned}$$

Now taking into account that \(t|\nabla v|^2=v^2|\nabla u|^2\), we arrive at

$$\begin{aligned} \left| \frac{\partial v}{\partial {\varvec{n}}}\right| ^2 \approx \frac{1}{t}\left( 1-H\sqrt{t}\right) v^2,\quad \text{ as }\quad t\rightarrow 0. \end{aligned}$$

Therefore, since \(v({\varvec{x}})\) is decreasing in the direction of \( {\varvec{n}}\), we have

$$\begin{aligned} v({\varvec{x}})\approx -\sqrt{t}\left( 1+H\sqrt{t}/2\right) \frac{\partial v}{\partial {\varvec{n}}}\,, \quad \text{ as }\quad t\rightarrow 0. \end{aligned}$$
(8)

In the two-dimensional case, a much more accurate asymptotic relation was derived in [47, Appendix 3, Theorem B]

Similar to [64, Section 2], (8) can be linked to a surface evolution with the normal speed component equal to

$$\begin{aligned} 2/\sqrt{t}+H. \end{aligned}$$

While (8) is not directly related to our study of gait recognition problems, it supports works on multidimensional shape symmetries [63] and may be also useful for investing properties of more general diffuse distance fields [62].

2.3 Poisson and Normalised Poisson Distance Functions

Now let us consider a simpler approach to smooth distance function generation. The approach is based on solving a Dirichlet boundary value problem for a Poisson equation

$$\begin{aligned} \Delta \varphi =-1 \quad \text{ in }\quad \varOmega , \qquad \varphi =0\quad \text{ on }\quad \partial \varOmega . \end{aligned}$$
(9)

This problem serves as a basic mathematical model describing Brownian motion of particles which are born at a constant rate inside \(\varOmega \) and die on \(\partial \varOmega \). The solution to (9), the so-called Poisson distance function, is proportional to the particle density and therefore can be considered as a smooth approximation of the true distance function \(d({\varvec{x}},\partial \varOmega )\) from \(\partial \varOmega \).

Poisson distance functions have been employed for action recognition [24, 25], skeleton extraction [4], turbulence modelling applications [65], and geometric de-featuring purposes [70].

Although the Poisson distance function \(\varphi ({\varvec{x}})\) does not deliver an accurate approximation of the distance function \(d({\varvec{x}},\partial \varOmega )\), a simple normalisation procedure applied to \(\varphi ({\varvec{x}})\) can significantly improve the approximation of \(d({\varvec{x}},\partial \varOmega )\) near \(\partial \varOmega \). Namely, following [61, 65] let us introduce

$$\begin{aligned} \psi ({\varvec{x}})=-|\nabla \varphi |+\sqrt{|\nabla \varphi |^2+2\varphi }. \end{aligned}$$
(10)

Normalisation procedure (10) is inspired by the fact that in the one-dimensional case (9) and (10) reconstruct the distance function precisely [65]. It is straightforward to rewrite (10) as

$$\begin{aligned} \psi ({\varvec{x}})=\frac{2\varphi }{\sqrt{|\nabla \varphi |^2+2\varphi }+|\nabla \varphi |}, \end{aligned}$$

and check that

$$\begin{aligned} \psi =0\quad \text{ and }\quad \partial \psi /\partial {\varvec{n}}=1 \quad \text{ on }\quad \partial \varOmega , \end{aligned}$$
(11)

where \( {\varvec{n}}\) is the outer unit normal to \(\partial \varOmega \). Thus \(\psi ({\varvec{x}})\) approximates \(d({\varvec{x}},\partial \varOmega )\) very accurately near \(\partial \varOmega \). It seems that the second normalisation condition in (11) has not been noticed before.

Of course there are many other possibilities to achieve a similar effect. For example, one can apply the normalisation procedure considered during geometric modelling purposes [55, 58]

$$\begin{aligned} \frac{\varphi ({\varvec{x}})}{\sqrt{|\nabla \varphi ({\varvec{x}})|^2+\varphi ({\varvec{x}})^2}}. \end{aligned}$$

While this and similar normalisation schemes lead to accurate approximations of the distance function near the boundary, they fail to achieve a satisfactory behavior far from the boundary. In contrast, as demonstrated by Fig. 2, (9) combined with (10) generates a good approximation of the distance function.

Fig. 2
figure 2

From left to right Poisson distance function (9), normalised Poisson distance function (9), (10) and screened Poisson distance function (1), (3)

Both the Poisson and normalised Poisson distance functions have lower computational costs compared to the screened Poisson distance functions (1, 3). On the other hand, the latter provides us with an ability to control the amount of smoothing by tuning parameter \(t\) in (1). For example, as shown in the right of Fig. 2, for a sufficiently small \(t\), the screened Poisson distance function delivers a better approximation of true distance \(d({\varvec{x}},\partial \varOmega )\) than the normalised Poisson and Poisson distance functions.

Figure 3 demonstrates a comparison of the Poisson, normalised Poisson, and screened Poisson distance functions. While all the smoothed distance functions demonstrate excellent properties in absorbing boundary perturbations, as shown later in this paper, the possibility to control smoothing properties of distance function approximations is vital for significant improvements in gait recognition.

Fig. 3
figure 3

Distance functions for a silhouette: \( \mathbf {a} \) true distance function; \( \mathbf {b} \) Poisson distance function (9); \( \mathbf {c} \) normalised Poisson distance function (9), (10); \( \mathbf {d} \), \( \mathbf {e} \), and \( \mathbf {f} \) screened Poisson distance function (1), (3) with \(t = 0.5, 5,\) and \(50\), respectively

2.4 \(p\)-Laplacian Distance Functions and \(L_p\!\) Distance Fields

One more approach to approximate the distance function uses a quasi-linear generalisation of the Poisson equation. Namely, let us consider a Dirichlet boundary value problem for the \(p\)-Laplacian

$$\begin{aligned} \mathop {\mathrm {div}}\left( \left| \nabla \varphi _p\right| ^{p-2}\nabla \varphi _p\right) =-1 \text{ in } \varOmega ,\quad \varphi =0 \text{ on } \partial \varOmega , \end{aligned}$$
(12)

with \(1\le p<\infty \). Then it can be shown [8, 37] that

$$\begin{aligned} \varphi _p({\varvec{x}})\rightarrow d({\varvec{x}},\partial \varOmega )\quad \text{ as }\quad p\rightarrow \infty . \end{aligned}$$

Moreover, as demonstrated in [8], for arbitrary \(m>1\), this convergence is strong in the Sobolev space \(W^{1,m}(\varOmega )\).

While \(\varphi _p\) for sufficiently large \(p\) delivers an accurate approximation of the distance function (see Fig. 4 for a simple example), achieving an accurate numerical approximation of the solution to (12) is a complex task compared with the linear PDE problems considered before.

Fig. 4
figure 4

\(p\)-Laplacian distance function (12) for a square. Note how well the creases of the true distance function are approximated by \(\varphi _p({\varvec{x}})\)

It is also worth mentioning that the so-called \(L_p\)-distance fields introduced recently in [7] also allow the user to control an amount of smoothing added to the true distance function. However, according to our numerical experiments, the screened Poisson distance functions tend to distribute smoothing uniformly over the domain, while the \(L_p\)-distance fields apply less smoothing near the boundary and more smoothing far from the boundary.

3 Rough Skeletons

After the pioneering work of Blum [9], skeleton-based shape representations have been widely utilised for the analysis and processing of static and dynamic 2D and 3D shapes [59]. Strong correlations between medial shape structures and perceptual shape organisation [38, 39] remain a subject of intensive research [3].

While the classical medial axis [9] reflects shape organisation, its main drawback is high sensitivity to small-scale boundary perturbations. As the medial axis of an object is closely connected to the distance function from the boundary of the object (the medial axis can be defined as the set of singularities of the distance function), it is natural to expect that a smooth distance function may lead to a more robust shape skeletonisation scheme. Indeed attempts of using smooth distance functions for better (less sensitive) skeletonisation have been made, for example in [4, 18, 25, 64].

Our approach to shape skeletonisation is conceptually similar to those developed in [4, 25, 64], but instead of using second-order derivative operators (e.g. laplacian, curvature, or a curvature-based operators) as employed in these papers, we compute the squared gradient \(|\nabla u|^2\) of a smooth distance function \(u({\varvec{x}})\). We choose the gradient due to the following observation. Assume that the boundary \(C_0=\partial \varOmega \) of \(\varOmega \) is oriented by its inner normal \( {\varvec{n}}\) and consider offset curves \(C_\rho \) obtained from \(C_0\) by shifting each point of \(C_0\) in the direction of \( {\varvec{n}}\) onto distance \(\rho \). Then the skeleton \(S\) of \(\varOmega \) is formed by the first self-intersections of \(C_\rho \), as \(\rho \) increases. One can easily see that these self-intersections move along \(S\) faster than the offset curves \(C_\rho \) move along their normals. Namely, if curve \(C_\rho \) moves with the unit speed, then its self-intersection point moves along \(S\) with speed equal to \(1/\sin \theta \), where \(\theta \) is the angle between \(C_\rho \) and \(S\). This means that the rate of change of the distance function \(d({\varvec{x}},\partial \varOmega )\) at that offset self-intersection point \( {\varvec{x}}\in S\) is given by \(\sin \theta \). Further, if \(\theta \) is small at \( {\varvec{x}}\in S\) (and therefore \(\sin \theta \) is small as well), then the orientation normals at the boundary points corresponding to \(x\) have almost opposite directions and a part of \(S\) near \(x\) reflects important bilateral symmetry properties of \(\partial \varOmega \). Figure 5 illustrates these simple ideas.

Fig. 5
figure 5

Left Illustrating relationship between skeleton \(S\) and offset \(C_\rho \). Middle and Right Examples of computer generated offsets and the skeleton of a given closed curve

Figure 6 demonstrates the advantage of the smoothed distance function gradient for extracting a fuzzy skeleton of a given shape. In practice we use the standard \(3\times 3\) Sobel kernels to estimate the gradient. Figure 7 demonstrates how the squared gradient map \(|\nabla u({\varvec{x}})|^2\) depends on smoothing parameter \(t\) in (1) and (3).

Fig. 6
figure 6

Silhouette (\( \mathbf {a} \)) and its smooth distance function (\( \mathbf {b} \)), fuzzy skeletons defined using gradient (\( \mathbf {c} \)), curvature (\( \mathbf {d} \)), and Laplacian (\( \mathbf {e} \))

Fig. 7
figure 7

Cooler colours correspond to fuzzy skeletons where Sobel kernels are convolved with the smoothed distance function (a) \(t=0.5\), (b) \(t=5\), (c) \(t=50\) (Color figure online)

A rough skeleton is obtained from the fuzzy skeleton by thresholding. In practice, as seen in the left image of Fig. 8, it also detects the silhouette boundary which is subsequently removed by rejecting a small number of \(u({\varvec{x}})\) boundary layers. The resulting skeleton is considerably less sensitive to boundary noise than the true medial axis.

Fig. 8
figure 8

Low gradient magnitude values correspond to skeleton and silhouette boundary (left)—shedding a small number of boundary layers yields the desired skeleton (right)

Note that in contrast to the classical medial axis, our rough skeleton is not a deformation retract of the original shape. For example, the rough skeleton shown in Fig. 8 contains gaps while the silhouette is a simple connected 2D shape. If necessary, Canny’s hysteresis thresholding procedure [12] can be utilised to remove such gaps.

4 Skeleton Variance Image

Over a complete gait cycle, skeleton motion can be extracted by considering how pixel intensity values vary during the skeleton sequence; this prompts our primary contribution—Skeleton Variance Image (SVIM) gait representation.

While in silhouette form, we perform (1) size normalisation to ensure constant height silhouettes and (2) horizontal alignment to centre silhouettes with the centroid of top 10 % figure height as a reference. We perform time-normalisation post skeleton construction to condense the skeleton sequence into a single, compact 2D gait representation by computing the pixel-wise variance. The resulting representation, seen in the rightmost column of Fig. 9, enables visualisation of high and low pixel intensity values corresponding to higher and lower degrees of body motion respectively.

Fig. 9
figure 9

Representation comparison: Gait Energy Image (GEI—baseline), Skeleton Energy Image (SEIM), Gait Variance Image (GVI) and Skeleton Variance Image (SVIM) for training (top), and test sequences: carrying a bag (middle), shoes (bottom)—\(t=5\) where applicable

5 Experimental Procedure

5.1 Validation

The TUM GAID database [28, 30], seen in Fig. 10, is one of the latest, largest and covariate factor rich databases and the first to utilise depth images extracted with the Microsoft Kinect—the database freely provides depth images which have been converted into silhouettes thus enabling research to concentrate on the gait recognition problem as opposed to data preprocessing problems such as silhouette segmentation. Training sequences are based on 155 persons and contain four normal i.e. covariate factor free sequences; test sequences contain two sequences each for: normal (N), carrying a bag (B—consistent across database) and shoes i.e. wearing over shoe covers (S)—see Fig. 10. Time-based test sequences are also captured three months later and contain 16 persons in two sequences each for: time and normal (TN), time and carrying a bag (TB) and time and shoes (TS); conversely these sequences contain coupled covariate factors i.e. time and clothing given the change in weather season. Depth, compared to RGB-based, silhouettes [30] are chosen given their cleaner appearance due to ease of extraction. We focus on persons captured from side views given their greater visibility of dynamic limb motion associated with a higher discriminative nature and greater robustness [44]—this is commonplace in gait recognition. Viewpoint as a covariate factor is another commonly, but often separately addressed, covariate factor however the TUM GAID database considers side views only (see Fig. 10).

Fig. 10
figure 10

TUM GAID database exemplar frames top left to right: normal (\( N \)), carrying a bag (\( B \)), shoes (\( S \)); bottom left to right: time + normal (\( TN \)), time + carrying a bag (\( TB \)), time + shoes (\( TS \))

5.2 Baseline and Comparable Representations

The Gait Energy Image (GEI) [27], seen in the leftmost column of Fig. 9, is our baseline and applies the same procedures outlined in Sect. 4 however using the pixel-wise mean and silhouettes in place of the pixel-wise variance and skeletons respectively. This appearance-based representation permits visualisation of static and dynamic information corresponding to high and low pixel intensity values respectively. We also present two new related representations for enhanced comparison: Skeleton Energy Image (SEIM) and Gait Variance Image (GVI) seen in the middle left and middle right columns of Fig. 9 respectively. The SEIM and GVI are analogous to SVIM and GEI respectively where the pixel-wise mean replaces the pixel-wise variance and vice versa respectively. These representations permits equal comparison of appearance-based (GEI and SEIM) vs. motion-based (GVI and SVIM) representations as well as silhouette (GEI and GVI) vs. skeleton (SEIM and SVIM) representations.

5.3 Distance Function

We compare the behaviour of distance functions extracted via the Poisson and screened and normalised Poisson equations.

5.4 Smoothing Parameter

Given smoothing parameter \(t\) dictates the skeleton thickness produced by the screened Poisson distance function, demonstrated in Fig. 7, we therefore choose a broad range of values to evaluate its effect on gait recognition: small values {t = 0.1, 0.5, 5} correspond to a thinner, more traditional looking skeleton compared to large values {t = 10–90 in steps of 10} which correspond to a thicker skeleton tending towards a silhouette appearance.

5.5 Dimensionality Reduction and Classification

The GEI, GVI, SEIM and SVIM serve as a means to represent gait (\(128\times 178\)—typical for the TUM GAID database [31]) and describe gait when reshaped to a 1D feature vector (22784D). Dimensionality reduction transforms the feature vector into lower dimensional space (154D) by maximising variance and class separability with Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA) respectively [42]. Nearest Neighbour classification utilises the cosine distance measure [31] where rank 1 and rank 5 results are presented demonstrating the correct identity occurring first or in the top five matches respectively. This dimensionality reduction and classification combination is commonly employed by approaches utilising single, compact 2D gait representations like our baseline [27], and is advantageous in situations where training sequences are few.

6 Results and Discussion

6.1 Smoothing Parameter \(t\) Behaviour

We first consider at how \(t\) affects the performance of the SEIM and SVIM representations seen in Fig. 11. Across covariate factors we can see only the normal (N) and shoe (S) sequences behave consistently across \(t\) which may be attributed to their similarity to training sequences; remaining sequences (B, TN, TB, TS) contain significant silhouette-based appearance differences compared to training sequences and the resulting skeleton variations cause covariate factors to prefer varying \(t\). Given how differently covariate factors effect silhouettes and therefore our skeletons, inconsistency with preferred \(t\) can be seen as advantageous as we could more effectively target covariate factors—especially should covariate factor detection be applied as a future pre-processing stage.

Fig. 11
figure 11

TUM GAID database rank 1 performance with respect to \(t\) for SEIM and SVIM representations and sequences: normal (N), carrying a bag (B), shoes (S), time and normal (TN), time and carrying a bag (TB), time and shoes (TS), weighted average

We are currently interested in the weighted average performance as we desire \(t\) which is most effective over a varying range of covariate factors. First to notice is the significant performance jump, regardless of covariate factor, from \(t=0.1\) to \(t=0.5\) which is attributed to \(t=0.1\) producing an overly thin skeleton risking considerable segmentation at branch points especially. Weighted average performance wise, we can see a subtle performance trend where the SVIM and SEIM decrease and increase respectively with larger \(t\) values; this is linked to the SVIM and SEIM preferring a thinner, more traditional looking skeleton compared to a thicker skeleton tending towards a skeleton appearance respectively. We therefore suggest small (\(t = 5\)) value for the SVIM, however as pointed to us by one of the reviewers of the paper, scaling the image \(\varOmega \) by a factor \(s\) (while keeping its resolution fixed) and assuming that the solution \(v(x)\) to (1) remains invariant leads to scaling the smoothing parameter \(t\) by \(s^2\). This means that in our current model no optimal \(t\) exists if the image size and resolution are not specified—note that \(t\) may also be database dependent.

6.2 Comparison to GEI Baseline

Table 1 compares rank 1 and rank 5 SVIM, SEIM, GVI and GEI performances across covariate factors with respect to Poisson, screened Poisson and normalised Poisson distance functions. For this table we choose the screened Poisson distance function yielding the best weighted average performance with respect to smoothing parameter \(t\).

Table 1 TUM GAID database rank 1 and rank 5 performance for representations: GEI (baseline), and SEIM, GVI and SVIM, and sequences: normal (N), carrying a bag (B), shoes (S), time and normal (TN), time and carrying a bag (TB), time and shoes (TS), weighted average; distance functions are based on: Poisson, normalised Poisson and screened Poisson schemes

6.3 Covariate Factor Performance Trends

Normal (N) and shoe (S) sequences perform highly given their appearance similarities to training sequences. Note the shoe sequences cause little gait appearance and motion alterations, whereas shoe types such as heels and flip flops may cause greater alterations and subsequently cause increased misclassification [10]. Bag carrying (B) sequences show poorer performances given the significant appearance alterations caused; bags appear as a mass of pixels around the back or a bend in silhouettes and skeletons respectively, see Fig. 9—note that bags also cause the body to lean due to compensation for a shifted centre of gravity. Time-based sequences (TN, TB, TS) cause significant issues performance wise, halving performance in some cases; see [46] for further information regarding time as a covariate factor during gait recognition. The primary cause of misclassification is due to appearance alterations caused by clothing which is a hidden covariate factor given the time (months) between capture. Clothing as a covariate factor is often addressed separately e.g. in the CASIA B database [73, 75]. Overall, these trends apply to both appearance-based and motion-based, and silhouette and skeleton approaches.

6.4 Appearance- vs. Motion-Based Representations

We can see significant performance differences between appearance-based and motion-based representations across the database. Especially during time-based sequences, motion-based representations often double that achieved with appearance-based representations—this occurs given gait motion is considerably more consistent over time compared to gait appearance. This observation leads us to recommend motion-based representations given their ability to overcome the majority and especially more complex real world covariate factors presented by the database.

6.5 Silhouette vs. Skeleton Representations

A pattern exists where combining silhouette and appearance-based representations (GEI) is favourable while skeleton and motion-based representations (SVIM) is superior overall, therefore this is what we recommend for gait recognition. The SVIM is successful as it places emphasis on body motion as opposed to covariate factor motion; for example, a rucksack undergoes motion due to natural gait motion (visible especially in the GVI in Fig. 9), where the skeleton represents the rucksack as a mere bend in the skeleton compared to a mass of static and dynamic pixel values for silhouette representations.

6.6 Distance Function Behaviour

While the distance function constructed from the normalised Poisson provides performance increases over the Poisson, we find the screened Poisson superior and is advantageous given the tunable smoothing parameter \(t\) which provides a performance boost. With respect to time, the Poisson distance function is the fastest and the normalised and screened Poisson are successively slower to implement. However given our gait recognition approach is not geared towards real-time processing, we favour the screened Poisson for its superior person discrimination.

6.7 General Recommendations

We have demonstrated the variance aspect of our SVIM to be a useful tool during gait recognition given gait motion is more consistent over time compared to gait appearance. The SVIM paired with the screened Poisson distance function offers significant flexibility due to the tunable smoothing parameter \(t\). Note that we only suggest a general recommendation for smoothing parameter \(t\) instead of promoting an optimised parameter explicitly due to how performance changes with (a) silhouette quality e.g. missing head or limbs due to imperfect extraction, (b) silhouette creation i.e. RGB versus depth images, (c) image size, (d) databases and even (e) applications. While this means we could achieve greater performance with alternative smoothing parameters \(t\), we have none the less demonstrated the effectiveness of the SVIM with the screened Poisson distance function.

7 Comparison to State of the Art

Table 2 compares our SVIM to state of the art approaches including the Gait Energy Volume (GEV) [60] and Depth Gradient Histogram Energy Image (DGHEI) [29]. The GEV is analogous to the GEI where 3D binary voxels are averaged in place of 2D silhouettes, while the DGHEI averages Histograms of Oriented Gradients (HOG) [16] descriptors captured during an image sequence. While the DGHEI outranks the GEV due covariate factor generalisation, the SVIM is superior overall, especially during time-based sequences, primarily given gait motion is more consistent over time compared to appearance; a 9.9% weighted average performance increase over the DGHEI exists due to the combined efforts of skeleton and motion-based representations achieving superior covariate factor handling and generalisation.

Table 2 Existing versus proposed TUM GAID database performances: normal (N), carrying a bag (B), shoes (S), time and normal (TN), time and carrying a bag (TB), time and shoes (TS), weighted average

8 Conclusion and Future Work

We have demonstrated an efficient approach to extract skeletons via the screened Poisson equation with tunable smoothing parameter \(t\). This combined with skeleton and motion-based representations yields our proposed SVIM which is capable of superior covariate factor generalisation despite the tough time-based covariate factors posed by the TUM GAID database. The SVIM owes its success due to (a) utilising gait motion which is more consistent over time than gait appearance and (b) skeletons which place emphasis on gait motion as opposed to covariate factor motion for greater covariate factor handling compared to silhouettes-based representations. Future work considers extension to action recognition combined with more advanced learning and classification tools (e.g. SVM).