1 Introduction

The number of people suffering from chronic diseases is constantly rising. Today, more than three quarters of the elderly population are suffering from chronic diseases, independent of the economic, social, and cultural background [21]. Such diseases can be possibly avoided or decreased if people often undergo a medical examination and find the early symptoms of these diseases [3]. It is, however, difficult for most people to frequently have supports by experts such as medical doctors and therapists. While complete diagnosis systems must be able to find a variety of diseases, we focus on the physical fitness of a lower body, in particular, a gait motion, which is crucial to maintain the quality of life.

As people get older, most of them may have the symptoms of lesions and/or aging on the lower body more or less. For example, a prevalence of knee osteoarthritis ranges from 19 % in women aged 50–59 to 56 % in men in the 70–90 year-old age group [6]. Such symptoms transfer in various ways such as insensitive sensation, and motor malfunction [10, 23]. In terms of the motor malfunction, people may have muscle weakness [12] and/or knee osteoarthritis [1]. These symptoms cause a change in the patterns of a gait motion [5]. The change in the gait motion increases the risk of slipping and falling [2].

The goal of this work is to develop easy-to-use diagnosis systems that support the evaluation of physical patterns in the gait motion. For this evaluation, this paper proposes a method for classifying several symptoms of lesions on the lower body (e.g., knees and ankles), which are observed in the gait motion.

Fig. 1.
figure 1

Lesioned-part identification using a local appearance (upper row) versus an entire-body appearance (lower row). In examples shown in this figure, the right knee of a subject was tightly bandaged. At each frame, gait-phase-synchronized bodies in a natural motion (left side at each frame) and a motion with a lesioned right knee (right side at each frame) are shown. While differences between these two motions at each frame are not significant, the pose of the entire body with the lesioned right knee differs from the one in the natural motion.

For precisely evaluating symptoms observed in gait motions, our contribution is to employ appearance information extracted from the entire body rather than local body parts such as knees and ankles. Figure 1 allows us to intuitively understand the effectiveness of the entire body appearance. In the upper row, the local appearance of a lesioned right knee is shown. This local appearance reveals less difference between a natural motion (shown in the left-hand side at each frame) and a motion with a lesioned right knee (shown in the right-hand side at each frame). On the other hand, we can easily see differences between these two motions in the appearance of the entire body shown in the lower row; for example, the upper body with the lesioned right knee is inclined backward and to the left for balancing. In this paper, a set of gait features is employed and appropriately pruned for robust lesioned-part identification.

2 Related Work

The proposed system employs a depth sensor for the evaluation of a 3D gait motion, which is expected to be more informative for lesioned-part identification than silhoette-based gait motions (e.g., [24, 25]). The depth sensor is able to robustly reconstruct the 3D pose (i.e., 3D positions of joints) of a person of interest [7, 13] rather than a conventional RGB camera. While such a camera can observe people only within its field of view, people are not required to carry any wearable sensors [17] and their motions are not affected by these sensors. In addition, for our goal (i.e., finding the symptoms in a gait motion), 24-hour observation using wearable sensors is not necessarily required.

A temporal sequence of a 3D body pose is defined as a gait motion. From the gait motion, we can extract several features representing physical symptoms caused by aging and/or physical disability. For these features, walking speed, stride, pace, etc. are useful [8]. For example, walking speed, stride, and pace become slow, short, and slow, respectively, due to the motor function decline [2]. As well as the walking speed, its acceleration is different between healthy people and elderly and/or disabled people [4, 16]. It is also known that the anteversion of pelvis becomes smaller and left-right asymmetric due to the hemiplegia arthrosis [9].

The aforementioned features are well known in the literature in physiotherapy and biomechanics. However, these features are extracted from only target body parts/joints in previous work mentioned above. However, we know intuitively that not only the motion of lesioned part(s) but also the one of the entire body is affected as the symptom of aging and/or physical disability. This paper proposes a system that identifies lesioned part(s) based on gait motion features extracted from the entire body. While the closest work to our system is presented in [11], this method [11] analyzes the motion variation of the entire body under an assumption that a lesioned body part is known. For the purpose of finding such lesioned part(s), this is a kind of chicken-and-egg problem. On the other hand, our proposed system finds lesioned part(s) and estimated their symptoms (i.e., how severe the symptom is) from gait features extracted from a temporal sequence of a 3D body pose.

3 Overview of the System

The overview of the proposed system is illustrated in Fig. 2.

In its learning step (bottom row in Fig. 2), a number of gait patterns including the symptoms of lesions on various body parts are observed by a depth sensor. Each observed depth sequence is used to estimate the sequence of 3D body poses (i.e., skeletons) by using a pose estimation model [7, 13], as shown in “Gait measurement” in Fig. 2. From the sequence of estimated 3D skeletons, a set of gait features are extracted. Since each set of gait features is labeled with the lesioned body part, a classifier (“Classification model” in Fig. 2) can be trained.

Fig. 2.
figure 2

Overview of the proposed system. The learning and identification steps are shown in lower and upper rows, respectively. Data are enclosed by rectangles and their flows are visualized by arrows in the figure.

Fig. 3.
figure 3

Spurious lesions given for our experiments.

When the depth sequence of a gait motion is observed for lesioned-part identification, its gait features are extracted as in the learning step. Then the set of the gait features is classified in order to identify the lesioned part.

4 Gait Features for Lesioned-Part Identification

4.1 Dataset Collection for Lesioned-Part Identification

For realistic data and applications, it’s better to collect and use the data of people who are actually lesioned. It is, however, difficult to collect a number of such dataFootnote 1. Instead of the real data, in this paper, spurious lesions were given to several body parts by bandaging or immobilizing them as shown in Fig. 3. Then gait motions with these spurious lesions were collected and used for classification. The spurious lesions, which were determined under the direction of a physiotherapist for our experiments, emulate the functional decline of joints caused by aging; for example, bending and stretching the knee [15] and the plantar flexion and dorsiflexion of the ankle [14].

The following gait motions were observed for classification:

  • Natural gait motion: Natural gait motions of physically-healthy people with no bandage.

  • Gait motion with bandaged knee(s): When the knee of a subject was bent 90 degrees, it was bandaged weakly or tightly. Each of both knees was bandaged. In addition to two by two combinations (i.e., left/right knees bandaged weakly/tightly), the gait motion of each subject was observed also when both knees were bandaged tightly. In total, five conditions were observed. The motion of a bandaged knee is similar to a decrease in the articular range of motion due to aging.

  • Gait motion with immobilized knee(s): When the knee of a subject was straight, it was immobilized with a splint. In this immobilization condition, the knee was almost fixed. This condition is similar to the symptom of a muscle strain. Each of both knees was immobilized separately. That is, two conditions were observed for each subject.

  • Gait motion with bandaged ankle(s): When the ankle of a subject was bent 90 degrees, it was bandaged. For experiments, left ankle, right ankle, and both ankles were bandaged separately. In total, three conditions were observed for each subject.

Eventually, 11 conditions (1 + 5 + 2 + 3) of gait motions were observed for each subject.

4.2 Gait Features Representing the Motion of the Entire Body

As described in Sect. 3, the temporal sequence of a 3D skeleton is obtained by a depth sensor. In our experiments, a Kinect V2 sensor and its SDK were used together for 3D skeleton estimation.

A set of gait features are detected from the 3D skeletons of one gait cycle. This gait cycle is extracted from the observed sequence of the 3D skeleton so that each cycle begins and ends when the left knee is in front of and furthermost from the pelvis. All gait cycles are temporally normalized so that all of them consist of the same number of frames. In the normalized gait cycle, the 3D skeleton of each frame (denoted by \({{\varvec{P}}_{i}}\) for i-th frame) is synthesized from observed skeletons with linear interpolation:

$$\begin{aligned} {{\varvec{P}}}_{i}= & {} \left( \frac{d_{i^{(+)}}}{d_{i^{(-)}} + d_{i^{(+)}}} \hat{{{\varvec{P}}}}_{i^{(-)}} \right) + \left( \frac{d_{i^{(-)}}}{d_{i^{(-)}} + d_{i^{(+)}}} \hat{{{\varvec{P}}}}_{i^{(+)}} \right) \end{aligned}$$

where \(\hat{{{\varvec{P}}}}_{i^{(-)}}\) and \(\hat{{{\varvec{P}}}}_{i^{(+)}}\) denote the observed skeletons whose observed times are closest to the time of i-th frame. \(\hat{\mathbf{P }}_{i^{(-)}}\) and \(\hat{\mathbf{P }}_{i^{(+)}}\) are observed prior to and later than \({{\varvec{P}}_{i}}\), respectively. \(d_{i^{(-)}}\) and \(d_{i^{(+)}}\) denote respectively the time differences from the time of i-th frame to the observed times of \(\hat{{{\varvec{P}}}}_{i^{(-)}}\) and \(\hat{{{\varvec{P}}}}_{i^{(+)}}\).

From each frame in the normalized temporal sequence, the following gait features are computed in our proposed system:

  1. 1.

    Relative xyz positions between the mid-spine and each joint/endpoint

  2. 2.

    Relative xyz velocities between the mid-spine and each joint/endpoint

  3. 3.

    Relative xyz accelerations between the mid-spine and each joint/endpoint

  4. 4.

    Angle of each joint

  5. 5.

    Angular velocity of each joint

  6. 6.

    Walking velocity along a moving direction

  7. 7.

    xyz positions of a body centroid

  8. 8.

    xyz velocities of a body centroid

From a 3D skeleton reconstructed using a Kinect V2, relative positions, velocities, and accelerations from the mid-spine to the head, neck, pelvis, both shoulders, both elbows, both wrists, both groins, both knees, both ankles, and both feet (in total, 17 points) are computed for the aforementioned features 1, 2, and 3, respectively. Joint angles and angular velocities are computed in the spine, neck, both shoulder blades, both shoulders, both elbows, both groins, both knees, and both ankles (in total, 14 points) for the features 4 and 5, respectively. The joint angle (radian) of joint j is represented by the 3D position of j and those of j’s parent and child joints. A body centroid is determined based on a weight distribution in a human body; according to a report from a physiotherapy, the weights of the head, neck, both arms, torso, and both legs are 4 %, 3 %, 10 %, 48 %, and 35 %, respectively.

In addition to eight features listed above, each of them is subtracted from the mean of natural gait motions in training data. These features are called mean-normalized features. In total, 8 + 8 = 16 features are extracted from each frame. All of these features extracted from all frames are concatenated to compose a gait feature vector.

The dimension of the above mentioned gait feature vector is huge. Specifically speaking, the concatenation of 16 gait features is a 376-dimensional vector: \(( (17 \times 3) + (17 \times 3) + (17 \times 3) + 14 + 14 + 1 + 3 + 3 ) \times 2 = 376\). For improving the discriminativity of the gait feature vector, its dimension is reduced by two schemes, namely backward search (a.k.a. backward feature elimination) and linear discriminant analysis (LDA), as follows:

  • Step 1: Assume the current dimension of a gait feature is N; initially, \(N = 376\). For backward search, a LDA is trained with N components of all training samples. In the same way, N LDAs are trained with \((N-1)\) components of all training samples so that one of N components in the gait feature vector is not used for each of N LDAs.

  • Step 2: All of these \((N+1)\) LDAs are tested with validation data; a cross validation scheme is performed.

  • Step 3: If one of LDAs with \((N-1)\) components (which is trained without k-th component) gets the best score in this validation, k-th component is removed from the gait feature vector. Then, go back to Step 1. Otherwise, namely if the LDA with N components is the best, the backward search ends. This LDA is used for lesioned-part identification.

With training data stored in the selected LDA space, k-Nearest Neighbor is employed for lesioned-part identification.

5 Experiments

For experiments, the gait motions of 10 subjects were observed in a laboratory setup (Fig. 4). All of them were adult males, whose height and weight was 170 cm and 63 kg on average. Each of 11 gait motions was captured 10 times for each subject. From each gait sequence, one gait cycle was extracted. In total, 110 gait cycles were captured for each subject. In all experiments shown below, leave-one-out cross validation was used.

Fig. 4.
figure 4

Environment for capturing our gait dataset. A subject begins to walk in the acceleration area. While the subject walk through the measurement area, the temporal sequence of gait features are extracted.

The effect of dimensionality reduction was investigated with each individual data (i.e., all data observed from each subject). As the result of backward search, gait features (1), (5), (8), (1\('\)), (6\('\)), (7\('\)), and (8\('\)), which are described in Sect. 4.2, were selectedFootnote 2. (\(l'\)) denotes the mean-normalized feature of feature (l). Comparison between results using all features and the selected features is shown in Table 1. This table shows the percentage of correctly-identified lesions. The performance was increased by around 10 % on average.

Table 1. Effect of dimensionality reduction using backward search for lesioned-part identification. The mean of all gait motions is shown in each subject. For identification in i-th subject, training data of only i-th subject was used to train the LDA.
Table 2. Confusion matrix of lesioned-part identification for 11 gait motions. All results were obtained with the selected features. Values above 10 % are highlighted in read cells.

With the selected features, a confusion matrix is computed. Each value in the confusion matrix is the mean of results for all subjects. Table 2 shows the results of lesioned-part identification when all 11 kinds of gait motions were classified. This result is equivalent to the one shown in Table 1; the mean of diagonal values in Table 2 is 81.1, which is also shown as the mean in Table 1. As expected, it can be seen that it was difficult to discriminate between weakly- and tightly-bandaged knees.

Table 3. Confusion matrix of lesioned-part identification for 9 gait motions. Both for right and left knees, tightly- and weakly- bandaged motions were merged. All results were obtained with the selected features.
Table 4. 3-class classification accuracy. This classification identifies the symptom on a lesioned part, after the initial classification, whose results are shown in Table 3, determines the lesioned part (right or left knees in examples shown in this table). The mean of classification accuracy is 77.1 %.
Table 5. Results of 3-class classification using local features. While this classification employs only several features extracted around a target part (i.e., right or left knee), the results shown in Table 4 were obtained with the features of the entire body. The mean of classification accuracy is 65.0 %.
Table 6. Confusion matrix of lesioned-part identification using gait features extracted from only the lower body.
Table 7. Confusion matrix of lesioned-part identification using gait features extracted from only the upper body.

To improve the discriminativity between different symptoms (e.g., weak and tight bandages) in the same body part, a two-step identification scheme was tested. In this scheme, only the position of lesion is identified initially. In our experiments, motions of tightly- and weakly- bandaged joints as well as immobilized joints were merged in each of right and left knees. If such a part is considered to be lesioned by the initial identification, the degree of the lesion is determined by a classification model that is trained by only the samples of lesions in this part; for example, the motions of weakly- and tightly-bandaged knees and immobilized knees are used for training for each of right and left knees. The results of initial classification (i.e., classification among 7 gait motions) are shown in Table 3. Table 4 shows the results of the second step where the motions of weakly- and tightly-bandaged knees and immobilized knees are classified after the initial classification. The final mean score of this two-step identification is 83.1 %Footnote 3, which is 2 % above the result shown in Tables 1 and 2.

Next, the effect of gait features extracted from the entire body is validated. This effect is validated by comparing the results obtained with the entire-body gait features with those with local parts. Results obtained with only local features are shown in Table 5. Specifically, the local features consist of gait features (1), (2), (3), (4), (5), (1\('\)), (2\('\)), (3\('\)), (4\('\)), and (5\('\)) where (\(l'\)) denotes the mean-normalized feature of feature (l), all of which are described in Sect. 4.2. Comparison between Tables 4 and 5 reveals the preponderance of the entire-body features over the local features.

For further analysis between entire-body and local features, confusion matrices obtained with gait features of lower- and upper-body parts are shown in Tables 6 and 7, respectively. The mean values of the percentages of correctly-identified lesions in Tables 6 and 7 are 74.1 % and 62.4 %, respectively, while the one of the entire-body features is 81.1 %. From these results, it has been demonstrated that the entire-body features are useful for lesioned-part identification.

In all experiments shown above, a training data for each subject consists of only this subject’s data; a classification model was trained individually. For universal models that can be used by anybody, a classification model must be trained by training data of all subjects and applied to anybody. To examine the effectiveness of the proposed system for such universal models, the percentage of correctly-identified lesions was computed with a leave-one-out cross validation procedure. The mean percentage among all subjects and all gait motions was 66 %, which is much lower than the one computed with individual models (i.e., 81.1 %). This is a natural consequence because people have their own gait patterns, which are utilized for gait recognition [22].

Improving the performance of lesioned-part identification using a universal model is one of important future issues. To this end, gait features robust to individual variation should be explored.

6 Concluding Remarks

This paper proposed a system using a depth sensor for finding the symptoms of lesions on the lower body. For finding the lesioned part and its degree of lesion, the proposed system employs a set of gait features representing the motion of the entire body rather than local features around each body part. Compared with the local features, the effectiveness of the entire-body gait features was demonstrated in experiments; 11-class classification rates were improved: for example, 74.1 % by the features of the lower body vs 81.1 % by the features of the entire body.

Future work includes experiments with a large number of elderly people rather than people with spurious lesions. As discussed at the end of Sect. 5, it is important to improve lesioned-part identification using universal models for more general use. For basic techniques for estimating more accurate 3D pose sequences, a human motion prior [1820] is useful.