1 Introduction

Developed countries are dealing with the rapid aging of their populations. Consequently, this could overwhelm a society’s capacity for taking care of its elderly members. Projections show that the percentage of population aged over 65 in the developed countries will rise from 7.5% in 2009 to 16% in 2050. In addition, the ratio of the working-age population (between 15 and 64 years) to the population aged over 65 is predicted to decline from 4.3 to 2.3 [26]. Typically, the elderly choose one of the following options for their way of living:

Nursing homes

The elderly usually wait for a long time to be accepted into nursing homes, and living there is expensive (in some countries, the costs are higher than the average income of an older adult). The elderly can be socially active in nursing homes, but they are less independent than when in their own homes. If the employees of nursing homes are attentive enough, they can manually detect changes in movement that indicate health problems in the elderly.

Elderly health care at home

This type of living provides some independence and comfort for the elderly, since they live in their own homes. However, it is also expensive and in most cases the visits of nursing staff are rare. Therefore, it is possible that health problems develop during the time when no medical staff or relatives are close by to the elderly to detect them.

Living in their own homes

Most elderly people prefer this option for as long as possible if they are healthy. The main reason is that this allows them to be independent. However, if the elderly person gets some health problem, there is a possibility that nobody detects it and medical help is not provided soon enough.

To prevent dangerous situations arising for the elderly when living alone in their homes, a system for the automatic, ubiquitous health care of the elderly is proposed, for which techniques for the early detection of common health problems manifested in the gait have been developed in the presented study. In the event that the system was to recognize a health problem, it would notify a physician and show him/her an explanation of the automatic diagnosis in the form of a visualization of the kinematic model. Therefore, the elderly would receive constant health monitoring in their homes and workload of the physicians would be decreased; although they would still have the possibility to confirm or reject the automatic diagnosis. In this case the elderly would have constant, ubiquitous monitoring, providing them with more safety and confidence while living in their homes.

The target health problems for automatic recognition are: hemiplegia (usually the result of a stroke), Parkinson’s disease, pain in the leg and pain in the back. The fifth health state for recognition is a normal health state, used as a reference health state to the other four health states, i.e., health problems.

The gait of the user is captured with the motion-capture system, which consists of the tags attached to the body. The positions of the tags are acquired by the sensors that are situated in the apartment. The resulting time series of the position coordinates are analyzed with machine-learning algorithms [13] in order to recognize the specific health problem. Due to the results of several investigations concerning privacy issues, e.g., of the FP7 project Confidence [6, 16], which showed that the elderly do not like being monitored by cameras that record video, only time series of positions of the tags instead of the video from cameras were used in this study.

The objective of the presented research study is twofold:

  • First, to discover to what extent the automatic recognition of health problems using a motion-capture system and several machine-learning methods is feasible.

  • Second, to investigate the classification accuracy achievable using various machine-learning algorithms, various numbers and placements of tags on the user’s body and various amounts of noise in the tag coordinates. The tag placement aims for a trade-off between the usability and the technical requirements—the users prefer as few tags as possible, but too few tags cannot ensure sufficient accuracy. Findings of this investigation can affect further development of the care systems for the elderly, including the motion-capture system selection.

The paper is organized as follows. In Section 2, related work from the fields of movement recognition and sensor appliances used for the task of movement recognition is presented. In Section 3, the health-problems-recognition system, which is based on the novel semantic features, described in the same section, is proposed. After defining the machine-learning setting in Section 4, which is used for the prediction of the target health problems, the performance of various classifiers is evaluated in the experimental Section 5. In Section 6, a prototype application for the explanation of the interpreted health state is presented. Section 7 concludes the paper and presents further work.

2 Related work

Motion capture

In order to perform automatic recognition of health problems manifested in the gait (or any type of movement), the movement must first be captured. There are several approaches to motion capture; three most appropriate for the task will be mentioned here. The first approach uses inertial system, normally composed of MEMS (Micro-Electro-Mechanical Systems)-based accelerometers and/or gyro sensors [11, 23]. The second commonly used approach is a computer vision, which uses video from cameras for the reconstruction of the human-body movement [27, 30]. The third approach uses cameras in combination with tags attached to the body. Usually, infra-red (IR) cameras are employed and the body posture is reconstructed from the position of the retro-reflective tags [9], as in the approach presented in this paper.

There are also some specific measurement devices for the capture of tremor – a symptom in Parkinson’s disease – e.g., sensors for a measurement of the angle of the joint deflection in tremor-type joint movements [22]. However, such a sensor system is too large and would prevent users from carrying out the activities of daily living if the system was to be worn all day. It has a major drawback in comparison to the approach presented in this paper, because it cannot automatically recognize Parkinson’s disease or any other health problem.

Recognition of health problems

In related work [7, 10, 20], physicians usually diagnose health problems that are manifested in the gait by visually observing the user’s gait. If they cannot decide on a certain diagnosis easily, they use a semi-automatic approach. In that approach, the capture of the movement using a motion-capture system is followed by a manual observation of the time series of the movement (such as the time series of the angles of legs, etc.).

The same procedure is used for the pre-rehabilitation planning (e.g., physical therapy) and the post-rehabilitation evaluation to quantify the difference in movement between those states. Also some special medical tests, e.g., a test for the assessment of balance, are used. However, none of those approaches can provide a constant real-time observation of the elderly at home, for the early recognition of changes in movement that indicate a health problem or progress in rehabilitation at home.

An example of the semi-automatic approach is [19], where a system for the long-term monitoring of the gait in Parkinson’s disease is presented. The characteristics of every stride taken were acquired using an ankle-mounted sensor array that saved the data to a pocket PC. Also that approach cannot be used for the early automatic recognition of health problems, but only for monitoring the progress of disease for a known Parkinson’s disease patient.

The automatic distinguishing between health problems such as hemiplegia and diplegia with a classification accuracy of 92.5% is presented in [14]. This was achieved with self-organizing maps, whose features were wavelet-transformed gait characteristics, such as walking speed and stride length. The study used similar motion-capture system as the approach presented in this paper but more tags and less noise, which makes it difficult to apply in real cases.

In order to present the motion-recognition field more generally, fall detection and activity recognition will also be mentioned here. The studies are broken down by the choice of motion-capture system.

Fall detection and activity recognition using inertial sensors

Fall detection with accelerometers and gyro sensors is relatively common, particularly when using simple threshold algorithms [1, 12]. Several reports claim to achieve classification accuracies close to 100%, but only in specific tests. With a more advanced approach using the one-class support vector machines, whose features were accelerations, changes in acceleration, etc., usable for more general set of falls, an accuracy of 96.7% was reported [29].

Accelerometers can also be used for activity recognition. Five tri-axial accelerometers distinguished 30 physical activities of various intensities with an accuracy of 94.9% [25]. This was achieved with C4.5 decision trees using various time- and frequency-domain features.

Fall detection and activity recognition from video

Fall detection and activity recognition from video are also relatively common. In an example of fall detection [27], objects in the video were first identified and then rules were applied to the aspect ratios of their bounding boxes, their horizontal and vertical gradients and angles, achieving an average accuracy of 83%. In an example of activity recognition [30], ten states related to the activities of daily living were distinguished with an average accuracy of 74.1%. This was achieved by first reconstructing the human body in the video and then matching its state to models defined using a specialized language. The problem with video is that it violates a user’s privacy and thus its use is restricted in real life.

Fall detection and activity recognition from video and tags

Authors of the paper [24] were using similar motion-capture system to the one in this paper, consisting of 43 body tags sampled with 30 Hz. For distinguishing between seven activities related to military operations an accuracy of 76.9% was reported. This was achieved with the support vector machines, whose features were the tag coordinates belonging to two postures separated by 1/3 of a second, reduced in number to 20 using principal component analysis.

Another paper on activity recognition [21] used 41 body tags sampled with 120 Hz to distinguish between 21 dance gestures, reporting an accuracy of 99.3%. The gestures were represented with Gaussian mixture models of the joint angles. The high accuracy can in part be attributed to the high quality of the input data, the strictly defined gestures and the fact that the testing was always conducted on the same dancer as the training. It is questionable whether such a large number of tags is applicable in real life.

Investigation of the optimal placement of motion-capture devices

An important part of the research presented in this paper is the study of the impact of the placement of the tags on the user’s body and the amount of noise in the tag coordinates on the classification accuracy. The closest works found in the literature investigated the placement of accelerometers for fall detection [2, 12]. Their finding was that the head provides the optimal accuracy, but is impractical; the wrist is not appropriate; and the waist is a good option.

The review of the related work has revealed numerous research studies performed in the fields of fall detection and activity recognition (also [6, 8, 15]); however, there is still a large amount of research needed in the related field of automatic recognition of health problems manifested in the movement. Therefore, the work presented in this paper is addressing the latter.

3 Movement-based health-problem recognition system

3.1 Targeted health problems

The proposed health-problem recognition system was developed with focus on four health problems and a normal walking as a reference in accordance with the suggestions received from the collaborating medical expert. The specific health problems for recognition were suggested on the basis of occurrence in the elderly aged over 65, the medical significance and the feasibility of their recognition from movement. The following four health problems were chosen as the most appropriate [7, 10, 20]:

  • Parkinson’s disease: Is a degenerative disease of the brain (central nervous system) that often impairs motor skills, speech, and other functions. The symptoms are frequently tremor, rigidity and postural instability. The rate of the tremor is approximately 4–6 Hz. The tremor is present when the involved part(s), usually the arms or neck, are at rest. It is absent, or diminished with sleep, sedation, and when performing skilled acts.

  • Hemiplegia: Is the paralysis of the arm, leg and torso on the same side of the body. It is typically the result of a stroke, although diseases affecting the spinal cord and the brain are also known to cause this state. The paralysis hampers movement, especially walking, and can thus cause falls.

  • Pain in the leg: It resembles hemiplegia in that the step with one leg is different from the step with the other. In the elderly this usually means pain in the hip or in the knee.

  • Pain in the back: It is also similar to hemiplegia and pain in the leg in the inequality of steps; however, the inequality is not as pronounced as in walking with pain in the leg.

3.2 Construction of the features for machine learning

A physician usually diagnoses the target health problems while observing a patient’s gait (i.e., the walking pattern) [7, 10, 20]. For the proposed automatic recognition system to perform the same task, the relevant gait characteristics have been transformed into computable features. Since the walking patterns of patients with the observed four health problems and normal gait look similar to each other, a physician must pay attention to many details that need to be considered also in the features. Because the features are based on medical knowledge for the specific health problems, they are semantic features. Thirteen semantic features were proposed and tested in the presented study. They are based on the tag locations, placed on a person’s body, as shown in Fig. 1.

Fig. 1
figure 1

Placement of tags for a definition of the features for subsequent machine learning

Before the definition of the features the measures used in more than one feature are defined:

  • The magnitude of the vector v is defined as:

    $$ \left\| v \right.\left\| = \right.{\sqrt {{\sum\limits_{j = 1}^n {{\left( {v_{j} } \right)}^{2} } }} }, $$
    (1)

    where j is the number of the current coordinate and n is the quantity of all coordinates (n=3 for (x,y,z)-coordinate system).

  • The subscripts a, b and d denote the tag labels as shown in Fig. 1 (e.g., a=5 denotes the right elbow).

  • Each recording consists of k time samples.

  • Average distance between the tag a and the tag b is defined as:

    $$ \overline{{d_{{a,b}} }} = \frac{{{\sum\limits_{i = 1}^k {{\left( {{\sqrt {{\sum\limits_{j = 1}^n {{\left( {{\left( {m_{a} {\left( i \right)}} \right)}_{j} - {\left( {m_{b} {\left( i \right)}} \right)}_{j} } \right)}^{2} } }} }} \right)}} }}}{k}, $$
    (2)

    where m a (i) and m b (i) denote the i-th time samples of the positions of the tags (in (x,y,z) coordinates), j is the number of the current coordinate and n is the quantity of all coordinates (n=3 for (x,y,z)-coordinate system).

  • Angle φ a,b,a,d (i) denotes the i-th time sample of the angle between the vectors ν a,b (i) and ν a,d (i), and is defined as:

    $$ {\varphi_{{a,b,a,d}}}(i) = \arccos \left( {\frac{{{\nu_{{a,b}}}(i) \cdot {\nu_{{a,d}}}(i)}}{{\left\| {{\nu_{{a,b}}}(i)} \right\| \cdot \left\| {{\nu_{{a,d}}}(i)} \right\|}}} \right), $$
    (3)

    where ν a,b (i) denotes a vector between the i-th time samples of the positions of the tags a and b, and where ν a,b (iν a,d (i) is the scalar product of the i-th samples of the vectors ν a,b (i) and ν a,d (i).

  • \( \mathop{{\max }}\limits_{{i = 1}}^k \left( {x(i)} \right) \) and \( \mathop{{\min }}\limits_{{i = 1}}^k \left( {x(i)} \right) \) denote the maximum and minimum element of the time series x, respectively.

  • Height z a (i) is the i-th time sample of the height (z coordinate) of the tag a.

  • Velocity v a (i) of the i-th time sample is defined as:

    $$ {\nu_a}(i) = \left( {\frac{{{m_{{a, x}}}\left( {i} \right) - {m_{{a, x}}}(i-1),{m_{{a, y}}}\left( {i} \right) - {m_{{a, y}}}(i-1),{m_{{a, z}}}\left( {i} \right) - {m_{{a, z}}}(i-1)}}{t}} \right), $$
    (4)

    where m a,x , m a,y and m a,z denote the x, y, and z coordinate of positions, respectively, and t denotes the time between two consecutive time samples.

The proposed features are listed as follows:

  1. I.

    Absolute difference between 1) the average distance between the right elbow and the right hip (\( \overline {{d_{{5,10}}}} \)) and 2) the average distance between the right wrist and the left hip (\( \overline {{d_{{6,7}}}} \)):

    $$ {F_1} = \left| {\overline {{d_{{5,10}}}} - \overline {{d_{{6,7}}}} } \right|. $$
    (5)
  2. II.

    Average angle of the right elbow:

    $$ {F_2} = \frac{{\sum\limits_{{i = 1}}^k {{\varphi_{{5,4,5,6}}}(i)} }}{k}. $$
    (6)
  3. III.

    The quotient between the maximum angle of the left knee and the maximum angle of the right knee:

    $$ {F_3} = \frac{{\mathop{{\max }}\limits_{{i = 1}}^k \left( {{\varphi_{{8,7,8,9}}}(i)} \right)}}{{\mathop{{\max }}\limits_{{i = 1}}^k \left( {{\varphi_{{11,10,11,12}}}(i)} \right)}}. $$
    (7)
  4. IV.

    Difference between the maximum and minimum angle of the right knee:

    $$ {F_4} = \mathop{{\max }}\limits_{{i = 1}}^k \left( {{\varphi_{{11,10,11,12}}}(i)} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {{\varphi_{{11,10,11,12}}}(i)} \right). $$
    (8)
  5. V.

    Difference between the maximum and minimum height of the left shoulder:

    $$ {F_5} = \mathop{{\max }}\limits_{{i = 1}}^k \left( {{z_1}(i)} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {{z_1}(i)} \right). $$
    (9)
  6. VI.

    Difference between the maximum and minimum height of the right shoulder:

    $$ {F_6} = \mathop{{\max }}\limits_{{i = 1}}^k \left( {{z_4}(i)} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {{z_4}(i)} \right). $$
    (10)
  7. VII.

    Quotient between 1) the difference between the maximum and minimum height of the left ankle and 2) the maximum and minimum height of the right ankle:

    $$ {F_7} = \frac{{\mathop{{\max }}\limits_{{i = 1}}^k \left( {{z_9}(i)} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {{z_9}(i)} \right)}}{{\mathop{{\max }}\limits_{{i = 1}}^k \left( {{z_{{12}}}(i)} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {{z_{{12}}}(i)} \right)}}. $$
    (11)
  8. VIII.

    Absolute difference between 1) the difference between the maximum and minimum speeds (magnitudes of velocity) of the left ankle and 2) the difference between the maximum and minimum speeds of the right ankle:

    $$ {F_8} = \left| {\left( {\mathop{{\max }}\limits_{{i = 1}}^k \left( {\left\| {{\nu_9}(i)} \right\|} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {\left\| {{\nu_9}(i)} \right\|} \right)} \right) - \left( {\mathop{{\max }}\limits_{{i = 1}}^k \left( {\left\| {{\nu_{{12}}}(i)} \right\|} \right) - \mathop{{\min }}\limits_{{i = 1}}^k \left( {\left\| {{\nu_{{12}}}(i)} \right\|} \right)} \right)} \right| . $$
    (12)
  9. IX.

    Absolute difference between 1) the average distance between the right shoulder and the right elbow (\( \overline {{d_{{4,5}}}} \)) and 2) the average distance between the left shoulder and the right wrist (\( \overline {{d_{{1,6}}}} \)):

    $$ {F_9} = \left| {\overline {{d_{{4,5}}}} - \overline {{d_{{1,6}}}} } \right| . $$
    (13)
  10. X.

    Average speed (magnitude of velocity) of the right wrist:

    $$ {F_{{10}}} = \frac{{\sum\limits_{{i = 1}}^k {\left( {\left\| {{\nu_6}(i)} \right\|} \right)} }}{k} . $$
    (14)
  11. XI.

    Frequency of the angle of the right elbow passing the average angle of the right elbow:

    $$ {F_{{11}}} = \frac{{\sum\limits_{{i = 1}}^k {\# \left( {\left( {{\varphi_{{5,4,5,6}}}(i) < {{\overline \varphi }_{{5,4,5,6}}}} \right)\& \left( {{\varphi_{{5,4,5,6}}}\left( {i + 1} \right) \geqslant {{\overline \varphi }_{{5,4,5,6}}}} \right)} \right)} }}{k} , $$
    (15)

    where #(condition) denotes the number of samples for which the condition holds, and \( {\overline \varphi_{{5,4,5,6}}} \) is defined as:

    $$ {\overline \varphi_{{5,4,5,6}}} = \frac{{\sum\limits_{{i = 1}}^k {{\varphi_{{5,4,5,6}}}(i)} }}{k} $$
    (16)
  12. XII.

    Average angle between 1) the vector between the right shoulder and the right hip and 2) the vector between the right shoulder and the right wrist:

    $$ {F_{{12}}} = \frac{{\sum\limits_{{i = 1}}^k {{\varphi_{{4,10,4,6}}}(i)} }}{k} . $$
    (17)
  13. XIII.

    Difference between the average height of the right shoulder and the average height of the left shoulder:

    $$ {F_{{13}}} = \frac{{\sum\limits_{{i = 1}}^k {{z_4}(i)} }}{k} - \frac{{\sum\limits_{{i = 1}}^k {{z_1}(i)} }}{k} . $$
    (18)

The features for the identification of the chosen four health problems were designed after discussions with a medical expert. They are intended to capture the following characteristics of the observed health problems [7, 10, 20]:

  • Parkinson’s disease: The tremor, which is not seen in otherwise similar disorders, can be used to help make a firm diagnosis, even when the other signs are absent. The features F 10 and F 11 are used to capture the tremor. Rigidity is defined as a resistance (increased muscle tone) to passive movement; it is affecting mostly the neck, the torso and the knees (F 4).

  • Hemiplegia: The affected leg is swung in a semi-circle from the hip with the pelvis tilted upward. The knee is hyperextended due to inappropriate quadriceps activity (F 3 and F 4). The stiff knee inhibits the advancement of the leg and deprives the patient of shock-absorbing knee flexion during weight acceptance. The outer part of the foot slides on the floor (F 7). The arm is held flexed and close to the torso with minimum swing (F 1, F 2, F 9 and F 12). The compensatory movements of a hemiplegic patient include a decrease in walking velocity with a shorter duration of stance, decreased weight bearing, and an increased swing time for the affected leg. The unaffected leg has an increased stance time (F 8).

  • Pain in the leg: A person with such a problem steps slowly on the affected leg, leans the torso laterally to the side of the affected leg, trying not to put too much weight on it, steps quickly on the unaffected leg and moves the torso back to the vertical position (F 8). Leaning the torso from the vertical position means that one shoulder is lowered in comparison to the other, and later returned to the normal position (F 5, F 6 and F 13). In addition, the knee is bent when stepping on the affected leg (F 3 and F 4).

  • Pain in the back—similar to pain in the leg—causes a lateral deviation of the torso, but the deviation is largely constant (F 5, F 6 and F 13). In order to minimize the pain, the affected person usually supports his/her back with the arm(s) (F 2). There is also a similarity to hemiplegia and pain in the back in the inequality of the steps; however, the inequality is not as pronounced as in walking with a pain in the back (F 8).

The final set of thirteen features is used for modeling using the machine-learning methods, which is presented in the following section.

4 Modeling target health problems using machine learning

To construct a predictive model that can be subsequently used to automatically recognize health states in subjects yet to be observed (the first objective from the Introduction), supervised learning methods from the field of machine learning (a sub-field of artificial intelligence), were employed. In supervised learning, a training data set of already-labeled recordings (i.e., classified into one of the five target classes) is used to construct a model, which is later used to predict the class for the new recording.

The task of the presented study was therefore to classify the recordings of walking into five classes: four with selected health problems (classes hemiplegia, Parkinson's disease, pain in the leg, and pain in the back) and one without a problem (normal).

4.1 Data acquisition

Data for the evaluation of the proposed approach was collected by recording the walking patterns of 9 test subjects, of which 4 had hemiplegia (3 subjects had right and 1 had left hemiplegia) and 5 subjects who were healthy (each subject was recorded 4–5 times). Due to the unavailability of test subjects with the actual target health problems, some of the data were acquired artificially under the supervision of an expert physician. These data were captured by recording healthy test subjects who were imitating particular health problems by following the physician’s instructions. The final data set of 141 recordings consisted of:

  • 25 recordings of normal walking,

  • 45 recordings of walking with hemiplegia,

  • 25 recordings of walking with Parkinson’s disease,

  • 25 recordings of walking with a limp due to a pain in the leg,

  • 21 recordings of walking with a limp due to a pain in the back.

The recordings consisted of the position coordinates for the 12 tags worn on the shoulders, the elbows, the wrists, the hips, the knees and the ankles, as presented in Fig. 1, sampled with 60 Hz. The tag coordinates were acquired with a Smart IR motion-capture system consisting of six IR cameras, tags on the body, and computer that captures the positions with a 0.5-mm standard deviation of noise.

For each subject, the locations of the tags were recorded in a session that lasted 5–8 s, from which a vector of 13 features (see Sect. 3.2) was computed. These learning examples were labeled with the type of the represented health state, yielding the final data on which the classifier was trained.

4.2 Classification models

For the testing, the following machine-learning algorithms were used and compared:

  • support vector machines (SVM) [5]: SVM implementation in Weka, which uses a radial basis function kernel with the parameter \( \gamma = \frac{1}{{number\;of\;attributes}} \) that defines the shape of the radial basis kernel which is defined as \( k{\left( {u,v} \right)} = e^{{ - \gamma .{\left\| {u - v} \right\|}^{2} }} \), the parameter C = 1 as a cost parameter of the model, that controls the trade-off between allowing training errors and forcing rigid margins, and the precision parameter ε = 0.1,

  • decision tree (DT) [4]: recursive partitioning trees with information gain as the node- splitting criterion. Each leaf is assigned a majority class of the examples in the leaf,

  • k-nearest neighbors (KNN): locality-based, lazy-learning algorithm which uses the five nearest neighbors for the classification of a new example,

  • random forest (RF) [3]: the number of trees in the forest was set to 10,

  • naïve Bayes (NB) classifier: standard probabilistic classifier, no special parameters,

  • neural network (NN) [18]: three-layered perceptron with 9 neurons in the hidden layer and the backpropagation learning rule using an adaptive gradient descent,

  • majority class (MC): predictor, used only for comparative-evaluation purposes. It defines the lowest acceptable classification accuracy, which can be trivially achieved by classifying all the examples into the majority class, i.e., the class with the largest number of training examples.

5 Experiments and results

Focus of the experimental work was on analyzing the classification accuracies of various models, built using the machine-learning methods. The experimental classification accuracies were obtained using a stratified 10-fold cross validation. The machine-learning algorithms implemented in Weka [28] were used.

The classification accuracies of the selected and tested classifiers are shown in Table 1.

Table 1 Classification accuracies (in [%]) of the tested classifiers

The results show that the k-nearest neighbors and the neural network algorithms both recognized the health states of all the recordings correctly (i.e., they classified all the examples correctly) as is evident from the classification accuracies of 100% from Table 1. The decision tree achieved the lowest classification accuracy of the first six classifiers with 90.1%; however, considerably higher than the baseline majority class classifier with 31.9%.

Table 2 shows the confusion matrices, i.e., how many examples of a certain true class (in rows) are classified in one of five possible classes (in columns). Again, it is evident that the k-nearest neighbors and the neural network algorithms correctly classified all the examples and that the decision tree performed the worst of the six classifiers. It is also noticeable that classifying the pain in the back was the easiest task for all used classifiers, since each of them classified all 21 examples of the pain in the back correctly.

Table 2 Confusion matrices of the tested classifiers, where H = hemiplegia, L = pain in the leg, N = normal (healthy) subject, P = Parkinson’s disease and B = Pain in the back. The numbers denote the quantities of the classified examples

For the real-world cases, the confusion matrices from Table 2 can be used for three purposes; an estimation of the:

  1. A.

    False positives (false alarms): When the system would report a false alarm in the real world, e.g., classifying the normal walking as a health problem, an ambulance would erroneously drive to pick up the elderly person, which would result in unnecessary costs. The following false positives were observed in the experiments:

    • ○ normal walking was classified as hemiplegia in 2 out of 25 examples using the decision tree and in 1 out of 25 examples using the naïve Bayes.

  2. B.

    False negatives: False negatives could mean a potentially risky situation for the elderly person, as his/her health problem would not be recognized automatically. The experiments resulted in the following false negatives:

    • ○ hemiplegia was classified as normal walking using the decision tree and using the naïve Bayes classifier in 1 out of 45 examples.

  3. C.

    Errors (misclassifications): After locating the misclassifications, additional features can be added to prevent them in the future. The following misclassifications were detected:

    • ○ normal walking was classified as hemiplegia in 2 out of 25 examples using the decision tree and in 1 out of 25 examples using the naïve Bayes,

    • ○ Parkinson’s disease was classified as hemiplegia in 5 out of 25 examples using the decision-tree classifier,

    • ○ pain in the leg was classified as hemiplegia in 2 out of 25 examples using the decision tree,

    • ○ hemiplegia was in 4 out of 45 examples classified as Parkinson’s disease, in 1 out of 45 as normal walking and in 2 out of 45 examples as pain in the leg, using the decision tree and using the naïve Bayes.

The results show that in the proposed approach false positives are rare, i.e., the approach would not result in many unnecessary ambulance expenses. Moreover, since it accurately classified the most of the true health problems, apart from two classifiers making a mistake in 1 out of 45 examples, it represents high confidence and safety for potential use in elderly care.

5.1 Variation of noise

To test the robustness of the approach, Gaussian noise with varying standard deviation (and zero mean) was added to the raw coordinates. The standard deviation of the noise was varied from 0 mm to 50 mm in steps of 5 mm.

As a preprocessing step, a Kalman filter was used to smooth the potentially unrealistic difference between the positions of two consecutive time samples, caused by the addition of Gaussian noise to the captured positions [17]. The classification accuracies of the six tested classifiers, modeled on noisy data, are shown in Table 3.

Table 3 Classification accuracies (in [%]) of the classifiers built on data with added noise. The table cells contain the classification accuracy and the significance level of the two-tailed, paired t-test that was performed between the cross-validation folds of the initial setting (0 mm noise) and the corresponding amount of noise denoted by the table row. Dark shading denotes the experiments in which the classification accuracy changed significantly (α < 0.05)

The results presented in Table 3 and in Fig. 2 show that varying the amount of noise in the range 0–50 mm has a different impact on all the used classifiers. The decision tree is the only classifier for which the performance does not degrade significantly with an increase in the noise; however, its classification accuracy is, in the initial setting, the lowest among all the models used. The results indicate that among the models with an initial classification accuracy of over 99%, the random forest manages to retain its performance longest, while increasing the noise; its classification accuracy does not have statistically significant drop for any level of noise except for the highest, i.e., 50 mm.

Fig. 2
figure 2

Classification accuracies of classifiers built on data with added noise

Even with the largest amount of noise added, the classification accuracy of the tested classifiers was considerably higher than the accuracy of the majority class classifier. Therefore, the models are robust to noise.

5.2 Reduction of the number of tags

Since wearing the full complement of twelve tags may be uncomfortable for the user, ways to reduce the number of tags were investigated in the presented study. For that reason, the experiments were started with all twelve tags and continued with removing one by one in the order that retained the largest number of features for each number of tags from twelve to one. With this procedure the best placement for each number of tags was defined. The classification accuracies for the chosen tag placements without the addition of noise are shown in Table 4 and illustrated in Fig. 3. The best tag placement for each number of tags for the recognition of specific health problems is illustrated in Fig. 4.

Table 4 Classification accuracies achieved with the best tag placements for each number of tags. The table cells contain the classification accuracy and the significance level of the two-tailed, paired t-test that was performed between the initial setting (12 tags) and the corresponding number of tags. Dark shading denotes the experiments in which the classification accuracy decreased significantly (α < 0.05)
Fig. 3
figure 3

Classification accuracies achieved with the best tag placements for each number of tags

Fig. 4
figure 4

Best tag placement for each number of tags for the recognition of the target health problems

The results show that decreasing the number of tags results in decrease of the classifiers' performance. From Table 4 it is observable that six is the smallest number of tags for which the performance of all the tested classifiers is insignificantly different compared to the initial setting. Figure 3, similarly to Table 4, illustrates a trend of decreasing classification accuracy for less than six tags. Figure 4 illustrates that for recognizing the target health states, the positions of the tags from the most to the least important are: shoulder, wrist, elbow, hip, ankle and knee.

5.3 Dependence of the classification accuracy on the tag placement and noise level

In addition to experimenting with number of tags, for every tag placement the standard deviation of the noise was varied from 0 to 50 mm. Figure 5 shows the dependence of the classification accuracy (CA) on the number of tags and the noise level. In each subfigure (a–f), a variation of the noise in the standard deviation from 0 to 50 mm on the horizontal axis and the best tag placement for each number of tags from 12 to 1 tag on the vertical axis can be observed. Each curve of different shape (e.g., doted, dashed) connects points of the particular classification accuracy, which is consequently a borderline between the higher and lower accuracy.

Fig. 5
figure 5

Classification accuracy with respect to the number of tags and the noise level for the recognition of the target health problems using the a) support vector machines, b) decision tree, c) k-nearest neighbors, d) naïve Bayes, e) random forest, f) neural network

Analysis of Fig. 5 revealed that overall the support vector machines, k-nearest neighbors, random forest and neural network achieved the best and comparable accuracies, while the decision tree achieved the worst classification accuracies of all the classifiers used. The most accurate four classifiers exceeded 99%, naïve Bayes exceeded 98%, and decision tree exceeded only 90% classification accuracy.

The k-nearest neighbors proved to be the best option for a classification accuracy over 99%, as only 8 tags with up to 15 mm of noise are needed to achieve it. The support vector machines, random forest and neural network are comparable in this characteristic. The classification accuracy of 98% was achieved with the four most accurate classifiers using at least 6–8 tags and up to 20 mm of noise and with the naïve Bayes using at least 10 tags and up to 5 mm of noise. The support vector machines, k-nearest neighbors, random forest and neural network reached classification accuracy of 95% using at least 5–6 tags with up to 25 mm of noise or using approximately 6–8 tags with up to 40 mm of noise. The naïve Bayes needed at least 8 tags and at most 35 mm of noise to achieve the same classification accuracy, and although decision tree achieved it also, it was statistically insignificant. Classification accuracy of 90% was achieved with all the tested classifiers using at least 4–5 tags with up to 30 mm of noise or about 6–8 tags with higher values of noise, except for the decision tree, which could not achieve 90% with noise over 25 mm. It is common to all the used classifiers that in order to surpass an accuracy of 80%, at least 3–4 tags are needed with up to 25 mm of noise or at least 4–6 tags with 25–50 mm.

Although the decision tree and naïve Bayes achieved slightly lower classification accuracies than the other classifiers, they both have an important advantage over the others. The results of their classification are easily interpretable through the Bayes nomogram or the decision-tree visualization, i.e., if in practice the system was to recognize some health state, the physician could check why the system classified observed gait as the particular health state.

6 Explanation of the interpreted health state

Since it is important for the physician to be able to observe the reasons which led the system to classify the movement into the particular health state, a control-panel prototype with the explanation of the interpreted health state was developed (shown in Fig. 6).

Fig. 6
figure 6

Control-panel prototype for the explanation of the interpreted health state

In the middle of the prototype screen there is a visualization of the kinematic model of an elderly person moving through the room. In the upper-right-hand corner, controls for saving and loading the captured movement are placed. The time series of the calculated angles are located underneath. In the lower-left-hand corner the time series of the x, y and z positions are shown for all 12 tags. When a health problem is recognized, the red alarm sign appears in the upper-left-hand corner of the screen with a description of the recognized health problem.

7 Conclusion

A system for the automatic recognition of health problems that are manifested in the gait of older adults was proposed in this study. Time series of the detected positions of the body parts from the motion-capture system were transformed into a form suitable for supervised machine-learning methods using novel semantic features for the recognition of health problems.

Despite the large amount of research devoted to the related fields of activity recognition and fall detection, the automatic health-problem recognition from the movement patterns is not an extensively researched field. This study took up the challenge to contribute to the latter and it achieved successful results.

The research work had two objectives. The first objective was to discover to what extent the automatic recognition of health problems with a motion-capture system is feasible.

The results show that the k-nearest neighbors and neural network algorithms performed the best of all the used classifiers in the initial setting (no noise, all tags), with the classification accuracies of 100%. The lowest classification accuracy of all the used classifiers was 90.1% for the decision tree, which is considerably higher than the 31.9% of the reference majority class. False positives/negatives were rare, thus the approach would not result in significant amount of unnecessary ambulance expenses or undetected health problems in the implementation in practice. Therefore, the presented approach shows that automatic recognition of health problems with a motion-capture system is not only feasible, but also it would provide high confidence and safety in the practical use in elderly care.

The second objective was to investigate the classification accuracy achievable using various machine-learning approaches and the numbers/placements of tags on the user’s body and various amounts of noise in the tag coordinates. The results of varying the amount of noise show that the decision tree is the only classifier for which performance does not degrade significantly with the increase of noise; however, its classification accuracy is in the initial setting the lowest among all the models used. Among the models with an initial classification accuracy of over 99%, the random forest manages to retain its performance longest while increasing the data noise. The proposed approach is therefore robust to the addition of noise.

The results of decreasing the number of tags show that it causes a decrease of the classifiers' performance. Six is the smallest number of tags for which the performance of all the tested classifiers is insignificantly different compared to the initial setting with all tags. For recognizing the target health states, the positions of the tags from the most to the least appropriate are: shoulder, wrist, elbow, hip, ankle and knee.

The results of an investigation of the dependence of the classification accuracy on the tag placement and noise level show that support vector machines, k-nearest neighbors, random forest and neural network proved to be the best option for a classification accuracy over 99%, as only 8 tags with up to 20 mm noise are needed to achieve it. It is common to all the used classifiers that for surpassing an accuracy of 80% at least 3-4 tags are needed with up to 25 mm noise and at least 4-6 tags with 25-50 mm. Although the decision-tree and naïve Bayes achieved slightly lower classification accuracies than the other classifiers, they both have an important advantage in comparison to the others which is the interpretability of their classifications by physicians.

Since it is very important for the physician to obtain an explanation for the interpreted health state, a control-panel prototype for the explanation of the interpreted health state was developed. When a health problem is recognized, the alarm sign appears with the necessary description of the recognized health problem.

Since the system is embedded in the domestic environment of the elderly person, it uses an artificial intelligence approach to interpret the health state and provides a natural explanation of the hypothesis; it is an example of the semantic ambient media for ambient-assisted living.

In future work, additional features to improve the distinguishing between classes, where the confusion matrices showed misclassifications, can be added. The present research was intended to study the feasibility of using machine-learning methods for detection of gait-related health problems. According to the acquired results, the study is intended to be expanded with the inclusion of more patients. Use of a wearable (e.g., inertial) motion-capture system instead of an infra-red one would allow the recognition of the health problems of the elderly even outside their homes. This would be possible with a small adaptation of the presented algorithms. A modification of the proposed recognition system could also be used for an automatic evaluation of the rehabilitation process (e.g., after a stroke) at home.

The important problems of rapid aging of a population could be mitigated if the approach proposed in this paper or in related work were to be transferred into practice. It is realistic to believe that this will eventually happen since supporting such research is an important objective of the Ambient Assisted Living joint research-and-development funding programme of the European Union.