Introduction

The proportion of people aged 60 and over is growing faster than any other age group worldwide. The older adult population will be approximately 1.2 billion in 2025 and 2 billion by 2050, with 80% of them living in developing countries (WHO 2002). This increase has proved to be a major challenge to the healthcare system, since ageing is associated with a decline in physical activity, affecting physical and mental health. Strategies are needed to enable older people to continue their daily living, prevent diseases or support in rehabilitation, which is costly to individuals and the healthcare system (WHO 2002).

Over the last few years, a wide variety of wearable devices (or just “wearables” for short) have appeared in the market promising to enhance the quality of life (Seneviratne et al. 2017). Wearables can sense and collect physiological data and are therefore used to provide services such as physical and mental health monitoring (Seneviratne et al. 2017). Specific applications of wearables such as activity recognition (Lara and Labrador 2013) and assisted living (Baig et al. 2013) are part of more recent investigation. The combination of advanced sensing, smart algorithms and medical benefits results in advanced healthcare services, which aim to support elderly people.

Automatic recognition of physical activities – commonly referred as human activity recognition (HAR) – has been a research topic for decades and there are still many issues that motivate further development (Lara and Labrador 2013). HAR aims to recognize the physical activities of a person based on sensor and/or video data. It is therefore a system that provides information about user behavior and that may prevent situations of risk or predict events that might happen.

The HAR topic offers several degrees of freedom when it comes to system design and implementation. First of all, there is no common definition or description of human activities that explains how a specific activity is characterized. The second aspect that goes along with this is that human activity is highly diverse and the recognition of specific activities therefore requires an accurate selection of sensors and their placement (Bulling et al. 2014). Some of the mayor challenges are therefore the selection of sensor measurements and the collection of data under realistic conditions (Lara and Labrador 2013).

The HAR problem is not feasible to be solved deterministically since the number of combinations of sensor measurements and activities can be very large (Lara and Labrador 2013). Therefore, machine learning algorithms are widely used for the development of HAR systems in order to recognize patterns of human activity in the sensor data.

In general, machine learning algorithms are applied to gain knowledge from data through the use of statistical models that aim to find relations and patterns between the variables or features of the data. Similar to other machine learning methods, HAR requires a training and a test stage. During the training stage a model is developed based on a training dataset, while the test stage serves to test (also called evaluate) the performance of the model. The performance of a model on a test dataset is an indicator of how well a model might perform in the future on previously not seen data (Kuhn and Johnson 2013).

The HAR problem, in the context of machine learning, can be regarded as a classification task, where the task of the model is to relate each input data point (e.g., sensor measurements) to a category out of a predefined number of categories (e.g., human activities). Classification is a supervised task, meaning that the models learn in the training stage by associating the input data to its corresponding ground truth label, that indicates which activity was taking place at the moment that the data was recorded (James et al. 2013).

While the HAR problem seeks to find a perfect segmentation of activities, this task is difficult to achieve since humans perform activities in a fluent way and the segmentation of time series into intervals that represent single activities is not straightforward (Bulling et al. 2014). Even semantically, it is difficult to define where one activity ends and where the next begins. Therefore, most of the work in HAR is dedicated to dividing the measurements into fixed intervals and to recognize the activity that took place within each interval. This is known as the relaxed HAR problem and an example is shown in Fig. 1 (Bulling et al. 2014).

Fig. 1
figure 1

The HAR problem and the relaxed HAR problem. Since it is difficult to split the measurements into intervals representing a single activity, the relaxed HAR problem splits the measurements into fixed intervals. This relaxation leads to a small error, since the fixed segmentation might create intervals in which a person was transitioning from one activity to another

Most research in the HAR context follows a specific sequence of processing principles to interpret human activities from sensor data, called activity recognition chain (ARC) (Roggen et al. 2010). An ARC is a sequence of signal processing, pattern recognition and machine learning techniques that are implemented for a specific activity recognition task (Bulling et al. 2014).

A HAR system based on an ARC can then be used to support elderly or diseased people by detecting their activity patterns and intervening in case of changes in behaviour or a critical event (e.g. by sending recommendations or information to the person affected). Most of all, a HAR system can enable those people to have a more independent life.

The design of such a system depends on the activities to be recognized. Since the work described in this paper aims at designing a system that supports elderly and diseased people in their daily living, the design of the HAR system focuses on daily activities, especially those to which clinical relevance in attributed. The main aspect of this paper is the development of a methodology for capturing data with elderly and diseased people within a hospital under realistic conditions, which is a novel trial compared to existing data recording procedures to create HAR datasets. Section 2 presents the main design issues regarding the data acquisition methodology. Section 3 describes the performed data acquisition trials. Section 4 introduces the design of an ARC. Finally, Section 5 gives an overview of some of the remaining problems and the results achieved so far. Note that the specific medical aspects of the patients are not included in this paper.

Design of a Methodology

The data collection for HAR is a challenging and time consuming task that involves general challenges on data collection, synchronization and multi-modal sensor arrangements (Calatroni et al. 2011). Some challenges are common pattern recognition problems and are associated with the robustness of the system like intraclass variability (i.e., people perform the same activity in different ways), interclass similarity (i.e., some activities can be similar between each other, e.g., drinking and taking a pill), and the NULL class problem (i.e., most of the time a person is not performing target activities) (Bulling et al. 2014).

The NULL class problem and the interclass similarity are coped by collecting large datasets including many samples from the target activities. Consequently, the learning stage of the models increases allowing to discover more complex patterns that effectively enables to differentiate between similar classes and NULL class activities (Vargas Toro 2018). To cope with intraclass variability, data from several subjects needs to be collected, such that the variability of single activities can be learned by the model.

Moreover, the ground truth labeling (assigning a ground truth label to each time stamp of the data stream, which is also called annotation for short) is a highly time-consuming task. It is mostly done by first recording the data collection session on video and then labeling the performed activities based on these recordings.

Before starting to collect data in accordance with our own requirements, we have conducted a review of various public datasets and selected one for a proof of concept and to use it as guideline for designing our data collection methodology respectively.

Review of Public Datasets

Over the past few years several research groups have attempted to establish benchmark datasets for the HAR task by simulating real life scenarios, variabilities, and activities. These datasets serve as a standard point of reference for machine learning algorithms applied in HAR.

Datasets for HAR are typically recorded in controlled environments, where persons wear body sensors and perform a series of scripted activities (Vargas Toro 2018). Some groups also included sensors that are attached to objects, as well as ambient sensors. The following list contains the considered HAR datasets and gives a brief description (see (Vargas Toro 2018) for further information):

  • The DaLiAc and WARD datasets contain a collection of 13 locomotion related activities using body-worn sensors ((Leutheuser et al. 2013), (Yang et al. 2009)).

  • The OPPORTUNITY dataset has a more limited set of locomotion activities but includes hand gestures and high-level activities in a setting with both body-worn and object sensors, simulating a scenario of activities of daily life (Chavarriaga et al. 2013).

  • The PAMAP dataset focuses on locomotion activities using body-worn sensors and includes a heart rate monitor in the measurements to address the physical intensity of an activity (Reiss and Stricker 2012).

  • The HAR dataset focuses on recognizing some physical activities by measuring data from only one smartphone (Anguita et al. 2013).

  • The REALDISP dataset attempts to simulate some of the variability that may occur in the day to day usage of sensors by inducing in some measurements a degree of displacement of body-worn sensors. It focuses on recognizing physical activities (Baños et al. 2012).

  • The HHAR (Heterogeneity HAR) study analysed various heterogeneities in motion sensor-based sensing (i.e., sensor biases, sampling rate heterogeneity and sampling rate instability) and their impact on HAR by sensing a set of activities with 13 different smartphones (Stisen et al. 2015).

  • The AReM dataset measures the Received Signal Strength (RSS) between body-worn sensors in an experiment focused on recognizing physical activities (Palumbo et al. 2016).

We have selected the Opportunity dataset (Chavarriaga et al. 2013) among the publicly available HAR datasets to conduct a proof of concept of a HAR system (Section 4). This public dataset was chosen because of the amount and location of wearable sensors, difference of types and complexity of the target activities, relevance of the target activities in a real life setting, and quantity of participants in the data collection sessions (Vargas Toro 2018). Furthermore, both the data collection process and the ground truth labeling were carried out in detail by recording each session on video, which was later used for annotation with a special software (Chavarriaga et al. 2013).

However, the Opportunity dataset was measured with healthy and young persons only. Thus, we propose a data acquisition methodology for creating a specific HAR dataset containing daily activities of elderly people and patients. This newly obtained database will be the basis for the development of an ARC that aims to support this group in their daily living.

Conception of Data Collection

Unlike other attempts for building such datasets (e.g. as proposed by (Roggen et al. 2010)), in this paper the target test group comprises of elderly people and current patients. This approach is challenging, since healthy people move much faster and safer compared to elderly or diseased people. Furthermore, the type of disease also affects the movements. This approach will therefore increase the chance to generate data for classifiers, which are compatible for patient analysis in the future. Another challenging aspect during data recording is that the endurance of the participants is not as high as it is with healthy persons.

The design of the data collection aims at covering realistic scenarios as well as clinically relevant activities. Hence, we have distinguished three main aspects pertaining to the collection of a HAR dataset, namely, (1) Definition of realistic scenarios, (2) Selection of activities of special interest, and (3) Selection of sensors and equipment. The main aspects and solutions are described next.

Definition of Realistic Scenarios

According to (Roggen et al. 2010), we have decomposed human activities into four hierarchical levels:

  1. (1)

    High-level activities (e.g. preparing breakfast, relaxing, cleaning up, etc.)

  2. (2)

    Mid-level activities (e.g. slicing bread, open drawer, etc.)

  3. (3)

    Low-level activities (e.g. moving bread, reach glass, etc.)

  4. (4)

    Modes of locomotion (e.g. walking, standing, sitting, etc.)

There are two categories of target activities that we have been interested in: activities of daily life (ADL) and modes of locomotion. Ni et al. 2015define ADL as the self-care and domestic activities that a person performs in a daily living e.g., feeding oneself, bathing, dressing, grooming work, homemaking, and leisure. These activities are typically the first ones that require outside support and it has been found that there is a progressive functional loss on them, with hygiene being an early-loss activity (i.e., this is one of the first ADL where it is likely that a person needs help from others), toilet-use a mid-loss activity, and eating a late-loss activity (Morris et al. 2013). The measurement of ADLs allows conclusions to be drawn about the physical and cognitive status, provides information about frailty, and allows predictions about the risk of falling ((Nourhashemi et al. 2001), (Hellström et al. 2013), (Tinetti et al. 1994)). Frailty is defined as a syndrome of physiological decline in late life, characterized by increased vulnerability to adverse health outcomes and reduced ability to adapt to stressors. Procedural complications, falls, institutionalization, disability, and death are often associated with frailty (Clegg et al. 2013). Monitoring ADLs would allow to estimate the independency in their daily living of older adults, as well as to recommend activities that might improve a patient’s health status.

Additionally, modes of locomotion like walking, sitting and lying can be used as reference for the physical activity level of an individual. Furthermore, they can be useful for detecting hazardous situations such as falling (Ni et al. 2015). Falls are the second most frequent cause of fatal accidents worldwide (estimated 646.000 falls per year), with elderly people (60+) suffering most fatal falls. Though not fatal, approximately 37.3 million falls, severe enough to require medical attention, occur each year. Such falls are responsible for over 17 million lost DALYs (disability-adjusted life years). In addition, elderly people with a disability due to a fall are exposed to a considerable enhanced risk of needing long-term care and institutionalization (WHO 2018).

Hence, we have designed the data acquisition methodology in a way that it incorporates some typical daily activities, modes of locomotion and object interactions that may happen during the daily routines. This includes the usual locomotion in a room, morning activities, hygienic activities, taking medicine, drinking and eating as well as some leisure activities. Two types of sessions have been designed to cover both a realistic daily routine and the need for a large database for each activity. The first one, called ADL session, covers a short version of a realistic daily routine and is divided into the following sequence of phases:

  • Bed phase: The person starts the session by lying down in bed. It follows a short simulation of sleeping (taking on different sleeping positions), before becoming active again as usually done during the wake-up phase. The person sits up in bed and interacts with a smartphone (simulating a phone call and the writing of a message). Then the person moves to sit on the edge of the bed, leaves the bed and puts on some pants.

  • Bath phase: The person leaves the bedroom and enters the bathroom. In the bathroom, the test subject executes a set of hygienic activities. These activities include washing hands, brushing teeth, using a hair comb, and taking a seat on the toilet.

  • Table phase: The person leaves the bathroom again and moves towards the table in order to take a seat. This requires moving the chair appropriately. The person then sits down and drinks from a glass of water. Following, the person takes a pencil and some paper to write down some notes.

  • Door phase: While the person is still sitting at the table, someone knocks at the door. The person stands up and opens the door. First, the person receives a tray with food, carries the tray to the table and then returns to the door. There, the person receives a plate with cookies and closes the door. The person carries the plate to a cupboard and returns to the table to take a seat again.

  • Table phase: Back at the table, the person first eats with cutlery, then takes a pill and finishes with drinking from a glass of water. All of these activities should be carried out in a natural way, with some short breaks during the transitions.

  • Cupboard phase: The person takes the glass, leaves the table and goes towards the cupboard. When standing next to the cupboard, the session will be closed by first eating one of the cookies, and then drinking from the glass of water.

We call the sequence of these phases and associated activities “activity protocol” that can be read to the participants to guide them through the ADL session.

Selection of Activities of Special Interest

Complementary to ADL sessions - which follow a natural pattern - activities of special interest (ASI) were performed in ASI sessions in a repeated pattern with small variations. These ASI sessions aim to generate more training data for a selection of activities in a short time sequence. From a clinical perspective the following activities were deemed as most important, which are explained in the following:

  1. (1)

    Changing positions in bed

  2. (2)

    Getting out of the bed

  3. (3)

    Using the toilet

  4. (4)

    Drinking

  5. (5)

    Eating

Micro-mobility (movements in bed) and nutrition status are important indicators to estimate the risk profile for decubitus, especially in elder patients. These patients constitute the single largest group (more than 60%) among all patients with decubitus ulcers (Anders et al. 2010) and their ability to change the position in bed is essential to avoid it (Harris 1996). For timely prevention it is necessary to monitor mobility and nutrition intake and detect any change indicating a risk increase for decubitus. Additionally to the discomfort for the patient and the enhanced risk to develop an infection, the treatment is also care intensive, time consuming, and therefore an important economic factor (Anders et al. 2010).

Getting out of the bed unsupervised may lead to falls being a major problem in hospitals and contribute to substantial healthcare burden, e. g., a significant injury or an increased length of stay ((Oliver et al. 2004), (Oliver et al. 2010)). To prevent falls in patients, who need support when getting up, it is necessary to detect the activity before the patient leaves the bed. This is an important part in a fall prevention program (Oliver et al. 2010).

The activity “Using the toilet” is clinically important, because incontinence is one of the most common symptoms in neurological rehabilitation and has considerable physical, psychological and social consequences, which can significantly impair the quality of life of affected patients (Irwin et al. 2006). Other impediments for independent toilet use may be motoric impairments, apraxia, or dementia. Independent toilet use is often an important therapy goal that is essential for an increased privacy of the patients and a self-determined way of life. Moreover, in the hospitals a vast amount of caregiver time is used to support the patient to use the toilet.

A sufficient hydration is necessary to maintain intra- and extracellular volume homeostasis to avoid cognitive impairment, confusion, reduced concentration and irritability (Popkin et al. 2010). Especially elder neurological patients often show decreased thirst, which leads to dehydration and exsiccosis (Lauster and Mertl-Rötzer 2014). Therefore, it is important to monitor the drinking habits of persons with enhanced risk of dehydration (Shells and Morrell-Scott 2018).

Malnutrition in elderly patients undergoing rehabilitation is a prevalent and often neglected problem associated with lower rehabilitation effect and lower physical function (Pirlich et al. 2006). In addition to monitoring the patient’s weight, it is important to recognize changes in dietary habits for timely intervention, i.e. before weight and muscle loss occur. Therefore, the nutrition intake was also selected as activity of special interest.

The ASI session contains the described activities of special interest, which are repeated several times to generate more training data. It is divided in the following sequence of phases:

  • Bath phase: In the bathroom, the person first simulates the toilet use. After washing and drying hands, the person also drinks from a glass of water.

  • Table phase: In the bedroom, the person sits down at the table to drink water again and to eat with cutlery. Then, the person takes a pill and drinks again.

  • Bed phase: The person sits down on the bed and then lies down. While lying down, the person moves from supine position once to the right side and once to the left side and back.

Selection of Sensors and Equipment

Two types of sensors find application in the HAR context: (1) wearables and (2) ambient sensors.

Wearables

In order to record the activity data of a person, several wearables were used. Since acceleration information is often the most promising input for human motion detection (Roggen et al. 2010), we selected various devices that include accelerometer.

The activPAL micro (PAL Technologies Ltd. 2019) is a small and slim activity monitor that includes a 3-axis accelerometer (see Table 1). It is used quite often in clinical, older aged residents and hospital trials (Chan et al. 2017). The activPAL software allows setting the start time for recording, while the device is plugged in the docking station. If no stop time is defined, the recording stops when either the memory is full, the battery runs out or the device is plugged in the docking station again. The activPAL monitor is attached by covering it with a finger cot to prevent direct skin contact and then pasting it on the skin with medical patches. The correct orientation has to be taken into account, which is provided by the manufacturer.

Table 1 Overview of data acquisition devices

Additionally, it was intended to record electromyographic (EMG) data as well, which provides the possibility to investigate the potential use of EMG wearables for this kind of human activity recognition. The Myo armband (see Table 1) includes eight EMG sensors and a 9-axis inertial measurement unit (IMU) that consists of a 3-axis gyroscope, a 3-axis accelerometer and a 3-axis magnetometer. This device is supposed to be placed on the forearm so that the EMG sensors can measure electrical activity from those muscles to detect hand gestures whereas the IMU collects data about the arm movement. Before starting to record data with the Myo armband, it has to be calibrated by the user in order to adjust the sensors to the respective muscle constitution. This is done by a simple hand gesture to which the Myo armband provides vibration feedback. The Myo armband uses Bluetooth Low Energy technology via a Bluetooth adapter to communicate with other devices. This interface can also be used to store the raw data of a Myo armband using a terminal program on a computer. Since only one Bluetooth connection can be established on a device, two computers are necessary for capturing the data of two Myo armbands.

A prototype of the SmartCardia wearable (see Table 1) was used additionally to record acceleration data of the upper body. This device is primarily designed for continuously measuring physiological and vital parameters, including the Electrocardiogram (ECG), heart rate, pulse rate and others (SmartCardia SA, 2019). It is worn like a patch on the chest.

Ambient Sensors

Additionally, two types of ambient sensors were selected, which are a Body Pressure Measurement System (BPMS) and cameras (only necessary for subsequent data annotation).

The BPMS from Tekscan (Tekscan, Inc. Tekscan 2016) was used to gather data for the analysis of the transitions between lying to sitting or sitting to standing. The BPMS provides a pressure distribution image of a person lying in the bed, with a resolution of 34 × 52 sensors per sensor layer, which has a dimension of 940 × 640 mm2. In order to cover the size of a full bed, three sensor layers are placed in a cloth cover next to each other (see Fig. 3), which provide a total sensor surface of 940 × 1920 mm2 with 5304 pressure sensors. Due to the pressure density of 0.3 sensels/cm2, the pressure mattress provides a pressure distribution picture, as visible in Fig. 2.

Fig. 2
figure 2

Pressure distribution picture captured from the BPMS from Tekscan

In order to ensure the recording of all activities for the later annotation, we have decided to use wall mounted stationary cameras. The “Akaso Action Cam EK7000” (see Table 1) was used, since this camera provides an ultra-wide angle objective of 170°, as well as high resolution record ranging from 1280 × 720 pixel up to a 4096 × 2160 pixel. Furthermore, a hand camera (see Table 1) was used to record a close-up of the test subject.

Data Acquisition

We have launched our data acquisition methodology in two stages, which are, (1) Data acquisition with healthy people, and (2) Data acquisition with elderly people and patients. The first stage was additionally divided into a first trial in the laboratory environment and a second trial in the hospital environment. The second stage with elderly people and patients was performed in the previously examined hospital.

Due to the use of various sensors and other technical equipment, we have split the data collection procedure into three phases. The preparation phase was used to setup the required equipment at the beginning of a session, whereas the postprocessing phase served for data storage and resetting the equipment at the end. In between of these phases the actual execution phase was situated. During this phase, the activity protocol was followed by the participant while the data from wearables and ambient sensors were collected either on the sensor devices themselves or via wireless communication.

Data Acquisition with Healthy People

In this first stage, data from a total of five healthy persons were recorded. There were recorded data of four persons in the laboratory environment and of one additional participant in the hospital environment. Their age was in the range of 24 to 55 years.

Laboratory Environment

For the first trial, a test apartment (approx. 30 m2) at the Technical University Munich (TUM) was used (see Fig. 3). The test apartment resembles a hospital room, consisting of a bedroom, where a bed and a table are placed along with a separated bathroom. We have used four wall mounted stationary cameras (see Fig. 4) in order to cover both rooms and avoid blind spots.

Fig. 3
figure 3

The test apartment overview. Left and middle: the bedroom with bed, table and chairs. Right: the bathroom

Fig. 4
figure 4

Positions of the stationary cameras in the test apartment (~ 30 m2)

Preparation of Sensors and Equipment

We used nine identical activPAL monitors, two Myo armbands and one SmartCardia wearable to record motion information in separate body partitions since the sensor position has a significant effect on the signal quality for distinct types of movement (e.g. walking or drinking). After setting up the wearables, they were attached to the participant according to Fig. 5.

Fig. 5
figure 5

Overview of wearable placement on the participants using location codes

Sensor Synchronization

Three types of sensors were used, each with a different sampling frequency: Myo armbands (50 Hz), SmartCardia wearable (31.25 Hz) and activPAL monitors (20 Hz). After collecting the sensor data, a downsampling procedure was performed to set all sensors to the same sampling frequency. More precisely, the recordings of the SmartCardia sensor and Myo armbands were resampled at 20 Hz. The reduction of the sampling frequency (and hence of the available data) for these sensors is not problematic, as Maurer et al. (2006) studied the behavior of the accuracy of models (based on accelerometer sensor data) as a function of the accelerometer sampling rate. In this study, the sampling rate was varied from 1 to 30 Hz and no significant gain in accuracy was found when using sampling frequencies above 20 Hz.

In order to synchronize the relative recording time of the different wearables, we have defined a synchronization gesture, which aims to generate a strong peak in the sensor records, trigged by a simple and clear visible gesture (see Fig. 6). This gesture consisted of a manual patting on the right and the left shoulder, a triple clapping over the head. This gestures generated clearly visible impacts on the data records of all sensors, and thus enabled to syncronise later the time axes of the sensor records with the camera records. In order to be sure that the time syncronisation will be successful, the participants were supposed to perform the syncronisation gesture clearly visible in front of the hand camera. An additional synchronization gesture was performed for the BPMS by pressing both hands into the mattress.

Fig. 6
figure 6

Exemplary synchronization gesture for the arm worn accelerometer. A clear impulse caused by the synchronization gesture can be identified on the raw data and be used to synchronize all sensors together

Technical Aspects

The use of various technical devices comes along with different data storage issues. The video files had to be downloaded via an USB interface from the cameras as well as the data from the activPAL sensors. In contrast, the data recorded by the Myo armbands was already saved on two computers due to the Bluetooth interfaces. The BPMS data was also stored on another separate computer.

In order to accelerate the data storage process, a local server was used to upload all data to one hard drive (see Fig. 7). The server was also purposed as a hotspot and therefore did not need an existing Wi-Fi structure.

Fig. 7
figure 7

Data storage overview of all sensors, where all can be uploaded in parallel via a computer using Wi-Fi. Only the cameras where directly uploaded at the end of the trial

Hospital Environment

The second trial was arranged in order to review the data acquisition procedure in the target environment of a hospital (see Fig. 8).

Fig. 8
figure 8

Hospital environment with the installed sensor equipment from the laboratory test apartment. Left: the bedroom. Right: the bathroom

This trial was performed at the Schön Klinik Bad Aibling Harthausen (SKBA) and aimed to investigate potential issues caused by the changed environment (e.g. no Wi-Fi in the room available, limited tools availability, limited amount of power sockets, different room size etc.). Due to security regulations of the hospital, the Wi-Fi interface was missing, but it was solved by the hotspot generated by the share server.

Results

All in all, data from five different healthy participants were collected in two different laboratory environments at TUM and SKBA. A summary of the number of recorded sessions per person is shown in Table 2.

Table 2 Summary of recorded sessions

Although the recorded activities are part of daily life, the data acquisition of several sessions one after the other is physically demanding. In order to adapt the protocol to the lower resilience of elderly people and patients, we removed the leisure activities (smartphone interaction and writing) from the activity protocol because they have no clinical relevance either.

The first stage of our data acquisition methodology showed that no technical difficulties are to be expected in the hospital environment. It also demonstrated that the most time-consuming task was storing the data. To optimize these procedures, two technical assistants are needed to manage the equipment as well as two (medical) assistants to prepare the participants and to guide them through the session according to the activity protocol. This stage also allowed to improve the activity protocol, such that we removed less relevant activities. Finally, this stage served to fine tune the methodology for the data record with elderly people and patients.

Data Acquisition with Elderly People and Patients

During the final data acquisition, it was planned to collect data from both elderly people and patients. Eligible eldery people were aged 60 and over, while eligible patients were able to walk (with or without support) and had no terminal diseases. Overall, we recruited a group of 15 participants consisting of eight elderly persons and seven neurological patients. A summary of the number of recorded sessions per person is shown in Table 2.

Environmental and Technical Aspects

This trial was performed in a conventional patient room at SKBA. Figure 9 shows the final setting in the hospital. The only change of technical equipment concerned the prototypic SmartCardia wearable, which could not be used due to a missing CE certificate.

Fig. 9
figure 9

Patient room for final trial. Left: the bedroom. Right: the bathroom

Target Group Aspects

In general, elderly persons are considered as vulnerable and therefore conceptual adjustments had to be made to address the different aspects of vulnerability (Schroder-Butterfill and Marianti 2006).

It is required by law to have the data acquisition ethically reviewed to safeguard the patients’ rights and well-being. A positive ethic’s vote (EC number 18084) had to be obtained before the first participant was included in the trial. The activity protocol and other study related documents were reviewed by the Ethics Committee of the Bavarian State Medical Association, Munich. All legal requirements have been fulfilled, i.e. the insurance, qualification of study personnel, certification of sensor equipment, data protection and a positive risk-benefit profile. It was also stated that at no time will there be diagnostic or medical statements or therapy adaptations based on the data acquisition trials or data analysis.

Before participating in the study, all participants signed a written consent and a data protection form, proved by the local ethical commission in accordance with the Declaration of Helsinki and gave permission to the use of their video material for data annotation. The participants where explicitly informed about their rights according to the General Data Protection Regulation (GDPR). All assistants needed for the data acquisition also signed a data privacy statement.

In addition, the physical and cognitive status of each participant was tested. A functional test for the general fitness and frailty (5x sit-to-stand-test (5XSST), (Treacy and Hassett 2018)), the grip force test (Allen and Barnett 2011) and a test to measure the cognitive status of the participants (Montreal Cognitive Assessment (MoCA), (Nasreddine et al. 2005)) were examined. The participants were also interviewed with questionnaires about their medical history, intrinsic motivation and technic affinity.

Due to the previous trials, we could estimate the expected strain during the sessions. Thus, the activity protocol was held flexible to allow the adaptation of procedures to the abilities of the participants (respecting cognitive, mental, and physical limitations). The medical assistants were actively attentive in order to adapt the activities, interrupt or terminate the participation. The wellbeing of the participants was under all circumstances the first priority and was held above scientific interests.

Results

During this trial, eight healthy elderly persons and seven neurological patients participated. The group of healthy elderly people consisted of 5 men and 3 women aged 67 to 86 years with a mean age of 74.4 years (SD 6.5). The values of the 5XSST were in the range from 7.2 to 10.0 s with a mean of 8.6 s (SD 1.1). The hand grip force ranged from 16.6 to 35.7 kg with a mean of 26.0 kg (SD 6.8). The MoCA results were between 24 and 27 points with a mean of 25.1 points (SD 1.0).

The selected group of neurological patients represented different levels of motor and cognitive function: two patients after ischaemic stroke, two with Parkinson’s Disease, one with Guillain-Barré-Syndrome, one with Alzheimer’s Disease and one with Multiple Sclerosis. All patients were in rehabilitation phase B. The group consisted of 5 men and 2 woman aged 30 to 84 years with a mean age of 64.0 years (SD 17.5). The 5XSST could be performed only by 4 patients as 3 patients were not able to stand up without an assistive device. The values ranged from 15.9 to 31.5 s with a mean of 21.8 s (SD 7.0). The hand grip force ranged from 8.9 to 36.5 kg with a mean of 16.7 kg (SD 9.8). The MoCA results were between 21 and 28 points with a mean of 24.3 points (SD 2.4).

As in the preliminary trials, at the beginning of the trial the wearables were attached following the scheme in Fig. 5 (without SmartCardia wearable), and a synchronization gesture was performed for sensor synchronization purposes (manual patting on the right and the left shoulder, a triple clapping over the head and pressing both hands into the pressure mattress).

Although a standardized protocol was followed for the acquisition, activities were performed in slight more natural variation, and the participants were allowed to modify their actions (e.g., taking the soap to wash hands from left or right side of the sink, or taking individual variation of pills etc.). If necessary, the participants were supported by a physical therapist throughout the entire session for safety.

Depending on the physical condition, enormous differences were observed during the data collection with elderly persons and patients. Participants who were pain-free performed the ADL and ASI sessions as fast as the trial assistants who tested the approach in the laboratory environment. On the other hand, patients with pain (e.g. back pain) or patients who needed aid in mobility (e.g. rollators) needed more time with the different tasks, especially standing up and going from bed to the bathroom, or back to the table. In those cases it was decided to execute fewer sessions, depending on the condition of the patient. A summary of the number of recorded sessions per person is shown in Table 2.

Design of an ARC

As mentioned before, the machine learning task of an ARC is the classification of human activities. Since the classification requires the ground truth labels of the data, we describe in (1) Building a HAR database how the labels were generated for the collected data. (2) Implementation of an ARC describes the stages of our system and (3) Preliminary results shows the evaluation of our proof of concept.

Building a HAR Database

The training stage of the ARC requires the availability of the ground truth labels (i.e., an annotation for each data point that indicates which activity was performed). Therefore, the data had to be annotated with the corresponding activities.

In Section 2.2.1 we introduced four hierarchical levels of human activities, ranging from basic actions like hand gestures (low-level activity) to complex composite activities (high-level activity). Moreover, modes of locomotion are seen as a separate level. According to these predefined hierarchical levels, we have converted the activity protocol of our data acquisition sessions into definitions for each activity at all hierarchical levels. The result we denote as annotation vocabulary (see Appendix Table 4), which serves as a template used in the software for annotation to specify the activities that can be labeled. Moreover, we have introduced an additional label to specify the objects with which the participants were interacting with their hands (e.g., door, chair, glass).

An example of the annotation process is shown in Fig. 10: a video recording on the left shows a person about to drink, which serves as a reference for creating the annotations that are displayed on the right. Each hierarchical level of an annotation gives different information about the activity being performed. Here, the person is sitting (locomotion) during lunch time(high-level activities II and I). Specifically, the person is drinking(mid-level activities) from a glass using the right hand. The figure also shows how the drinking activity can be decomposed into a succession of low-level activities, namely hand gestures: grasping a glass, moving it to the mouth, taking a sip, move the glass away from the mouth and then place the glass on the table.

Fig. 10
figure 10

Example of the annotation process. Left: a video snapshot, which is used as reference to create the annotations. Right: the corresponding annotations to the performed activity. The red vertical line indicates the time corresponding to the video snapshot

On the one hand, the hierarchical levels generate a level of detail that could increase the possibilities of model generation with the data. On the other hand, the generation of labeled training data also becomes a very time-consuming task. So far we have annotated a selection of mid-level activities, which are washing hands, brushing teeth, toileting, pull up/down the pants, eating, drinking, pill taking and the corresponding hand gestures for the last two.

Implementation of an ARC

In general, an ARC consists of the following stages:

  1. (1)

    Sensor data acquisition

  2. (2)

    Preprocessing raw data

  3. (3)

    Segmentation

  4. (4)

    Feature extraction and selection

  5. (5)

    Classification

Our proposed implementation of an ARC, divided into these stages, is shown schematically in Fig. 11. In the first stage, a set of sensors deliver raw and unprocessed data streams. The subsequent pre-processing includes timestamp conversions, synchronization of sensor clocks, renaming according to naming conventions for data harmonization, missing data imputation, and data merging.

Fig. 11
figure 11

Stages of the Activity Recognition Chain (Vargas Toro 2018)

The additional data transformations aim to create one common sampling rate by aggregating the data from sensors with higher sampling rate and eventually creating new inputs based on calculations on the already existing measurements, such as the magnitude vector for every group of tri-axial measurements (Vargas Toro 2018). An example of raw acceleration data taken with a Myo band is shown in Fig. 12. The data are then split into equally sized time segments (also known as windows) and an overlapping between two consecutive windows is also introduced to capture the dynamics of the signals, as shown in Fig. 13.

Fig. 12
figure 12

Acceleration data taken with a Myo band (RLA). The recordings show the triaxal acceleration measurements for the 3 exemplary activities: Brush teeth, Drinking, Wash Hands

Fig. 13
figure 13

Segmentation using fixed sliding windows with a length of one second and an overlap of 50% is used to capture the dynamics of the signal that occur between two windows (e.g., window 1 and window 3) (Vargas Toro 2018)

A set of features is then calculated for each window: statistical features, e.g., mean, variance, kurtosis and interquartile range, are popular for HAR problems due to their simplicity as well as their performance (Bulling et al. 2014). A subsequent feature selection process selects the features that will be provided as input to the classifier model. For example, features that have a near zero variance are completely removed.

We have selected a Random Forest (RF) algorithm as classifier for our ARC since Zhu et al. (2017) used several classifiers in a HAR problem with a high dimensional feature space and reported that the random forest classifier yielded the best results. During the training stage the classifier model first learns to associate the given features within each time window with a performed activity by deciding for one of the target activities. Afterwards, the pretrained classifier is applied on a previously unseen test dataset where it predicts a target activity for each time window. The performance of the classification model is evaluated by comparing the predictions with the ground truth labels, i. e. the labels created by annotation.

Traditionally, the performance of a classifier is measured via the F-score that reaches its best value at 1 and worst at zero. Ward et al. (2011) developed specialized metrics to measure the performance of a HAR classifier, defined as frame analysis and event analysis. Frame analysis corresponds to error categories that can be calculated with regard to the single frames (time windows) of the classifier. In contrast, an event is equivalent to a sequence of windows that represent the same activity. The performance of the classifier is therefore build upon single window assignment on the one hand and event assignment on the other hand. These analyses extend the traditional classification errors (false positives and false negatives) onto more descriptive categories: deletion, insertion, merge, and fragmentation.

Preliminary Results

The first versions of the ARC were trained and tested using the publicly available Opportunity dataset (Chavarriaga et al. 2013). The authors of the dataset released a public challenge to recognize hand gestures and locomotion activities. Table 3 shows the performance of the participants of the challenge, reported in (Chavarriaga et al. 2013), and the performance of our approach (all details of this implementation are to be found in (Vargas Toro 2018)). These results show that our first ARC version already outperforms other approaches in terms of recognizing locomotion and hand gestures.

Table 3 Baseline results of the Opportunity challenge

Discussion

Future Research Considerations

The annotation process of the HAR database is currently in progress. Afterwards, the database will be ready to be used for model evaluation, which will be divided into two parts: the application of our implemented ARC using data from healthy participants, followed by an evaluation using data from elderly people and patients. Additionally to the evaluation of the model, we seek to analyse in detail the major differences between these two groups of people, within the perspective of the activities of interest mentioned here. Finally, we intend to use the mattress data not only to recognize in-bed postures, but also to detect relevant movements of a patient, such as the moment when they are attempting to stand up from the bed by themselves.

Acceptability of a HAR System

To assess the potential acceptability of our HAR system, all participants filled out a standardized questionnaire on their general technical affinity. Therefore, the Affinity for Technology questionnaire (TA-EG) of electrical devices was used, which is one of a few scales that is not restricted to computer use ((Karrer et al. 2009), (Attig et al. 2017)). The TA-EG is a 19-item questionnaire consisting of four dimensions, i.e., enthusiasm for technology, competence in dealing with technology, positive and negative attitudes towards technology. The responses to all items are categorized on a 5-point Likert scale, ranging from “fully agree” to “fully disagree” (transferred to a 5-point scale as shown in Fig. 14). Although all healthy elderly persons were highly interested and motivated in participating in a study aiming at a monitoring or intervention system, their overall general technical affinity was only moderate with a mean value of 3.0 points (SD 0.6). Enthusiasm for technology resulted to be the lowest dimension, but nevertheless the participants’ ratings showed the highest value in positive attitude towards technology for each individual. Hence, the general acceptability of the system envisaged can be estimated to be high enough that elderly people are likely to use it. The implementation strategy, however, needs to be adapted to the users’ low enthusiasm to search and test new technological equipment.

Fig. 14
figure 14

Technical affinity of healthy elderly persons (n = 8) using the TA-EG questionnaire. (Note: The items of the dimension “negative attitude towards technology” were analysed with an inverted number encoding)

Long-Term Impact of a HAR System

The data acquisition and the proposed HAR system aim to enhance the quality of life, privacy, and autonomy of elderly people as well as patients in addition with a reduction of carer effort. The HAR system is supposed to support the target group at home but also in the hospital, especially when they have to carry out resource-intensive or private activities. These activities relate to personal hygiene, eating, and safe transfers in the patient room. By using unobtrusive sensor-systems in the respective room, the assistance of professional caregivers could be reduced and therefore lead to a more private environment. Therefore, the sensors could send an alarm, e.g. if a patient starts to leave the bed or skips all hygienic activities. Both the recognised situation and the individual condition should influence whether only the person concerned, relatives or carers are informed.

Critical situations (e.g., fall, immobility or dehydration) could then be recognised both for patients in the hospital and for elderly people at home, such that care staff could intervene earlier. The analysis of activity pattern and possible pattern deviations allows therefore risk assessments without the affected person being permanently observed and without intrusion into the persons’ privacy.