Keywords

1 Introduction

Smart home services are an important provider of systems of technology. They focus on constructing, operating and maintaining buildings in the most cost and effective efficient manner. At the most primary level, smart homes deliver functional building services for occupants by providing them security, air quality, thermal comfort, appropriate illumination, at the environmental impact as well as the lowest cost and over the building life cycle [20]. Reaching this vision needs adding intelligence from the design phase’s beginning throughout to the end of the buildings useful life. They use information technology during operation to connect different subsystems, which operate independently, so that these systems can share information to optimize the performance of the building. They look beyond the building equipment within their four walls. They are connected to the smart power grid, and they interact with building operators and occupants to empower them with practical information and new visibility levels.

Building system is decomposed into two parts: the physics with an important number of appliances and building envelope, controllers for energy management, as well as a human part associated to building occupants. These two parts interchange: the physical part is modeled and controlled by the occupants, while the occupants waiting for new services from the physical part. Research works about buildings mostly focuses on physical part and disregards the human one. However, the elaborate hardware and the most advanced software in the real applications be nothing but transistors and wires and without the people that use them to work more functionally. In that sense, the occupants that run a smart home are a crucial component of its intelligence. Consequently, in order to define the state of the buildings, non-measured values should be known not only in the physical part (i.e. air flow) but also human (i.e. occupancy and activities). Household occupants carry out different activities each day as part of their normal quotidian behavior. These include getting up, taking a shower, switching on lights and preparing breakfast. While such activities are common to us all, there will be surely differences [11]. For example, one household may use the shower at different times of the day, while others might prefer to take a bath.

In the literature, an important number of HAR techniques are available. They have significantly improved the use of deep learning techniques, and they could be classified into two categories. The first one is real-time. The HAR systems are a reactive system that depend on their environments and are in permanent interaction with them. They must respond to the latter’s stimuli while respecting certain constraints, the most important of which is the time constraint. These systems use real-time data (i.e. the activities are detected while they are being performed) and in which the data describing activities are partially observed. The second is offline using historical data (i.e. the activities are recognized when they are completed) and in which the data describing activities are completely observed [3]. By way of explanation, regardless of the rate of sampling, the data captured from uncompleted activities contain fewer details about the activities compared to the data captured from completed activities, and this makes real-time HAR a more challenging problem compared to the offline HAR.

This work presents an overview about the latest advances in the field of real-time HAR in smart home to obtain a better understanding of the difficulties, challenges and opportunities. The main contribution is to highlight the trends regarding the real-time HAR task from pattern classification (Sect. 2) and to propose new solutions for data segmentation (Sect. 3), data variability (Sect. 4), datasets pre-processing issues (Sect. 5) and complex real-time HAR (Sect. 6). A discussion and a conclusion are provided respectively in Sects. 7 and 8.

2 Pattern Classification

Various deep learning models have frequently been used by researchers in the HAR field. The learning generally involves creating a statistical or probabilistic activity model that is augmented with large training data. The task of the model is to learn and recognize patterns that differentiate various classes in the training data, and apply this knowledge for the classification of test data.

The HAR task proceeds from pattern recognition, where the techniques are decomposed on two categories.

2.1 Ontology Based Approaches

Various techniques have been proposed in the literature to tackle the HAR task, and most of the basic ones focus on location because it is a crucial part of the context. They assume that the semantics of a spatial position are static, that’s why they focus on how to specify the semantics of a spatial position and also focus on how to identify the spatial position of occupants [31]. Theses techniques work well to some scope. However, they are unable to recognize human activities with enough accuracy. For example, if the occupant stays at a kitchen, she or he may be cooking food or putting the dishes away; the efficacy of this method is limited because a location offer various activities. As an example, a living room can support various activities such as eating, playing TV games or reading. For now, research works fails to handle these changes in semantics.

Another type of ontology-based approaches is thing-based techniques that identify dynamically the change in activities by using remote sensors to identify the objects that the occupant is interacting with [28]. For example, to recognize an activity, [32] developed a system of semantics identification that identified the activity space from objects forming the immediate environment of the occupant. These techniques try to overcome the diversity of activities in the most comprehensive way possible. However, they require a complete knowledge of the domain to elaborate activities models. They are not robust to handle settings change and uncertainty, and they have implementation problems regarding the recognition accuracy of the used sensors depending on their cost. In addition, they propose to follow expert knowledge to model activities, which is time-consuming and difficult to maintain in case of environment evolution. In this paper, we propose modeling this type of activity as a multi-label classification problem. We could use a semantic segmentation of sensors and activities and inspire from techniques developed in the literature such as [23] for example. Another solution is to exploit CNN architectures. This allows the model to relate the possibility that different activities could be distinguished for the same occupant.

2.2 Data Driven Approaches

Another approach to recognize human activities in real-time is to monitor the status of different appliances to understand the behavior of the different loads in the household. In the literature, several load monitoring techniques can be implemented. They can be divided into two categories [7, 27]. The Intrusive Load Monitoring (ILM) is a data-collection method where the devices are installed at each appliance node to detect the sensor events and therefore characterize in detail the occupant activities using deep learning techniques, for example.

The databases generated by these systems can be labeled manually (i.e. the user label the monitored appliance), or automatically (i.e. the system is trained with examples from distinctive appliances and then recognizes the appliance that is being used). Generally, manual setup ILM systems master automatic setup ILM systems. Result’s accuracy is the main profit of this method, but it requires expensive and complex installation systems.

Non-Intrusive Load Monitoring technique (NILM) is an alternative process, in which one single monitoring device is installed at the main distribution board at the household, and an algorithm is applied to determine the state of operation for each individual appliance. The main advantage of NILM is the fact that only one single monitoring device is needed. Therefore, it lowers significantly the cost and the intrusion at the household level. The main inconvenience is the lower accuracy compared to ILM systems, in particular, those with manual labeling.

In general, the appliance of a household can be categorized in the following classes: (i) finite-state appliances such as dishwashers or fridges, which have different states, each one had its duration (cyclic or fixed) and its own power demand; (ii) continuously varying appliances, such as computers, which behavior is not periodic and have different states; (iii) on/off appliances such as light bulbs, which have only two states: on and off, with a fix power demand. The duration depends on the user; (iv) permanent demand appliances, such as alarm-clocks, which are always ON with a fixed power request.

The appliances can be identified by “event-based” techniques that detect the On/Off changes, or by “non-event-based” methods that detect whether an appliance is ON during the sampled duration. These techniques can use different sensor measurements and features data, such as current and voltage measurements and signal waveform. The sampling rate is an important parameter in the complexity level of the methods of discretion. It affects not only the type of feature that can be measured, but also the type of algorithm that can be used. A detailed review about the features and algorithms is discussed in [22].

A high frequency sample data rate (1 s–1 min) permits more detailed analysis and accuracy in the detection of appliances loads. However, the large amount of data needs higher quality hardware and requires a capacity of processing and storage (locally or in the cloud) to run the algorithms. Another important issue in data driven approaches is related to the quality of data. In this work, we are interested in dealing ambiguous outliers detections in both training and testing data and insufficiency of labeled data. Outliers could be detected using a multiclass deep autoencoding model for example. Some autoencoder variants, such as a variational autoencoder, can be used to better extract high-level features for the estimation network. We are interested also to perform experimental results to prove the outstanding performance of the proposed solutions and compare them with state-of-the-art rival methods.

3 Data Segmentation

In HAR task, each sensor data need to be divided into chunks, in a windowed manner. On each window, the features are computed, and then are used as an instance for learning phase. This task is difficult since occupants perform activities continuously, and successive activities can not be clearly distinguished. This section details the most used data segmentation techniques in real-time HAR context.

3.1 Time Windows

Time windowing (TW) techniques consists on dividing the data stream into time segments with a regular time interval. The selection of optimal duration of the time interval is the biggest challenge for these techniques. In fact, if the time interval is very large, the events could be common to many activities and consequently the predominant activity in the time window will have the more influence in the label’s choice [4]. TW techniques are commonly used in the segmentation of sensor event for real-time activity recognition. This technique is more favorable to the sensor time series with regular or continuous time sampling. However, in the smart home context, sensor data are often generated in a discrete form along the timeline where a fixed size time window is not suitable.

3.2 Sensor Events Window

Sensor events windowing technique consists on dividing the whole sequence into a set of sliding windows with an equal number of sensor events [16]. Each window is labeled with the last event’s label in the window, and the sensor events that precede the last event in the window define the last event’s context.

It is easier to implement; however, the challenge consists on the fact that the actual occurrence of the activities is not intuitively reflected [18]. In fact, one sliding window could cover two or more activities, or sensor events belonging to one activity can be bust into several sliding windows. Furthermore, if several activities occur simultaneously, this technique is unable to segment sensor events. In addition, this window type differs in terms of duration, and it is consequently impossible to interpret the time between events.

To solve this problem, a method that segment the sensor events into portions that are consistent with the incidence of each activity is proposed by [15]. This approach can correctly outline the extremities of the activity, but it could take a longer time until sufficient information to define one segmentation is received. Moreover, the question that arise is how to determine if two sequential sensor events belong to similar activities or not? This is a challenge.

3.3 Dynamic Window

Unlike the previous techniques, dynamic windowing uses a non-fixed window size. In the literature, [26] proposed a dynamic segmentation model where the window is reduced and expanded based on the use of the current state of activity recognition, the sensor data and the temporal activity information. It is a two-stage methodology [21]. An offline phase consists in splitting the data stream into events windows, then extracting the “best-fit sensor group” from the event window. In the real-time phase, the dataset is streamed to the classification algorithm. When the “best-fit sensor group” is identified in the stream, the corresponding label is associated with the given segment’s input by the classifier. A non properly annotated dataset is the most challenging for the use of the technique. Also, it is not able to handle complex HAR in real-time [1].

3.4 Fuzzy Time Window

Another important type of data segmentation is fuzzy time windowing (FTW) [14]. The objective of this window’s type is the generation of features for each sensor time series according to its evolution for a given interval of time. It is created for encoding multi-varied sequences of binary sensors.

To predict the right activity label, one should not rely solely on the past because in some cases, delaying the recognition time allows a better decision to be made. Consider the following example, if a binary door contact sensor is activated, the activation can be associated with the following activity “the resident has left the house”. However, it can happen that the inhabitant only opens the front door to speak to another person at the entrance of the house and returns home without going out. To improve the precision, the use of the activation of the following sensors is a good alternative, and it will be useful for example to introduce a delay in the decision-making. The longer the delay, the greater the accuracy. However, a problem may arise if this delay is too long and, in effect, the delay prevents real time recognition. While a long delay may be acceptable for some types of activities, others require very short decision time in the event of an emergency, such as a resident falling. Furthermore, this method is applicable only to binary sensors’ data, although this is not always the case because a smart building also contains non-binary sensors such as \(CO_2\) concentrations sensor.

3.5 Outline

Many applications implement real-time activity recognition using wearable sensors such as accelerometers. HAR using these devices is less challenging because the data are generated continuously with fixed frequency, allowing data to be segmented based on the number of sensor signals or time intervals, and they are limited only sample on simple activities such as walking, cycling. In contrast, smart home ambient sensors are normally generating data in a discrete manner; so, this still remains a challenge for dynamic segmentation. Besides, the classified activities are often complicated, composed of a number of sub-activities where the duration of sensor segmentation and the boundary are difficult to determine.

In this work, we aim to develop a new dynamic real-time sensors’ event segmentation approach to sensor data analysis which incorporates two components: sensor correlation computation, and time correlation calculations.

4 Data Variability

In the case of HAR, most variations occur in settings and data. The following subsections detail the variability in settings and data issues.

4.1 Variability of Settings

Involving inhabitants from residential buildings is still a key challenge because each smart home is unique due to its architecture, configuration of sensors and equipment, but mostly because of its inhabitants’ different profiles. The site models, the sensor locations and human activities/preferences are mostly not available, however, these are required for mass deployment solutions [27].

Some may be small, such as a single apartment, where the sensors may be fewer and have more overlap and noisy footage. Others may have multiple rooms and contain many sensors and devices. Indeed, the performance of an activity recognition system can be influenced by the number, type and location of smart home sensors. Therefore, a model optimized in one house may not work well in another because of this difference in architecture.

In this work, we are interested to test the transfer learning methods for solving this problem [10]. These techniques allow the use of pre-trained deep learning models with different data distributions. We are also motivated to analyze the different types of knowledge that can be updated with deep learning algorithms and taking advantage of recent advances in transfer learning for deep learning.

4.2 Temporal Drift

Thanks to the large number of sensors as well as the interaction between the occupants in a smart home, the initial training data constitutes a mirror of the activities carried out at the time of registration. A model is generated and trained using this data.

Habits and behavior of occupants may change over time. The data collected in real time is no longer the same as the data used in the training phase. This is translated by the concept of temporal drift as presented in [24], and means a change in the distribution between the training data and the test data. Let’s consider the following example. A primary approach, which is prevalent in many buildings, is to use passive infrared (PIR) sensors for activity estimation. However, motion detectors fail to detect activity when the occupants remain relatively still, which is quite common during activities like regular desk work. Furthermore, drifts of warm or cold air on objects can be interpreted as motion, leading to false positive detection.

The statistical properties of a variable that the model attempts to predict change over time in unexpected ways. Taking this drift into account requires real-time adaptation to changes in human activities using new HAR algorithm data in smart homes [19]. Based on real-time monitoring of human activities, we are interest to perform Fault Detection and Diagnostics (FDD) but also predictive maintenance. It is indeed possible by artificial intelligence algorithms to detect drifts and intervene before it is too late. Numerous research studies still make it possible to develop these techniques to make them truly operational, in particular to adapt easily to changes in piloting mode and to different configurations of the building [29].

5 Datasets Pre-processing

Following a data collection, the performance of activity recognition is evaluated with datasets generated through different sensors.

In a dataset, most of the activation data of sensors are not annotated because the dataset has been built with predefined activity sets to be labeled. Consequently, the activation of sensors linked to other activities are annotated under the “Other" label. This special class represents more than 50% of the datasets studied, provided by the Center for Advanced Studies in Adaptive Systems (CASAS) [8]. Event sequences of this class can be similar to event sequences of normal classes, as they can be quite different from each other. The difficulty of classification increases in proportion to this growth of the “Other” class. This leads to class confusion. In addition, each sensor activation gives only some information about the current activity. For instance, the activation of the kitchen motion sensor could indicate different activities like cooking, housekeeping or washing dishes. Therefore, the information offered by this sensor is exploitable only in coincidence with neighboring sensor activation. Thus, the time series is irregular and sparse, and it contains highly variable data and unbalanced classes.

Labeling is also challenging. In fact, in supervised learning techniques, which are extensively used in a lot of HAR applications, the problem of the required target arises in the determination of the activities of occupants i.e. the labeling issue is usually addresses using video cameras. In order to accept the privacy of occupants, the use of cameras is generally not acceptable in many places.

Also, the labeling is time-consuming. In general, the datasets for an actual building are labeled by the residents themselves using a graphical user interface and then used post-processed for the validation of research works. Missing labels are the most challenging problem in datasets. Let’s consider the example of CASAS dataset developed by Washington State University, the activation of sensors may not be correctly ordered in a chronological order of timestamps. Also, duplication of days and events (i.e. same activity label, timestamp, sensor and value) could arise in a dataset and a cleaning step should be considered before the training process. Figure 1 presents an extract from the Milan dataset where labels are missing.

Fig. 1.
figure 1

CASAS dataset anomaly

The contribution in this work is to take into account the annotation of events in the judge of performance using evaluation metrics weighted by the number of representations in a dataset, F1 weighted score or balanced accuracy when the dataset classes are unbalanced [12].

6 Complex Human Activities

An activity consists of a pattern of multiple actions over time. Typical examples include reading, coffee time and cooking. The occupant’s activities could be classified into two categories. Simple activity refers to primitives that realize a simple purpose or function, while complex activity consists of the temporal combinations of multiple simple activities over time.

In the literature, an important number of research works have been developed in the framework of real-time HAR. However, they focused mostly on simplified scenarios involving simple activity recognition and a single occupant, and there is no much attention on the real-time complex HAR.

In a house setting, occupants perform frequently different actions simultaneously in a variety of temporal combinations. That’s why, activities are much more complex than actions, but they represent more the real life of residents.

Occupant activities are frequently carried out in a complex manner. Activities can be carried out in an interleaved or concurrent manner. An occupant may alternately wash dishes and cook, or listen to music and cook at the same time, but could just as easily wash dishes and cook, alternately, while listening to music. The possibilities are limitless in terms of activity planing. However, some activities could not appear in the dataset and could be also abnormal, such as cooking while the individual sleeps in his room. Modeling this type of activity is a challenge and only few research works are presented in the scientific literature. However, it could be modeled as a problem of multi-label classification. For example, in [23] a semantic segmentation of sensors and activities has been investigated and this allows the model to relate the possibility that certain activities may or may not occur for the same resident simultaneously.

Multi-user activity is another challenge. Besides, while the monitoring of a daily single resident activities is already a hard task. The complexity increases more with many residents. The same activities become more difficult to recognize for the following reasons. Firstly, in a group, a resident may interact to do common activities. In this case, the sensor activation reflects the same activity for each resident in the group. Secondly, everyone can perform simultaneously different activities. This produces a concurrent sensors’ activation for different activities. These activations are then merged in the activity sequences, where an activity performed by one occupant is a noise for the activities of another occupant. Some researchers are interested in this problem [17]. However, most of the existing methods are of the offline type and there is no interest to date for the recognition of complex activities in real time.

The real-time activities that a smart homes want to recognize can be on the contrary seen as sequences of micro actions. These sequences generally follow a certain pattern, but there are no strict constraints on their compositions or the order of micro actions. To solve this problem, an important number of research works are present. A two-layer based LSTM technique is investigated to address the diversified composition of actions for HAR based on wearable sensors in [30] and based on videos in [9]. Another idea consists by exploiting the techniques developed by the natural language processing field, where the word’s context vary, the words vary and the texts have a multilevel hierarchical structure.

Embedding techniques, such as BERT [5] and ELMo [6] have been investigated to operate sequential data. These techniques help the processing of long sequences. However, all these techniques are only experimenting with offline HAR. The study presented by [5] is limited by a certain size of possible sensor activations and vocabulary, and it is consequently impossible to obtain a representation of sensor values that have never been observed. In this work, we are motivated to investigate methods that could capture more semantics in real-time, i.e. that could split words and represent sensor activation into sub-words such as byte pair encoding (BPE) [25].

7 Discussion

In this survey, we have highlighted the key challenges in real-time HAR in smart buildings. A taxonomy of the main components of a real-time HAR algorithm (time series analysis, data segmentation, data variability, data pre-processing and classification) has been introduced.

The data variability is still an unsolved problem because sensor data are very sensitive to the localization of sensors as well as the house configuration. We are interested in (i) performing Fault Detection and Diagnostics (FDD) to detect drifts using artificial intelligence algorithms to solve the temporal drift, and (ii) the use of transfer learning algorithms to solve the variability of settings.

Multi-occupant and concurrent activities’ recognition are the most challenged problems in human activity recognition task using ambient sensors. We are motivated to work on this issue by modeling this type of activity as a multi-label classification problem or using CNN architectures.

Taking into account the quality of the training data was a problem rarely discussed in the literature. In fact, as the available information changes over time, the structure of the training data should also be readjusted to deal with such dynamic aspects. In [13], the authors have evaluated big data quality using different indicators: precision, accuracy, completeness, volume, timeliness and consistency. In this work, we are motivated by using other indicators to test the data quality, such as spread rate technique proposed in [2] which considers the global space of the data and does not look at each class alone.

The unavailability of sensor data over long periods of time, missing data as well as non-annotated events are also challenging ones. In this direction, we are interested to (i) take into account the missing data by adding a penalization function in the optimization algorithm when the data are not available, and (ii) to judge the performance of algorithms using evaluation metrics weighted by the number of representations in a dataset, F1 weighted score or balanced accuracy when the events are not annotated.

Finally, pattern recognition analysis and feature extraction challenges are solved by deep learning algorithms, such as RNN, CNN, LSTM, or hybrid architectures (RNN and LSTM for example). Both approaches based on CNN and LSTM would give equivalent performance levels, but CNNs are faster in the training phase and therefore more suitable for real-time HAR. However, challenges related to sequence analysis still remain largely unresolved. We argue that the application of language processing techniques can bring advances in solving some of these challenges as they deploy sequence analysis methods. In this context, our contribution to investigate byte pair encoding (BPE) methods [25] able to split words and represent sensor activation into sub-words and therefore capture more semantics in real-time.

8 Conclusion

This work provides an insight of new challenges for real-time HAR in smart buildings. Based on the existing methods and applications, different trends are analyzed and deals with pattern classification, data segmentation, data variability data sets pre-processing and complex human activities. Under these issues, various opportunities and solutions are analyzed along with its recent application-based approaches. Based on the observations, the findings of the survey are summarized to improve the field of real-time HAR. In this work point of view, it is recommended to improve deep learning techniques by developing neural network architectures that takes into account the quality of training data (missing values in time series and non-annotated event), the variability of data, the data segmentation and ontology of activities for better performance improvement.

Future work will be around the analysis of different software/hardware architectures for real-time HAR in smart buildings.