Keywords

1 Introduction

A smart home environment is an intelligent and inclusive residential space designed to enable aging adults to live comfortably and independently [1]. It is retrofitted with a network of sensors and IoT devices that can detect events, monitor, and recognize the activities of daily living (ADLs) of residents, respond, and adapt to their needs [2,3,4]. For example, smart homes can trigger a warning to notify residents of a potential fire hazard if they forget to turn off the stove. ADLs, such as using technology, preparing meals, and sleeping, are important tools used to assess an aging adult's functional ability [5]. For instance, if aging adults consistently leave ADLs unsupervised (e.g., resident frequently forgets to turn off the TV before going to bed or did not turn off the light after using the restroom), it may be an indication of a decline in their health and well-being. Several studies have shown a strong correlation between the amount of energy consumed in homes and how aging adults perform their ADLs. The projected population of aging adults (aged 65 and above) is expected to exceed 70 million by 2030 [6], and as they continue to age in place, the amount of home energy consumed will increase, resulting in higher costs for home energy services. Aging adults belong to an economically vulnerable group with limited options for consistent income [7, 8]. Therefore, it becomes crucial to develop a robust framework for energy management in older adults’ homes to mitigate increasing energy consumption from unattended ADLs and to alleviates the associated financial burden on aging adults who choose to age in place. To address this, our study presents a situation-aware framework for mitigation of increase in energy consumption from detected unattended/unsupervised ADLs in smart home environments. Our proposed framework is designed to help residents sense and perceive unattended ADLs in the smart home environment, as well as the states of smart home devices/appliances, especially during episodes of impaired awareness. The remainder of this paper is organized as follows: Sect. 2 presents the related work, Sect. 3 highlights the proposed framework and methodology, Sect. 4 discusses the experiment design, the results and interpretation are presented in Sect. 5, and in Sect. 6, we present the conclusion and future work. Please note that we will use the terms ‘unattended’ and ‘unsupervised’ interchangeably throughout this paper.

2 Related Work

This section presents the vision of this paper and concepts used in this study.

2.1 Impaired Awareness in Activities of Daily Living

Activities of daily living (ADLs) are self-care tasks performed to enhance an individual's quality of life (QoL) [9]. ADLs may include basic tasks such as grooming, dressing, and toileting, as well as more complex tasks that require stepwise procedures for their execution, for example, preparing meals, watching TV, and doing laundry [10]. As people age, they are more prone to age-related cognitive decline, which affects their reasoning and thinking skills, thereby impeding their ability to successfully manage or supervise complex ADLs [11]. When aging adults are unable to supervise ADLs effectively, it may suggest that they suffer from impaired awareness [12]. Impaired awareness causes aging adults to underestimate their functional decline and can have economic consequences and compromise their safety when they choose to age in place [13, 14].

For example, an aging adult with mild memory loss may forget to turn off the TV and/or the air-conditioner (AC)/fan in the living room before going to bed or may forget to turn off the light bulbs after using the restroom or may forget to turn off the stove and/or faucet after preparing a meal. Each of these scenarios has economic and safety consequences. For instance, if a resident fails to turn off the TV, AC, fan, and light bulbs for a long period of time in moments that suggest that the ADLs are incomplete or unattended, this could result in an increase in the home energy consumption and consequently raises the cost of home energy services. Moreover, when the stove or faucet are not turned off and left unsupervised for a long period of time, it may constitute a hazard or risk of accidents in homes (e.g., such as fire outbreak or slips due to a wet floor. However, the main objective of this study is to mitigate the potential economic consequences in terms of energy consumption from unsupervised ADLs.

2.2 Situation Awareness for Energy Use Management in Smart Homes

Situation awareness refers to the ability to perceive or sense devices in the environment at a given time instance and location, understand their operational context, and predict their future states or status [15]. Smart home environments consist of smart appliances and IoT devices that are connected and communicate via a wireless network. These smart appliances and IoT devices are tools used for the execution of ADLs. When aging adults develop impaired awareness, it impedes their ability to properly supervise and manage ADLs, causing them to leave one or more appliances in operating states and unattended, which leads to increasing home energy consumption. To address this problem and enable aging adults to continue live optimally and independently, a situation-aware framework can be integrated into smart home environments. This framework enables aging adults to understand and perceive the contextual and operating states of the appliances used for the execution of their ADL in smart home environment and consequently help to mitigate unintended energy consumption. For example, when smart home appliances such as a TV, fan, or AC are in an operational state, each produces a distinct sound. These sounds provide useful context data that can be leveraged to gain an understanding of an ADL scene. Similarly, passive infrared (PIR) sensors can detect motion or human presence within a space at a given time to indicate if an ADL is supervised or unsupervised. By leveraging the predictive power of advanced machine learning algorithms and applying them to the robust context data from the smart home environments, aging adults can regain the awareness required to control the state of home appliances when left unattended. This can help prevent unintended energy consumption in unsupervised ADLs and alleviate the potential financial burden from increased energy use.

Fig. 1.
figure 1

Situation-Aware Framework for Energy Savings in smart home

3 Proposed Framework and Methodology

The objective of our proposed situation-aware framework is to enable aging adults to understand and perceive the contextual and operating states of the appliances used for the execution of their ADL in smart home environment, and to mitigate unintended energy consumption when ADLs are left unsupervised. Specifically, we focused on three main smart home appliances that consume more energy in homes: TV, AC, and fan. These appliances were used as a case study for our experiment. Additionally, the proposed framework includes sensors for sensing lightening from lightbulbs spaces (e.g., restroom) and where human presence is not detected after an ADL has been initiated but not completed as would be expected for a normal execution of the ADL. In such scenarios, the user receives a notification to shut off the lightbulbs.

Figure 1 provides a description of the framework and its components. The smart devices are connected and controlled through the Alexa Echo Dot. The Raspberry Pi serves as the control hub of the smart home environment, detecting the presence of light and sound through its connected sensors. This helps to detect the presence of a resident within a space. If no human presence is detected in space, it sends a notification to the resident's phone instructing them to turn off any detected devices that are left unattended or not turned off. The Alexa Echo Dot can receive requests initiated by the user from our custom-built mobile app, allowing them to turn off the smart devices when they receive a notification on their phone from the Raspberry Pi.

3.1 Audio Source Separation

Blind source separation (BSS) refers to receiving a set of source signals from a mixed signal or a set of mixed signals with very little information of the corresponding source signals [16]. We require a separator for sound separation in our experiment and propose to use BSS to filter out a mixture of sound into its respective audio waveforms. We considered four main smart appliances within the smart home set-up.

IVA Variations Used.

We employ four different variations of Independent Vector Analysis (IVA) in our study. Two of these variations are based on the gradient-based method. NaturalGradLaplaceIVA and GradLaplaceIVA. The other two variations use the auxiliary approach: Auxiliary Laplacian IVA (AuxLaplaceIVA) and Auxiliary Gaussian IVA (AuxGaussIVA). In the auxiliary approach, AuxLaplaceIVA utilizes the Laplacian function as its source model, while AuxGaussIVA uses the Gaussian function. On the other hand, both gradient-based methods consider the Laplacian function as their source model. However, they differ in how they optimize the KL Loss through gradient descent. Specifically, GradLaplaceIVA is like a gradient descent with momentum, as it incorporates an additional momentum hyper-parameter along with the learning rate.

Why Dual Channel?

As part of our experiment, we assumed the presence of two functioning appliances at a given time. This assumption is based on the observation that the AC, fan, and heater are typically used independently without overlapping. When the room temperature is warm, residents are more likely to turn on the AC and at a higher temperature rather than putting on both the fan and the AC at the same time. Similarly, the resident would likely turn on the heater when the room temperature is cold. The fan is considered in scenarios with moderate temperatures. Additionally, we assumed the TV would always be turned on in all three scenarios. To avoid increased time and computational complexity, we tested the separator using three channels. However, since this significantly increases the processing time, we opted for a dual-channel method. When only one device is functioning in the dual-channel scenario, we modified the system to separate it as a blank output. For instance, an output prediction may include ‘TV’ and ‘blank’ or ‘fan’ and ‘blank’.

Why Noise?

An important assumption when using IVA is that the input sources should not be correlated with each other. However, in the case of a mixture such as the TV and fan, the IVA model would encounter an error due to their high correlation. Despite having different frequencies and amplitudes, the rate of change of the component is equivalent for both sources. To address this, we introduce noise to one of the source samples. There are several important parameters in the IVA function. These include the basic machine learning parameters such as the type of loss, number of epochs, and number of iterations. Additionally, two significant parameters are the algorithm spatial and apply projection back. The algorithm spatial parameter can take the values ‘IP’ or ‘ISS’, which determine how the algorithm iterates its demix filter. The second parameter, apply projection back, is a scaling function applied to the source and mixture to prevent larger values and reduce complexity. It can be set to either True or False, depending on whether we want to apply it or not.

Hyper-parameters.

The essential hyper-parameter for the model is the noise co-efficient (n) which refers to the degree of noise added to the input source. This is done to prevent the sources from being highly correlated to each other. The other hyper-parameters include the learning rate and the threshold value, which are common hyper-parameters while using machine learning models. The learning rate is used for optimization while the threshold value stops iterations once loss is at minimum.

4 Experiment Design

4.1 Audio Classification

Audio data for classification in most scenarios is pre-processed by conversion to mel spectrograms or by manipulating data through short time Fourier transform (STFT) [17]. The short time Fourier transform is a Fourier-based transform which determines the sinusoidal (sin) frequency and content of the local sections of a signal under study with respect to time [18]. Once we receive the spectrograms, the research problem is now reduced to that of a computer vision problem, hence, we rely on powerful convolutional neural network (CNN) architectures for the feature extraction.

STFT.

The audio recordings were downsized to 16 kHz and then transformed from waveforms into the time frequency-domain signals by computing the STFT. The STFT splits the signal into windows of time and runs a Fourier transform on each window, preserving some time information, and returning a 2D tensor that you can run standard convolutions on.

Spectrogram.

In the following step, the STFT (number of samples perseg = 255, overlap = 124, nfft = 256) was applied to the waveform signals to obtain the spectrogram images of size 129 x 124 (frequency x time) which was then fed into a simple convolutional neural network to train the model.

CNN Model.

The CNN model has a convolution layer to down sample the input to enable the model to train faster and a normalization layer to normalize each pixel in the image based on its mean and standard deviation. The CNN model consists of four weight layers used sequentially as follows: conv2–32, conv2–64, maxpool2, FC-128, and FC-4.

Important Parameters.

In our experiment, our parameters with the model include an epoch value, which is 10, along with a batch size of 64 which was trained on 712 audio datasets with a 4-class label: “TV”, “AC” and “Fan”. The loss function used is SparseCategoricalCrossEntropy since it is a classification problem. The optimization parameter used is Adam’s Optimizer. We use the early stopping parameter to end iterations once minimum loss is reached per iteration. We also set the train-test split parameter as 0.6–0.2–0.2 with 60% for training data, 20% testing, and 20% for validation.

4.2 DashHome Audio Dataset Collection

The DashHome dataset consists of 60 s audio recordings of smart devices sound including TV, AC, fan and heater. There are 712 samples in.wav file format. The recordings were collected in a quiet room with a recording device at a sample rate of 44 kHz and then later re-sampled to 16 kHz. The AC and fan sounds were captured at different temperatures and fan speed, the TV sound was recorded with music or a show playing at volume 40, and the heater sound was captured when in high and low temperature mode. We consider the heater as the input to extract features from it, however it is not an output label, hence it is not considered in our output computations. The separator was trained with dual audio mixtures of fan-AC, fan-TV, AC-TV, and TV-heater. Each mixture was 10 min at the highest sensory level of the digital recorder. The TV sound was playing at a volume of 40, with the fan speed at four and AC at high. It should be noted that despite a couple of mixtures being trained on the heater, it was not considered for the calculation of ground truth device probability since the heater sound is very difficult to separate and there are few samples that gives us heater outputs.

4.3 Smart Devices Setup

The smart devices used for this study were Alexa-enabled devices connected to the Amazon Echo. They include Sony X90J 55 Inch TV, Vornado 660 AE Fan, Rollicool 14000 BTU Smart Air Conditioner and Philips Hue Smart A19 Lights. In the Alexa app, The IFTTT trigger skill (a virtual button that can be used to perform a request at the event of button pressed) was enabled and linked to the Alexa account. Then, an Alexa routine was created such that when the trigger button is pressed then the devices are instructed to be turned off. Finally, a webhook URL to trigger the button from a URL request was generated using an applet on ifttt.com using the logic “If receive a web request then Trigger the IFTTT Button”, which will then issue the corresponding command to the Alexa device to turn off the smart devices automatically. The Raspberry Pi 4 with a configuration of 8GB ram and 64GB storage was used in this study with the Raspbian OS loaded into memory. The sensors connected to the Raspberry Pi include the microphone for collecting sounds from the environment, photoresistor for detecting the sensitivity of light bulbs, motion sensor for user location tracking. The function of the Raspberry Pi is to continuously collect the environmental sounds and detect if the sound is classified as one of the smart devices.

5 Results and Interpretation

5.1 Audio Classification

We use a custom-made CNN model for our classification, and we achieved an accuracy of 84%. Further, we considered a threshold value of 0.30. If our probability is greater than the BSS threshold, we consider both outputs for the mixture.

5.2 Audio Separation and Classification

Initial Studies. We considered the following separator models which were trained on the respective source appliances. ‘Ag’ represents Auxiliary Gaussian, while ‘Al’ represents Auxiliary Laplacian. We considered only the auxiliary models as opposed to the gradient-based ones since they have a much better performance comparatively. ‘ip’ and ‘iss’ are the demix filter iteration differences, and the number at the end denotes how many iterations the model is trained for. We considered an effective mix of the different iterations, models, and methodologies to derive our output.

  1. 1)

    Ag_ip_TVAC_200. 2) Ag_ip_TVFAN_200. 3) Ag_ip_TVHEATER_200. 4) Ag_iss_FANAC_300. 5) Ag_iss_FANAC_200. 6) Ag_iss_FANAC_100. 7) Ag_iss_TVAC_300. 8) Al_iss_FANAC_200. 9) Al_ip_TVAC_200. 10) Ag_ip_TVFAN_200.

Table 1. The probability of each ground truth smart device calculated from the BSSP

About Table 1.

  • The first column represents the input source.

  • The second column represents the output given by just the Classifier (experiment 1). The Classifier gives a probabilistic output; the sources having a probability above the threshold value 0.30, are considered.

  • The third column is the output that we get when we pass through 11 different separation models that are coupled together following which we classify each output that we receive from the separator (experiment 2).

  • The fourth column gives the probability calculated by a probabilistic formula that we have developed in the paper, the BSSP (blind source separation probabilistic) formula.

The BSSP Formula.

(1)

where,

  • i is a particular input source summed up to the total number of input sources involved (E.g., Fan, TV, and AC).

  • N is the total output samples received from the coupled separator models; 11 models from each dual-channel model (11*2 = 22 total outputs).

  • x is the number of times a predicted output source occurs for each input.

Model Comparisons. Based on Table 1, we can conclude and compare the models as follows:

  • The Ag_iss_TVAC is the only model that was able to detect the AC as AC, hence it performed better in all scenarios. Training it over the AC dataset makes the difference that gives us the AC as output instead of the Fan.

  • Models trained on TV and fan will never give an AC output, irrespective of the iterations and model.

  • Ag_iss_fanac_300 and Ag_iss_FANAC 200 give the same output therefore it can be inferred that an auxiliary-based model is independent of the iterations.

  • As compared to Laplacian, the Gaussian model has a much-refined output and higher probability in general.

Fig. 2.
figure 2

Loss curves for the separator model.

Understanding the Loss Curves. The following points explain the loss curves represented in Fig. 2:

  • (a) and (d) are gradient-based approaches, and we can see that in the output plot. There is a decrease in loss with each iteration.

  • (b) and (c) are auxiliary approaches which showed a sharp decrease in loss hence the output is independent of iterations.

  • Comparing (a) and (c), we see that the natural gradient Laplace (ngl) algorithm minimizes towards a lower loss value than the basic gradient algorithm (gl). Also, ngl reaches the minimum loss at a lower iteration value as compared to gl

  • On the other hand, on comparing (b) and (d), the gaussian auxiliary (ag) method minimizes the loss to a lower level as compared to the Laplacian, therefore being the best model out of all 4.

Table 2. Output of each machine learning model.

5.3 Energy Savings Outcome

Suppose an older adult resident turns on the TV, AC, and fan in the living room at 7:00 PM and then goes to bed at 8:00 PM and forgets to turn off the devices in the living room. When the resident wakes up the next day, the energy consumed during that period would have incurred a cost with zero service value because the resident was not present in the living room area where the ADLs were left unattended.

However, the Raspberry Pi component of our proposed framework can periodically detect human presence, listen to the sound emitted by the appliances, and infer its source as TV, fan, AC, or heater. This ADL context information is used to determine if notification needs to be sent to the resident’s mobile phone instructing them about appliances that are left unattended and are to be turned off to save energy consumption and reduce cost. To provide a more specific estimate of energy savings, we can calculate the energy consumed every 30 min or hour based on the power consumption as shown in Table 2. The assumptions to consider during these calculations are as follows: (Table 3)

Table 3. Energy savings from smart devices in unattended ADLs.
  • We assume the resident turns off the appliance as soon as they receive the notifications.

  • It takes the model 30 s to a minute to figure out which appliance is working. Therefore, we neglect this short span and consider a time of half an hour as well as an hour, to calculate the energy consumed if the resident had left the appliances on for that period.

  • The power rating of the Raspberry Pi is four watts, which shall be considered in our experiment while calculating the energy spent to run it. This brings us to our worst-case scenario: If the resident has not used any appliance for a day, but has left the Raspberry Pi functioning, it would lead to energy being wasted.

  • We calculate the energy saved in kWh which is given by the formula:

    $$ E = P_a *T_s - P_m *T $$
    (2)

where, E is the energy saved for time \(T_s ,P_a\) is the power rating of the appliance, \(P_m\) is the power rating of the energy management module (Raspberry Pi), \(T_s\) is the time of study.

6 Conclusion and Future Work

The aging adult population belongs to the low-income category of our population and is susceptible to age-related impaired awareness. Impaired awareness limits aging adults’ ability to successfully supervise ADLs and results in unanticipated economic consequences. For example, when home appliances are in operational states and left unattended, it leads to an increase in the amount of home energy consumed and consequently raises the cost expended on home energy service. It is therefore imperative to develop an effective solution to enable aging adults with impaired awareness to mitigate unintended energy use from unattended ADLs so that they can continue to afford to age-in-place. In this study, we proposed a situation-aware framework for mitigation of unintended energy consumption in unattended ADLs. The framework includes low-cost sensors and IoT devices such as Raspberry Pi, sound, light and motion sensors. In addition, the framework incorporates machine learning models that uses robust context data generated from the smart home appliances to enable the user to understand their ADLs environment, perceive smart home appliances and their operating states to mitigate unintended energy consumption when ADLs are unattended. The results obtained from the experiments are promising. For our future work, we plan to test the efficacy of our proposed framework in senior retirement homes in using longitudinal study.