1 Introduction

Integrated networks are communications networks that use devices from multiple manufacturers, protocols, and service operators. Currently, using integrated networks, it is possible to interact with sensors and everyday devices, deployed in what is known as the Internet of Things. IoT is a concept that is being applied in multiple areas, among which we can highlight Smart Home, Smart Cities, Smart Agriculture, Smart Grid, Smart Health, and Wearables.

One of the areas in which the application of the IoT is increasing day by day is in Smart Home. According to [1], in 2015 the smart home sector had a global value of $ 47 billion and they forecast a growth of 14% per year in the period 2016 to 2022. According to data from the observatory [2], the Italian market for the smart home sector in 2018 had purchase estimates of 380 million euros, which represents an increase of 52% compared to 2017. In addition, it indicated data from other European countries such as Germany (1.8 billion euros, an increase of 39%), the United Kingdom (€ 1.7 billion, 39% increase), France (€ 800 million, 47% increase) and Spain (€ 300 million, 59% increase).

We can define Smart Home as a system that allows the monitoring and control of a home or office easily. Its application is carried out by both companies and individuals. In order to use it, we must have access, through an integrated network, to the devices located in the home or office. Its implementation can be very broad and located on very different devices. It can go from simple monitoring with cameras for security control, to something as complex as the use of smart refrigerators, which can place orders automatically, when they detect the lack of food. Its most common applications are the control of lighting, gas, water, air conditioning, doors and windows, security cameras, weather sensors, irrigation the home theater, etc. Some of these applications are multimedia applications. In multimedia services, the Quality of Experience (QoE) is the most important metric, since it provides a measure of how good a service is from the user’s point of view [3].

If we observe carefully, we will realize that the use of electrical energy is necessary for most of the activities that we carry out in our homes throughout the day. We must be responsible with the use of energy due both to the depletion of certain energy sources, and to the impact that occurs on the environment. On a regular basis, at home, we have a large number of appliances such as a refrigerator, freezer, dishwasher, washing machine, dryer, oven, stove, etc. that we use even several times a day. These appliances can often account for half of the energy consumption in the home. Also, it is important to control the consumption of light and the appliances used for air conditioning. In addition, the consumption of electronic devices, in general, is usually higher than it should be. In general, our televisions, multimedia players, etc have a system called stand-by that allows them to never be unplugged from the current, so their electricity consumption is permanent.

Artificial Intelligence can be used to improve and optimize energy consumption in our homes. One of the biggest keys to using AI is its learning phase. The key to optimizing our energy consumption at home is data. As a smart home is an automated environment, we can monitor and get to capture the patterns of daily activities, which are provided by sensors through information technologies. The greater the amount of information available about our habits, the greater the adjustment in energy consumption savings can become. Using learning, automatic or supervised, we can make the control devices capable of self-programming. In addition, we can apply Deep Learning, which allows us to use logical structures very similar to those of organization of the nervous system of a mammal. By using Deep Learning on the control devices that we have at home, we will be able to optimize energy consumption in our smart home.

In this paper, we present an intelligent system in Smart Home environments that manages the set of nodes and services of the environment, enabling or disabling them based on the predictions of user’s service consumption. This system is aimed to improve the results of a deep learning classification system when the algorithm is still learning from the user. Deep Learning works better with a huge number of data and our proposal, based on RL is aimed to provide better flexibility to the system.

The rest of this document is structured as follows. Section 2 presents some of the most relevant works related to our study. The network and system environment and architecture are described in Sect. 3. Section 4 briefly defines the data preprocessing and the classification model, based on deep learning. Section 5 describes the RL algorithm and all its components. Section 6 presents the evaluations of the system. Finally, Sect. 7 concludes the paper and presents some future works.

2 Related work

There is a large number of articles in which the authors have studied smart homes.

Some authors like [4,5,6,7,8] present studies on IoT and Smart Home. Vasicek et al. [4] present the IoT concept in a Smart Home. They use IoT devices to create a smart home, without the need to rebuild the home. Jie et al. [5], present a highly scalable architecture, applying IoT technologies, where they can integrate many applications using a uniform interface, to develop a smart home. Khan et al. [6] present the design of an IoT Smart Home System (IoTSHS), which was designed, programmed, manufactured and tested with excellent results. The authors stress that it can benefit all parts of society by providing advanced remote control for the smart home. Malche and Maheshwary [7] analyze the functions of a smart home and its applications and introduce an architecture that they call FLIP (Frugal Labs IoT Platform), to build smart homes enabled for IoT. Yang et al. [8] present a study examining service characteristics of smart homes in 216 samples in Korea. They also do a study of personal characteristics in the behavior of users.

There are authors like [9,10,11,12,13] who present reviews about IoT and Smart homes. Risteska Stojkoska and Trivodaliev [9] propose a holistic framework, which incorporates different components from IoT architectures/frameworks proposed in the literature. In addition, they identify a management model for the proposed framework, identifying the tasks to be carried out. Alaa et al. [10] present a review study, where they formalize a taxonomy focused on three areas, (1) smart homes, (2) apps, and (3) IoT in three major databases, namely, Web of Science, ScienceDirect, and IEEE Explore. Kuzlu et al. [11] present a review where they compare wireless and wired communication technologies in local area networks applied to Smart homes, taking into account the standards, protocols, data rate, coverage, and adaptation rate. Kamel and Memari [12] present a review, whose main objective is to classify the different types of Smart homes into three main groups, from the energy point of view. Groups are established at homes with energy monitoring systems, systems with control capabilities, and systems with advanced data processing capabilities. Finally, Jia Chen et al. review, in [13], the use of IoT in*home systems and applications for health monitoring. They reviewed the key factors that caused the growth of IoT monitoring at home. Then, they presented the lastest advances of the architecture of these systems. Lastly, they discussed future outlooks and personal recommendations.

Other authors [14,15,16,17,18] present studies from the point of view of access security to IoT devices in smart homes. Apthorpe et al. [14] present the study of four Smart home IoT devices. They find that network traffic rates can show user interactions, even when the traffic is encrypted. For this reason, they indicate the need for technological solutions to protect the privacy of users. Augusto-Gonzalez et al. [15] present the GHOST framework (Safe-Guarding Home IoT Environments with Personalized Real-time Risk Control), which aims to provide cyber security to residents of smart homes. They do it through a new reference architecture, for smart home security and its users. Lin and Bergmann [16] establish key requirements to give reliability in smart homes in the future. They propose a gateway architecture, to have a high availability of the system and devices with limited resources. Meng et al. [17] present the most popular architecture for smart home platforms, detailing the functions of each of its components. They also comment on the main security and privacy challenges of the platforms and review the state of the art of the proposed countermeasures. Ammi et al. [18] propose a novel Blockchain-based solution for secure smart home systems, using a combined hyperledger fabric and hyperledger composer. Another important aspect of the proposed solution is the mapping of the attributes of a smart home to those from the hyperledger composer. This mapping allows for a customized, designed-for-purpose solution that can meet the security requirements for IoT-based smart homes.

Atat et al. [19] present the cyber-physical systems (CPS) taxonomy by providing a broad overview of data collection, storage, access, processing, and analysis. These systems are foreseen to revolutionize our world via creating new services and applications in a variety of sectors, such as environmental monitoring, mobile-health systems, intelligent transportation systems, and so on. This is the first panoramic survey on big data for CPS, where their objective is to provide a panoramic summary of different CPS aspects. Also, they provide an overview of the different security solutions proposed for CPS big data storage, access, and analytics. In addition, they discuss big data meeting green challenges in the contexts of CPS.

There are authors [20,21,22,23] who study energy management in the Smart home environment. Collotta and Pau [20] propose an energy management system for smart homes, using Bluetooth Low Energy (BLE) [21] for communications, together with a home energy management (HEM) scheme. Their results show the efficiency of the proposed system, since they reduce the peak load demand, and the charges for electricity consumption, thus increasing the comfort of its users. Al-Ali et al. [22] present an Energy Management System (EMS) for smart homes. They use Business Intelligence (BI) and Big Data analytics software to manage energy consumption, satisfying consumer demand. Xia et al. [23] propose an edge-based energy management framework, which provides a low cost of electricity and saves on infrastructure construction. They have implemented a prototype, the results of which show a reduction in the cost of electricity of 82.3%, compared to similar cases. Celik et al [24] present a review of the antecedents in modeling of residential load, demand-side management (DSM) and demand response (DR), in the settings of a home and in a neighborhood area. The objective is to classify the structure and coordination techniques of energy management, from previous research. Authors as Wu et al. [25] discover the relations between the trend of the big data era, and that of the new generation green revolution, through a comprehensive and panoramic literature survey in big data technologies toward various green objectives and a discussion on relevant challenges and future directions.

Amjad et al. [26] propose a cognitive edge-computing-based framework solution, to integrate the advancement of edge computing resource requirement schemes as well as the resource allocation schemes found in the literature for enterprise cloud; to attain a universal resource allocation framework for IoT. Others Authors as Jararweh et al. [27] present a novel experimental framework for IoT-based environmental monitoring applications, using concepts from Data Fusion (DF) and software defined systems (SDS). It is built on top of the software defined networking platform where the core components (the host, switch and the controller) are expanded to support other software defined systems components (such as software defined storage and security) and enable the applications of different DF techniques in IoT environments.

Studies [28,29,30,31,32] related to the application of AI in the field of Smart Home can also be cited. Sodhro et al. [28] state that the convergence of IoT and AI promotes energy-efficient communication in smart homes. Its main objective is to optimize the Quality of Service (QoS) of video transmission, which is carried out using wireless micro medical devices (WMMD), in smart healthcare homes. Guo et al. [29] make reviews of the literature and existing products to define the functions and roles of AI in Smart homes. They point out the existence of a delay between the literature and the products. Sepasgozar et al. [30] reviewed the applications of the IoT in homes, to make them intelligent, automated and digitized in many of its aspects. They have studied the literature on the use of IoT, AI and geographic information systems (GIS) in smart homes. They state that there is a considerable gap in the integration of AI and IoT and the use of geospatial data, in the field of Smart Home. Song et al. [31] present frameworks of centralized and distributed AI-enabled IoT networks. Key technical challenges are analyzed for different network architectures. Deep reinforcement learning (DRL)-based strategies are introduced and neural networks-based approaches are utilized to efficiently realize the DRL strategies for system procedures. Different types of neural networks that could be used in IoT networks to conduct DRL are also discussed. Lloret et al. [32] proposed an intelligent system for detecting elderly problems and assist them. They proposed a communication architecture and designed a software application.

Several works have been carried out about Smart Home from different points of view. However, the role that an automated intelligent management system can play in Smart Home environments needs to be discussed. Consequently, we propose a new role for AI in this scenario. Our proposed system works along with the user to reduce the intrusion that an automatized service management can introduce to the user’s experience. Moreover, the system is oriented to especially reduce the impact of bad predictions on multimedia services, trying to guarantee a good QoE from the user’s point of view. In order to overcome the possible difficulties that a deep learning classifying method may provide to the system, we introduce the use of an RL-adapted method into the Smart Home. RL has been used in other works like in [33] due to its performance.

3 Proposed architecture

In this section, the architecture of the proposal is detailed. First, the network architecture is explained. Then, the intelligent system is detailed, explaining the function of each module that composes it.

3.1 Network architecture

The architecture of this proposal is an architecture with centralized management based on AI [34, 35]. This architecture is divided into five logical layers to organize and maintain divided the functions of each object connected to the network. The management layer (Layer 4) centralizes the information and stores the data of the features of the connected objects (dataset) and the parameters and statistics of the network. The AI uses this information to create workgroups (grouping) and routing, and other functions that keep the network operational. The multiprotocol IoT Gateway is the device in this layer in charge of doing this work and has the capacity to handle different interconnection technologies, store information, host an AI, process information and control internet access. In the internet layer (layer 5), the AI within the cloud can choose the IoT Platform according to the type of parameter and the capacity to process large volumes of data. The rest of the layers are located below as Artificial Intelligence Assistants (AIA) in layer 3, smart things (th) in layer 2 and smart sensors and actuators in layer 1. In this way, the functionalities are separated and kept as a stand-alone system. Fig. 1 shows the hierarchical distribution of the architecture with the objects represented as nodes.

Fig. 1
figure 1

IoT Smart architecture

This architecture is used to design Smart IoT-Networks, which contain interconnected objects with integrated AI (Smart Things). Some of these networks are Smart Home, Smart Office, Smart City, Smart Factory, among others. For this case study, the scenario is based on the connected objects in a Smart Home. The IoT Gateway's AI classifies the objects connected in the network into workgroups and roles by layer [34], and then, when an object requires resources, the IoT Gateway routes it, selecting the best node to provide them. The AI creates these groups to provide an automatic service to a user based on their features of functions, resources, and capacities. The following is a case of service within the Smart Home, e, g., when a visitor arrives at the house, the AI selects the objects with nearby features to provide this service. The system searches inside the house if there are users or not and automatically attends the door's visit. The necessary resources for this service are multimedia such as video and sound necessary to show the image of the visit on the objects that have this resource and that are close to the user inside the house. If there are no users inside the house, it will send them to objects or mobile devices in any location. One function that would be activated would be facial recognition and identity verification and then sent to be processed in the cloud. The first object to interact in the service would be the smart main door at the front of the house. This would be the requesting node for the resource and would execute the recognition and verification function. The IoT Gateway's AI would be in charge of distributing and transmitting the video and voice to the objective objects that have this resource and that meet the condition of being close to a user and with the capacity to reproduce it. Fig. 2 shows a case study for a workgroup that attends the door's visit service in a Smart Home through this architecture. This workgroup is organized according to the layers of the architecture in Fig. 1. When the smart door attends a visit, it activates a request to send video and image processing data. The image processing data (facial recognition) is sent through the AI interfaces until the visitor's identification is obtained. The video is sent to each connected object with multimedia playback functions that is close to the user and routed through the IoT Gateway. Therefore, the smart home announces a visit and the identification of a person located in the main door. If the user is away from home, the IoT Gateway will send the information through the cloud to the closest objects with multimedia functions to the user (E.g., Smart Car, Smartphone, Tablet).

Fig. 2
figure 2

Case study with a single workgroup

In Smart Home, with this architecture, the IoT Gateway, through the IoT nodes, can provide multimedia services. Through user requests given to the smart assistant placed in the IoT Gateway, either by voice commands or either by a smartphone application, the user can play their favorite music in the audio system on the distributed speakers in the house, watch movies-on-demand, or automatic record surveillance videos of the house.

3.2 Intelligent system architecture

Once the network architecture has been explained, the system architecture must be discussed. The aim of the system is to improve the QoE of the services in the Smart Home, focusing on multimedia services. In order to achieve that, the intelligent system must enable or ask the user to enable the services the user may want to use after other services and learn how to disrupt the user’s activity the least number of times.

The intelligent system is composed of several modules, located in the IoT Gateway, that are interconnected to provide the desired functionality. The architecture of the system is depicted in Fig. 3.

Fig. 3
figure 3

Intelligent system architecture

The first module in the system is the recording module. This module is not an intelligent system, but it is the necessary first step to get the data the system needs. Each time the user consumes a service, the IoT Gateway records a data array with the structure shown in Table 1. Due to space constraints in the table, the user id column has been omitted and fields like the time, the options or the duration are shown as unique columns. However, in the dataset, they are split. For instance, the time and duration fields are split into day (only regarding the time column), hour, minute and second. As regards options, the entry is a set of columns, from option 1 to option 5, represented by binary values. The meaning of the options is meaningless for the system. Only the service finds this meaning useful. In Table 1, the first row means that the user enabled the heating service the second day of the week at 5:39 PM for 1 hour 42 min. Moreover, that day was a working day (the type of day field shows us this) and that service is not a multimedia service. The options here show us that the system was enabled in heat mode. The IoT Gateway manages equivalency tables to transform the meaning of these fields into their values. Consequently, the smart system works with integers and that makes easier the processing of the data. Table 2 shows an example of an equivalency table. The options used in this record can be non-exclusive options.

Table 1 Record data example
Table 2 Equivalency table example

The second module of the system’s architecture is the data preprocessing module. This is a software module that computes the data to transform it into the input of the next modules. The data preprocessing process is detailed in Sect. 4.

After the data has been processed, the datasets extracted from the logs provided by the record module are sent to the classifying module. Here we have to distinguish between the training phase and the prediction phase. In the training phase, the dataset extracted is used to train the classifying. Therefore, a training, validator and test dataset are extracted. Once the classifying system has been trained, the classifying module is used to predict the next service to be consumed. Consequently, the data sent by the data preprocessing module are the next inputs for the classification. In that case, the classifying module returns the predicted service.

The next module in the system is the RL module. The RL module receives information about the services from the data preprocessing module. This information is used to build the initial states and to calculate the required metrics. For instance, the RL module needs to know if a specific service is a multimedia service. When the classifying module predicts a service consumption, that prediction is an input for the RL module. The RL module chooses the best action, as explained in Sect. 5, and that action is the output of the RL module.

Finally, the IoT Gateway has an actuator module that performs the action chosen by the RL algorithm.

In the next section, the data preprocessing and the classifying modules are described.

4 Preprocessing and classifying algorithms

In this section, the data preprocessing and the classifying modules are explained. First, the data preprocessing process is described. Its process and algorithms are detailed. Then, the classifying model chosen is described.

4.1 Data preprocessing process

Once the logs are provided to the preprocessing module, this module starts extracting some characteristics from them. Firstly, the RL module uses some data characteristics that are outputs of the preprocessing module. For the RL module, the statistical probability of changing from one state to another and the mean of the timestamp when it does are important data. Consequently, the preprocessing module transforms the data extracting these statistics. In order to achieve this, the module manages Markov chains. The definition of these chains adapted to the problem is described in (1):

$$ P[X_{{n + 1}} = x_{{n + 1}} |X_{0} = x_{0} ,~X_{1} = x_{1} ,~ \ldots ,~X_{n} = X_{n} ] = P[X_{{n + 1}} = x_{{n + 1}} ~|~X_{n} = ~X_{n} ] $$
(1)

Where \({X}_{n+1}\) is the next service consumed, \({X}_{n}\), the service consumed in the iteration number \(\rm {n}\) and \({x}_{n}\) is the service number \(\rm {n}\) in the services set \(\rm {S}\).

In this system, the data will be processed regardless the time. That means, regarding the preprocessing, the time does not change the probability of the transition between services. That is depicted in (2):

$$ P[X_{{n + 1}} = j|X_{n} = i = ~P[X_{1} = j~|~X_{0} = ~i]~\forall i,j \in S $$
(2)

The fact that the Markov chain does not consider time to set the probability does not mean that in the system the time is not considered an important input. However, the RL algorithm will use time in a different manner.

Once the records have been read, the preprocessing module turns them into matrices so that the RL algorithm can operate efficiently with them. The first data the RL will need is the probability of consuming a service. To set this probability, depending on the last service consumed, the preprocessing module builds a transition matrix. This matrix, given a consumed state \(i\) and a possible next consumed state \(j\), defines the probability \(p\) of demanding \(j\). The preprocessing module must satisfy the constraints defined in (3) and (4):

$$ p\left( {i,j} \right) \ge 0 \quad ~\forall i,j \in S $$
(3)
$$ \sum\limits_{{j \in S}} p \left( {i,j} \right) = 1\quad ~\forall i \in S $$
(4)

Where \( p\left( {i,j} \right) \) is the probability of consuming the service \( j \) after consuming the service \( i \).

In this system, the probability of consuming a service is not enough. Another important statistical data is the time between transitions. If the time is not considered, the system could ask for the right service hours before the user wants to enable it. This, although will not be treated as a feature of the definitions in the RL algorithm, will be necessary information to implement some of the actions of the RL system (see Sect. 5). For these calculations, the preprocessing module uses the time and duration of the records for each service. As regards the means of time and duration, and to take advantage of the incremental processing of records, the calculation will use a recursive formula. In order to calculate the mean, the equation defined in (5) is used.

$$ \mu _{n} = \frac{{\left( {n - 1} \right)*\mu _{{n - 1 + ~}} t_{n} ~}}{n} $$
(5)

Where \({\rm {\mu }}_{n}\) is the media with \(n\) records, \(n\) is the number of records and \({t}_{n}\) is the time or duration of the record number \(n\). From this equation, we need to define the basic case as in (6):

$$ \mu _{1} = t_{0} $$
(6)

The variance is calculated in a similar way. The recursive formula described in (6) is used to avoid iterating through each past record when a new service consumption is ended.

$$ s_{n} ^{2} = ~s_{{n - 1}} ^{2} + ~\frac{{\left( {t_{n} - ~\mu _{{n - 1}} } \right)^{2} ~}}{n} - ~\frac{{s_{{n - 1}} ^{2} ~}}{{n - 1}} $$
(7)

With these definitions, we can set the algorithm of the data preprocessing module. This algorithm is shown in Fig. 4, in a flow diagram that describes the algorithm. First, the data needed is initialized. Then, all the records are processed until there is no more records left. For each day of the week, that is why the next condition is compared with 7, the Markov and the stats are calculated. The pseudocode of the algorithm is described in Algorithm 1. This algorithm defines the main procedures of the data preprocessing module. Given a set of records from the IoT Gateway, the module processes the records to assign them to users and days (User_Records). Then, the Markov transition matrix (Markov) is calculated. This is done in an iterative way. The algorithm of the Markov data building is shown in Algorithm 2 and explained later. Then, the other statistics needed are extracted from the time and duration data. This subroutine is explained in Algorithm 3. Finally, the datasets for training and validating the classification module are extracted.

Fig. 4
figure 4

Data preprocessing algorithm

As regards the Markov structure, it is calculated as it is shown in Fig 5, that depicts the flow diagram of this algorithm. In order to calculate the transition matrix, we need to know how many times a certain service is consumed after another one. We need to store the data in the Markov structure. In Fig. 5, it is shown how this info is read from the record. The structure is indexed based on the day, the first service and the second service, which is consumed after the first one. Moreover, for each first service, we need the total amount of transitions, totalCases. If we find a record and the structure is not created, we need to create it. And then, the totalCases is set to 1, as shown in Fig. 5. . If that service is the first time that is consumed, the following service consumed has a 100% of transitions. Otherwise, we need to iterate through all the previous transitions from that service to calculate the probability for each second service. That is shown in the last loop in Fig. 5. Algorithm 2 describes the same process with more detail.

Fig. 5
figure 5

Markov transition matrix calculation algorithm

figure e

Finally, we find the statistics calculation in Fig. 6. Fig. 6 shows the flow diagram that corresponds to Algorithm 3. It depicts the two different ways of calculating the statistics. If there was no previous statistics for a specific day when the record is read, the data structure is created and initialized Otherwise, the statistics must be recalculated. By using equations (5) (6) and (7) these statistics can be recalculated each time a new service is consumed after another one in the same day or each time the new service ends its consumption (for calculating the duration).

Fig. 6
figure 6

Statistics calculation algorithm

figure f

In the following subsection, the classifying module is explained. That module uses some of the data provided by this module. In the training phase, the records are split to create the datasets. In the prediction phase, each new record is sent to the classifying module.

4.2 Classifying module

The classifying module is based on neural networks. Then, we have a deep learning method to predict the next service consumed by the user. The entries of the system will be the different measurements of the record (time, day, type of day, service consumed, duration and so on). In the records, we have 20 different features. As output, the different services provided. We define seven different services, shown in Table 3. A weak point of having a classifying system as the single method to predict service consumptions is that if a new service is added to the system, the classifying model has to be redefined and trained again. However, with an RL as a supervisor, that process can be delayed, and a provisional action can be added to avoid a QoE reduction until the classifying model is trained again.

Table 3 Services description

The classifying model will be based on a neural network whose architecture is depicted in Fig. 7. We define a number of neurons in the input layer equals to the number of features in the data. In our case, we will have 20 features, so 20 different neurons in the input layer. The number of hidden layers, \({n}_{h}\), and the number of neurons in each hidden layer, \({m}_{h}\), will be parameters of the model. After testing the model varying these parameters, the model used will be the one with the highest accuracy. This evaluation will be presented in Sect. 6.

Fig. 7
figure 7

Classifying neural network architecture

figure g

As regards the output layer, it will be composed of 7 neurons, one for each class to be detected by the system. If the number of different services in the Smart Home increases, the system would need to extract more relevant features and the model would need to be adapted. However, this adaptation is not going to be considered in this paper, and it will be considered as future work.

The loss function will be cross-entropy and the optimization method will be Root Mean Square Propagation (RMSprop), which achieves good results in multi-layer neural networks [36].

Another aspect of the model that has to be chosen is the activation function. We will use the ReLU function as the activation function of the hidden layer, \({a}_{h}\) [37]. The softmax function will be used as the activation function of the output layer, \({a}_{o}\), to get the probabilities of belonging to each class [38].

With the model presented, the next service consumed by the user can be predicted. In the next subsection, we define the RL module, that will choose the best action to perform based on this prediction.

5 Reinforcement learning module

This section describes the RL algorithm that will be used to reduce the impact of bad predictions. First, the environment, the states and the actions are defined. Then, the data structures that the algorithm needs to work are described. Finally, the rewards calculation and the policy of the system are detailed.

5.1 Environment, states and actions

In this subsection, the environment, the states and the actions that the algorithm will use to implement the reinforcement learning will be described.

Firstly, we have to define the environment. The reinforcement learning will act in the Smart Home environment. Exactly, it will notify the IoT Gateway which command must perform. Therefore, the agent of the reinforcement learning algorithm will be the IoT Gateway. Initially, it could be thought that the user would be the agent, but the user will be only the source of information of the actions performed by the agent. For instance, when the user consumes a service and the algorithm decides to start another service after a certain amount of time, the user may not need that service and give the order to turn it off. The user, then, is an input for the algorithm. In this case, the user is saying that the action performed was not chosen correctly. This fact would affect the reward of the action. We will discuss that in the next subsections. For now, it is important to notice that the user will be an input, not the agent that modifies the environment. Despite this fact, the user is the one who starts enabling the services. Consequently, if the definition of the states would be only based on the current service running in the Smart Home, the definition of the environment and the agent would be more complicated. Furthermore, the algorithm would be incomplete, due to the fact that we need to know which service is supposed to be necessary to activate. It is there when we need the classifying module. Moreover, due to the user patterns, the day and the current time are also relevant data.

From the previous discussion, the following definition of the states is obtained. The RL algorithm will decide which action needs to be performed based on the current service consumed, the next service predicted and the current day. This presents a problem that will not be addressed in this paper: how new states are generated through the use of the system. We will consider in this paper that the states are statics and are generated from the transition matrix obtained from the preprocessing module. However, new patterns can be adopted by the user and this will be discussed in future works.

Usually, the states of an RL algorithm can be depicted in a state diagram. However, in this scenario, the state diagram can be composed of a high number of states, depending on the number of services. This can make the algorithm not scalable. Nevertheless, the Smart Home environments do not have a high number of services. Furthermore, not all the services can be important enough to define a state. In this paper, we will use all of them but, in future works, a categorization of services can be proposed to reduce the space of states.

Fig. 8 shows an example of a diagram of states with three different services to enable. In this case, there is no probability of transitioning from state 2 (S2) to state 3 (S3). Therefore, there is no transition between those states in the graph. Furthermore, depending on whether the user actually consumes the predicted service, the next state might be with the same service being consumed. That may happen when the classifying module does not predict the next service accurately or when the prediction was to consume again the same service. The state diagram is not, then, a direct representation of the Markov chain derived from the transition matrix. There are transitions to states with the same current service that do not represent the same service being consumed twice. In Fig 8, in order to make easier the readability of the diagram, the states are composed only of the current service. However, the diagram is more complex because the algorithm defines the state as a pair of states. The first state is the current state and the second one is the predicted state. Fig. 9 depicts the diagram of states of the state S2 of Fig. 8. Within each state of Fig. 8, there would be a subdiagram with similar transitions. Although this can add some difficulty to understand the algorithm, the number of states is finite and not big enough to present a deep-learning algorithm for the policy function as in [39]. However, it could be a future work to study.

Fig. 8
figure 8

State diagram with three services

Fig. 9
figure 9

Subdiagram of state S2

For each state defined, a set of actions can be performed. The actions, however, must be classified based on how much they interrupt the user’s activity. This helps the algorithm to not reduce the user’s QoE as much if the predictor fails to predict the next service. For this proposal, six different actions are going to be defined and classified depending on this intrusion level. The actions are shown in Table 4, where the level of intrusion and the description are detailed. The simplest action is to wait, without executing any service. This has the lowest impact on the user because they do not have to do anything. However, this does not mean it is always the best action, because sometimes enabling a service or turning on a node can reduce the waiting time for the user or implying other advantages (for instance, saving energy or heating up the house before the user arrives). An alternative with low impact would be to ask if the user wants to enable the predicted service or another one from a list of similar alternatives (services from the same group). This is not too different from a manual service selection. If the system only asks to enable a certain service or it turns on a node it will have a higher intrusion from the user’s point of view. We have to take into account that certain services are more intrusive than others. This is reflected in the actions. The system can automatically start a predicted service, with some options or subfunctionalities that are not too intrusive, such as perform a search on Internet or increase or decrease the temperature a few degrees. However, other services, like opening a door, turning on the TV or start playing a song or video have a bigger impact on the user. Finally, stopping services that the user is using or turning off nodes automatically has the biggest impact on the user, so transitions that require disable services will have the biggest level.

Table 4 Actions

The definitions of the actions are not simple actions as they could be found in other RL algorithms, but there are actions that will depend on the predicted service. That means, that enabling a node will enable a node that provides the predicted service, starting a service will start the service predicted or a module of the service and so on.

These actions provoke changes in the network performance, and, depending on the next values obtained from the Statistic Analyzer, the reward value of the action taken will be updated.

The way the rewards are assigned and calculated and how the actions are performed are described in the next subsections. However, we first need to know the data structures and concepts.

5.2 Data structure

In this subsection, the data structures that the RL algorithm uses are described.

First of all, the RL algorithm needs a structure where the reward of each action for each state is stored. This data structure will be a matrix, where the rows will be the states and the columns will be the different actions of the algorithm. In this case, the states are a pair of current service and predicted service values. Table 5 shows an example of the data structure for three different services, following the same state diagram that is shown in Fig. 8.

Table 5 Reward-state matrix

Secondly, the algorithm will need information about the services. This information is given by the data preprocessing module. For each service, the following data is required: the group where the service belongs, the impact that the service has on the user and if the service is multimedia. The group of the service is data that the IoT Gateway knows because it is that agent who categorizes the services attending to the architecture presented in Sect. 3. If the service is multimedia it also comes from the IoT Gateway. Like the group of the service (workgroup), this is information that is presented in the dataset. However, the last data that is needed, the impact of the service on the user, must be set by the RL algorithm. A simpler classifying module could be added in the system architecture just before the RL module to determine which services have a bigger impact on the user. However, to make the proposal simpler, the IoT Gateway provides the RL module with this information, obtained statically from the group of the service. Table 6 shows an example of this data structure, which we will name as infoServices. The impact field is a positive integer. The bigger the impact value, the more intrusive the service is.

Table 6 InfoServices matrix example

The last data structure needed to implement this RL algorithm is the Input User Matrix. This data structure will store how much a user input is needed for each action. That means, that the algorithm will be able to know the impact of having a certain input for the user for each action chosen. This will be useful for knowing how much a certain action was appropriate for a state, that is, how the RL algorithm should be rewarded. For instance, if a set of services is given to the user to choose which one should be started, if the user selects the predicted, the reward will be different from the one obtained if the user discards all the possibilities. This will be discussed in more detail in the next subsection, where the policy of the RL algorithm is detailed. Table 7 shows an example of this matrix.

Table 7 Input user matrix example

5.3 Rewards, policy and objective function

In order to define the reward properly, we have to analyze the problem we are dealing with, because it is the problem, and the scenario, the one that defines the appropriate method to obtain and calculate the reward. The reward will indicate to the system whether the action chosen was effective in the same state or, otherwise, if there are better options. If reinforcement learning is applied to a game, the effects that a performed action over the player determines the reward. If the game provides an actual reward such as finishing a level or defeating an enemy, the reward will be positive. If other actions usually drive to losing a life, losing objects and so on, the reward will be lower and will decrease if we choose that action, even being able to contain a negative value.

In the environment previously defined, the goal of the system is to avoid the user from enabling services or nodes. However, it is also important to reduce the intrusion of the system, especially regarding multimedia services. Therefore, the input of the user will be quite important to know if the action chosen by the system was appropriate.

Applying the RL system to the result of a classifying algorithm modifies the way the reward is calculated. In this case, the reward will be a measure of the number of times the classifying module has predicted correctly which service would be consumed. Therefore, if the module provides high accuracy with certain services, the actions with a high level of intrusion can work well. On the other hand, when it has low accuracy, performing an action with a low level of intrusion could be a better option.

The algorithm has a set of actions, \(J\). We are going to describe the general case where several actions can belong to the same level classification. However, this definition will also be valid when there can be only one action per level. Consequently, we can define the reward obtained after choosing an action \(j\) as in (8).

$$ {\text{ }}r_{j} = {\text{ }}u,{\text{ }}where{\text{ }} $$
(8)
$$ u = 1~ \leftrightarrow Input\left( {User} \right) < \beta $$
(9)
$$ u = - 1~ \leftrightarrow Input\left( {User} \right) \ge \beta $$
(10)

where Input(User) is a parameter that returns the IoT Gateway and its value depends on the action chosen and the action the user does after. The possible values were defined previously, as an example, in Table 7. Moreover, \(\beta \) is a threshold defined to classify the action performed by the user as corrective or not. Then, if the action was corrective, the reward should be decreased.

Nonetheless, if we only consider the reward like that, we can drive the system to a point where the actions of less level, due to the fact that they are less intrusive, present a higher reward. In order to solve this, the reward should be incremented regarding the level of the action chosen. Consequently, the reward given to an action j in the iteration \(i\) will be defined by (11).

$$ r_{{i,j}} = u_{{i,j}} *~\frac{{level(j)}}{{Max(level)}} $$
(11)

Where level(j) is the level of the action \(j\) chosen.

The last adjustment that needs to be done is to give extra importance to multimedia services. That means, for those actions coming from a state whose current service is multimedia, the reward should increase or decrease at a higher rate. In (12) we can see this adjustment.

$$ r_{{i,j}} = \left( {1 + multimedia(s)} \right)*u_{i} *~\frac{{level}}{{Max(level)}} $$
(12)

Where \( {\text{ }}multimedia(s){\text{ }} \) is the flag in the dataset that identifies the service \(s\) as a multimedia service.

We can define, then, the total reward of an action \(j\) as (13).

$$ R_{j} = \sum\limits_{{i = 0}}^{n} {r_{i} }\quad where\quad~R_{j} \ge 0~\forall {\text{ }}~j~ \in J $$
(13)

Initially, the rewards are calculated depending on the level and the probability of transitioning from the initial state \({\rm {S}}_{a}\) to \({\rm {S}}_{b}\) as is defined in (14):

$$ r_{{0,j}} = \left( {1 + multimedia} \right)*(P\left( {S_{a} ,S_{b} } \right)*~\frac{{level\left( j \right)}}{{Max\left( {level} \right)}} + ~(1 - P\left( {S_{a} ,S_{b} } \right))*~\frac{{Max\left( {level} \right) - ~level\left( j \right)}}{{Max\left( {level} \right)}}) $$
(14)

We need to define then the policy of the system. In subsection 5.2 we defined a matrix that contains the rewards for each pair of state and action. Usually, RL algorithms choose the action with the highest reward. However, in this case, the policy will have a component of exploration. This component will force the algorithm to try actions that had not been tried for a high number of iterations. Equation (15) defines the function ρ that represents the probability of choosing and action \(\boldsymbol{\alpha }\) that has less reward than the action with the maximum reward, n, for the same state.

$$ \rho _{\alpha } = ~\frac{{R_{n} - ~R_{\alpha } }}{{\left| {level\left( n \right) - level(\alpha )} \right|}} + ~\omega *i $$
(15)

Where i is the number of iterations that have passed since the last time action α was chosen and ω is a parameter that sets a weight to the exploration.

The policy function can be described then by (16):

$$ \pi \left( s \right) = a_{j} |~\rho _{j} \ge \rho _{n} \quad ~\forall\;j,\;n \in J $$
(16)

6 Methodology and results

In this section, the tests performed and the results obtained are discussed. First, the classifying model will be tested in order to obtain goods parameters. Then, the QoE will be measured to check if the introduction of an RL algorithm improves it.

6.1 Classification analysis

In Sect. 4, the classifying model was defined based on two parameters. These parameters, the number of hidden layers \({n}_{h}\), and the number of neurons in each hidden layer \({m}_{h}\), are analyzed in this subsection. In order to get the better values for these parameters, the model will be trained, given the same dataset, for the values of \({n}_{h}\), and \({m}_{h}\) shown in Table 8.

Table 8 Parameters for Classification analysis

The results of the evaluation of the model for each pair of parameters values are shown in Fig. 10. In the figure, the accuracy for each number of neurons is shown for each model with a different number of hidden layers. The size of the training dataset corresponds to two weeks of user activity. Although the mean of the values of each series could be interesting for other experiments, we only need here the maximum value of all the series to select the values of the parameters with the best accuracy. In this case, the model works better with this kind of data when it has five hidden layers and ten neurons per hidden layer. With those values, the accuracy of the classifying model is 54%. Despite the fact the accuracy might be improved with further feature selection or with larger datasets, this is a good accuracy to prove the hypothesis of this work.

Fig. 10
figure 10

Classification model results

Furthermore, the results of the deep learning model are tested against another classifying algorithm. In this case, we chose the KNN algorithm to implement a classifying algorithm that predicts the next service. In this case, the KNN algorithm is parametrized depending on the number of neighbors. Fig. 11 shows the results obtained from the KNN algorithm for a number of neighbors from 1 to 18. The maximum obtained accuracy, with the same dataset that the deep learning model, is 41.21% with three neighbors.

Fig. 11
figure 11

KNN classification results

Fig. 12 shows an example of how preparing the data from the Smart Home environment may help to get good levels of accuracy. The figure depicts the violin graph of the feature “type of day”. This feature has two values: 1 if the day is a working day 0 it is not. Since the services with higher ID are the services regarding multimedia and leisure (see Table 3), the probability of having a high value in service is higher when the type of day is 0.

Fig. 12
figure 12

Violin graph of service depending on the type of day

Finally, Table 9 shows the running time of the different parts of the algorithm. We have measured the mean of the running times from the different parts of the proposal. The experiments were carried on an AMD Ryzen 5 5600X with 32GM RAM DDR4. The results show that the times fit the environment restrictions, which are not extremely demanding in terms of running time.

Table 9 Running time

6.2 QoE in smart home results

In this section, the most important metric for this work, QoE at Smart Home, will be calculated for a KNN classifying system, the deep learning system and the deep learning system with the RL as a supervisor.

However, we need to define first how we calculate QoE applied for the Smart Home scenario. At Smart Home, unlike in multimedia scenarios, the delay, bandwidth and jitter will not be our concern. We are going to put aside the network resources and focus on the user. Thereby, the QoE decreases when the user is interrupted with consults about services they do not want to use. Moreover, if the system starts a service the user does not want to consume or to stop a service the user wants to keep consuming, the QoE will also be decreased. After each prediction, the QoE will be updated based on the loss using the function λ (17).

$$ \lambda ~ = L\left( {y,\hat{y}} \right)*\left( {1 + multimedia} \right) $$
(17)

Where \( L\left( {y,\hat{y}} \right) = \left\{ {\begin{array}{*{20}c} {0~\quad if\;~y = ~\hat{y}~} \\ {1~\quad if\;y~ \ne {\text{ }}~\hat{y}} \\ \end{array} }, \right. \) and multimedia refers to the current service.

Note that the loss defined is only applied to the classifying. For the RL algorithm, the loss is defined by the user input matrix, depending on the action chosen.

Therefore, let \(\alpha \) be a parameter to weight the decreases, the total QoE after the \(n\) iterations (transitions) of the system is determined by (18).

$$ QoE\left( n \right) = Max\left( {QoE} \right) - \frac{{\sum\limits_{{i = 0}}^{n} {\lambda ~_{i} } }}{\alpha } $$
(18)

In order to maintain similarity with the QoE defined in multimedia, the max value of QoE in our experiments will be 10. As regards \(\alpha \), it can be adjusted depending on the length of the measurement. We will show three different experiments where the parameter \(\alpha \) is different, but the number of iterations does not vary. The number of iterations will be 88 for each measurement.

Fig. 13 depicts the QoE obtained during all the iterations with an \(\alpha \) value of 20. With this \(\alpha \) value, the RL algorithm obtains a QoE of 8.7. the deep learning algorithm obtained a QoE of 6.56 and the KNN algorithm of 5.89. Although the RL obtains a high QoE, the other two methods are not too far from each other in terms of QoE. Despite the good QoE obtained with the classifying methods without RL, the accuracy of the predictions has been low. This \(\alpha \) value only shows a real behavior when the user cares less about interruptions.

Fig. 13
figure 13

QoE after 88 iterations with \(\alpha =20\)

The QoE values obtained with an \(\alpha \) value of 15 are shown in Fig. 14. Again, the RL algorithm obtains better QoE than the other alternatives, with an 8.26. In this experiment, the deep learning algorithm obtains a 5.53 of QoE. The KNN algorithm is again the one that provides the worst performance with a QoE of 4.26. These results might represent with more accuracy the average user, whose QoE decreases with the interruptions. In addition, the RL presents the least decrease with the change of the \(\alpha \) value.

Fig. 14
figure 14

QoE after 88 iterations with \(\alpha =15\)

Finally, Fig. 15 illustrates the results when the parameter \(\alpha \) is reduced to 10. In this case, the bad predictions that have as a consequence to interrupt ongoing services decrease quickly the QoE. The RL supervisor, by choosing actions that prevent interruptions when the classifying algorithm does not give an accurate prediction, presents the best QoE with 7.39. One more time, the deep learning classifying model presents better results than KNN, with 3.59 and 1.53, respectively. The decreased pace of the QoE, in this case, may show an unrealistic situation where the user punishes the system too much. However, it shows how the RL algorithm improves the performance of the system.

Fig. 15
figure 15

QoE after 88 iterations with \(\alpha =10\)

7 Conclusion and future work

IoT has provided new ways of networking, communicating and sensor development. In addition, with its introduction, several new applications have been designed in several fields. Smart Home is a new way of understanding services at home. With the communication that this technology offers, users can access new services at home. In addition, AI can help automatizing tasks at home using these services and architectures.

In this paper, we have introduced an intelligent system to automatize Smart Home services management. The aim of the intelligent system was to avoid user interruptions to guarantee a good QoE, prioritizing the multimedia services. In the system, we have designed the data preprocessing process, the classifying algorithm and the RL system that improves the performance, defining all the related concepts needed to describe the scenario and so that the system can interact and provide functionality.

We have also defined QoE for this scenario and we have measured it for different parameters and systems.

Results show that the deep learning classifying model proposed achieves better accuracy than other algorithms like KNN, improving their performance around 33%. The QoE of the deep learning algorithm shown with different values of the parameters has always been higher than the KNN algorithm. With high values of the parameter \(\alpha \), the QoE obtained is higher, being in the experiment 5.53 for the deep learning algorithm and 4.26 for the KNN. This difference, based on the difference in accuracy, gets higher when the weights of the prediction increase. That is when \(\alpha \) has lower values. This difference raised up until 2.06, with QoE values of 3.59 for deep learning and 1.53 for KNN. Nevertheless, the most remarkable results are the ones obtained from the RL system, which manages the predictions to reduce the impact and obtains QoE values of 8.7 when \(\alpha \) has the highest value and 7.39 when it has the lowest. That makes a difference of 3.17 in the first case and 3.8 in the second one. That shows that the inclusion of an RL system improves the QoE in Smart Home environments when the classifying cannot predict the next service with high accuracy.

There are several aspects that can be studied in future works. Firstly, the system can be evaluated against better classifying models. That would include the study of improving the accuracy obtained in this work. The feature selection could be enhanced, reducing dimensionality [40] or an automatized label system could be included in the system [41]. Other parameters, such as the activation function or the optimization parameters could be changed to improve the accuracy. The scalability of the system could be studied. Mechanisms to adapt the system to new services could be defined to efficiently control a bigger number of states. Moreover, the system could be compared to other classifying models, besides KNN. The QoE metric could add an intelligent method to select the \(\alpha \) parameter based on the user profile. Thereby, some actions to correct low QoE could be included in the system. Finally, this system could be adapted to face other problems in IoT, such as video surveillance or Industrial IoT environments, working together with other solutions [42] or being integrated in other netwokrs like 5G [43].