Keywords

1 Introduction

Projections about the increase of people in older age from the United Nations show that the number of people aged 65 or older will double by 2050 (United Nations 2020a, b), and this will cause significant effects on the share of the population suffering from geriatric diseases such as cognitive impairments, and consequently on the share of the population needing care due to loss of independence (Berryhill et al. 2012).

Advanced care planning, supported decision-making, and availability of assistive devices can enhance autonomy regardless of an elderly person’s level of capacity (World Health Organization 2017).

Nevertheless, solutions are still far from predictive environments as intended by the Ambient Assisted Living (AAL) policies (World Health Organization, Regional Office for Europe 2017). This gap could be narrowed by the Digital Twin (DT) technology. Accordingly, this paper provides prototypes of DT models to support elderly people within the home environment in detecting anomalies in daily scenarios.

2 Literature Review

An increasing number of DT models are currently emerging within the built environment. The development of such innovating systems aims to achieve a variety of objectives depending on domains (Liu et al. 2021; Opoku et al. 2021; Sharma et al. 2020). Among the different assets that are currently being mirrored by DTs, both prevention and prediction of probable events during the whole life cycle of a building can be performed by means of a building-level DT, enhancing the building’s efficiency as well. Building-level DTs consider both the environment and its user processing real-time information to offer appropriate services.

Hence, they could be also referred to as cognitive buildings (Yitmen et al. 2021).

AAL is a field that combines information and communication technologies, sociological sciences, and medical research and its purposes can be summarized as the development of products and services for countering the effects of a growing elderly population (Li et al. 2015; Dobre et al. 2017). Cognitive environments in AAL domain should be able to learn at scale, reason with purpose and co-operate with users in a natural way. Accordingly, some cognitive human-centered environments have recently begun to appear (De Paola et al. 2017; Rafferty et al. 2017; Patel and Shah 2020; Calderita et al. 2020) due to the significant influence that the home environment has on AAL’s objectives.

Since pursuing and completing Activities of Daily Living (ADLs) allows autonomous well-being in older ages, this kind of system usually aims at encouraging, supporting, and easing the users in their ADLs. A variety of sensors and devices are exploited to collect data that can be processed by AI algorithms to make analysis on either user or environmental conditions. However, visual sensors (e.g., cameras) are not fully exploited yet.

3 Methodology

3.1 Mirroring Real Environment

A consistent virtual representation of the context must include both the building elements and the user. Accordingly, a real-time representation of the user is combined with BIM information within a game engine (GE), i.e., unity. The user virtual counterpart is synthesized through its posture, which is typically referred to as Skeleton.

A LiDAR camera is used as visual sensor (Intel RealSense L515). Nuitrack AI is used as the Skeleton tracking algorithm to process camera’s raw 3D data and yield the user Skeleton. Besides, dynamic features such as appliance states, environmental temperature, and so on can also be associated with BIM elements into the GE to retrieve further information from sensor readings.

Though, further information is required to define a complete semantic of a user suffering from cognitive disorders. The activities that he or she performs can be detected through an Activity Recognition (AR) model. The AR task can be performed following either data-driven or knowledge-driven approaches (Rafferty et al. 2017). A data-driven approach is followed in this work since it enables the modeling of uncertainty and exploits increasingly available activity datasets. Specifically, the model developed in Liu et al. (2020) is integrated as the AR agent of the system. The MS-G3D model is based on Spatial–Temporal Graph Convolutional Networks (ST-GCNs), firstly proposed in Yan et al. (2018) for the AR task, and is therefore fed with the 3D coordinates of the Skeleton joints instead of RGB images as required by the previous Convolutional Neural Network (CNN). This results in a lightweight model that outperforms existing methods for AR.

3.2 Knowledge Contextualization

BIM information combined with its dynamic features and the user-related information define a low-level knowledge on the real asset of its virtual counterpart. A contextualization of such information allows interpreting the real world and thus the twinning of scenarios that includes the environment and the user with its behaviors, habits, intentions, activities, and situations. Emergent scenarios that the system aims to detect are those including anomalies. Accordingly, knowledge contextualization can be achieved through an agent acting as the reasoner of the system. A rule-based reasoner has been proposed in De Paola et al. (2017). Their module consists in if, else conditional rules and takes basic decisions such as turning the heating/cooling system on/off depending on the user satisfaction. Since complex rules cannot cope with scalability and are not easily reusable, above all when considering an elderly person suffering from cognitive disorders, a probabilistic approach is followed by means of Bayesian Networks (BNs).

Such a probabilistic model is based on conditional probabilities that an event may occur depending on evidence or other variables. BNs are probabilistic graphical models that represent a set of variables and their conditional dependencies via a directed acyclic graph. Expert knowledge could be elicited in Conditional Probability Tables (CPTs) of the nodes that represent the events. BNs are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor (De Grassi et al. 2009). In this work, BNs are used to infer different types of scenario anomalies:

  • Wasteful and senseless situations (e.g., window open while heating system is turned on)

  • Unusual behaviors (e.g., skipping meals)

  • Dangerous situations (e.g., something dropped on the ground)

  • Emergencies (e.g., falls).

Once the real-world scenario has been recognized by the system, it should offer the appropriate supportive services. A dialog system is implemented through a flow-based programming tool (i.e., Node-RED) to enable a bidirectional interaction between the user and the system itself. This platform can bridge the gap between the reasoner and the services to deliver.

4 System Architecture

The architecture of the proposed system has been outlined following the guidelines stated in Lu et al. (2020), which define the structure of a building-level DT. Thus, our model consists of five layers, namely data acquisition layer, transmission layer, digital modelling layer, data/model integration layer, and service layer. The proposed architecture defines a system able to autonomously perform high-level reasoning to detect anomalies in daily scenarios and consequently offer support to the user. Figure 16.1 shows the architecture of the system.

Fig. 16.1
A flow diagram of the system architecture contains a data acquisition layer, a digital modeling layer, a data integration layer, and a service layer. The automation hub block of the model integration layer leads to the emergency call, support, and investigate blocks of the service layer.

System architecture. Arrows indicate data flow (transmission layer)

4.1 3D Real-Time Representation

Data acquisition and digital modelling layers hold the computation that manages the 3D real-time representation of the context. The virtual scenario built upon BIM and Skeleton data requires some adjustments. Two different filtering algorithms are developed for the following reasons:

  • Filtering non-confident Skeleton data. Some Skeleton joints may have low confidence values due to obstructed camera field of view and can thus be less reliable leading to distortions of the user’s avatar within the GE. Consequently, a threshold is introduced to discard data regarding Skeleton joints with a confidence value below 10%.

  • Enhancing Skeleton stabilization. Once non-confident data are discarded, the avatar should move following natural movements. The avatar is stabilized through an autoregressive filter that acts on joint’s position, orientation, and avatar’s height.

    $$X_{{\left( {t + 1} \right)}} = \left( {1 - a} \right) \cdot X_{\left( t \right)} + a \cdot X_{{\left( {t + 1} \right)}}^{{{\text{raw}}}}$$
    (16.1)

where \(X_{{\left( {t + 1} \right)}}\) is the processed data at the time \(\left( {t + 1} \right)\), \(X_{\left( t \right)}\) is the processed data at the time \(\left( t \right)\), \(a\) is a corrective factor with a value that ranges between 0 and 1, and \(X_{{\left( {t + 1} \right)}}^{{{\text{raw}}}}\) is the raw value of the data at the time \(\left( {t + 1} \right)\).

These filters allow avoiding the ambiguities that the avatar had. Furthermore, cleaner movements mean more consistent output data that will consequently feed the AR model. To this end, the avatar joints have been remapped as the Kinect v2 Skeleton, used to build the NTU RGB + D dataset (Shahroudy et al. 2016) on which the MS-G3D model is pre-trained.

4.2 Scenario Awareness

The data/model integration layer is responsible for analyzing and processing the data to achieve scenario awareness. It consists of three intelligent agents that can, respectively, detect the activities performed by the user, reason on the current scenario detecting the anomalous ones, and act accordingly to support the individual if necessary.

The agent acting as the reasoner of the system is formalized through an Object-Oriented Bayesian Network (OOBN), composed in turn of four sub-modules (Fig. 16.2).

Fig. 16.2
A flow diagram with a block for reasoner leads to a neural network diagram, labeled nonsense and dangerous scenarios detector, which progresses to a table, labeled feeling cold node's C P T. The nonsense and dangerous scenarios detector neural network diagram has 3 layers for evidence situations, and scenario anomalies.

The reasoner consists of an Object-Oriented Bayesian Network, decomposed in turn into four underlying OOBN modules. Zooming in shows the Nonsense and Dangerous Scenarios Detector module and the CPT of the “FeelingCold” situation node. Dashed gray-edged nodes represent input nodes. Solid gray-edged nodes are output nodes

The approach followed to formalize the OOBN firstly considers general symptoms that could lead to anomalous scenarios (confusion, depression, loss of memory, and so forth) (Berryhill et al. 2012; Dillon et al. 2013; scie.org, https://www.scie.org.uk/dementia). Then, a semantic regarding probable events, situations, scenarios, and anomalies in a AAL environment is built: Evidence is captured by sensors (turning on/off appliances, indoor/outdoor temperature, open/closed window, and so forth) as are the results of the AR model and the user-system vocal interactions; situations are combinations of evidence and represent feelings, behaviors, events, or intentions (feeling hot/cold, getting dressed, something on the ground, leaving home, and so forth).

By associating and combining available evidence and recognizable situations, probable scenarios are theorized as the anomalies that may occur. Anomalous scenarios therefore include time disorientation, difficulties arranging, indifference to the environment, getting easily overwhelmed, mishandling appliances, and changes in eating patterns.

The Automation HUB, based on Node-RED, can integrate applications to offer appropriate support to the user. In this work, a dialog system is implemented. Speech-to-Text (STT) and Text-to-Speech (TTS) services that rely on Machine Learning models are integrated to define a dialog system whereby bidirectional vocal interactions between the user and the cognitive layer of the building can be performed.

Is essential not to have Hot Phrases (HP) since the user may forget them due to cognitive impairments. HP are phrases typically used to trigger common dialog system such as Alexa and Google Assistant.

5 System Implementation

Combining BIM data and the Skeleton allows achieving a reliable real-time virtual representation of the physical asset which is shown in Fig. 16.3. BIM data from a home environment are converted to Industry Foundation Classes (IFC) format using Autodesk Revit. Importing IFC files into the unity game engine recognizes all BIM objects as Prefabs. Prefabs preserve all information related to BIM objects. Working with physics engines, Unity allows additional properties to be assigned to Prefabs achieving greater realism. Indeed, the mesh collider attribute is applied to all tangible components to avoid inconsistencies. Additionally, dynamic features can be added to Prefabs to extract real-time sensor readings about BIM objects. Besides, the LiDAR camera is placed at a height of 1 m and leveled horizontally. The tests show that the distance between the user and the camera should be unobstructed and not exceed 5 m to obtain consistent results.

Fig. 16.3
A screenshot of a computer program with a semi-transparent person in the foreground, a 3 D schematic of a room in the center, and commands and options on the left and right. At the bottom, a 3 D illustration depicts the same 3 D room with the same person in 3 D.

Real-time 3D representation of the context

To evaluate the effectiveness of the Object-Oriented Bayesian Networks developed in this work, the node’s CPTs are filled eliciting the knowledge of the authors. Figure 16.2 shows the CPT relative to the “FeelingCold” situation node. Then, possible combinations of evidence are set up by manually activating input nodes, and the expected consequences achieve high percentage values meaning that predictable anomalies within the scenario are fully recognized. Figure 16.4 shows an example of anomaly detection within the Nonsense and Dangerous Scenario module. Four input nodes are set up to represent a scenario where the user is barefoot, not wearing a jacket and hat, and is opening the door. Specifically, the ShoeOn, CapOn, and JacketOn input nodes have been set up to false (activities recognizable through the AR model), while the DoorSensor input node has been set up to open. The user is not preparing to leave (98.90% false) but is actually leaving home (90.09% true). The “Leaving Home Anomalies” output node detects a likelihood of 87.34% that the user is leaving undressed.

Fig. 16.4
A flow diagram of the nonsense and dangerous scenario module has the blocks for shoe On, cap On, jacket On, door Sensor, preparing To Leave, I am feeling cold, and more interconnected with each other. The block for leaving home anomalies is highlighted with a box.

Detected anomaly within the Nonsense and Dangerous Scenario module

Figure 16.5 shows the dialog system prototype built upon the STT and TTS processes. STT module starts by recording the user’s speech without requiring HP. Then, the record is managed by the IBM’s Watson STT service that returns a transcription of the speech. Finally, the transcription is shown in the Node-RED’s debug tab. By contrast, the TTS process is automatically triggered by the system depending on the output of the reasoner. Tailored messages can be played depending on the needs of the user. These written messages are converted through the IBM’s Watson TTS service. Finally, the converted speech is played by the speakers.

Fig. 16.5
A block diagram of the S S T process includes 3 blocks, microphone, speech to text, and message transcription at the top, while the T T S process contains 3 blocks, from O O B N, text speech, and play audio at the bottom.

SST and TTS processes within the Node-RED editor

6 Conclusion

The increase in the number of elderly people and consequently the increased occurrence we will see in the future of geriatric cognitive disorders requires new systems for developing AAL solutions. Therefore, this research work aims to propose the development of Cognitive Building through the exploitation of the DT paradigm.

The grounded multi-agent system architecture defines a model able to autonomously perform real-time high-level reasoning, that allows the detection of anomalies in daily scenarios, and consequently offers support to the user. The knowledge development applied here is a major strength: the raw data that is captured by multi-modal sensors (visual and non-visual) and subsequently reported in 3D in real time, but also the reasoning applied at a high level when anomalies are detected. AR is performed using a neural network model that leverages 3D data derived from the user’s pre-processed real-time 3D representation. On the other hand, the OOBN can recognize wasteful, meaningless, or dangerous behaviors, environmental distress, changes in behavioral patterns, and serious medical situations or events, and then trigger specific services. Two-way voice interaction with the individual is performed by the dialogue system, implemented in the Automation HUB agent, based on Node-RED. A number of improvements can be addressed as future work of this study: implementing the MS-G3D model; learning the OOBN modules through data captured from a real-world AAL environment; and fully testing the entire pipeline in an end-to-end manner.