A Reference Architecture for IoT-Enabled Smart Buildings

Bashir, Muhammad Rizwan; Gill, Asif Qumer; Beydoun, Ghassan

doi:10.1007/s42979-022-01401-9

A Reference Architecture for IoT-Enabled Smart Buildings

Original Research
Open access
Published: 27 September 2022

Volume 3, article number 493, (2022)
Cite this article

Download PDF

You have full access to this open access article

SN Computer Science Aims and scope Submit manuscript

A Reference Architecture for IoT-Enabled Smart Buildings

Download PDF

Muhammad Rizwan Bashir ORCID: orcid.org/0000-0001-6326-9267¹,
Asif Qumer Gill² &
Ghassan Beydoun¹

3723 Accesses
7 Citations
Explore all metrics

Abstract

The management and analytics of big data generated from IoT sensors deployed in smart buildings pose a real challenge in today’s world. Hence, there is a clear need for an IoT focused Integrated Big Data Management and Analytics framework to enable the near real-time autonomous control and management of smart buildings. The focus of this paper is on the development and evaluation of the reference architecture required to support such a framework. The applicability of the reference architecture is evaluated by taking into account various example scenarios for a smart building involving the management and analysis of near real-time IoT data from 1000 sensors. The results demonstrate that the reference architecture can guide the complex integration and orchestration of real-time IoT data management, analytics, and autonomous control of smart buildings, and that the architecture can be scaled up to address challenges for other smart environments.

Smart Buildings in the IoT Era – Necessity, Challenges, and Opportunities

Smart Buildings in the IoT Era: Necessity, Challenges, and Opportunities

IoT-Big Data Software Ecosystems for Smart Cities Sensing: Challenges, Open Issues, and Emerging Solutions

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An increasing number of Internet of Things (IoT) initiatives have been proposed in recent times to improve the quality of human life. Those initiatives pose real-time challenges which have been the focus of many researchers and practitioners in recent times [1,2,3,4,5]. Indeed, IoT and big data sources can be found in a number of applications, e.g., smart homes [6, 7], smart buildings [8, 9], smart grids [10], transportation [11], healthcare [12], disaster management [13], financial sector [14], retail management [15], and smart cities [16, 17]. IoT sensors can be deployed in a smart building environment to continuously monitor various environmental parameters, including smoke, parking lot usage, user comfort, energy consumption, waste management, and many others. The aim of this paper is to facilitate the use of analytics and dealing with the concomitant large data sets in smart buildings for the effective control and management of smart building.

Within a smart building, the number of sensors could range from few hundreds to thousands. Big data analytics and machine learning techniques can only be effective, if the data from sensors are effectively managed and are made available and ready for real-time analytics. This real-time ‘Big Data’ needs to be extracted and ingested into a centralized location from where it can be extracted, cleaned, transformed, analyzed, and visualized on-demand or in real time [18] to obtain useful insights, to make effective decisions, and eventually trigger alerts and actuate various controls in a smart building.

Real-time strict definition is that “an upper bound” on the response time actually exists [19]. We use the term ‘near real-time’ in this work as there is an insignificant data processing delay involved when analyzing IoT sensor data [20]. Strictly speaking near real-time can be defined as “in more than 95% of cases, an upper bound on the response time of 1 s will not be exceeded”. In the context of smart buildings, some of the challenges include responding to emergency situations in real time and the possibility of autonomously eliminating or reducing it.

To deal with the challenges of real-time big data management and analytics in the smart building context, a coherent framework which incorporates a metamodel and a reference architecture is needed. This research is aimed at bridging this gap in the literature. The metamodel provides the list of essential elements in a smart building ecosystem and how these elements interact with each other. The reference architecture on the other hand provides an end-to-end blueprint to enable real-time management and analytics of huge amounts of IoT data coming from various IoT sensors. It also intends to provide autonomous near real-time control of smart buildings by analyzing, monitoring, and controlling various facilities within the smart buildings. The reference architecture and the metamodel are linked to each other through five contextual elements.

IoT sensors deployed inside a smart building gather useful information like residents’ occupancy, oxygen levels, luminosity levels, etc., which help manage and secure the smart building more efficiently. IoT is the core building block for today’s smart buildings, and it enables artificial intelligence and big data analytics for smart building operations. With IoT, data from various buildings can be observed, gathered, and analyzed, and the IoT sensors can be updated with the latest software from anywhere across the globe. This paper is an extension to our work presented in [21] and [22]. In [21], we presented the idea of an IBDMA (Integrated Big Data Management and Analytics) framework, which comprises of a metamodel and reference architecture. The proposed IBDMA framework is aimed at addressing two issues: (1) how to effectively manage and analyze data generated by IoT sensors deployed inside smart buildings, and (2) how to holistically identify all the elements and the relationship between these elements to effectively manage and analyze, i.e., data in IoT-enabled smart buildings. The first issue is addressed by this paper, while the second issue has been addressed in [21]. There is no coherent framework which provides a metamodel and a reference architecture to address the issues outlined above. The existing frameworks in the literature either provide a reference architecture or a metamodel, but there is no framework in the literature which provides a coherent view of the two. The aim of the IBDMA framework is to enable developers design the smart building by providing a comprehensive list of components required in a smart building by utilizing the IBDMA metamodel. The metamodel will also enable to convert an existing building into the smart building. The IBDMA reference architecture enables researchers and practitioners to manage and analyze IoT data in a smart building efficiently. The metamodel work has been presented in [21]. In this paper, we present the second component of the IBDMA framework: the IBDMA reference architecture. The aim of the IBDMA reference architecture is to enable big data management and analytics within smart buildings while autonomously monitoring and controlling various facilities within the smart building.

The applicability of the IBDMA reference architecture is demonstrated using it for smart building experimental scenarios to monitor and control oxygen concentration levels, luminosity levels, smoke levels, parking lot spaces, and waste management with a view to improve residents’ safety, health, and comfort. The IoT data are presented using multiple data visualization tools. We use ARIMA (Auto Regressive Integrated Moving Average) [23] model to forecast values of IoT sensors, and suggest that the reference architecture can be employed and extended in the machine learning domain for data scientists and machine learning practitioners. However, the choice of ARIMA model, its fine tuning, and evaluation are beyond the scope of the paper.

The rest of the paper is organized as follows: “Research Background and Related Work” presents the research background and related work. “Research Method” discusses the method used in this research. “The IBDMA Framework” presents the conceptual level IBDMA framework. “Reference Architecture Development Process” discusses the development and iterative process of the IBDMA reference architecture. “Reference Architecture Implementation” provides implementation details of the architecture. “Framework Evaluation Results” presents the evaluation details and the evaluation results. "Conclusions and Future Work" concludes and discusses future research directions.

Research Background and Related Work

Research Background

This section discusses the research background related to the development of the IBDMA framework. It elucidates the background knowledge required to understand the IBDMA framework. Since this research focuses on a reference architecture for IoT-enabled smart buildings, there are two important concepts to understand, which are explained below:

Internet of Things

The Internet of Things (IoT) refers to a network of interconnected devices with the ability to exchange information via Internet [24]. IoT has become an increasingly popular topic of interest both in academia and industry. This includes everything from wearable devices, mobile phones, heart monitor implants, or any other type of sensors (oxygen, luminosity, garbage detection, etc.), which have the ability to transfer data over the Internet. IoT applications can be found in many domains ranging from precision agriculture, smart cities, smart buildings, smart grids, healthcare, transportation, and many more.

IoT has seen a tremendous growth in the past couple of years and this growth rate is expected to increase in the upcoming years. It holds a lot of promises [25]. According to Ericsson [26], the number of interconnected objects is expected to rise above 50 billion mark by 2020. The IoT devices range from sensors used inside homes for home automation [27], sensors deployed in smart buildings [28], sensors installed in vehicles [29], sensors inside a warehouse [30], sensors integrated inside wearable devices [31], and many others [32, 33]. According to another report, considering this tremendous pool of sensing devices, it is anticipated that the number of IoT devices will reach trillions of number [34] in the upcoming years. However, such an increase in the number of IoT devices will also increase the amount of data generated from these sensors. This increasing number of sensors and huge amount of data raise new challenges and concerns for data management and analytics practitioners, scientists, researchers, and data architects. An IoT system comprises of four high-level layers or blocks, as shown in Fig. 1. Perception layer consists of sensors and actuators. Sensors gather data from the environment and actuators are activated based on the data gathered from the sensors. Network layer enables sensors to connect to internet. Middleware later consists of data storage and computing engines. The application layer consists of applications including dashboards and reports.

In this research, we use five different types of sensors which monitor the environment, sense various parameters depending on the type of the sensor and generate data at a specified frequency. The generated data through TCP/IP (Network Layer) are stored into a central location (Middleware Layer). The data are analyzed (Middleware Layer) and are reported using visualizations (Application Layer). The next section explains the data generated by IoT sensors in more detail.

Real-Time Big Data

The concept of real-time (or near real-time) data management and analytics (within the IoT paradigm) refers to capturing, storing, and analyzing the data streams as soon as they are received from the IoT sensors. IoT sensors deployed in the smart building generate a lot of data at a high velocity, comprising big data. Big data management refers to sourcing, storage, and distribution of data. Analytics refers to finding patterns and valuable insights using various analytical techniques and algorithms [36]. In today’s fast-moving digital age, businesses want to stay ahead of their competitors by focusing on the immediate implications of managing and analyzing real-time data. Real-time data management and analytics can be categorized into two types—On-Demand and Continuous [37], which are distinguished by the reactive and proactive approaches [38]—or pull and push [39]. For instance, On-Demand Real-Time Analytics is reactive. It waits for user to initiate a query and it delivers the results. Continuous Real-Time Analytics, on the other hand, is more proactive and keeps on delivering the analytics results to the users in real-time. Both above-mentioned types of real-time data analytics have their own use cases and can be used to provide valuable information and insights to a business to make effective decisions. In this research, we focus on both data analytics techniques.

The duology of IoT and, near real-time big data management and analytics is complex in nature. There is a lack of integrated and coherent framework. This research integrates a metamodel and a reference architecture to address the real-time big data management and analytics challenges in smart buildings. This research aims to address this important challenge by proposing the IBDMA framework’s reference architecture using the well-known DSR [40] method. The IBDMA framework has five key contextual elements: people, process, technology, information, and facility. The metamodel and the reference architecture are connected through these five key elements.

Related Work

Increasing interest in the areas of IoT, big data management, and big data analytics in recent years has been observed both in academia and industry. In [41], authors put forward different big data analytics techniques and specifically discuss Apache Spark in the context of smart grid big data. In [42], the authors propose a real-time semantic annotation reference architecture for Smart City IoT streaming applications. The work presented in this paper provides a foundation for the development of a comprehensive framework that could be useful in improving the performance capability of a smart city. In another recent study [36], an IoT-based Wireless Sensing and Monitoring platform has been proposed to detect environment conditions in the context of building automation. The work presented in this research does not, however, discuss the real-time data analytics for the IoT-enabled smart environments.

In [43], the authors discuss the advantages of sensing and analyzing big data from various sensors in a smart city. This paper provides a conceptual overview and the advantages of applying big data techniques over IoT data coming from sensors deployed in the smart city. It also highlights the difference between static and mobile data sources and proposes the best data extraction techniques for both types of data sources. Similarly, in [44], the authors discuss different big data analytics techniques suitable in a smart city scenario. This paper also discusses some of the major challenges in the data analytics process for smart cities data. In [45], the authors survey IoT, Cloud Computing, Big Data, and Sensor technologies with the aim to find their common operations and combine them. The authors then propose new methods to collect and manage sensors’ data in a smart building. In [46], the authors present a distributed system for storing and processing building data. Based on the big data technologies, the platform enables new potentials in terms of data analytics for smart buildings’ applications. These papers, however, do not provide an insight or concrete guidelines and implementation architecture, as discussed in this paper, about which and how different components can be integrated and implemented to design real-time data management and analytics architecture and solutions.

In [47], the authors propose a social media data analytics platform that uses tweeter posts to improve the smart city experience for the residents of the city. It aims to improve the residents experience by analyzing real-time Twitter posts. The overall results suggest that this platform can help improve the effective management of the smart city. This paper, however, only takes into account residents’ sentiments and twitter posts without taking into account IoT sensor data and actuation of the controls. An IoT-based system has been implemented in [48], this paper focuses on obtaining real-time sensor data from IoT sensors deployed in a smart city and performs the real-time analytics on it. A practical demonstration has been presented in the paper using Hadoop ecosystem. This paper, however, does not address the smart city control and end-to-end data management scenario.

In [49], the authors present a scalable architecture for ingesting and analyzing IoT data called the hut architecture. It utilizes historical data analysis to provide context for real-time analysis. The applicability of the architecture is demonstrated using two real-world smart city scenarios in transportation and energy management.

In [50], the authors present an initial version of Big Data analytical framework for Internet of Things and Smart City application. This work demonstrates how such a framework can be used by presenting a case study in the smart grid domain. However, this framework is a high-level and initial version addressing some of the volume and velocity challenges. The implementation details around the use of tools and the data ingestion pipelines are not made clear. Moreover, the results obtained from the analytics have not been used to autonomously control the smart city or smart grids based on the received data.

In [51], the authors presented a Big Data architecture for the Smart Supply Chains fields. Data were ingested into Hadoop and Machine Learning models were used to address data-related challenges in Supply Chains. However, this paper lacks details in explaining the significance of the work, and how it can be applied in real-life scenarios. It presents only the architecture of an IoT pipeline which is missing the near real-time visualizations as presented in our work. And more importantly lacks the framework we have developed with five elements: 1—People, 2—Process, 3—Technology, 4—Information, and 5—Facility, and how these elements are related to the underlying Big Data Management architecture and to Smart Buildings.

In [52], the authors presented the use of machine intelligence and data analytics algorithms on data acquired from the sensing networks’ integral to smart city applications. However, the paper lacks the conceptual framework and lacks the implementation details of the proposed architecture. However, it presents a very high-level architecture which lacks any conceptual framework and implementation details of how the authors performed their experiments and evaluation. It is a very generic paper listing some of the machine learning and data analytics challenges in smart cities.

In [53], the authors explore the IoT issues in smart buildings and compare two network protocols used for IoT devices to improve energy efficiency in smart buildings. However, this paper lacks a focus on the big data management and analytics of IoT data in smart buildings. It also lacks the discussion on the control and management of various controls and facilities within the smart building.

In [54], the authors present a technique for the facial recognition in smart cities. This paper lacks a reference architecture for smart buildings. Kuma et al. [55] discuss IoT applications and challenges in various domains. It lacks big data analytics focus and highlights that it is a key challenge. Kuma et al. [56] present IoT-based fog computing model. It lacks a focus on big data management and analytics for IoT data. It also does not focus on smart building domain.

In [57], the authors focus on the service-oriented architecture and the networking layer. It does not focus on big data management and analytics, near-real time visualization, and the autonomous control of smart buildings. In [58], the authors focus primarily on the networking layer, identifying which protocols are available and a comparison of those communication protocols. It identifies the future challenges, but it does not mention the data management and analytics in smart buildings. It is a survey paper which does not specify any framework or reference architecture like the IBDMA. It also lacks discussion about the real-time analytics and control of the smart buildings. In [59], the authors suggest a big data mining IoT system. It does not focus on autonomous control of the facilities in general and smart building controls in particular. It mentions about the generic data gathering systems, but details about the implementation and evaluation of the suggested system are missing. In [60], Gubbi et al. present the architectural elements in IoT paradigm, but lack a focus on (1) data management and analytics, (2) near real-time analysis, (3) near real-time visualization, and (4) near real-time control of facilities within smart building. It also lacks the integration of the architecture with a metamodel which the IBDMA framework provides.

In [24], the authors present a high-level IoT architecture. It does not mention any details about big data management and analytics, near-real time visualization, and the autonomous control of smart buildings. In [61], the authors present high-level IoT architecture and challenges faced in the IoT domain. It lacks autonomous control of the facilities in general and smart building controls in particular. It is a survey paper which does not specify any framework or reference architecture like the IBDMA. It also lacks discussion about the real-time analytics and control of the smart buildings. In [21], the IBDMA framework’s second component, i.e., the metamodel is presented. The metamodel presents the key elements and the relationship between these elements that are required in the big data management and analytics ecosystem for smart buildings. However, the paper does not present the reference architecture, and hence, this research focuses on the second component of the framework, i.e., the reference architecture.

Table 1 summarizes the research gaps and the corresponding studies where we observed the research gap.

Table 1 Research gap

Full size table

Based on the literature review and Table 1, it is quite evident the literatures lacks focus on real-time big data management and analytics, integration of a reference architecture and metamodel, and real-life validation scenarios in the smart buildings context; and hence, there is an urgent need for a vendor independent practical research-based integrated comprehensive framework for IoT real-time big data management and analytics. In this research, we presented the IBDMA framework, which is an attempt to fill the research gap. This sets a foundation for more studies in this important area of research.

Research Method

This research adopts the DSR approach [62]. DSR proposes a practical research approach supporting the creation of artifacts to solve real-life problems [63]. DSR encompasses the formation of new information through the design of novel artifacts. It involves the performance analysis of such artifacts to understand and improves the behavioral of aspects of IS (Information Systems) [64]. These artifacts may include algorithms, methodologies of system design, and human/computer interfaces. DSR researchers can be found in various domains, e.g., Engineering, Information Systems and Computer Sciences. In [65], DSR activities are described for the IS discipline using a conceptual framework. We adopt the guidelines from [63, 65] in conducting this research, as shown in Table 2. The rationale behind using the DSR approach for this research is that; first, the research involves incremental development of the reference architecture; second, the DSR focuses on solving real-world problems, and for this research, we address the challenge of big data management and analytics by developing a reference architecture; thirdly, DSR tries to address the gap between theory and practice, and for this research, we are trying to address the gap in the areas of BDMA for IoT-enabled smart buildings.

Table 2 DSR guidelines [35]

Full size table

For assessing the quality of our DSR, we follow the checklist questionnaire also from [31]. The checklist is presented in Table 3.

Table 3 Checklist to assess design science research [65]

Full size table

We adopt the ‘Three Cycle’ DSR framework from [66] for conducting our research. The Relevance Cycle links the contextual environment with the DSR activities. The Rigor Cycle bridges the DSR activities with the knowledge base of scientific theories and methods, that inform the research. The centrally located Design Cycle continuously iterates between the development and evaluation elements of the DSR. The three cycles mentioned above must exist in DSR project and must be distinguished clearly from each other. Following these research cycle and the checklist questions of Table 3, we first identified the research question. Then, we defined the artifacts and the design processes that would be used to build those artifacts. The literature was reviewed to determine if the knowledge base provides support to the artifact design. The artifacts were then designed, and an evaluation method was proposed to test the designed artifacts. Finally, the research is communicated in the form of publication. Figure 2 represents these three research cycles and each checklist question from Table 3 mapped onto appropriate phases of the three-cycle DSR approach.

There are five main steps involved in the DSR, as shown in Fig. 3. In the first step, the problem is identified by finding the research gap. We performed literature review to identify the research gap, and eventually helped us identify the problem. We then proposed the design of an IBDMA framework which consists of a metamodel and a reference architecture to bridge the research gap. The IBDMA framework components (metamodel and reference architecture) were designed and developed to address big data management and analytics challenges in smart buildings by providing a holistic view of all the elements required in a smart building ecosystem. The IBDMA framework components were then evaluated, and then, the evaluation results are presented as outcomes of the research. As mentioned earlier, the metamodel development and evaluation has been published in [21]. This paper focuses on the IBDMA reference architecture, which is the second component of the IBDMA framework. Following the DSR approach, we went through five iterations of the IBDMA reference architecture before coming up with a final reference architecture as previous iterations did not satisfy the evaluation criteria. The details about all five iterations, the development, and evaluation processes are explained in the next sections.

The framework we have developed has been evaluated against the EC (evaluation criteria), as shown in Fig. 4. In [67], the authors present artifact evaluation criteria in design science research. The 13 evaluation criteria we shortlisted (out of 20 from [67]) have been adopted from [67] based on their relevance to our research, as some of the evaluation criteria were not applicable to our research or were not possible to be evaluated concretely (such as style, homomorphism, level of detail, etc.). For this research, we chose 13 ECs, i.e., EC1, EC2, EC3, …., EC13 for evaluating the IBDMA framework and reference architecture. The details of these ECs are presented in Fig. 4. These ECs were evaluated against our research objectives of:

Ability to have both batch and streaming analytics.
Ability to do near real-time analytics and visualization.
Ability to autonomously control facilities within smart building.
Ability to provide building management and relevant authorities the alerts in near real time.
Ability to scale up for any building size and for other smart environments.
Ability to provide a comprehensive framework comprising of a reference architecture and metamodel.
Ability to validate the proposed design using real-life scenarios.

The evaluation results are presented in “Framework Evaluation Results” in Table 11.

Table 4 provides details of the research objectives, and how they are lined to the research gaps and research aims.

Table 4 Research objectives linked to research gaps and research aims

Full size table

The IBDMA Framework

The proposed IBDMA framework, as shown in Fig. 5, consists of two components. This paper specifically covers the reference architecture component. The metamodel part has been developed earlier and is presented in [21]. The reference architecture is encircled in Fig. 5 to demonstrate that this is the focus of the research. With the reference architecture, IBDMA framework will assist professionals and researchers working in the big data and IoT domains in the smart building context. Figure 5 represents that the context of the research is limited to smart building, and the IBDMA framework (applicable with the smart building) has two components: (1) Metamodel and (2) Reference architecture. The metamodel and the reference architecture are linked to each other through the contextual elements.

The IBDMA framework as adopted from [68,69,70,71] has five main contextual elements, as shown in Fig. 6.

The IBDMA framework will help building developers, and IoT and big data professionals to have a holistic view of which elements do they need to deploy in the smart building while designing the smart building or converting an existing building into smart building. The metamodel helps in identifying those elements and the relationship between those elements, while the reference architecture provides the ability to link those elements to the physical design of the end-to-end data management, analysis, and process flow.

The reference architecture (which is the focus of this paper) is novel in that it provides a scalable architecture which focuses on big data management and analytics for smart buildings and can be extended to other smart environments (smart homes, smart grids, and smart cities). The reference architecture also provides an ability to mitigate the risks and improve residents’ experience in smart building by providing the ability to autonomously control various facilities within the building.

The integration of metamodel and reference architecture is important to provide a holistic view of the IoT-enabled smart building ecosystem for big data management and analytics of IoT data. The integration of the metamodel and the reference architecture not only assists in developing new smart buildings but also provides the ability to transform an existing building into a smart building by enabling big data management and analytics for IoT data. In the initial phase of transformation, the metamodel will be used by the building developers, architects, and administrators along with IoT experts to identify which elements do they need and to understand the relationship between these elements. Once this has been achieved, the big data developers and architects can utilize the reference architecture to implement the big data management and analytics processes in the smart building ecosystem.

As shown in Fig. 6, ‘People’ element is the core IBDMA element. This includes the ‘residents’, ‘policy-makers’, as well as the developers of the smart buildings. The ‘policy-makers’ make policies which enable and govern the smart buildings ecosystems. The ‘developers’ develop the smart buildings adhering to policies compiled by the ‘policy-makers’. The ‘residents’ may include students, staff, home owners, shop owners, etc. These are beneficiaries of the smart building ecosystem. The ‘developers’ and ‘policy-makers’ make policies which help identify ‘Process’ element of IBDMA. The ‘Process’ element encompasses all processes which are required for the effective management and analysis of the smart building data. The ‘technology’ element consists of the technology stack that supports the processes as defined by the ‘process’ element of the framework. The overlap of these elements results in useful information which makes up the fourth element of IBDMA known as ‘information’. The information is then autonomously used to control various facilities within smart buildings, which fall under the ‘facility’ element of IBDMA. The ‘process’ element links all other elements, as shown in Fig. 6. The way these different elements are linked and interacted is explained in the upcoming sub-sections.

People

‘People’ element which is the first element IBDMA. It includes policy-makers, developers, and building residents, as shown in Fig. 7. These can be broken down into two groups, one consisting of policy-makers and developers, and another consisting of building residents.

Policy-Makers and Developers

The policy-makers define policies which govern the policies of the building. These policies define the key requirements from the stakeholders and help propose the optimum solution to meet the expectations of the stakeholders and residents.

The developers include the building developers who develop the smart building in line with the policies and regulations. Their role is to ensure the safety, security, and comfort of the residents of the building.

Residents

The residents of the building may include students, staff, tenants, homeowners, shopkeeper, etc. depending on the nature of the smart building. They are the users or beneficiaries of the smart building ecosystem.

IBDMA proposes that based on the policies defined by the policy-makers, the “processes” are defined which are required for the effective execution of these policies. These “processes” help identify “technology” stack required for the effective execution of these “processes”. This includes tools and software applications required for the execution of the “processes”, e.g., Microsoft Power BI [72] and Tableau [73] for data visualization, Apache Flume [74] for data ingestion, Apache Spark [75] for data analysis, etc. The applicability and usability of each of these tools and the process elements is explained in detail in the next sections.

Policy-Makers and Developers (people) clearly articulate the requirements for the smart building. Processes are then identified and executed. This includes ingesting IoT data, storing it, and analyzing it. Hence, ‘process’ element is the second element that follows when applying the framework.

Process

“Process” is the second element of IBDMA framework. It performs a key role in defining the strategy for the implementation of IBDMA framework. Processes outline the operations and how various operations should be integrated for a concise and effective solution. Hence, to have an effective solution, processes defined in IBDMA should be transparent and streamlined.

The goals, requirements, and policies defined by the “people” serve as input to defining the processes and form the basis for choosing and implementing the “processes”. Since this research focuses on ingesting, managing, and analyzing data generated by IoT sensors deployed in smart buildings, the IBDMA framework proposes the following processes to by implemented which include: monitoring of the smart building environment, sourcing of IoT data, ingestion of data, storing of data at a centralized location, analyzing near-real time, decision-making, visualizing of near real time, and autonomously controlling various smart facilities in the smart building in near real time as presented in Fig. 8. The processes depicted in Fig. 8 represent the data flow and hence are sequential (monitoring, data sourcing, ingestion, analysis, decision-making, and actuation). There may be several facilities that could be controlled autonomously within the smart building using IBDMA framework. However, to have a realistic scope for this research, we consider five facilities which include managing oxygen levels, luminosity levels, garbage, parking, and fire.

Monitoring

In all IoT systems, the first process in implementing a big data management and analytics infrastructure is the ‘monitoring’ of the environment in which the IoT sensors are deployed, which for this research is the smart building. There are various types of IoT sensors available these days that could be deployed to monitor various parameters and attributes of the smart buildings depending on the use cases and requirements of the residents and stakeholders.

Data Sourcing

On monitoring the environment in which they are deployed, these sensors generate data. The output from these sensors could be binary or continuous depending on the nature and type of the IoT sensors.

Data Ingestion

The data generated from these sensors are then ingested into a centralized repository using an ingestion pipeline.

Data Storage

The centralized repository is where the data are stored for cleaning, manipulation, and further processing. Once the data are at a centralized location, it is made ready for the analysis.

Data Analytics

The nature of data analysis is dependent on a particular use case or the requirement of the stakeholders. The analysis process is the process which enables us to obtain useful insights about the smart buildings.

Decision-Making

The output of the analysis helps us in decision-making to manage and control the smart building. The decision-making process involves making decisions on whether to activate or deactivate controls in the smart building based on the data received from the IoT sensors. Decision-making processes for this research include: (1) whether an HVAC system needs to be turned ON or OFF based on the oxygen levels in the building, (2) deciding whether the lights need be to turned ON or OFF based on the luminosity levels in the building, (3) deciding whether a fire extinguisher and a fire alarm need to be triggered if the smoke levels are above a given threshold, (4) deciding whether the garbage bins need to be emptied or not if the bins are filled above a threshold level, and (5) whether the parking lot is full and the incoming vehicles can be directed to another parking lot.

Actuation/Control

Based on the data analytics results and the decision-making process, the smart building controls are actuated and controlled in an autonomous manner, so the building can be managed in an effective manner.

All these different processes from monitoring to ingestion, from storage to analysis, and from decision-making to autonomous control of the smart building fall under the ‘Process’ element of the IBDMA which performs the core function of integrating all the elements of the IBDMA, as shown in Fig. 6. A more detailed implementation is presented in “Reference Architecture Implementation” where it will become more evident on how different elements of the IBDMA interact with each other to have an effective solution.

Once the processes are defined following the requirements compiled by the ‘people’, the implementation of the ‘process’ requires ‘technology’ stack. Choosing the right tools and software packages is imperative to the success of an effective solution. Within IBDMA, these tools, technologies, and software packages fall under the ‘technology’ element of IBDMA, and hence, it is the third element to be discussed.

Technology

‘Technology’ is the third element of the framework. It has a pivotal role in the effective implementation of big data management infrastructure and strategy. Hence, choosing the right technology stack is imperative. Technology includes tools and software packages deployed for effectively designing and deploying of IBDMA, as shown in Fig. 9. In general, the technology stack would include data ingestion tool, data storage tool, data visualization tool, and near real-time data analysis tool. However, for implementation and evaluation purposes, details about specific tools are provided that were used.

IoT Devices/IoT Application

For this research, in the initial two iterations, we use physical IoT devices for the data analytics and building control. The details of these iterations and the use of physical IoT devices are presented in IBDMA reference architecture development process section (see iteration 1 and iteration 2 details). However, to have a scalable reference architecture which can take into account hundreds or thousands of sensors, we implement a virtual IoT sensor-based application in Python programming language and used PyCharm (a Python IDE) for the development. This application simulated data generation from IoT sensors deployed in smart buildings. Iterations 3, 4, and 5 present details of the IoT application.

HDFS

The data generated by these sensors are stored in HDFS (Hadoop Distributed File System) which is a high-performance distributed file system and provides reliable data access to Hadoop clusters.

Apache Flume

The data generated by the sensors are ingested into HDFS using data pipelines that are developed using Apache Flume which is a reliable data collection, aggregation, and transportation tool to ingest huge amounts of batched and streaming data, including logs, IoT data, financial data, etc., and move it to a centralized location. Flume is a fault tolerant tool which provides failover and recovery mechanisms and uses a simple extensible data model that allows for online analytical application.

Apache Spark

Apache spark is used to analyze the data generated by the IoT sensors. Apache spark is an in-memory data processing engine which provides fast data analysis and processing capabilities for various streaming and batched applications. Its architecture is based on Resilient Distributed Dataset (RDD) which provides fault tolerant way of maintaining multiset of data items distributed over a cluster of machines. For this research, we use Python to write Spark code for data analysis. The analysis helps in decision-making and in turn enables the system to control and maintain the smart building and its various facilities autonomously. The aim of this autonomous is to make the smart building comfortable and secure building residents.

Power BI

For visualization, Microsoft Power BI is used. Power BI comes with a built-in connector to connect to HDFS and enables to write code in R and Python to perform predictive analytics within its environment. Hence, it was a natural choice for the data visualization tool for this research. However, like any other tool, Power BI has some limitations, and it is hard to build a near-real-time dashboard in Power BI.

Elasticsearch and Kibana

Since for this research, we are working with IoT sensors data, we needed to have the capability to present the IoT sensors data in near-real-time dashboard to have a greater insight into monitoring the smart building environment, so any alarms can be addressed in near-real time. To have this near-real-time visualization capability, we choose Elasticsearch [76] and Kibana [77]. Elasticsearch is an open-source tool built on Apache Lucene [78] and provides distributed search and analytics engine. New incoming data are stored as documents in Elasticsearch using either API or an ingestion tool such as Logstash [79]. This receives and stores incoming data and augments a searchable reference to the data (document) in the cluster’s index. These documents can then be searched and retrieved using the Elasticsearch API. For the data visualization, Elasticsearch provides an open-source data visualization plug-in called Kibana which provides near-real time visualization capabilities in accessing documents on an Elasticsearch cluster.

Information

‘Information’, which is the fourth element of IBDMA, originates from the overlap of the first three elements shown in Fig. 6. As discussed in the previous sections, ‘people’ outline the policies and requirements for the smart buildings. Based on these policies, processes are identified to define the ‘technology’ stack for their implementation. Information is generated from the data generated by the IoT sensors when the processes and the technology infrastructure are deployed successfully. There could be various forms and types of information that could be obtained from the data generated by the IoT sensors which enable us to control various parameters and aspects of the smart buildings for improved residents’ comfort and safety.

Information for Building Control

There could be numerous facilities within the smart building that could be autonomously controlled as a result of the information that is generated which may include: smart lighting based on the luminosity levels, smart parking based on the parking sensors, elevators’ operation, HVAC System, vending machine operations, and many others. However, for this research and to have a limited scope, IBDMA proposed the autonomous monitoring and control of five different facilities including: HVAC system, luminosity levels, parking management, garbage management, and fire incident management. One example scenario could be if the luminosity sensor indicates that the luminosity levels in a particular location of the building are below a certain level, IBDMA proposes that this ‘information’ will help improve the luminosity levels of the room by turning the lights on in that location.

Generally, IBDMA proposes that the ‘information’ element also includes the visualization of the IoT data done in Power BI and Kibana, the analysis results generated by Apache Spark and the results for autonomous control of facilities within the smart building (prescriptive analytics), as shown in Fig. 10. The Spark program analyzes the data received from the IoT sensors, and based on the received data, it decides what action to take. For instance, the smoke detection sensor, sends a value indicating there is a fire in a certain location of the smart building, the Spark program will trigger the fire alarm and the fire extinguisher in that location. More details and specific use cases are provided in the next sections. As mentioned earlier, the “processes” integrate all the elements of the framework, hence the “processes” integrating “information” element to the rest of the elements of IBDMA include ‘data visualization, ‘data analysis and ‘decision making’ as represented in Fig. 17.

Facility

‘Facility’ is the final element of the framework. It includes numerous facilities of the smart building that are aimed to enhance residents’ comfort, safety, and security. The ‘information’ generated from the IoT data enables the autonomous control of these ‘facilities’, as shown in Fig. 6. The facilities may include but are not limited to elevator maintenance and management, HVAC system, and garbage management. For this research, we consider the following five facilities: HVAC system, smart parking, smart garbage management, smart lighting, and smart fire management, as presented in Fig. 11. The target facilities that need to be controlled in the smart building should be identified and considered before defining the policies and requirements of the smart building ecosystem. This reduces the possibility of encountering any major roadblocks in deploying the infrastructure for big data management of IoT-enabled smart buildings.

As discussed earlier, the ‘process’ element of IBDMA integrates all the elements of IBDMA, the process that integrates and control the facilities of the smart building based on the information generated by the system is called ‘action’. The action process enables the autonomous control and management of the five facilities that fall under the scope of this research as represented in Fig. 17.

Reference Architecture Development Process

The development of the IBDMA reference architecture has been done in five iterations to ultimately reach to its current final state (iteration 5). In this section, we also provide the details of how the contextual-level elements of IBDMA are related to the physical level elements; this will be demonstrated in iteration 4 and 5 in the sub-sections below.

The five iterations for the development of the Big Data and Analytics architecture, for our use case, are discussed in detail below:

Iteration 1 (Physical)
Iteration 2 (Physical, real time)
Iteration 3 (Virtual, smart building)
Iteration 4 (Virtual, smart building, improved)
Iteration 5 (Virtual, smart building, improved and finalized)

Iteration 1: In the first iteration, we extracted UTS (University of Technology Sydney) sensor data from UTS building 11 and imported that into RStudio by utilizing web-scrapping techniques. The data are available publicly on UTS’s web portal. These data were graphed and plotted for various sensors and predictions were made about the sensor data using ARIMA (AutoRegressive Integrated Moving Average) [23] model. The implementation of graphs and ARIMA model was done in RStudio using R. This exercise was done to become familiar with the sensor types and the data available. It also helped us in familiarizing ourselves with tools and techniques that we could use for future research at that time. Figure 12 demonstrates the components and the steps that were taken in Iteration 1.

Iteration 2: In the second iteration, we prototyped a physical system consisting of an Arduino microprocessor board, physical sensors, and a linear actuator. The data were sent to HDFS (Hadoop Distributed File System) for storage from where it was imported to RStudio for predictive analytics. The problem with Iteration 1 was that the data available were only batched data. Our focus was on both batched as well as real-time data, so we decided to prototype a system with a couple of physical sensors connected to a microprocessor. The sensors considered for this iteration were temperature and smoke detection sensors. The sensors produced real-time data after regular intervals. A linear actuator was also connected to the system to simulate the behavior of a fire extinguisher scenario. The sensors generated the data in real time, the data were stored in HDFS, and from HDFS, we could perform predictive analytics as well as visualize it in Tableau [80, 81]. The data generated from the sensors were also analyzed in real time as it was generated. If the values generated by temperature sensor and the smoke detection sensor went above the threshold (simulating a fire scenario), the linear actuator got activated for 5 s, simulating that the fire extinguisher is activated to rectify the fire. The linear actuator would go off if the sensors read a value below the threshold. The steps and processes followed in Iteration 2 are shown in Fig. 13.

Iteration 3: In the third iteration, we focused on scaling up the architecture developed in the second iteration. For this iteration, we considered a smart building application scenario by introducing big data pipelining, storage and analysis tools. It was not possible to have access to a large number of physical IoT sensors and actuators in a lab environment, and thus, we decided to virtualize the IoT sensors by simulating the sensor data. Similarly, we simulated (virtualized) the actions taken based on the data received from the virtual sensors. This work has been published in [22]. The architecture developed for this iteration is shown in Fig. 14.

Data Sourcing

For this research, we virtualized the data generation from fifteen virtual sensors using a Python application. These 15 sensors include five (IoT) oxygen sensors, five smoke detection sensors, and five luminosity sensors deployed in a smart building. These 15 sensors are assumed to be deployed at five different locations (e.g., different rooms or floors) of the smart building in such a way that each location has a set of these three different sensor types, i.e., oxygen, smoke, and luminosity.

Data Ingestion and Storage

The data generated by these IoT sensors (source) are ingested into HDFS (sink) using an Apache Flume over a TCP (Transmission Control Protocol) port. For the implementation, we made use of the Cloudera [82, 83] Big Data platform (Virtual Machine for the Apache Hadoop environment) for extraction, ingesting, data pipelining, storing, and analyzing the data. For ingesting data into HDFS, Flume was the choice of tool because of Flume’s robust integration with HDFS as compared to Kafka [84]. MQTT is a widely used protocol for IoT data; however, MQTT is primarily used as Machine-to-Machine protocol for transferring data between two physical systems. Since our goal is to move data to HDFS, we use Flume for data ingestion. There are a number of other tools available including Apache Beam, Apache Flink, Apache Storm, Apache NiFi, and Apache Ignite that can be used for streaming data analysis and event processing. However, for the purpose of this research and proof of concept prototype, we used Flume to ingest data and Apache Spark for its analysis.

Data Analysis and Building Control

For the analysis of data to enable decision-making, we developed an Apache Spark algorithm using PySpark [20]. The algorithm reads and analyzes the data from three different types of IoT sensors stored in HDFS in near-real time to enable effective decision-making. For instance, if the oxygen sensors generate data indicating a low oxygen concentration in a given location of the smart building, the Spark algorithm would in turn enable the HVAC System to turn ON to ensure that comfortable oxygen concentration levels are attained in that location. The system represents this by outputting “HVAC X turned ON” on the Cloudera terminal, where X represents the room or floor in a smart building. For the oxygen concentration threshold levels to turn the HVAC system ON or OFF, we defined the oxygen concentration threshold value of 14. On the other hand, if the oxygen concentration levels are above the threshold levels, the deployed infrastructure would represent this as “Oxygen level at X ok” on the Cloudera terminal, indicating that the oxygen concentration level in a particular location is above the comfortable threshold levels and that no further action is required to enable or disable the HVAC System. If the HVAC System was turned ON by the system due to low oxygen concentration levels and the oxygen concentration levels have become normal, the system would turn the HVAC system and will represent this by outputting “Fire alarm X turned ON” where X represents the room or level where smoke is detected.

Similarly, if during the data analytics process, a particular luminosity sensor detects lower than the minimum luminosity levels, the system will turn the lights ON that are located at that location. This is represented by the system by displaying “Lights at X turned ON” at the Cloudera terminal where X represents the particular room or level of the smart building.

Iteration 4: In iteration 4, we improved and extended the architecture to conceptualize the elements in terms of people, process, technology, information, and facility to link the contextual elements (Fig. 6) with the physical layer components (Fig. 14). The architecture for iteration 4 is shown in Fig. 15. The architecture developed in iteration 3 was scaled up and tested for a smart building application scenario by considering 1000 virtual IoT sensors.

Data Sourcing and Ingestion

For this iteration, we considered 200 of each of the five different types of IoT sensors which include oxygen sensors, smoke detection sensors, light sensors, parking spaces sensors, and garbage detection sensors.

Data Storage

Ten Flume agents were configured with IoT sensor data as the source and HDFS as the sink. The data were then visualized in Tableau.

Data Analysis and Building Control

Apache Spark was used to analyze the data in near-real time as it gets stored in HDFS. Based on the algorithm developed in PySpark, various messages were printed on the terminal screen simulating the feedback actuation behavior.

For oxygen sensors, if the value sent by a sensor is below a threshold, the PySpark algorithm prints out a message on the terminal stating the HVAC system associated with that particular oxygen sensor has been turned ON. For smoke detection sensors, if the value generated by a sensor exceeds a threshold (i.e., occurrence of a fire), the PySpark algorithm detects that and outputs a message on the terminal stating that the Fire Alarm connected at the location of that particular smoke detection sensor is turned ON. In case of luminosity sensors, if a particular luminosity sensor generates an output value below a threshold indicating it is dark, the PySpark algorithm outputs a message on the terminal stating that the lights associated with that particular luminosity sensor are turned ON. For the parking space sensors, if the value generated by a particular parking space sensor is a 1, the PySpark algorithm displays a message on the terminal stating that a car has been parked at that particular parking spot. For the garbage detection sensors, if the value generated by a particular sensor is above the threshold, the PySpark algorithm displays a message on the terminal stating that the garbage bin associated with the particular sensor which generated an above threshold value is full.

Iteration 5: The earlier iterations had limitations in terms of real-time data visualization and performing predictive analytics. In the fifth and final iteration, we worked on improving the architecture to enable real-time visualizations by introducing Elasticsearch [76, 85] and Kibana [77, 86]. We also introduced MS (Microsoft) Power BI [87, 87] for the visualization of data stored in HDFS. The main reason for introducing MS Power BI was because Power BI integrates well with R scripts. This integration provides the ability to do data analysis and predictive analytics within Power BI in an interactive way.

Moreover, we chose a hybrid model considering both batched as well as streaming data sources.

The high-level and generic reference architecture is presented in Fig. 16.

However, for implementation and validation purpose, we chose specific tools, and the resultant architecture is presented in Fig. 17.

Batched Data

We chose open data as a batched data source for our architecture. These open data are scrapped using R (can also be done using Python) and ingested into HDFS.

Streaming Data

For streaming data sources, the virtual IoT sensors send the data to two sinks: (1) HDFS and (2) Elasticsearch. As the data from the IoT sensors land into HDFS, it is analyzed in near-real time by the Apache Spark algorithm to enable decision-making for the effective management and control of the five facilities earlier described within the smart building. These data once stored in HDFS are visualized in batches using Power BI.

Predictive and Near-Real Analytics:

For predictive analytics, we used R scripts to develop an ARIMA model within Power BI. For the second source, i.e., Elasticsearch, the data are indexed as it lands into Elasticsearch. Elasticsearch provides a data visualization plug-in called Kibana which enables the near-real-time visualization of IoT data.

The updated architecture is presented in Fig. 17. It shows how the five contextual elements shown in Fig. 6 are related to the physical elements of the framework. People are at the top-most level, which represent the stakeholders of the smart building environment such as building developers, building management, IT professionals, and residents of the building. Process element defines data-driven processes, which are relevant to the smart building. This includes monitoring via sensors, data sourcing, ingesting, storing, analysis, visualization, decision-making, and finally actuation. The Technology element includes the technology stack including Flume, R, Elasticsearch, HDFS, Kibana, Spark, and Power BI. The information includes the near-real data visualizations in Kibana, Power BI dashboards for the IoT data visualization and the output of the decision-making process using Spark. Finally, the Facility element represents the facilities in the smart building, including HVAC systems, fire alarms, lights, parking spaces, and garbage spaces.

Table 5 summarizes the details of various big data tools used in the development of the IBDMA. It lists the processes in which each of these tools are used and the purpose of each of these tools in the IBDMA reference architecture implementation.

Table 5 Elements in the IBDMA architecture and their purpose

Full size table

Reference Architecture Implementation

The proposed design of the IBDMA architecture is implemented for a smart building application scenario following the architecture as presented in Fig. 17. As shown in Fig. 17, we considered both streaming as well as static data. For streaming data, we created 1000 virtual IoT sensors. This is presented in the section marked Streaming Data in Fig. 17. These virtual sensors are implemented using a software application developed in Python. This application generates data from each sensor at regular intervals. Each sensor generated the sensor id and the value it measures from the building environment. The sensor id represents the location of the sensor in the building. The IoT application has a defined range of values for each type of sensor, and hence, the values are generated randomly between the ranges of values by the IoT application. This is done to keep the validation scenario simpler and to ensure that all possible scenarios are considered during the implementation and validation phases.

For the IoT sensors, we consider five different types of sensors. Out of the 1000 sensors, we simulate 200 oxygen sensors, 200 smoke detectors, 200 luminosity sensors, 200 parking spaces sensors, and 200 garbage detection sensors. It is assumed that these 1000 sensors are deployed at 200 distinct locations (including rooms or levels) of the smart building. We implement the example scenario using Cloudera VM (Virtual Machine) Hadoop distribution and used Python to create virtual sensor application to generate IoT data. The VM provides most the big data tools (Apache Flume, Apache Spark, HDFS, and Hive) required for the implementation of the IBDMA architecture. The other software packages (Pycharm IDE, Elasticsearch, and Kibana) were installed on the VM. The virtual sensor application forwarded the IoT data to two destinations. First destination is Elasticsearch where the data are indexed and stored, so Kibana can be used to visualize it. The second destination is multiple Flume agents ingesting data into HDFS. We configure ten Flume agents with each Flume agent serving 100 sensors. On ingesting the data into HDFS, it is analyzed in near-real time using PySpark (Python Apache Spark API) [20].

For the static data, we consider UTS smart building open data available publicly and ingested that in HDFS using R. This is presented in the Static Data section of Fig. 17.

The “process” element IBDMA framework integrates all other elements. The implementation of IBDMA architecture relies actually on the implementation of the processes, as shown in Fig. 6 and Fig. 17. Hence, we further the processes as shown in Fig. 8 to explain the implementation of the IBDMA architecture.