VR digital twin of office space with computer vision-based estimation of room occupancy and power consumption

Mukhopadhyay, Abhishek; Talwar, Naveen R.; Viswakarma, Himanshu; Rajshekar Reddy, G. S.; Srivastava, Shakti; Pena-Rios, Anasol; Biswas, Pradipta

doi:10.1007/s44257-024-00008-z

VR digital twin of office space with computer vision-based estimation of room occupancy and power consumption

Research
Open access
Published: 15 April 2024

Volume 2, article number 6, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Analytics Aims and scope Submit manuscript

VR digital twin of office space with computer vision-based estimation of room occupancy and power consumption

Download PDF

Abhishek Mukhopadhyay¹,
Naveen R. Talwar¹,
Himanshu Viswakarma¹,
G. S. Rajshekar Reddy²,
Shakti Srivastava³,
Anasol Pena-Rios⁴ &
…
Pradipta Biswas¹

626 Accesses
1 Altmetric
Explore all metrics

Abstract

In the past years, energy consumption has increased rapidly due to many factors, including the rise in technology adoption. This has many downfalls, from higher costs to CO$_2$ emissions. Human activities in offices and houses represent a considerable amount of energy usage. A digital twin (DT) of an open-plan common space is created, serving the purpose of remote room occupancy monitoring and automatic detection of energy consumption. A virtual reality (VR) model is developed and integrated to temperature, humidity and imaging sensors. For maintaining privacy, images are processed in local computers to measure occupancy levels and live video feed were never transmitted. The same set of imaging sensors were also used in a bespoke computer vision module for energy consumption estimation. The human avatars were mapped with high correlation (R$^{2}$ $= 0.85$) with actual positions on floor. Our energy consumption algorithm accuracy obtained true positive rate of $91.58\%$ and F1 score of $81.96\%$. Finally, all this information is transmitted and visualized to the 3D digital twin for remote monitoring and simulation.

A 3D localisation method in indoor environments for virtual reality applications

Article Open access 13 October 2017

Distance estimation technique from 360-degree images in built-in environments

Article Open access 13 October 2023

Real-Time Multi-view Grid Map-Based Spatial Representation for Mixed Reality Applications

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Electricity consumption globally has continued to increase rapidly. Between 1990 and 2021, the world’s annual electricity consumption rose from 10,000 TWh to 25,100 TWh [1]. As the rise of technology and population increases at an alarming rate so is electricity consumption. Building spaces are one of the important places for optimizing usage of energy based on actual occupancy. Holmin et al. [2] discusses electricity and cost reduction which is based on layout and size of Origo office space and made three proposals based on ratio of occupancy and number of available workspaces. An open office design proposal led to an annual electricity consumption reduction of 686 KWh, activity-based workplace 540 KWh and an individual office design led to 452 KWh [2]. In a white paper published by Dassault Systems, five instances were demonstrated where digital twin technology could create an economic benefit of $\$1.3$ trillion and reduce $C{O}_2$ emissions by $\$7.5$ gigatonnes between the years 2022 and 2023 [3]. This paper proposes a human in the loop system to use electricity in intelligent way by observing room occupancy and corresponding power consumption in a digital twin of any office space. The proposed system will allow building management teams to make informed decisions and create plans to minimize consumption by

Estimating room occupancy maintaining the privacy of individuals,
Calculating total energy consumption along with individual equipment monitoring using legacy webcam, and
Visualizing the real estate with inhabitants in 3D.

A VR digital twin of an office and laboratory space has been developed. In this context, A VR digital twin is a virtual replica of a physical environment or object, recreated in a virtual reality (VR) setting to simulate, analyze, or interact with real-world entities in a digital space. It allows immersive exploration and manipulation of physical elements within a virtual environment. With the legacy monocular webcam, the person detection algorithm has been able to attain an accuracy of $96.04\%$, while the power consumption estimation algorithm has a true positive rate of $91.58\%$. The model is already deployed in British Telecom office space for measuring room occupancy and power consumption in real time.

This paper is structured as follows. Section 2 explains literature literature regarding digital twins, mapping between real and virtual worlds, and electric energy consumption using computer vision and machine learning techniques. Section 3 describes proposed methods used for mapping and measuring energy consumption. Results are discussed in detail in Sect. 4, followed by general discussion in Sect. 5 and conclusion and future work in Sect. 6.

2 Literature survey

In tandem with efforts to enhance room occupancy and power consumption management, considerable research has concentrated on providing more convenient ways for occupants to directly interact with appliances. Some systems exhibit a subset of all appliances based on the user’s estimated location, yet struggle to differentiate between appliances of the same type. Alternatives incorporate various sensors and actuators to aid users in selecting appliances, some even providing feedback. However, these approaches often entail significant deployment overhead or prove less suitable for commercial building contexts. In the following sections, the related literature regarding digital twins, mapping between real and virtual worlds, and electric energy consumption using computer vision and machine learning techniques are described.

2.1 Digital twins

The inception of digital twin concepts traces back to NASA’s Apollo program [4], initially deployed during live missions to replicate critical scenarios faced by the crew. The formal definition of Digital Twin by NASA [5] in 2012 established it as an integrated, multifaceted simulation of an as-built vehicle or system. This simulation integrates various physics, scales, probabilities, sensor updates, and historical fleet data to faithfully mimic the life and functions of its real-life counterpart. In our context, digital twins represent real-time, data-rich models that accurately mirror and synchronize with physical or logical assets, processes, or systems. Leveraging real-time data, these models forecast and optimize system behaviors in advance, enabling informed decision-making and leading to significant savings in time and resources [6]. In contrast, a traditional model can only give a behavior snapshot at a specific moment, digital twins accurately describe change over time and have a bi-directional connection to its real-world counterpart via real-time data feedback [7]. Tao et al. [8] highlighted the state-of-the-art in industrial DTs. Khajavi et al. [9] explored a DTs’ use in a smart building scenario by replicating a part of its front facade. The facade was visualized by assigning different yellow shades to the respective lux values received from the sensor. Several commercial solutions have also emerged due to the diverse possibilities and benefits. One example is the Azure Digital Twins (ADT) [10], a cloud-based service to support DT deployment by providing it as a software as a service solution. Steelcase, a company known for designing workspaces, developed a space-sensing sensor network using ADT [11]. By implementing a suite of wireless infrared sensors, they generated analytics on how their spaces were being utilized, which in turn was used to enhance reliability and efficiency. ICONICS [12] also utilized ADT to create a virtual representation of a physical space to improve energy efficiency, optimize space usage, and lower costs. techniques. In a recent published paper, British Telecom has explored the use of DTs in telecommunications for energy modelling, capacity management, in-building network design and knowledge transfer [6]. Mukhopadhyay et al. [13] created virtual environment for creating dataset to train machine learning model and showed as an alternative to conventional dataset preparation which is necessary for any supervised learning process.

2.2 Mapping between real and virtual worlds

Replicating real-world movement to virtual space is not straight-forward problem. There is few research found in literature solved this problem by raycasting technique [14, 15], planar map [16], “ecef" coordinates [17], using GPS system [18] or by using wearable computer system [19]. Mukhopadhyay et al. [14, 15] calculated a direction vector to establish a direction to travel to centroid point from virtual camera or to place the humanoid in VR. To get the direction vector, they mimicked camera position in real world to virtual world. Sun et al. [16] proposed a system to match a pair between virtual and physical world using planar map. They first computed a planar map between real and virtual floor plans to minimize angular and distal distortions. In Singapore, ArtScience Museum and Google Zoo helped people to experience effects of deforestation using Augmented Reality (AR) [17]. They mapped between real and virtual world space using Google Tango which can give exact position and orientation (ecef coordinates) in WGS84, US GPS system. They transformed unity world to sit on top of ecef coordinates to overlap virtual world correctly with real world. Hanke et al. [18] used real to virtual world mapping associated with a parallel reality game. Cheok et al. [19] presented two interactive games by using real world and virtual world mapping. They converted the real world to a fantasy virtual playground by ingraining the latter with direct physical correspondence.

2.3 Measurement of energy using intelligent techniques

Measurement of energy consumption by electrical appliances using computer vision techniques is challenging. The existing literature can be put into multiple subgroups based on their working principles.

Electric consumption prediction models:
- Jiang et al. [20] proposed non-intrusive load monitoring using deep learning models for electric consumption prediction.
- Olu-Ajayi et al. [21] compared machine learning algorithms for predicting annual energy consumption in residential buildings.
- Rahman et al. [22] developed a deep recurrent neural network for mid-to-long-term electric load prediction.
- Gao et al. [23] introduced deep learning models and a transfer learning framework to enhance energy consumption prediction for new buildings or equipment.
Electric component detection
- Abeykoon et al. [24] compared classifier models to detect electric components based on their parameters.
- Chui et al. [25] proposed a powerline noise transformation approach to merge electricity load disaggregation datasets.
Building energy performance prediction
- Seyedzadeh et al. [26] reviewed machine learning approaches for forecasting and enhancing building energy performance.
- García et al. [27] provided an extensive review of machine learning methods for estimating energy consumption.
Occupancy and equipment detection for energy saving
- Tien et al. [28] utilized computer vision for occupancy and equipment detection to predict energy savings in buildings.
Building cooling load prediction
- Kwok et al. [29] discussed a probabilistic entropy-based neural network model for predicting building cooling loads.
- Yezioro et al. [30] used simulation tools and artificial neural networks to assess heating, cooling, and energy consumption in buildings.

Summarizing the literature on electrical power consumption/energy consumption estimation using intelligent techniques, the literature primarily focuses on predicting electric consumption using machine learning and deep learning models, encompassing non-intrusive load monitoring, energy forecasting in buildings, component detection, and methods to estimate building cooling loads. Researchers have employed various techniques, including deep neural networks, transfer learning, and noise transformation, for accurate electric consumption prediction and enhancing building energy efficiency. Additionally, studies have utilized computer vision for occupancy detection and employed simulation tools alongside neural networks for energy assessment in buildings.

2.4 Summary

In summary, past literature has primarily focused on using DTs in industrial scenarios [8]. While there is literature on using twins for workspaces, only Nikolakis et al. [31] focus on mapping a person’s position and posture using expensive depth cameras. In this work, a cost-effective method is proposed in mapping position of person between real and virtual world. In terms of measuring the energy consumed, so far researchers have proposed a predictive model. It requires lengthy and costly data collection procedures. A similar work is carried out by Tien et al. [28], where authors proposed a computer vision-based approach to reduce building energy consumption by detecting room occupancy and equipment usage. However, their focus was only on detecting monitor screens to correlate occupancy and energy consumption. This overlooks other appliances such as lights and fans that also contribute to energy consumption. Additionally, their reported accuracy rate of 80% is lower than our proposed method (91.58%). Our approach addresses these issues by incorporating a variety of appliances commonly used in office spaces and logging data on both room occupancy and energy consumption. This allows for pervasive actions to be taken by the floor management team. The information was fused into a DT to maintain privacy of individuals while obtaining an overall estimation of occupancy and energy consumption in any given place.

3 Research methodologies

The digital twin (DT) of office space was developed using the Unity 3D [32] game engine and Probuilder [33] modeling tool. This virtual twin accurately mirrored the dimensions of physical space and replicated furniture and other elements. Employing baked global illumination enhanced the photorealism of virtual environment (VE) by precomputing lighting behavior as texture files, reducing real-time computational demands. Additionally, physically based materials (PBR) [34] were utilized to authentically simulate material properties and light reflection, achieving realistic visual effects. Figure 1 illustrates the envisaged deployment of the Digital Twin implementation, integrating real-time data from cameras and Internet of Things (IoT) sensors such as temperature and humidity detectors. The physical-virtual linkage, established through sockets, enables mapping various environmental variables (e.g., temperature, humidity via DHT-11 sensor), occupancy status, and energy consumption (captured by on-site low-cost webcams and analyzed using computer vision techniques). Detailed procedures for mapping techniques and energy consumption estimation are described in the following subsections.

3.1 Person mapping

A transfer learning technique was utilized to fine-tune an object detection model for person detection by using the Open Image dataset, which comprises 2022 images annotated with persons. Here, object detection involves identifying and locating persons within an image, bounding them with a designated boundary or box for recognition and analysis. The model was trained, and performance testing was conducted on NVIDIA GeForce RTX 2070 GPU. The model’s performance was cross validated on both synthetic and real-world images, with an overall accuracy of 96.04% (std error 0.9) for real images and 96.98% (std error 0.13) for synthetic images. There are plethora of mapping techniques available for reconstructing a real space using multi-view stereo (MVS) [35, 36]. One widely used technique is COLMAP [36], which is an end-to-end image-based 3D reconstruction pipeline. It employs MVS to compute depth and/or normal information for every pixel in an image, using the output of Structure-from-Motion (SfM) [36, 37]. By fusing the depth and normal maps of multiple images in 3D, a dense point cloud of the scene is generated. However, this technique requires a large number of images from different viewpoints and high visual overlaps, making it slower and more time-consuming when creating a representation of real-world scenarios at a specific moment in time. Machine learning approach is advantageous because it only requires a single image to recreate a real-world scenario. As a result, it is faster and can consider real-time changes happening in the real world, accurately reflecting them in the virtual world. A linear regression model was employed to map the persons between the real-world and virtual world, which mapped 2D image coordinates to corresponding 3D points in the virtual world. In this context, linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data, predicting the value of dependent variable value based on the independent variable. In this case, as Y-axis being constant in virtual space irrespective of avatar’s position, ultimately mapping is being done between screen space (x, y) to virtual space (x, z) coordinates. The regression functions are formulated as:

$$\begin{aligned} X = a_{1} x+ b_{1} y+ c_{1}, \end{aligned}$$

(1)

$$\begin{aligned} Z = a_{2} x+ b_{2} y+ c_{2}, \end{aligned}$$

(2)

where, (X, Z) indicates virtual-space coordinates, (x, y) indicates screen-space coordinates, and $a_{1}, b_{1}, c_{1}, a_{2}, b_{2}, c_{2}$ are constant terms. The 2D coordinates were obtained as the centroid of the bounding box of detected persons in real-world space, and the humanoids were placed in corresponding locations in the digital twin of the real-world space to obtain 3D coordinates. The real-world person coordinates and virtual world avatar coordinates were found to be significantly correlated (coefficient of determination R$^{2}$ $= 0.93$). Finally, Mixamo’s [38] motion capture data was used to automatically rig the humanoid’s armature (base skeleton rig) so that it reflects realistic poses of a human. Realistic pose of humanoids corresponding to real world person movement helps in creating more accurate simulations and predictions for digital twin.

3.2 Energy consumption measurement

The objective was to develop an end-to-end system taking frames in real time and calculating the total amount of wattage consumed in the given space. The process was divided into two parts: (I) Training and validating an object detection model on an electric appliance dataset; (II) Developing an image processing algorithm on the detected objects to determine their states (ON/OFF). Figure 2 shows end to end working the proposed system.

Dataset preparation: A dataset was necessary to validate the object detection model, yet no existing dataset included indoor electronic/electrical appliances to our knowledge. Consequently, a custom dataset was prepared by capturing images within Indian household and office settings. People were requested across various Indian states to contribute images featuring at least two instances of specific target objects: fans, TV/monitors, tube lights, and various other light sources excluding tube lights. A total of 441 images were collected initially, with 70 images discarded due to containing only one object instance or distortion. This resulted in a final set of 371 images, depicting 359 fans, 382 TV/monitor instances, 209 tube lights, and 783 other light sources. Additionally, 48 new images were captured in laboratory and office environments for model evaluation. The details of instances per class are outlined in Table 1. Manual annotation of electrical appliance instances was performed using Computer Vision Annotation Tools (CVAT) [39], generating labeled data in XML format containing bounding box coordinates, later converted for YOLO model training.

Table 1 Overview of the electric appliance dataset

Full size table

Model description: Object detection methods vary, ranging from classical computer vision techniques to diverse deep learning models. Deep learning approaches typically fall into three primary categories. Two-stage models involve region proposal followed by classification within those proposed regions. Single-stage models perform object detection and classification simultaneously. Semantic segmentation-based models delineate and categorize each pixel within an image according to its class. After studying different type of state-of-the-art object detection models (two stage, one stage, segmentation based), the YOLO model was selected due to its accuracy and latency [40]. YOLOv5 was specifically employed for this implementation, trained within the PyTorch 1.7 environment. Various YOLOv5 models offer parameter ranges from 1.9M (YOLOv5n) to 86.7M (YOLOv5x). Selecting the YOLOv5s variant with 7.2M parameters was influenced by the modest size of the training dataset. The training involved 25 epochs with a batch size of 16, determined through a grid search for optimal epoch sizing. NVIDIA GeForce GTX 1080Ti facilitated model training, while inference time measurements were conducted on NVIDIA GeForce RTX 2070 with Max-Q Design.

ON/OFF detection: Once electric appliances are detected, the next step was recognizing their ON/OFF state. Detecting the ON/OFF status of the individual appliance is not a straightforward problem.

Different types of image processing techniques were applied to recognize these statuses. For instance, for detecting the ON/OFF status of a tube light or light_source, mean intensity of the detected region was calculated. It is important to note that the accuracy of these algorithms relies on various external factors such as illumination and reflectivity [41, 42]. To mitigate this issue, images were captured at different times of the day (morning, afternoon, evening) under diverse lighting conditions. Similar methodology was employed to determine the ON/OFF status of a TV/Monitor. It helped in finding global threshold for mean intensity.

During testing, two issues emerged: (I) Difficulty in recognition when TV/monitor operates at lower brightness, and (II) False detection of ON state during daytime due to shadows. To address these, the screen was divided into five regions (center, top left, top right, bottom left, bottom right). Utilizing histogram analysis of R, G, and B channels with a fixed threshold, the algorithm determined if all regions were above or below the threshold, counting the number of regions above. A positive count indicated the TV/Monitor as ON; otherwise, it was deemed OFF (Fig. 3 illustrates this). Detecting the FAN state posed the greatest challenge, tackled by considering the visibility of its blades. When the FAN operates at full speed (ON), its blades are not visible, contrasting with their full visibility when OFF. A segmentation technique [43] was utilized to compute the blade visibility ratio in both states, establishing an optimal threshold for FAN detection. The operational principle of the fan detection algorithm is illustrated in Fig. 4.

3.3 Embodiment of electrical component detection camera module

This module facilitates electrical appliance detection with two degrees of freedom for movement along two directions. A high-resolution (1080p) webcam, featuring a 55° field of view, is mounted atop the module, rotating at a speed of 1.6°/s with a 300 ms delay. It comprises two SG90 Servo motors and an Arduino Uno microcontroller. Calculations indicate that a 35° motor rotation covers the largest room area within approximately 22 s. Considering the frame rate of 30 frames per second of the camera, each frame requires 33 ms, with an additional 200 ms for processing. Factoring in GPU latency, network latency, and a 90 ms buffer, the total delay amounts to 300 ms. The proposed camera module is illustrated in Fig. 5. Using Fusion 360 [44], a digital model was created through CAD (Computer Aided Design), allowing iterative concept development and prototype design via 3D printing technology.

4 Validation

4.1 Experimental setup

The experimental setup incorporates a GPU system, legacy webcams, and Unity software, depicted in Fig. 1. To ensure zero blind spots in the target office space, multiple Logitech HD 1080p webcams were strategically placed in the physical space, interconnected to a local GPU. Details about the GPU configuration is provided in Sect. 3.2. Additionally, another system is arranged for running Unity software, as outlined in Sect. 3, to create the VR DT. Both the GPU and the Unity system are linked within a local network. The GPU is crucial in executing person detection and electric power consumption algorithms. The resulting information is transmitted to the Unity system, which is received and presented through dials, as demonstrated in the Additional file 1: Video (https://youtu.be/0Gc833mJQlI).

4.2 Person mapping

Procedure: In Sect. 3.1, the linear regression-based mapping procedure was described. Two validation techniques were employed to assess the accuracy of the mapping algorithm. Initially, predictions were made concerning the position of humanoids corresponding to individuals in the real world. Additionally, a correlation between Euclidean distances in both the real and virtual worlds for individuals was reported. Figure 6 depicts the positioning of individuals in the real world and their corresponding representation as humanoids in the virtual realm. The humanoids are generated in VR to match the 2D position determined by the person detection algorithm. Distances between individuals, their real-world positions (generated by the person detection model), and the anticipated distance and position derived from the mapping algorithm have been labeled.

Analysis: A video lasting approximately 2 min was captured using a camera placed in a room. The video underwent processing via a person detection algorithm to derive corresponding coordinates for each detected person. Subsequently, a tabulation of the Euclidean distance was performed in both real-world and VR contexts. Analysis encompassed examining correlations for distance and coordinates produced by the CNN and linear regression model. The findings revealed correlations of R$^{2}$ $= 0.85$ for coordinates and R$^{2}$ $= 0.5$ for distances. Figure 7 depicts the correlation graph for both coordinates and distances.

4.3 Energy consumption measurement

In this section, accuracy of electric appliance detection model is reported followed by energy measurement accuracy by the proposed system.

Object detection model: The accuracy of the model was assessed using IOU (Intersection Over Union), calculated as the ratio of the overlap area between the ground truth and predicted bounding boxes to the union area of these boxes. Precision, recall, and F1 score were reported for individual classes and overall model performance. Latency measurements were conducted on the NVIDIA GeForce RTX 2070 with Max-Q Design, showcasing a mean IoU of 0.32, an accuracy of $65.61\%$, and a speed of 86.11 FPS. Performance details were summarized in Tables 2 and 3.

Table 2 Accuracy analysis of YOLOv5 (class wise)

Full size table

Table 3 Accuracy analysis of YOLOv5 (overall)

Full size table

Energy estimation accuracy: Upon detecting electric appliances, the subsequent stage involves determining their ON/OFF status. A test was conducted using a total of 36 images captured at various times and locations. The total energy consumption of the identified devices was calculated, followed by the execution of image processing algorithms to determine the ON or OFF status of these devices. Metrics such as true positive (TP), false positive (FP), and false negative (FN) were recorded, enabling the calculation of the true positive rate (TP/total energy consumed by the detected appliances) and false positive rate (TP/total energy consumed by the detected appliances) for individual appliances.

The considered electric appliances in this study possess standard power ratings: a fan at 53 W, TV/monitor at 15 W, Tubelight at 20 W, and Light_source at 18 W. Precision scores were utilized to gauge accuracy of the model in estimating energy consumption within a given area. The overall true positive rate for detecting the ON or OFF state of electrical appliances stands at $91.58\%$, accompanied by an F1 score of $81.96\%$. The performance of the system is illustrated in Fig. 8, and a comprehensive summary is presented in Table 4.

Table 4 Accuracy analysis of energy consumption estimation

Full size table

5 General discussion

5.1 Summary

The proposed solution employs computer vision and a VR setup to remotely monitor buildings and spaces, aiming to enhance operational efficiency and optimize space management and asset utilization while ensuring individuals’ privacy, as this is built-in by design. A novel approach is introduced for measuring electrical energy consumption in any office space location using computer vision techniques with legacy webcams. Compared to traditional electrical meters, which require physical contact with the system being measured and are relatively costly (around USD 110), legacy webcams offer a more cost-effective alternative (around USD 28) [45]. Additionally, installing multiple traditional meters in small spaces like cubicles, meeting rooms, and cafeterias is infeasible as it would require a dedicated current sense transformer on each circuit, and each cubicle, meeting rooms is never wired up as a ‘one room-one circuit.’ In recent times, smart plug posed as a solution in saving energy [46]. It is cheap and plug-and-play device. However, the disadvantages of using smart plugs in office spaces include dependency on internet connectivity, potential disruptions, the ‘one electrical appliance, one smart plug’ approach, and privacy/security concerns. In comparison, our computer vision-based solution addresses this limitation as it requires only cameras for estimating energy. Moreover, traditional meters do not provide insights into individual appliances’ or devices’ energy consumption patterns, making it challenging to identify energy-wasting equipment [47]. The proposed system adopts a Non-Intrusive Load Monitoring (NILM) method to monitor the electricity consumption of individual appliances, leveraging the cost-effectiveness and convenience of NILM methods [20, 47]. The system’s evaluation utilized the energy-intensive NVIDIA GTX 1080Ti, yet this can be alleviated by employing edge computing boards. Our tests with the NVIDIA Jetson Nano found it highly efficient in running a YOLO model, consuming only 5–10 W of power, significantly less than even a tube light with a power rating of 20 W. Contrary to a ‘smart plug,’ images are not transmitted from edge computing devices; it sends only the location information of persons to the server system. Furthermore, a novel camera module is proposed to reduce the number of cameras needed for estimating power consumption.

Moreover, a practical method is introduced to measure room occupancy utilizing computer vision techniques and the identical cameras utilized for energy estimation. Recent attempts have explored estimating room occupancy by monitoring Wifi probe signals from mobile phones for slip and fall risk assessment [48]. However, this method entails expensive Google Cloud services for tracking, unlike our cost-effective computer vision-based approach. Data visualization techniques were applied to integrate the data into a digital twin, elucidating the model’s functionality for both real-world and digital twin applications.

5.2 Accuracy of the proposed system

Our implementation achieved a 0.85 correlation on human mapping. CNN-based object detection is rapidly evolving, with new models frequently appearing in the literature. Thus, the electric appliance detection accuracy (even with different factors) can further be increased using customized CNN models. However, our focus was not on developing CNN for detection but instead on proposing a novel way of detecting the ON/OFF status of electrical appliances using computer vision techniques. Moreover, the human mapping algorithm works using a webcam’s visual field placed in a room, but a future version will implement 3D distance measurement like Bertoni’s [49] Monocular 3D Localization algorithm.

5.3 Utility

The proposed VR prototype is deployed as a VR-based digital twin of an office space implementing real-time person occupancy, electric energy consumption and environmental variable monitoring capabilities through dashboards. This can be valuable for monitoring person occupancy and energy consumption measures in office spaces. Correspondingly, the floor management team can take pervasive action to control unnecessary energy consumption, which positively affects the environment. A second benefit could be that an observer could undertake a detailed remote virtual walk through the office space, which would not be possible with a standard multi-screen video from security cameras. Figure 9 depicts the actual and corresponding virtual space, demonstrating that individual identities remain undisclosed in the virtual representation. However, the floor management team can access detailed information, including occupancy, employee postures, total power consumption, and additional sensor data within the space.

The environment dimension of ESG (Environmental, Social, and Governance) focuses on minimizing the negative impact of business activities on the environment and promoting sustainability. This paper intended to create cost effective environments by implementing remote room occupancy monitoring and automatic energy consumption detection. By accurately estimating energy consumption in shared spaces and implementing an energy consumption algorithm, the proposed system can facilitate the identification of high-energy-consuming areas or appliances. This information can be used to develop strategies to reduce energy consumption and subsequently lower carbon emissions associated with the operation of offices and houses. By visualizing the energy consumption data, stakeholders can identify opportunities for energy optimization, implement energy-saving measures, and make more sustainable choices.

5.4 Value addition

The global pandemic has reshaped the use of working space in offices dramatically due to the implementation of working from home (WFH) and indoor social distancing requirements. Now that people are returning to the offices, and there is a focus on hybrid working, current office designs must be re-evaluated to provide safer and healthier environments, optimize spaces, and reduce costs. How companies manage spaces in the emerging new normal is more important than ever.

Many machine learning-based energy estimation prediction models have been proposed in the past years. However, most of these systems were data dependent. They need a vast amount of data to make a predictive model. Our paper proposes a new way of measuring energy consumed in a place. Our implementation can be used as a plug-and-play system to estimate consumed energy for any place. In doing so, the benefits are:

The conventional use of digital twins revolves around optimizing or simulating asset process life cycles or maintenance. An innovative application of digital twin technology allows the visualization of room occupancy and energy consumption within a workplace environment. This visualization tool significantly aids floor management in making informed decisions to enhance the overall environmental sustainability of the office space.
The digital twin implementation ensures privacy preservation for individuals. The human mapping algorithm provides real-time positioning in office spaces, utilizing randomly generated avatars to maintain privacy-by-design, avoiding disclosure of identity or gender. While Tien et al. [28] discussed a simulation model for the working space, their work overlooked the critical aspect of privacy, particularly crucial within a working environment.
Utilizing an image processing pipeline enabled the detection of appliance ON/OFF states, showcasing a unique approach. Detecting the fan’s ON/OFF state presented challenges with conventional techniques, compelling us to devise an innovative solution. Similarly, addressing the ON/OFF state detection of the TV/monitor involved discussing an inventive approach.

Occupancy detection and power consumption monitoring using machine vision can be extremely useful for HVAC (Heating, Ventilation, and Air Conditioning) regulation and optimization in buildings. Here the floor manager can walk through the office space in digital twin and optimize the HVAC systems based on the occupancy level of an area and temperature and humidity data to ensure that the temperature and humidity levels are comfortable and conducive to productivity.

5.5 Implication of the study

The findings have substantial implications for energy management and building design. Firstly, the ability to estimate room occupancy and energy consumption at varying levels of granularity-ranging from individual rooms to entire buildings-via a single camera’s visual field surpasses the capabilities of standard energy meters. This granularity offers a detailed understanding of energy utilization patterns that were previously unattainable, providing valuable insights for optimizing energy usage across diverse spatial scales within buildings. Secondly, other research efforts leverage computer vision techniques alongside external data, such as latitude, longitude, and sunlight duration, to estimate solar energy generation and optimize energy utilization [50, 51]. Our study aligns with the Environmental dimension of ESG principles by aiming to mitigate adverse environmental impacts through remote room occupancy monitoring and automatic energy consumption detection. Moreover, the amalgamation of person occupancy, energy consumption, solar power utilization, and building location data could pave the way for optimizing energy usage within buildings. This integration of interior and exterior observations might foster an intelligent energy management system capable of dynamically adapting to environmental conditions and human presence.

6 Conclusion and future work

This paper introduces an innovative end-to-end system designed to estimate energy consumption in various settings. The system demonstrates a reported accuracy of predicting energy usage at an average of $91.58\%$. Additionally, a person occupancy method based on a human mapping algorithm is proposed. The information is transmitted to a 3D digital twin of the office space, allowing the floor manager to visualize occupancy and energy consumption in real-time, aiding in strategic planning. However, limitations include assumptions about uniform energy ratings for specific electrical equipment like TVs/Monitors and various light_sources. Efforts are focused on addressing these issues by collecting more diverse data to enhance the accuracy of the electrical appliance detection model. Future plans involve expanding the dataset with images from different countries and employing k-fold cross-validation to validate techniques across diverse regions, aiming to improve precision. As previously mentioned, the deployed system covers the British Telecom office area, accommodating up to 150 employees. Initial challenges involved determining optimal camera positions to prevent overlapping fields of view and addressing cable connectivity issues across all cameras to the base system. Subsequently, utilizing USB-3 cables notably improved the performance of the person and electric appliance detection algorithm. Additionally, for cost-effective solutions, legacy webcams and a privacy-by-design approach proved beneficial for real-time remote asset monitoring. To address scalability concerns, the system’s deployment should consider larger spaces by reassessing camera placements and their coverage areas to accommodate more employees efficiently. Introducing a networked setup or alternative connectivity solutions beyond cables might resolve integration challenges, allowing for seamless camera-to-system connections. Upgrading to USB-3 cables enhances the algorithms’ performance, but exploring cost-effective yet efficient devices, such as legacy webcams, ensures scalability without substantial expenses. Incorporating a privacy-by-design approach in device selection and deployment remains crucial for real-time remote asset monitoring, ensuring compliance with privacy regulations while maintaining system efficacy. Furthermore, the model underwent testing in various lab settings, with all data transmitted to the digital twin of a real-world space (refer to: https://youtu.be/0Gc833mJQlI).

Data availability

Will be available.

Code availability

Coding for a demo version will be available.

References

Enerdata. World Energy & Climate Statistics-Yearbook 2022. https://yearbook.enerdata.net/electricity/electricity-domestic-consumption-data.html/. Accessed 18 Apr 2023.
Holmin J, Levison E, Oehme S. The utilization of office spaces and its impact on energy use.
Morello M. Digital models of real systems can improve efficiency and boost the circular transition. https://www.renewablematter.eu/what-are-digital-twins-and-how-they-can-help- sustainability/. Accessed 18 Apr 2023.
Ferguson S. Apollo 13: The First Digital Twin. https://blogs.sw.siemens.com/simcenter/apollo-13-the-first-digital-twin/. Accessed 10 Mar 2021.
Glaessgen E, Stargel D. The digital twin paradigm for future NASA and US Air Force vehicles. In: Proceedings of the 53rd AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference; 2012. p. 1818–1831.
Peña-Rios A, Di Cairano-Gilfedder C, Krug L, Glover T, Conway A, Owusu G. Uncovering the power of digital twins for Telcos. ITP J. 2022;04(16):38–44.
Google Scholar
Bolton A, Enzer M, Schooling J, et al. The Gemini Principles: Guiding values for the national digital twin and information management framework. https://www.cdbb.cam.ac.uk/system/files/documents/TheGeminiPrinciples.pdf. Accessed 10 Mar 2021.
Tao F, Zhang H, Liu A, Nee AY. Digital twin in industry: state-of-the-art. IEEE Trans Industr Informat. 2018;15(4):2405–15.
Article Google Scholar
Khajavi SH, Motlagh NH, Jaribion A, Werner LC, Holmström J. Digital twin: vision, benefits, boundaries, and creation for buildings. IEEE Access. 2019;7:147406–19.
Article Google Scholar
Khelifi I. Azure Digital twins: powering the next generation of IoT connected solutions. https://azure.microsoft.com/en-in/blog/azure-digital-twins-powering-the-next-generation-of-iot-connected-solutions/. Accessed 10 Mar 2021.
Microsoft. Steelcase CEO: Building the Smart + Connected Workplace. https://www.steelcase.com/eu-en/research/videos/steelcase-ceo-building-smart-connected-workplace/. Accessed 10 Mar 2021.
Lange A. ICONICS makes smart buildings even smarter with Azure Digital Twins. https://www.linkedin.com/pulse/iconics-makes-smart-buildings-even-smarter- azure-digital-andr%C3%A9-lange/. Accessed 10 Mar 2021.
Mukhopadhyay A, Rajshekar Reddy G, Mukherjee I, Kumar Gopa G, Pena-Rios A, Biswas P. Generating synthetic data for deep learning using VR digital twin. In: Proceedings of the 5th International Conference on Cloud and Big Data Computing (ICCBDC); 2021. p. 52–56.
Mukhopadhyay A, Reddy GR, Saluja KS, Ghosh S, Peña-Rios A, Gopal G, et al. Virtual-reality-based digital twin of office spaces with social distance measurement feature. Virt Real Intell Hardw. 2022;4(1):55–75.
Google Scholar
Mukhopadhyay A, Reddy GR, Ghosh S, LRD M, Biswas P. Validating social distancing through deep learning and VR-based digital twins. In: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology; 2021. p. 1–2.
Sun Q, Wei LY, Kaufman A. Mapping virtual and physical reality. ACM Trans Graph (TOG). 2016;35(4):1–12.
Article Google Scholar
MediaMonks.: How to map a virtual world to a real space. https://medium.com/mediamonks/how-to-map-a-virtual-world-to-a-real-space-39094426ada2. Accessed 11 Nov 2022.
Hanke JV, Aubin MA, Dinkelacker J, Keslin PC, Spirakis C. Mapping real world actions to a virtual world associated with a location-based game. Google Patents. US Patent 9539498.
Cheok AD, Hwee GK, Wei L, Teo J, Lee TS, Farbiz F, et al. Connecting the real world and virtual world through gaming. In: Building the information society. Berlin: Springer; 2004. p. 45–50.
Chapter Google Scholar
Jiang J, Kong Q, Plumbley MD, Gilbert N, Hoogendoorn M, Roijers DM. Deep learning-based energy disaggregation and on/off detection of household appliances. ACM Trans Knowl Discov Data (TKDD). 2021;15(3):1–21.
Article Google Scholar
Olu-Ajayi R, Alaka H, Sulaimon I, Sunmola F, Ajayi S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J Build Eng. 2022;45: 103406.
Article Google Scholar
Rahman A, Srikumar V, Smith AD. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl Energy. 2018;212:372–85.
Article Google Scholar
Gao Y, Ruan Y, Fang C, Yin S. Deep learning and transfer learning models of energy consumption forecasting for a building with poor information data. Energy Build. 2020;223: 110156.
Article Google Scholar
Abeykoon V, Kankanamdurage N, Senevirathna A, Ranaweera P, Udawalpola R. Electrical devices identification through power consumption using machine learning techniques. Int J Simul Syst Sci Technol. 2016;17.
Chui KT, Gupta BB, Liu RW, Vasant P. Handling data heterogeneity in electricity load disaggregation via optimized complete ensemble empirical mode decomposition and wavelet packet transform. Sensors. 2021;21(9):3133.
Article Google Scholar
Seyedzadeh S, Rahimian FP, Glesk I, Roper M. Machine learning for estimation of building energy consumption and performance: a review. Visual Eng. 2018;6(1):1–20.
Article Google Scholar
García-Martín E, Rodrigues CF, Riley G, Grahn H. Estimation of energy consumption in machine learning. J Parallel Distrib Comput. 2019;134:75–88.
Article Google Scholar
Tien PW, Wei S, Calautit J. A computer vision-based occupancy and equipment usage detection approach for reducing building energy demand. Energies. 2020;14(1):156.
Article Google Scholar
Kwok SS, Lee EW. A study of the importance of occupancy to building cooling load in prediction by intelligent approach. Energy Convers Manage. 2011;52(7):2555–64.
Article Google Scholar
Yezioro A, Dong B, Leite F. An applied artificial intelligence approach towards assessing building performance simulation tools. Energy Build. 2008;40(4):612–20.
Article Google Scholar
Nikolakis N, Alexopoulos K, Xanthakis E, Chryssolouris G. The digital twin implementation for linking the virtual representation of human-based production tasks to their physical counterpart in the factory-floor. Int J Comput Integr Manuf. 2019;32(1):1–12.
Article Google Scholar
Unity. for all the creators. https://unity.com/. Accessed 10 Mar 2021.
Unity. ProBuilder. https://unity3d.com/unity/features/worldbuilding/probuilder. Accessed 11 Nov 2022.
Riveccie M. What are PBR materials. https://info.e-onsoftware.com/blog/what-are-pbr-materials. Accessed 10 Mar 2021.
Hu Y, Fu T, Niu G, Liu Z, Pun MO. 3D map reconstruction using a monocular camera for smart cities. J Supercomput. 2022;78(14):16512–28.
Article Google Scholar
Schonberger JL, Frahm JM. Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 4104–4113.
Schönberger JL, Zheng E, Frahm JM, Pollefeys M. Pixelwise view selection for unstructured multi-view stereo. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14. Springer; 2016. p. 501–518.
Adobe. Mixamo. https://www.mixamo.com/#/?page=1 &type=Motion%2CMotionPack/. Accessed 18 May 2023.
Intel. O. S. Computer vision annotation tool (cvat). https://github.com/openvinotoolkit/cvat/. Accessed 18 May 2022.
Mukhopadhyay A, Mukherjee I, Biswas P. Comparing CNNs for non-conventional traffic participants. In: Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings; 2019. p. 171–175.
Mukhopadhyay A, Mukherjee I, Biswas P. Comparing shape descriptor methods for different color space and lighting conditions. AI EDAM. 2019;33(4):389–98.
Google Scholar
Mukhopadhyay A, Murthy L, Arora M, Chakrabarti A, Mukherjee I, Biswas P. PCB inspection in the context of smart manufacturing. In: Research into design for a connected world. Berlin: Springer; 2019. p. 655–63.
Chapter Google Scholar
Qin X, Zhang Z, Huang C, Dehghan M, Zaiane OR, Jagersand M. U2-Net: going deeper with nested U-structure for salient object detection. Pattern Recogn. 2020;106: 107404.
Article Google Scholar
Autodesk.: Fusion 360 Software. https://www.autodesk.in/products/fusion360/overview/. Accessed 20 Jan 2023.
Energysage.: Best Home Energy Monitors in 2023 Explore other energy products. https://www.energysage.com/energy-products/energy-monitors/. Accessed 18 Jul 2023.
Asurion. Smart plugs: what they do and how to best use them. https://www.asurion.com/connect/tech-tips/smart-plugs-what-they-do-and-how-to-best-use-them-in-your-home/. Accessed 18 May 2023.
Aladesanmi E, Folly K. Overview of non-intrusive load monitoring and identification techniques. IFAC-Papers. 2015;48(30):415–20.
Article Google Scholar
Jinson K. Google will help insurers measure slip and fall risks in buildings. https://techiestechguide.com/google-will-help-insurers-measure-slip-and-fall-risks-in-buildings/. Accessed 18 May 2023.
Bertoni L, Kreiss S, Alahi A. Perceiving humans: from monocular 3D localization to social distancing. IEEE Trans Intell Transport Syst. 2021;23(7):7401–18.
Article Google Scholar
Carballo JA, Bonilla J, Berenguel M, Fernández-Reche J, García G. New approach for solar tracking systems based on computer vision, low cost hardware and deep learning. Renewable Energy. 2019;133:1158–66.
Article Google Scholar
Omdena. Computer Vision for Roof Detection and Solar Energy Output. https://www.youtube.com/watch?v=MnFkPazZULU/. Accessed 14 Dec 2023.

Download references

Funding

This work was supported by British Telecom IISc Research Consortium.

Author information

Authors and Affiliations

CPDM, Indian Institute of Science, Bengaluru, Karnataka, 560012, India
Abhishek Mukhopadhyay, Naveen R. Talwar, Himanshu Viswakarma & Pradipta Biswas
Computer Science, University of Colorado Boulder, 552 UCB, Boulder, CO, 80309, USA
G. S. Rajshekar Reddy
BT RMZ Ecoworld, Bellandur, Bengaluru, Karnataka, 560103, India
Shakti Srivastava
BT Adastral Park, Martlesham Heath, Ipswich, Martlesham, IP5 3RE, UK
Anasol Pena-Rios

Authors

Abhishek Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Naveen R. Talwar
View author publications
You can also search for this author in PubMed Google Scholar
Himanshu Viswakarma
View author publications
You can also search for this author in PubMed Google Scholar
G. S. Rajshekar Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Shakti Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Anasol Pena-Rios
View author publications
You can also search for this author in PubMed Google Scholar
Pradipta Biswas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AM worked on conceptualization, data curation, and writing the draft. NRT worked on software development, and writing the draft. HV worked on data curation, developing camera module. GSRR developed Virtual Reality (VR) environment, APR and SS supplied data and helped implementation and deployment, PB supervised the research.

Corresponding author

Correspondence to Abhishek Mukhopadhyay.

Ethics declarations

Consent for publication

The authors confirmed that human research participants (here, authors themselves) provided informed consent for publication of the images in Figs. 6 and 9. The authors took all the images, and they have complete copyright.

Competing interests

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mukhopadhyay, A., Talwar, N.R., Viswakarma, H. et al. VR digital twin of office space with computer vision-based estimation of room occupancy and power consumption. Discov Anal 2, 6 (2024). https://doi.org/10.1007/s44257-024-00008-z

Download citation

Received: 28 September 2023
Accepted: 05 January 2024
Published: 15 April 2024
DOI: https://doi.org/10.1007/s44257-024-00008-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

VR digital twin of office space with computer vision-based estimation of room occupancy and power consumption

Abstract

Similar content being viewed by others

A 3D localisation method in indoor environments for virtual reality applications

Distance estimation technique from 360-degree images in built-in environments

Real-Time Multi-view Grid Map-Based Spatial Representation for Mixed Reality Applications

1 Introduction

2 Literature survey

2.1 Digital twins

2.2 Mapping between real and virtual worlds

2.3 Measurement of energy using intelligent techniques

2.4 Summary

3 Research methodologies

3.1 Person mapping

3.2 Energy consumption measurement

3.3 Embodiment of electrical component detection camera module

4 Validation

4.1 Experimental setup

4.2 Person mapping

4.3 Energy consumption measurement

5 General discussion

5.1 Summary

5.2 Accuracy of the proposed system

5.3 Utility

5.4 Value addition

5.5 Implication of the study

6 Conclusion and future work

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation