Abstract
Big Data is steadily expanding beyond the boundaries of its foundational constructs of three primary Vs, Volume, Velocity and Variety, and two secondary Vs, Veracity and Value. The advent of 5G networks, Edge computing and IoT technologies has transformed Big Data into this modern context. With these new manifestations of Big Data, the focus is not only on the data itself but on the context that it applies to its immediate environment as well as the human and societal perception of this context. It is increasingly challenging for conventional AI algorithms to process and transform this data, analyse and visualise a broad spectrum of insights, and then formulate the explainability of such insights in terms of bias, transparency, safety, ethics, and causality. Self-structuring Artificial Intelligence (SSAI) addresses the limitations of conventional AI by adapting to the inherent structure of the data, incrementally learning and abstracting from this structure. SSAI has not been investigated in a cloud-based setting for generating explainable insights from these new types of Big Data. In this paper we propose a cloud-based architecture for explainable Big Data analytics using SSAI in highly-connected 5G and Edge computing environments. The proposed architecture is empirically evaluated on a commercial scale Big Data use case of Smart Grid for Smart Cities. The results of these experiments confirm the functionality and effectiveness of the proposed architecture.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the age of Artificial Intelligence (AI), the proliferation and ubiquity of Data has taken on new dimensions. As demonstrated by several studies, all Generative AI models are highly data dependent for the development of corresponding intelligence capabilities [1, 2]. This new paradigm of Big Data that we find ourselves in means that now we can no longer depend solely on the batch processing strategies of the not too distant past. We need to be able to capture and process the different streams of data at scale. In our conceptualisation of the platform we explore architectural and data processing strategies which will trade off between real time and batch based processing paradigms allowing for low latency and high latency data streams. We explore an event based streaming philosophy of data consumption and ingestion which allows for extremely high scale and speed of processing. The incoming data streams are considered as streams of events which are used to mirror and depict events which occur in the environment. The dual paradigm of processing spoken about above also allows the data that we use as the final version of truth to be constantly bolstered with the very latest data, as it is available. This in turn allows for the models and applications which rely on the data to be able to generate insights in real time while still augmenting and supporting those insights with new information as it becomes available [3, 4].
The rest of this paper is organized as follows. Section 2 explores related work in Big Data analytics platforms. Section 3 proposes a cloud based analytics platform architecture for SSAI and proposes an existing SSAI technique which will be suitable to showcase the platform. Experiments conducted using Household Energy Consumption data generated from the SGSC initiative are reported in Sect. 4. Section 5 concludes the paper with a discussion on limitations and the plausibility of future work.
2 Related work
The proliferation of Data in Cloud systems and the recent advances in technologies to tackle Big Data environments has been discussed extensively in literature. However, existing solutions do not effectively cater to a holistic solution, there remain to be open research questions existing in data staging, distributed storage, security and analysis [5,6,7]. The scale of data prevalent in the new Big Data paradigm demand non-conventional methods to process and generate insights from the data streams being generated. Al-Jarrah et al. have discussed data modelling on large datasets for machine learning from a theoretical and experimental perspective, toward optimising computational complexity [8]. It is necessary to instead look to non-traditional methods of harnessing Big Data’s potential [9]. An important consideration though is that invasive techniques that look to structure Big Data are generally cumbersome and eventually self defeating faced with the exponential growth we are seeing and predicting [10]. Harnessing the power of AI is one of the ways challenges posed by the new Big Data can be answered. O’Leary explores some of the way Artificial Intelligence can be used to facilitate process and analyse Big Data in this manner [11]. This intersection of Big Data Analytics and AI is demonstrated in a number of recent studies, such as, situational awareness from IoT data streams [12], intelligent detection of driver behavior changes [13], human activity recognition [14] and emotion detection [15]. However, in most such studies a cloud based strategy is not proposed and the solution does not focus on the seamless integration of real time and batch processed data, further they also do not address the need for the explainability of the insights generated.
As the world begins to depend further on machine learning models to generate insights, which in turn inform important decision making, it is even more important to be able to formulate the explainability of such insights in terms of bias, transparency, safety, ethics, and causality [16, 17]. There is significant research that has gone into the challenges faced in applying Explainable AI and the workarounds and solutions that are available [18,19,20]. It is important that any platform that seeks to apply Artificial Intelligence for insights also builds into its architecture a framework which can include Explainable AI models and explainers which can provide a reference to why those insights were generated.
There are advances also being explored to work with the class of Big Data which encompasses Sensor Data, Plageras et al. propose a sensor management system based on remote sensor deployments communication with a cloud based building management server [21]. This allows for centralised management of sensor data using the cloud as well as bringing the elastic nature of the cloud to the fore to be able to manage the inconsistent surges in Big Data. Mavromoustakis et al. take this idea further by speaking about distributing compute and storage in the local networks using “edge cloudification”, where small clouds operating at the network edge can accommodate the storage and compute needs of the local area [22]. This is an area that can be explored further in distributing AI along the continuum of cloud to edge, with demonstrated potential in diverse applications, such as energy [23, 24] and healthcare [25, 26].
3 The proposed cloud-based architecture for explainable Big Data analytics
3.1 Theoretical discussion
The proposed architecture is composed of three distinct layers; namely Process, Store and Serve. The Process layer serves as the gateway to the platform taking care of discovery, registration, provision and acclimatisation of the data streams [27]. Given the nature of the data being variable the processing layer lends a frame of reference from which the platform can adapt the data and synchronised. This means that the platform will be able to handle various velocities of data and carry the ability to sample them in a single frame of reference. The Process layer, by managing data stream discovery, registration, and acclimatization, ensures data integrity and transparency, which act as building blocks for explainability of the system.
The Store layer will provide the high performance stack which is required for handling the data coming through the Process layer at scale. It will manage and make the data available to the platform inline with its freshness and frequency of utilisation, utilising both “hot” and “cold” paths of delivery. The partitioning of the data in this manner will also afford the ability to be able to augment and mask the data inline with contextualisation and security considerations. This will allow for the efficient participation of the data in any analytic activities. All extracted and collected metadata from each stream will be democratised within the platform and will be used to inform the processing and storage processes as well as inform the up stream layers. The Store layer’s data partitioning and management strategies facilitate data traceability, enabling insights to be linked back to their data sources, enhancing transparency and accountability, which in turn speak to the explainability of the system.
The final layer of the platform is Serve, which will function as the wheel house of the platform; the framework will hold a suitable adapters to be able to retrofit different learning mechanisms. These learning mechanisms can perform in ensemble or individually. From an Edge perspective the learning mechanisms will surface latent models which can be deployed to run on a low capacity environment such that is prevalent on the edge of the network.
As mentioned, the proposed platform operationalises SSAI that addresses the limitations of conventional AI by adapting to the inherent structure of the data, incrementally learning and abstracting from this structure. The information processing functionality of the platform that supports SSAI is based on the principles of the lambda architecture [28], where we enable the processing of a real time stream (Real time processing layer) and a separate processing stream for higher latency data in batches, a batch stream (Batch processing layer). One inherent problem in employing principles of a lambda architecture is that there is a duplication of function and code in both the Speed and Batch layers, we look to overcome this by utilising a single repository of components which we utilise across both layers. the platform is illustrated in Fig. 1.
In the proposed platform, we utilise Apache Kafka [29] as our Streaming Manager, it will be used to publish and subscribe to all streams entering the system. It will work as the defacto gatekeeper to the system. The nature of Kafka means that it provides a fast message bus and will be the delivery point for all event streams into the system, including data from 5G low latency mMTC, URLLC networks. The Streaming Manager will work alongside Apache Spark Streaming system [30] to implement the real time processing branch. Other options which can be used here include Apache Storm [31], Samza [32] however we have chosen Apache Spark Streaming for its capabilities for Machine Learning, Graphing, SQL based querying and the cohesion of systems in the processing layers. In the Batch processing branch we will have Apache Spark fulfilling this role, here we could have also used such services as Hadoop but again we have chosen to utilise Spark for the reasons mentioned above.
Both these branches of processing will load data into the Stream Manager which will be implemented using Apache Druid. Apache Druid is is a column-oriented, high performance, open-source, distributed data store [33]. It provides capabilities for flexible, highly available, low-latency queries and fast slice-and-dice analytics ("OLAP" queries) on large data sets.
The AI Capabilities of the platform will be initiated on top of this high performance data delivery stack. The different learning mechanisms will be instantiated using pipelines which will sequence and apply machine learning capabilities. For the purpose of this article, we demonstrate the workings of the IPCL algorithm, which has been developed upon the principles of SSAI for incrementally characterises patterns in stream data and correlates these across time [24].
3.2 IPCL algorithm
Incremental Pattern Characterisation Learning (IPCL) is an unsupervised incremental learning algorithm where existing learned knowledge incrementally gets extended and updated as new data comes in. Incremental learning is a must for high velocity low latency data stream, as it is a constantly evolve. There will always be new unseen patterns appear, previously appeared patterns may appear later, so learning techniques have to keep the previously learned knowledge intact without discarding. IPCL algorithm supports the four key characteristics of an incremental learning technique [24]:
-
1.
Learn additional information from new data.
-
2.
Require access to the past data that it has already processed.
-
3.
Address catastrophic forgetting, thus should preserve the previously acquired knowledge
-
4.
Accommodate new classes that may be introduced with new data
IPCL self learns a layered structure across time generalising the knowledge embodied in data. Each layer learns from a buffered batch of data using GSOM self-structuring technique [34]. IPCL preserves the acquired knowledge in a generalised form therefore, it does not require access to the past data that it has already processed. The generalised version of the acquired knowledge from each layer (n) is used as the basis for the knowledge acquisition from the subsequent layer (n + 1), thus it avoids catastrophic forgetting of the past knowledge. Moreover, while using the past acquired knowledge as the base, it incrementally acquires new knowledge that is embodied in the upcoming data. This incremental learning capability of IPCL enables it to handle high-velocity low latency data streams as it does not need to look into past data again to learn the patterns.
The learning outcomes determined by the IPCL algorithm are presented to the Orchestrator, which is for choosing applying and coordinating these pipelines based on the data. The required Meta data for the pipelines will be augmented and delivered from the data factory. The Explainability Engine module at the end of the pipeline will respectively, be responsible for providing the explainability of the models generated and creating the appropriate augmentations to the data to make the results human friendly. The reasoning function of the engine will employ an ensemble of explainers including LIME [35] and SHAP [36] based techniques for exploring results, these will be integrated with the feature relevance to provide an Explainability Graph to rationalise the results delivered from the machine learning pipeline. The Explainability Engine interface will also allow for human input in consolidating and validating reasoning provided. The Explainability Engine will also have interfaces which will allow for new models of reasoning to be added in the future. This forms the framework by which insights generated by the SSAI are presented hand in hand with the ability to also interrogate it in terms of its explainability. The Serve layer, through the Explainability Engine, directly tackles explainability by providing understandable interpretations of AI decisions. The process and store layers provide explainability-enabling metadata, while the serve layer provides actual explanations of model outputs using that metadata and XAI techniques. These layers collectively ensure that the architecture not only supports but enhances the explainability of AI-driven insights.
3.3 Comparative evaluation
Several studies have proposed cloud-based, scalable architectures for Big Data analytics, including [37, 38], however a shortcoming of these has been the ability to integrate a Self-Structurng AI model, which is able to handle drift in both the data streams being processed and the data in itself. They also do not address this along with a suitable set of reasoning capabilities. The Orchestrator and the Explainability Engine which are modelled in the Serve layer prove themselves capable of addressing this gap. The framework also provides the ability to serve the latent representations of the model which are generated to out to the edge of the network. These latent representations lend themselves to being executed on a lower capacity hardware, suiting themselves for the use on the edge. This way the most of the data would be processed on the edge and the resulting information can be ingested back into the framework through the Process layer. This is another notable point of difference which was observable in conventional architectures that we evaluated, please see Table 1 for this evaluation.
4 Experiments and results
In the push for smart cities, smart grids are key components in understanding and addressing the burgeoning demand for energy, while fulfilling the requirement for energy efficiency, sustainable energy management, and carbon neutrality. The real time monitoring capabilities of Smart Grids inculcate the ability to understand the demand for energy and allow energy providers to adjust rates accordingly to match supply and demand. By doing this, household and industrial consumers can change their behavioural and lifestyle factors that directly impact usage patterns to reduce the load on the grid. This setting of multiple stakeholders with varied objectives in a multi-layered composition is well-positioned for the evaluation and demonstration of the proposed cloud-based architecture for explainable Big Data analytics. However, smart grid data-streams from a metropolitan city is technically challenging as it encompasses frequent updates from thousands of household smart-meters depicting the qualities of a high-velocity low latency data stream of hundreds of millions of data-points per day.
For this experiment, we utilised 30-min interval load data collected for the Smart Grid for Smart City (SGSC) project [39]. The dataset consists of data points collected from 30 min interval-reading of 78,000 households in New South Wales, Australia from 2010 to 2014. This dataset is streamed to the proposed platform emulating real-time smart-meter interval-readings flowing to the platform. The processing layer of the platform receives data-stream and pushed into the IPCL algorithm which will be instantiated in the Machine Learning pipeline. Each IPCL layer is represented by readings from a 24 h period, given that the readings are half an hour apart this creates a 48-dimension vector (24 × 2 half-hourly reads). The IPCL algorithm develops a columnar arrangement that will represent patterns and will maintain continuity in learning across each time period. As noted in the expansion of the algorithm in Section 3, the initial four aggregate nodes generated as a part of the learning phase will then expand in the subsequent phases, incrementally growing and learning in this manner for all time periods. Nodes that do not grow in a certain phase will not be lost but will be retained and learning will be continued in subsequent phases when relevant. The streamlined nature of the platform will allow the algorithm to be run in near real time for processing and insights. The metadata and customer demographic data will be processed through the batch processing layer and augmented to the results from the machine learning pipeline. The results will be run through the Explainability Engine to generate explained insights, such as the ontological representation shown in Fig. 2
As discussed above, the platform will cluster the energy consumption data over time and cluster different Energy Consumption Pathways or Usage Profiles based on the characteristics were learned by the algorithm. The execution resulted in 12 different profiles which encapsulated the energy usage of the households. These profiles were then extrapolated and averaged to a representative energy consumption pattern visualised over time. Figure 3 shows the consumption of energy over the days of the month and the four segments of the day (the 30 min interval loads were split into 6 h segments from midnight to midnight.
Studying the visualisation we can see, the peaks and troughs commonly displayed across peak and off peak use (day and night) and also slopes denoting weekday and weekend usage. We can also see usage characteristics of different user profiles; Cluster 11 (denoted by P24_11) shows a relatively flat and low consumption pattern where energy usage does not vary much. Whereas we can see Cluster 3 (denoted by P24_3) show significant peaks in their usage. The data was then averaged over a 24 h period to have a closer look at the Energy Usage Characteristics of the different Usage Profiles, shown in Fig. 4. We can see that the patterns are consistent with the views from Fig. 3. Here we can also see Clusters 12 and 2 (denoted by P24_12 and P24_2) also seem to show similar behaviours to Cluster 3 albeit on a lower energy consumption scale.
If we consider the energy consumption profile for a day depicted in Fig. 4 we can see that there are peaks experienced in the morning between 8 a.m. and 9 a.m. followed by an immediate dip, where the occupants are possibly getting ready to leave for their day’s activities be it school or work. We see this gradually pick up from around 4 p.m. to 8 p.m. where people arrive home and follow through with activities through to dinner when they wind down for the night where we can see subsequents dips in energy consumption. Here interestingly we see that P24_3 and P24_1 both seem to have a decrease in energy consumption from 12am to 1am in the morning, which will suggest that there are occupants who stay up later in the night.
Next to delve deeper into the usage profiles and the patterns of energy usage that they exhibit, the platform also cross referenced the demographic profiles of the customers to the Energy Consumption Profiles derived by the platform. A median value for each demographic characteristic was established and was compared and cross tabulated against the mean of each of the Energy Usage Profiles. The results are shown in Fig. 5. This allows more granular analysis of the Energy Pathways adopted by the different User Groups and allows the plotting of cause and effect phenomenons that can be observed.
Here we can see Cluster 10 (denoted by P24_10) exhibit the largest consumption profiles, a closer look at their demographic profiles, show a high percentage of high income families with children and a median of four occupants, living in individual houses. Utilities include split system and ducted heating and a good percentage of them also have pools. This validates the energy consumption curve which is consistently high through the day. Group3 (denoted by P24_3) are also generally high income earners, who have their own houses, high electricity usage, ducted or split system air conditioning and a median of 4 occupants including children and 2 refrigerators. Which again bears out in the high energy usage and the high peaks and troughs in energy usages seen as they depart for work and school and come back from work. The late night energy consumption might suggest teenagers staying up later in the night.
In contrast if we consider Cluster 11 P24_11 we can see that the energy profile is relatively flat through the day and the average energy usage is quite low. A view of the demographic profile shows a single occupant with no children who is at home during, living in a unit on rent. There is an emphasis on Gas usage for heating water and cooking. The income profile also shows a low income. This might suggest elderly pensioners who are living on their own. There is also a note of use of pool pumps and solar usage, this can mean that these units are a part of a set of units and the landlord has installed a pool or solar generation.
If we consider the Explainability Graph generated by the system to explain the insights generated by the system, this further bears out the analysis that has been conducted.
Figure 6 illustrates household attributes across different 24-h profile segments, indicating diverse energy consumption patterns. Certain segments, such as Segment 2 and Segment 4, demonstrate a higher prevalence of both heating and air conditioning use, suggesting households with a need for energy due to climate control, which could be reflective of regions with more extreme temperatures. Segment 9 and notably Segment 10 present the highest attribute sums, particularly in technological connectivity and air conditioning use, hinting at segments with potentially larger or more technology dependent households that have constant energy demands. Conversely, other segments display lower sums for features like pool pumps and solar panels, possibly pointing to urban dwellings with less space for such amenities or more energy-efficient lifestyles. The variability across segments provides insights into the different energy needs and usage patterns.
The results of the experiment were positive and we were able to successfully demonstrate the effectiveness of the proposed cloud based platform architecture in hosting and applying a Self Supervised AI (SSAI) on data streams of Energy Consumption data and presenting an explainable narrative to validate the model. The experiment also demonstrated the practicality of the application of the IPCL using the platform on the Energy Consumption data. With the results generated by the system it will be possible for energy providers and regulators to correlate the grouping of attributes to people groups of those demographic profiles and be able to attribute energy consumption patterns and load characteristics to those profiles. This will allow the ability to propose most appropriate energy plans which will maximise the value and the returns offered. Due to the flexible nature of the platform and the enabled ongoing learning, these results can be obtained incrementally and in in near real time. This has far reaching consequences for Smart Grid application including efficient load distribution, providing benefits for consumers and power generators.
4.1 Addressing the challenges of Big Data at the Edge
The proposed platform rationalises a few of the challenges which are faced when working on the edge of the network. This includes the availability of general compute resources on the edge, in practice end nodes are not capable of handling analytical workloads. The platform’s ability to serve a minimal latent representation of the model capable of being executed on the edge and which can then provide back a reduced data footprint for further processing in the cloud alleviates this issue. This tackles a further issue of the large data load which would otherwise have to be passed on to the cloud. The final issue that is addressed is that of a limited energy budget being available on the edge; the platform addresses this need in its processing layer which is able to handle various velocities of data, coupled with the models that can be run on the edge, this means that the edge devices can be run on less demanding schedules and stretching the energy budget further. The inherent architecture of the platform also lends itself to addressing challenges such as additional security and service discovery through the store and processing layers, however these areas have been relegated to future work.
5 Conclusion
We proposed a cloud based architecture for a platform which can encapsulate and execute a Self Structuring AI algorithm. We were able to demonstrate the viability of the platform through the application of the IPCL algorithm on the SGSC household energy consumption data. The results generated showed distinct energy usage pathways for distinct demographic household profiles, which included peaks and troughs in daily usage patterns. This further attested to the performance of the model within the platform. Further the system was able to demonstrate meaningful visualisation and display a mechanism for explainability which were validated in the results section.
Future work will involve correlating the pathways with temperatures and other relevant data streams utilising the data fusion capabilities of the IPCL. The platform will also be integrated with further Self Structuring AI models such as the DGSOM [12] in approaching different use cases such as Traffic and Congestion. Further investigation will be undertaken on improving the visualisation and explainability provisions of the platform. Further extension to platform will be considered in terms of decoupling the processing of the platform down from the cloud through the Edge to the IoT devices in providing end to end governance of the datastream as well as exploring the distribution of processing and providing insights at varying levels at each stage of the platform.
Data availibility
The Smart-Grid Smart-City Customer Trial Data is available from the Australian open government data repository: https://data.gov.au/dataset/ds-dga-4e21dea3-9b87-4610-94c7-15a8a77907ef/details.
References
Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, Casas DdL, Hendricks LA, Welbl J, Clark A, et al. Training compute-optimal large language models. 2022. arXiv preprint arXiv:2203.15556
Biderman S, Schoelkopf H, Anthony QG, Bradley H, O’Brien K, Hallahan E, Khan MA, Purohit S, Prashanth US, Raff E et al. Pythia: a suite for analyzing large language models across training and scaling. In: International conference on machine learning. PMLR. 2023. p. 2397–430.
De Silva D, Burstein F, Jelinek HF, Stranieri A, et al. Addressing the complexities of Big Data analytics in healthcare: the diabetes screening case. Aust J Inf Syst. 2015. https://doi.org/10.3127/ajis.v19i0.1183.
Nawaratne R, Bandaragoda T, Adikari A et al.: Incremental knowledge acquisition and self-learning for autonomous video surveillance. In: IECON 2017-43rd annual conference of the IEEE industrial electronics society. IEEE. 2017. p. 4790–5.
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU. The rise of “Big Data” on cloud computing: review and open research issues. Inf Syst. 2015;47:98–115. https://doi.org/10.1016/j.is.2014.07.006.
Ji C, Li Y, Qiu W, Awada U, Li K. Big Data processing in cloud computing environments. In: 2012 12th international symposium on pervasive systems, algorithms and networks. 2012. p. 17–23.
Chen M, Mao S, Liu Y. Big Data: a survey. Mobile Netw Appl. 2014;19(2):171–209.
Al-Jarrah OY, Yoo PD, Muhaidat S, Karagiannidis GK, Taha K. Efficient machine learning for Big Data: a review. Big data, analytics, and high-performance computing. Big Data Res. 2015;2(3):87–93. https://doi.org/10.1016/j.bdr.2015.04.001.
Katal A, Wazid M, Goudar RH. Big Data: Issues, challenges, tools and good practices. In: 2013 Sixth international conference on contemporary computing (IC3). 2013. p. 404–9.
Evans D. The internet of things: how the next evolution of the internet is changing everything. CISCO White Paper. 2011;1(2011):1–11.
O’Leary DE. Artificial intelligence and Big Data. IEEE Intell Syst. 2013;28(2):96–9.
Mills N, de Silva D, Alahakoon D. Generating situational awareness of pedestrian and vehicular movement in urban areas using IOT data streams. IEEE Intern Things J. 2020;7(5):4395–402.
Nallaperuma D, De Silva D, Alahakoon D, Yu X. Intelligent detection of driver behavior changes for effective coordination between autonomous and human driven vehicles. In: IECON 2018-44th annual conference of the IEEE industrial electronics society. IEEE. 2018. p. 3120–5.
Nawaratne R, Alahakoon D, De Silva D, et al. Hierarchical two-stream growing self-organizing maps with transience for human activity recognition. IEEE Trans Ind Inf. 2019;16(12):7756–64.
Chamishka S, Madhavi I, Nawaratne R, et al. A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling. Multimedia Tools Appl. 2022;81(24):35173–94.
Arrieta AB, Díaz-Rodríguez N, Ser JD, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R, Herrera F. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fus. 2020;58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012.
Yu, H., Shen, Z., Miao, C., Leung, C., Lesser, V.R., Yang, Q.: Building ethics into artificial intelligence. 2018. arXiv preprint arXiv:1812.02953.
Castelvecchi D. Can we open the black box of AI? Nat News. 2016;538(7623):20.
Lipton ZC. The mythos of model interpretability. Queue. 2018;16(3):31–57.
Došilović FK, Brčić M, Hlupić N. Explainable artificial intelligence: A survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE. 2018. p. 0210–5.
Plageras AP, Psannis KE, Stergiou C, Wang H, Gupta BB. Efficient IOT-based sensor Big Data collection-processing and analysis in smart buildings. Future Gen Comput Syst. 2018;82:349–57. https://doi.org/10.1016/j.future.2017.09.082.
Mavromoustakis CX, Batalla JM, Mastorakis G, Markakis E, Pallis E. Socially oriented edge computing for energy awareness in IOT architectures. IEEE Commun Mag. 2018;56(7):139–45.
Gamage G, Kahawala S, Mills N, De Silva D, Manic M, Alahakoon D, Jennings A. Augmenting industrial chatbots in energy systems using chatgpt generative ai. In: 2023 IEEE 32nd international symposium on industrial electronics (ISIE). IEEE. 2023. p. 1–6.
De Silva D, Yu X, Alahakoon D, Holmes G. Semi-supervised classification of characterized patterns for demand forecasting using smart electricity meters. In: 2011 international conference on electrical machines and systems. IEEE. 2011. p. 1–6.
Hartmann M, Hashmi US, Imran A. Edge computing in smart health care systems: review, challenges, and research directions. Trans Emerg Telecommun Technol. 2022;33(3):3710.
Adikari A, De Silva D, Ranasinghe WK, Bandaragoda T, Alahakoon O, Persad R, Lawrentschuk N, Alahakoon D, Bolton D. Can online support groups address psychological morbidity of cancer patients? an artificial intelligence based investigation of prostate cancer trajectories. PLoS ONE. 2020;15(3):0229361.
De Silva D, Alahakoon D. An artificial intelligence life cycle: from conception to production. Patterns. 2022;3(6):100489.
Kiran M, Murphy P, Monga I, Dugan J, Baveja SS. Lambda architecture for cost-effective batch and speed Big Data processing. In: 2015 IEEE international conference on Big Data (Big Data). 2015. p. 2785–92. https://doi.org/10.1109/BigData.2015.7364082
Kreps J, Narkhede N, Rao J, et al. Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, vol. 11. 2011. p. 1–7.
Maarala AI, Rautiainen M, Salmi M, Pirttikangas S, Riekki J. Low latency analytics for streaming traffic data with apache spark. In: 2015 IEEE international conference on Big Data (Big Data). 2015. p. 2855–8
Iqbal MH, Soomro TR. Big Data analysis: apache storm perspective. Int J Comput Trends Technol. 2015;19(1):9–14.
Noghabi SA, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, Campbell RH. Samza: stateful scalable stream processing at linkedin. Proc VLDB Endow. 2017;10(12):1634–45.
Yang F, Tschetter E, Léauté X, Ray N, Merlino G, Ganguli D. Druid: A real-time analytical data store. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. 2014. p. 157–68.
Osipov E, Kahawala S, Haputhanthri D, Kempitiya T, De Silva D, Alahakoon D, Kleyko D. Hyperseed: unsupervised learning with vector symbolic architectures. IEEE transactions on neural networks and learning systems. 2022.
Ribeiro MT, Singh S, Guestrin C. "why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13–17, 2016. 2016. p. 1135–44.
Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, editors. Advances in neural information processing systems 30. Curran Associates, Inc., 2017. p. 4765–74.
Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J. Edge intelligence: paving the last mile of artificial intelligence with edge computing. Proc IEEE. 2019;107(8):1738–62.
Services AW. Aws well-architected lens—data analytics lens. 2019. AWS Web Services
Motlagh O, Foliente G, Grozev G. Knowledge-mining the Australian smart grid smart city data: a statistical-neural approach to demand-response analysis. London: Springer; 2015. p. 189–207.
Funding
This research was partially funded by the La Trobe University Net Zero Program and the Australian Government's International Collaboration Networks Grant for Renewable and EV Grid Integration (ICN4CEEV).
Author information
Authors and Affiliations
Contributions
N.M, A.M, Z.I, T.B: Conceptualization, methodology, software, validation. N.M, T.B, D.D, M.M, A.J: data curation, formal analysis, writing—original draft preparation. N.M, A.M, Z.I: visualization, investigation. D.D, M.M, A.J: supervision, writing- reviewing and editing, project administration. All authors contributed to the study. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mills, N., Issadeen, Z., Matharaarachchi, A. et al. A cloud-based architecture for explainable Big Data analytics using self-structuring Artificial Intelligence. Discov Artif Intell 4, 33 (2024). https://doi.org/10.1007/s44163-024-00123-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44163-024-00123-6