Predictive intelligence to the edge: impact on edge analytics
 619 Downloads
Abstract
We rest on the edge computing paradigm where pushing processing and inference to the edge of the Internet of Things (IoT) allows the complexity of predictive analytics to be distributed into smaller pieces physically located at the source of the contextual information. This enables a huge amount of rich contextual data to be processed in real time that would be prohibitively complex and costly to deliver on a traditional centralized Cloud. We propose a lightweight, distributed, predictive intelligence mechanism that supports communication efficient aggregation and predictive modeling within the edge network. Our idea is based on the capability of the edge nodes to (1) monitor the evolution of the sensed time series contextual data, (2) locally determine (through prediction) whether to disseminate contextual data in the edge network or not, and (3) locally reconstruct undelivered contextual data in light of minimizing the required communication interaction at the expense of accurate analytics tasks. Based on this online decision making, we eliminate data transfer at the edge of the network, thus saving network resources by exploiting the evolving nature of the captured contextual data. We provide comprehensive analytical, experimental and comparative evaluation of the proposed mechanism with other mechanisms found in the literature over real contextual datasets and show the benefits stemmed from its adoption in edge computing environments.
Keywords
Edge analytics Predictive intelligence Evolving data streams Communication efficiency Context prediction Exponential smoothing1 Introduction
Edge analytics (Satyanarayanan et al. 2015) is an approach to efficient contextual data analysis in which computation is performed on sensing devices (sensors, actuators, controllers, concentrators), network switches or other devices (concentrators) instead of transmitting the whole data to a centralized computing environment/Cloud. By sending all the data from billions of IoT devices to the cloud can overwhelm the existing infrastructure. To overcome these issues, Edge Computing (EC) (The mobileedge computing initiative 2016; Stojmenovic and Wen 2014) is emerging bringing contextual data processing, networking, and analytics closer to the IoT devices and applications. EC represents a shift in which intelligence is pushed from the cloud to the edge, localizing certain kinds of analysis, e.g., aggregation operators over data streams, regression analyses, information inference and reasoning, and local decisionmaking (Yi et al. 2015). This enables quicker response times, unencumbered by network latency, as well as reduced traffic, by intelligently processing and relaying the appropriate analyzed data. Pushing analytics algorithms to IoT devices alleviates the processing strain on enterprise data management as the number of connected devices and the amount of data generated and collected increases (Vulimiri et al. 2015; Cheng et al. 2016).
1.1 Motivation
In IoT environments contextual information sources are considered as continuous/evolving data streams (multivariate time series), where analytics tasks are applied to extract statistical dependencies, aggregate analyics, and infer new knowledge. Contextaware applications, crowdsensing applications (Ganti et al. 2011; Lane et al. 2010), environmental monitoring (Oliveira and Rodrigues 2011), forest monitoring (Awang and Suhaimi 2007; Zervas et al. 2011; Kang et al. 2013; Anagnostopoulos et al. 2016) (through unnamed vehicles), agriculture monitoring (Nittel 2009), road traffic monitoring, surveillance, video analytics (Simoens et al. 2013), marine environment monitoring (Xu et al. 2014), watershed monitoring systems (Eidson et al. 2009; Nguyen et al. 2010) over largescale data streams require efficient, accurate and timely data analysis in order to facilitate (near) realtime decisionmaking, data stream mining, and situational context awareness (Kolomvatsos et al. 2016).
We abstract an edge network architecture through edge nodes forming a layer between sensing/actuator nodes and the cloud. Several Sensing and Actuator Nodes (SAN) are connected to each Edge Node (EN), e.g., cloudlet, sink node. Since ENs are located close to the SANs, contextual data should be intelligently transferred to them in realtime and in an energy efficient manner. Each SAN performs measurements and locally determines whether to transfer these measurements to the ENs or not in light of minimizing the required communication interaction (overhead) at the expense of accurate analytics tasks performed on the ENs. Based on this context, our idea rests on locally predict whether to disseminate sensed data or not within an edge network to achieve quality analytics by being communication efficient by exploiting the evolving nature of the captured contextual data and its reconstruction. However, this comes at the expense of the quality of analytics tasks. The fundamental requirement to materialize such predictive intelligence at the edge network is: (1) the autonomous nature of SANs to locally perform sensing and disseminate data under analytics qualitydriven rules and (2) the capability of the ENs to locally perform lightweight data reconstruction and robust analytics tasks.
1.2 Literature review

push the analytics tasks close to the contextual data sources, i.e., to the ENs, which have to follow the evolution of the contextual data streams;

push intelligence to SANs and ENs to collaboratively support edge analytics. ENs have to intelligently communicate with the SANs in an energyefficient way, since communication efficiency is crucial to the prolonged lifetime of the edge network to support edge analytics.

Distributed analytics This methodology is based on the observation that the SANs and ENs create the possibility of analyzing and building (training) predictive analytics models in a distributed way. In this class of edge analytics, e.g., Simonetto and Leus (2014), Kejela et al. (2014) and Gemulla et al. (2011), contextual data and/or model’s metadata are circulated within the edge network, which evidently requires energy for data and metadata dissemination adding extra communication overhead.

Groupbased centralized analytics This methodology refers to a groupbased communication and single localized computation/processing scheme e.g., Anagnostopoulos et al. (2012, 2014, 2016); Anagnostopoulos and Hadjiefthymiades (2014); McConnell and Skillicorn (2005); Papithasri and Babu (2016); Manjeshwar and Agrawal (2001). Specifically, an EN is responsible for a group of SANs and maintains a set of historical contextual data of each SAN within the group. Such localized method is communication efficient due to the reduced length of routing path from SANs to the cloud. To support such type of edge analytics, energy is consumed on communication, i.e., sending and receiving data from SANs to the EN, and computation, i.e., ENs are processing local data. However, since the cost of local processing and analytics tasks is nontrivial, we should take into account the tradeoff between intraedgenetwork communication and localized computation (Jiang et al. 2011).
In this work, we depart from the mechanism of the selective data delivery (Jiang et al. 2011) and provide a generalization of this mechanism to be adopted on EC environments with the aim to support communicationefficient edge analytics. Our generalized mechanism relies on the principle of boundedloss approximation at the ENs by the context prediction at the SANs. SANs locally decide on delivering contextual data to ENs based on local predictions, while ENs locally reconstruct the contextual data given an approximation (reconstruction) error bound. This error bound is controlled by the SANs.
Evidently, there is a tradeoff, that we should pay attention, between contextual data communication and accuracy of analytics due to approximation/reconstruction. On the one hand, by selectively transmitting contextual data increases the edge network life time and the available bandwidth, since less data are circulated. On the other hand, this comes at the expense of the quality of the predictive analytics tasks, due to local data reconstruction at the ENs.
1.3 Research objectives and contribution
The research objectives of our generalized mechanism are: (1) ENs employ a mechanism to reconstruct the undelivered contextual data by following the evolving nature of the data streams; (2) SANs are equipped with realtime context prediction (time series forecast) and data delivery decision. To secure an upper bound on the reconstruction error at ENs (which plays a significant role in the quality of the aggregation and predictive analytics tasks), SANs control their local prediction/forecast error based on the principle of selective data delivery. This is achieved by splitting this predictive intelligence to SANs and ENs: the former nodes locally predict the expected data and locally decide on their delivery given a prediction error bound; the latter nodes locally reconstruct the undelivered given a controlled reconstruction error bound by the SANs. The mechanism is applied when SANs need to communicate with ENs and when ENs need to reconstruct the undelivered data before proceeding with the scheduled analytics task. Should the IoT applications tolerate certain quality in the derived analytics tasks, e.g., prediction accuracy, model fitting approximation error and misclassification error, our mechanism is proved to be communication efficient as shown in our Sect. 5.

We present a distributed, communication efficient predictive intelligence mechanism for local prediction and local reconstruction within an edge network;

We provide the theoretical prediction and reconstruction error boundaries and their relationship;

We provide a comprehensive sensitivity analysis of the basic parameters of our model and showcase the tradeoff between accuracy (quality) of edge analytics (focusing on aggregation and multivariate linear regression) with communication overhead;

We provide a comparative theoretical and experimental assessment with the selective data delivery mechanism (Manjeshwar and Agrawal 2001) where the EN neighborhood’s formation is adopted from Papithasri and Babu (2016) in light of reconstruction, aggregation, data prediction errors and communication overhead.

We provide computational complexity analysis of our mechanism, which is higly computational efficient with O(d) prediction and reconstruction time over ddimensional contextual data streams, and its relationship to the Autoregressive Integrated moving Average model (Muth 1960).

We experiment with real contextual data from sensors and actuators networks.
1.4 Organization
The paper is organized as follows: in Sect. 2 we present our rationale and basic concept of the edge predictive intelligence formulated by certain definitions, preliminaries, and the fundamental metrics for evaluating our mechanism. Section 3 reports on the predictive intelligence split to the SAN and EN perspectives elaborating on certain policies for data delivery and reconstruction. In Sect. 4 we provide a theoretical analysis of the prediction and reconstruction error boundaries, the computational complexity of our mechanism and its relation to the linear forecasting model Autoregressive Integrated moving Average model (Muth 1960). In Sect. 5 we provide a sensitivity analysis of our mechanism with the basic model parameters, a theoretical and experimental comparative assessment with the selective data delivery mechanism (Manjeshwar and Agrawal 2001) and showcase the performance of the proposed mechanism with two real contextual datasets. Finally, Sect. 6 concludes the paper with future research agenda on edge analytics.
2 Edge predictive intelligence
2.1 Rationale

Case 1 If the predicted \(\hat{\mathbf {x}}_{t}\) differs from the actual sensed \(\mathbf {x}_{t}\) with respect to a decision threshold \(\theta \in [0,1],\) i.e., \(e_{t}> \theta \), then the SAN i sends the actual \(\mathbf {x}_{t}\) to the EN j.

Case 2 Otherwise, i.e., \(e_{t} \le \theta \), the SAN i does not send \(\mathbf {x}_{t}\) to the EN j. In this case, the EN j is responsible for reconstructing a context vector locally for further processing.
Given a decision threshold \(\theta \in (0,1)\) at SAN i, we study the performance of certain predictive analytics tasks on EN j. We qualitatively derive sufficient conditions for this and reveal that the decision is a function of both the desired error bound and the correlation among the sensed contextual data values. When the decision threshold is very tight or the correlation is not significant, the SAN i always has to send its context vectors to the EN j. Due to the characteristics and inherent dynamics of the SANs’ contextual data, when the underlying data streams distribution evolves over time, prediction / forecasting techniques may not work efficiently for a set of less predictable contextual data. Moreover, there might be correlations among contextual data from neighboring SANs (data locality in \(\mathcal {N}_{j}\)), thus, the EN j is capable of learning those statistical correlations in a communicationefficient way, as will be shown later. We provide certain definitions and preliminaries before elaborating on our distributed intelligence mechanism.
2.2 Definitions and preliminaries
Definition 1
(Sliding window) A sliding window \(\mathcal {W}\) is specified by a fixedsize temporal extent \(N>0\) (‘horizon’) by appending new context vectors and discarding older ones on the basis of their appearance.
For instance, at time t, a sliding window \(\mathcal {W}\) is a sequence of all context vectors observed from \(tN\) to \(t1\), i.e., \(\mathcal {W} = (\mathbf {x}_{tN}, \mathbf {x}_{tN+1}, \ldots , \mathbf {x}_{t1})\). As an example, an analytics query over \(\mathcal {W}\) could be: ‘continuously return all context vectors of the past hour, i.e., N=60 min’. The sliding window is the most widely used in continuous aggregation and fusion analytics functions (Dallachiesa et al. 2015; Patroumpas and Sellis 2011; Abadi et al. 2003, 2005).
The aggregation analytics tasks are evaluated over the contents of a window \(\mathcal {W}\). The aggregated results change over time as the window slides. We use the classification from Gray and Chaudhuri (1997) that divides aggregation functions into three categories: distributive, algebraic, and holistic. Let \(\mathcal {W}\), \(\mathcal {W}_{1}\), and \(\mathcal {W}_{2}\) be windows. An aggregation analytics function \(h: \mathcal {W} \rightarrow \mathbb {R}^{d}\) is distributive if \(h(\mathcal {W}_{1} \cup \mathcal {W}_{2})\) can be computed from \(h(\mathcal {W}_{1})\) and \(h(\mathcal {W}_{1})\) for all \(\mathcal {W}_{1}\), \(\mathcal {W}_{2}.\) An aggregation analytics function h is algebraic if there exists a ‘synopsis function’ \(\sigma \) such that for all \(\mathcal {W}\), \(\mathcal {W}_{1}\), and \(\mathcal {W}_{2}\): (1) \(h(\mathcal {W})\) can be computed from \(\sigma (\mathcal {W})\); (2) \(\sigma (\mathcal {W})\) can be stored in constant memory; and (3) \(\sigma (\mathcal {W}_{1} \cup \mathcal {W}_{2})\) can be computed from \(\sigma (\mathcal {W}_{1})\) and \(\sigma (\mathcal {W}_{2})\). An aggregation analytics function h is holistic if it is not algebraic. Among the standard aggregates, MAX and MIN are distributive, AVG is algebraic, since it can be computed from a synopsis containing SUM and COUNT, and QUANTILE, MEDIAN are holistic.
Example 1
We can define the AVG and MAX analytics functions: \(h^{avg}(\mathcal {W}) = \frac{1}{N}\sum _{k=tN}^{t}\mathbf {x}_{k}\) and \(h^{max}(\mathcal {W}) = [\max \{x_{1k}\}, \ldots , \max \{x_{dk}\}]_{k=tN}^{t}\), respectively.
In our case, the aggregation analytics function h is running on EN j for each sliding window \(\mathcal {W}\) containing M received and/or reconstructed context vectors from the SAN \(i \in \mathcal {N}_{j}\) depending on Case 1 and Case 2. Note that such functions are builtin constructs in IoTapplication specific continuous analytics queries.
Example 2
The aggregation analytics query ‘every minute find the average temperature and the maximum humidity over context streams ‘temperature’ and ‘humidity’ collected during the past hour’ in Continuous Query Language (Arasu et al. 2006) involving AVG and MAX operators in a sliding window \(\mathcal {W}, N=60\,{\text {min}}\) can be expressed as follows:
SELECT AVG(temperature), MAX(humidity) FROM Context Streams [RANGE 60 MINUTES SLIDE 1 MINUTE]
Note, typical progressive aggregates like SUM, MIN and AVG requires constant time O(1) per value since there is no need to scan the entire window (Patroumpas and Sellis 2006, 2010). However, more advanced aggregation analytics functions like outliers detection or concept drift detection in a sliding window \(\mathcal {W}\) require multiple scanning of the \(\mathcal {W}\). Aggregation analytics functions can be also combined on a EN to infer certain events that might trigger decision making.
Example 3
Consider the evaluation of a situational context (localized event stream processing) for the past 10 min as the activation of the following rule with conjunctive predicates associated with AVG and MAX aggregation analytics functions over ‘temperature’ and ‘windspeed’ sliding windows from two corresponding SANs:
EVENT := IF AVG(temperature) \(\ge \) 90 AND MAX(windspeed) \(\in \) [10,20] WITHIN 10 minutes THEN ACTION is ‘warning’
Definition 2
The aggregation analytics difference \(\beta _{i}\) denotes how much the aggregation results over the window \(\mathcal {W}\) on ED j with context vectors \(\mathbf {u}\) differ from the aggregation results over the window \(\mathcal {W}^{*}\) with context vectors \(\mathbf {x}\), should SAN i have sent all context vectors to ED j. Obviously, if we encounter only the Case 1, then \(\beta _{i} = 0, \forall i \in \mathcal {N}_{j}\). Now, since we allow SAN i to decide on sensing context vectors w.r.t. \(\theta \) and EN j being able to reconstruct undelivered context vectors, then \(\beta _{i} \ge 0\). The concept is how much an IoT application tolerates this difference in analytics results in light of communication efficiency in the edge network.
Example 4
Consider SAN i with context vector \(\mathbf {x} = [x_{1}, x_{2}, x_{3}]\) referring to the contextual parameters humidity, wind speed and temperature. The corresponding EN j is responsible for learning the statistical dependency \(\mathbf {w}_{i}\) between temperature (dependent variable \(y^{out} = x_{3})\) with humidity and wind speed (independent variables \(\mathbf {x}^{in}=[x_{1},x_{2}])\).
Moreover, the regression analytics task on EN j is applied from context vectors coming from different SANs.
Example 5
Consider the SAN i and SAN \(\ell \) with \(i,\ell \in \mathcal {N}_{j}\) sensing context vectors \(\mathbf {x}_{i} = [x_{i1},x_{i2}]\) and \(\mathbf {x}_{\ell } = [x_{\ell 1}, x_{\ell 2}, x_{\ell 3}],\) respectively. The EN j is responsible, e.g., for learning the linear dependency \(y^{out} = x_{i2}\) and \(\mathbf {x}^{in} = [x_{\ell 1}, x_{\ell 2}]\) between the contextual parameters from those SANs in \(\mathcal {N}_{j}\).
Definition 3
The RMSE \(\epsilon ^{*}_{i}\) is the linear regression error we obtain over the actual training pairs \((\mathbf {x}^{in},y^{out})\) since \(\mathbf {w}_{i}^{*}\) is the actual regression coefficient. However, since EN j may not receive the actual pairs all the time due to Case 2, then the derived regression coefficient \(\mathbf {w}_{i}\) results to a RMSE \(\epsilon _{i} \ne \epsilon _{i}^{*}\). We require to tolerate a low \(\gamma _{i}\) difference by being communicationefficient in the edge network.
Statistical learning analytics, like the discussed linear regression analytics, that have local computation are suited for the EC paradigm. Therefore, the regression learning task should be iterative in nature, which processes a single training pair at a time. Since the computation is carried out on the EN, the training algorithm should be lightweight and robust. The optimization algorithm suitable for these cases are based on the method of online learning (Bottou 2010) and the Stochastic Gradient Descent (SGD) is the most prominent among them. In this context, the EN j incrementally updates the coefficient vector \(\mathbf {w}_{i,t}\) at time instance t by moving a small step size (learning rate) \(\eta \in (0,1)\) along the negative gradient of the minimization function in Eq. (6) as shown in Algorithm 1. The training algorithm converges when there is no significant improvement of the \(\mathbf {w}_{t}\) coefficient, i.e., when \(\Delta \mathbf {w}_{i,t} \le \delta \), given a convergence threshold \(\delta> 0\).
Hence, given a decision threshold \(\theta>0\), our aim is to examine the impact of our predictive intelligence mechanism on (1) the reconstruction difference a, (2) the aggregation analytics difference \(\beta \), and (3) the regression analytics difference \(\gamma \) in light of communication efficiency by saving significant network bandwidth.
3 Distributed predictive intelligence
The intelligence of the proposed mechanism is split into two parts: (1) the SAN’s intelligence with respect to the local prediction algorithm \(f_{i}\) following the evolving nature of the data streams and (2) the EN’s intelligence with respect to the local reconstruction algorithm \(g_{j}\) that supports the analytics tasks introduced in Sect. 2.2.
3.1 Sensor–actuator node intelligence
Consider a SAN i and let us elaborate on the first part. Very complex prediction models are not practical in the discussed EC paradigm due to the limited (energyconstrained) computational capacity of the SANs. Fortunately, simple linear predictors are sufficient to capture the temporal correlation of realistic contextual data as shown by previous studies (Chu et al. 2006; Chowdappa et al. 2015; Anagnostopoulos et al. 2010). A sliding windowbased linear predictor is one of popular approaches to predicting the future based on past N measurements.
In this work, we are seeking to reduce the computational power for prediction and to use a small fraction of the SAN’s computing power by adopting a predictive function with low complexity and computational effort. Multivariate exponential smoothing, used for time series forecast, is an ideal predictor adopted in our case, as its computational complexity is O(d) in a ddimensional space (elaborated in Sect. 4). A simple exponential smoothing weighs the current sensed context vector \(\mathbf {x}_{t}\) and the historic context vectors (Durbin and Koopman 2012). This simple smoothing function is adopted as the prediction function \(f_{i}\) for the \(\theta \)based decision making.^{1}
3.2 Edge node intelligence
On the other side, the EN j, at time instance t either receives \(\mathbf {x}_{t}\) (Case 1) or nothing (Case 2). In Case 1, the EN j simply inserts the delivered \(\mathbf {x}_{t}\) into its corresponding window \(\mathcal {W}\) (which is associated with the SAN \(i \in \mathcal {N}_{j}\)) discarding the oldest context vector, i.e., \(\mathbf {u}_{t} = \mathbf {x}_{t}\). In Case 2, the EN j encounters an undelivered vector problem, since there is nothing to push in the sliding window \(\mathcal {W}\). Such undelivered context vectors must be reconstructed with the available context vectors \(\mathbf {u}\) reside currently in the \(\mathcal {W}\) at EN j. In order to achieve this, we propose three reconstruction policies, i.e., variants of the reconstruction function \(g_{j}(\mathcal {W})\). We should stress that, we require a computationally efficient reconstruction function on EN j, thus, being relatively a small overhead compared to the analytics tasks. Those policies are introduced below.
Policy 1 This policy, in Case 2, uses the most recent context vector from \(\mathcal {W}\) at EN j, i.e., the first element of the sliding window, as the reconstructed context vector. Therefore, the reconstructed context vector is inserted into the \(\mathcal {W}\) and the oldest context vector from the window is discarded. Note that, after this insertion, there are two duplicates of the most recent context vector in the window. There might be also the case where the entire window (of length N) would have contained the same context vector if the SAN i had not sent a context vector in the last N time instances. This denotes that, during this recent history of N time instances, the maximum difference of the sequentially sensed context vectors measured on SAN i is less that \(\theta \). In this case, it is redundant to send similar context vectors to EN j given a threshold \(\theta \). In Case 1, the EN j simply inserts the delivered \(\mathbf {x}_{t}\) into the window and discards the oldest context vector. Policy 1 at EN j is provided by the Algorithm 2.
4 Theoretical analyses
4.1 Prediction error and reconstruction difference analysis
In this section, we analyze the relation of the local prediction error \(e_{t}\) at SAN i and its corresponding reconstruction difference \(a_{t}\) at EN j provided by a reconstruction algorithm (policy) \(g_{j}(\mathcal {W})\). The aim of this analysis is to demonstrate the evolving feature of our distributed mechanism to follow the contextual data streams on the SANs and then to decide on their delivery or not to their corresponding ENs. Specifically, we analyze the evolution of the local conditional expectation of the prediction error \(\mathbb {E}[e_{t}  e_{t} \le \theta ]\) conditioned on the event (Case 2): \(\{ e_{t} \le \theta \}\) and its relation with the expected reconstruction difference \(\mathbb {E}[a_{t}]\). The idea is that our mechanism ‘remotely’ monitors the evolution of the reconstruction difference in EN j by experiencing a local prediction error at SAN i. This provides us with further insight on the upper bound of the reconstruction difference, based on the local prediction (forecasting) error. The derived knowledge of this relation provides us further insights on the adopted policy (Policy 1, 2, or 3) at the EN j, thus, being able to adapt to different policies based on the evolution of the data streams.
 1.
Case A: \(\hat{\mathbf {x}}_{t} = \tilde{\mathbf {x}}_{t}\). In this case, the predicted context vector at SAN i is the same as the reconstructed context vector at EN j, e.g., SAN and EN are adopting the same algorithms for prediction and reconstruction.
 2.
Case B: \(\hat{\mathbf {x}}_{t} = \tilde{\mathbf {x}}_{t} + \varvec{\rho }_{t}\), where \(\varvec{\rho }_{t}\) is the vector discrepancy of the predicted context vector and the reconstructed context vector, given that \(e_{t} \le \theta \) at SAN i, with \(\mathbb {E}[\varvec{\rho } ] < \infty \).
Proposition 1
The expected reconstruction difference \(\mathbb {E}[a]\) at ENs is bounded by the expected prediction error \(\mathbb {E}[e]\) at SANs, i.e., \(\mathbb {E}[a] \le \mathbb {E}[e]\).
Proof
In Case A, where the predicted vector at SAN i is the same as the reconstruction vector at EN j, the expected reconstruction difference is bounded by the expected prediction error. Hence, the evolution of the reconstructed context vectors at the EN j are known to the SAN i, where the latter produces those vectors from its local predictor \(f_{i}\). This means that the SAN i knows the upper bound of the expected reconstruction error that the EN j will experience, thus, can adjust the decision threshold \(\theta \) to satisfy the accuracy needs of the launched IoT analytics application.
Note that, the expected prediction error \(\mathbb {E}[e]\) can be incrementally approximated at the SAN i by adopting the recursive approximation of the error mean \(\tilde{e}_{t}\) as: \(\tilde{e}_{t} = \tilde{e}_{t1} + \frac{1}{t}(e_{t}  \tilde{e}_{t1})\) for a large \(t>0\). Consider now the Case B.
Proposition 2
The expected reconstruction difference \(\mathbb {E}[a]\) at ENs is bounded by the expected prediction error \(\mathbb {E}[e]\) at SANs and the expectation of \(\xi \), i.e., \(\mathbb {E}[a] \le \mathbb {E}[e] + \mathbb {E}[\xi ]\).
Proof
The evolution of the times series \(\xi _{t}\) can be monitored in a training phase where the EN j sends the reconstructed vectors \(\tilde{\mathbf {x}}_{t}\) to the SAN i. After this training phase, the SAN i is aware of the expected discrepancy by approximating the mean of the time series \(\tilde{\xi }_{t} = \tilde{\xi }_{t1} + \frac{1}{t}(\xi _{t}  \tilde{\xi }_{t1})\). Based on this learned evolution of the time series \(\xi \), the SAN i knows the upper bound of the expected reconstruction error that the EN j will experience, thus, can adjust the application specific decision threshold \(\theta \). Moreover, during this training phase, the SAN i can send the pairs \((\mathbf {x}_{t},\hat{\mathbf {x}}_{t})\) to the EN j to locally approximate both the expected prediction error and the expected discrepancy. In this context, the EN j can adjust the current reconstruction policy (Policy 1, 2, or 3) by selecting the policy that corresponds to the minimum \(\mathbb {E}[a]\). In Sect. 5, we experiment with the proposed policies adopted by EN j to demonstrate the applicability of our model.
4.2 Predictability analysis and computational complexity
The derived ARIMA model in our case, consists of two parts: the linear autoregressive (AR) part and the moving average (MA) part. The AR part involves regressing the variable on its own lagged/past contextual vector. In our case, the lag \(p=1\). The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at one time instance in the past. The predictions adopted by the EWMA (i.e., ARIMA with only one past context vector; \(p=1\)) produced by the recursion in Eq. (17) are the minimum mean square error predictions (Muth 1960) by minimizing the expected squared prediction error: \(\mathbb {E}\left[ \mathbf {x}_{t}  \hat{\mathbf {x}}_{t}^{2}\right] \). Moreover, by adopting the multivariate AR(p) with lag \(p>1\), i.e., storing more than one past context vectors \(\mathbf {x}_{t1}, \ldots , \mathbf {x}_{tp}\) to proceed with \(\hat{\mathbf {x}}_{t}\) forecasting, then the space complexity is O(dp). In this case, in terms of prediction on SAN i, the computational time for calculating the linear autoregressive coefficients is \(O(d^{2}p)\) based on the ordinary least squares (OLS) autoregressive estimation. Given some evidence of nonstationarity, those coefficients should be reestimated regularly, which implies huge complexity not only on the SAN but also on the EN, where the latter node has to maintain n multivariate AR(p) models; one for each connected SAN. In that case, the computational complexity for reconstruction is \(O(npd^{2})\) at the EN. For those reasons of predictability, computational complexity, and scalability performances we chose the EWMA in SAN and in EN for prediction and reconstruction (as provided in Policy 3).
5 Performance evaluation
5.1 Datasets and experimental setup
In our experiments, we used two real datasets for assessing the performance of the proposed edge prediction intelligence mechanism. The first contextual dataset (DS1) was adopted by UCI (Vito et al. 2008). This dataset contains twelve SANs of chemical compounds and environmental parameters: CO, PT08.S1 (tin oxide), Non Metanic HydroCarbons, Benzene, PT08.S2 (titania), NOx, PT08.S3 (tungsten oxide), NO_{2}, PT08.S4, PT08.S5 (indium oxide), temperature, relative humidity, and absolute humidity. All these contextual parameters are required to measure the air pollution of a specific area. These data are collected every hour and refer to \(T=9357\) 12dimensional measurements with \(n = 12\) SANs and one EN. Inside this dataset missing values occur. For each SAN, we impute those missing values by adopting the missing value imputation method of linear interpolation. This method exploits two data points \((x_0,y_0)\) and \((x_1,y_1)\) to reconstruct a linear function to find for a specific x value the missing value y as follows: \(y = y_0 +\frac{y_1y_0}{x_1x_0} (xx_0)\).
The second contextual dataset (DS2) refers to 4dimensional contextual data collected by Raspberry Pi SANs deployed at the School of Computing Science, University of Glasgow (Hentschel et al. 2016). We used four different SANs’ that measured: two different room temperatures (room F121 and room S123), humidity and sound (room F121). This data is collected by an interval of 10 min and refer to \(T=1000\) 4dimensional measurements with \(n=4\) SANs and one EN. For comparison and reproduction, both datasets are normalised and scaled, i.e., each contextual parameter \(x \in \mathbb {R}\) is mapped to \(\frac{x\mu }{\sigma }\) with mean value \(\mu \) and variance \(\sigma ^{2}\), and scaled in [0,1] using \(\frac{\max \{x\}x}{\max \{x\}\min \{x\}}\). That is, each context vector \(\mathbf {x}\) is normalized and scaled in the ddimensional unit cube \(\mathbf {x} \in [0,1]^{d}\), with \(d \in (12, 4)\) for DS1 and DS2, respectively, and \(\theta \in [0, 1]\).
Based on our sensitivity analysis of our mechanism in Sect. 5.4, the experimental assessment was carried out with different values of the decision threshold \(\theta \in \{10^{5}, 10^{3}, 10^{2}, 0.05, 0.06, 0.1, 0.2\}\). Using the normalized and scaled datasets DS1 and DS2, the physical meaning of \(\theta \) is interpreted as the percentage change of a measured/sensed time series value \(x_{t}\) by: 0.0002, 0.02, 2, 10, 12, 20 and 40% respectively for the chosen \(\theta \) values, respectively. Those \(\theta \) values derived from our sensitivity analysis which examines the impact of \(\theta \) on the local prediction error \(e_{t}\) in Eq. (2) in SAN and the reconstruction error on the EN in our mechanism. The local predictor/exponential smoother in the SAN, in our experiment, uses \(\alpha \in \{0.5,0.7\}\) as suggested in Durbin and Koopman (2012). Moreover, in Policy 3, the reconstruction smoother function in EN adopts \(\alpha =0.7\). The sliding window size is set to \(N=10\). This selected size represents for DS1 a history of the last 10 h and for DS2 a history of the last 100 min. Our experimental set up includes seven \(\theta \) values and two \(\alpha \) values over three different policies (Policy 1, 2, and 3) for reconstruction on the EN. This leads to an overall of 42 experiments for each of the aggregation analytics function \(h(\mathcal {W})\): i.e., AVG, MAX and MIN, the linear regression analytics function and the reconstruction difference for evaluation. In order to objectively assess the performance of our mechanism, we implemented the baseline mechanism and also compare our mechanism with the TEEN model (Manjeshwar and Agrawal 2001). Which leads to an overall of 315 experimental results and one baseline solution. The baseline mechanism is produced by capturing the continuous contextual data and transmitting them from all SANs to an EN, without any predictive intelligence on SAN or EN. The TEEN model along with a theoretical comparative assessment is provided in Sect. 5.3, while in Sect. 5.4 we provide the comparative assessment of our model and TEEN.
5.2 Performance metrics
5.3 Model comparison
From an error analysis perspective, let us denote with \(\tilde{\mathbf {e}}_{t}\) and \(\mathbf {e}_{t}\) the prediction error of the TEENSAN i predictor and our SAN i predictor, respectively.
Proposition 3
Proof
Based on Proposition 3, the SAN predictor performs better than the TEENSAN in terms of context vector prediction. Section 5.4 reports on the experimental comparative assessment and sensitivity analysis of our mechanism and the TEEN model.
5.4 Sensitivity analysis and comparative assessment
In this section we provide a sensitivity analysis of our mechanism, especially focused on the decision threshold \(\theta \). Based on this analysis, we demonstrate the impact of \(\theta \) on the data reconstruction quality given a certain pair: SAN i and EN j. The outcome of this analysis is to provide useful insight on the appropriate range of the \(\theta \) values to ensure highly quality reconstructed data. In order to investigate this impact, we assess the sensitivity of our mechanism based on the following metrics adopted in signal processing for time series data reconstruction: (1) Kullback–Leibler (KL) divergence, (2) sum of squared residuals, (3) variance of the actual and the reconstructed time series, (4) coefficient of variation of the actual and reconstructed time series. For the sake of readability, we suppress the subscript of dimension k from the variable \(x_{kt}\) for \(k \in (1,d)\) and focus on the communication pair (SAN i, EN j).
By performing the introduced sensitivity analysis metrics in Eqs. (26), (27) and (28), we obtain the Figs. 2, 3, 4, 5 and 6. The purpose of analyzing these metrics is to identify the upper bound of \(\theta \) for selecting one dimensional time series from DS1; similar sensitivity results are obtained using DS2. Not only the sensitivity metrics are important for the appropriate and reasonable \(\theta \) values, but also to consider the relationship between increase of \(\theta \) and the percentage of communication overhead occur between SAN and EN (please refer to Fig. 7 top left,which shows this relationship). By combining Fig. 7 with Fig. 2, it is clearly applicable that given a small \(\theta \) value, the reconstructed time series \(\tilde{x}\) follows the actual time series x. This is caused by the fact that still high communication occur between SAN and EN. By increasing the values of \(\theta \), it leads to less communication overhead, while with a \(\theta> 0.2\) one can observe that only a couple of values are sent from SAN to EN. This is represented by the green lines in Fig. 2. It should also be mentioned that the number of communication is highly depending on the chosen \(\alpha \) value in the SAN. Looking at Fig. 2 with \(\alpha = 1\), which is equivalent to the TEEN model in SAN, it leads to the fact that only one value is sent from SAN to EN with \(\theta = 0.3\). Whereas for the same \(\theta \) value and \(\alpha =0.5\), more measurements are sent in between. Closely related to the number of communication between SAN and EN as well as the comparison of the reconstructed time series with the actual time series, is the probability density function p(x) for indicators of the behavior of high \(\theta \) values. Figure 3 shows the probability density functions (shown as histograms) of the actual and the reconstructed time series with fixed \(\alpha =0.5\) for different \(\theta \in (0.05,0.1,0.3,0.7)\). We can observe that an increase of \(\theta \) decreases the possibility to reconstruct the actual distribution in the EN, i.e., the reconstructed time series follows a significantly different probability density function with that of the actual time series, especially when \(\theta> 0.2\).
The aforementioned relationship between small \(\theta \) values and tightfitting of the distribution is undermined with the analysis of the sensitivity metrics. Not only is it impossible to follow the probability density function of the actual time series with increase of \(\theta \), further the sensitivity metric of SSR clearly identifies in Fig. 4 that using \(\theta> 0.2\) causes a loosefitting of the EN reconstruction model to the actual SAN model. While \(\theta \le 0.2\), the SSR is increasing over time but still in a reasonable range which is applicable as SSR growing in an exponential way. Besides the SSR metric, the coefficient of variation (CV) metric indicates the development of \(\theta \) and its impact on the probability distribution. Figures 5 (right) and 6 and show that by using \(\theta> 0.5\) produces an CV value of 0, which holds true for all policies and aggregation methods as well for the reconstruction case. From CV in Eq. (27) it can be seen that either the mean or the variance has to be zero if the CV is zero. This can clearly been observed in Fig. 5 (left) where variance is zero for \(\theta> 0.5\). This finding and the CV effect can be explained by the knowledge and insights we gained from the sensitivity metrics KL and SSR. After \(\theta> 0.5\) limited communication between SAN and EN occurs and, therefore, a low or zero variance time series is reconstructed in the EN. Moreover, from the behavior of the CV metric, we can observe that a reconstruction of the actual CV depends on (1) the policy adopted for reconstruction, (2) exponential smoothing or TEEN adopted on SAN. A comprehensive comparison between those cases is provided in Sect. 5.5.
Lastly, by evaluating the KL divergence of the reconstructed probability distribution in Fig. 4 (right), it is observed that, depending on \(\alpha \), the KL is only slightly increasing up to a \(\theta \) value of approximately 0.1. After having \(\theta> 0.1\), the loss of information is growing linearly. By adopting the TEEN in the SAN, it is applicable that the loss of information is higher for a lower value of \(\theta \) than using exponential smoothing with \(\alpha = 0.5\). It can be observed that the maximum loss of information is 3.5, which indicates a loss of 350% of communication. This maximum is reached for TEEN with \(\theta> 0.3\), and \(\alpha = 0.5\) for \(\theta>0.5\). In order to obtain information loss of less than 100%, then \(\theta \) should be less than 0.2.
5.5 Performance assessment
Based on the results obtained from our sensitivity analysis of our model in Sect. 5.4, in this section we evaluate the performance of our proposed method over elucidated datasets DS1 and DS2 and the defined performance metrics to illustrate the tradeoff between communication and error for the reconstruction, aggregation analytics and regression analytics differences. Independent of the specific differences, our aim is to reduce the percentage of communication only with a slight increase of the error.
Generally, over all evaluated differences, increasing the value of \(\theta \) is decreasing the number of communications towards the EN which is applicable from Fig. 7. The reason behind this is that \(\theta \) is demonstrating the tolerance for a change of the expected value and the actual/sensed one. Therefore high values of \(\theta \) are indicating that values can vary between a larger range before they are sent towards the EN (refer also to sensitivity analysis in Sect. 5.4). Furthermore, it is worth noting that the number of communications is highly dependent on the exponential smoother parameter \(\alpha \). Given the same \(\theta \) value, the number of communications is decreasing with higher values for \(\alpha \) at which \(\alpha =1\) is equivalent to the TEEN model under comparison. Increasing \(\alpha \) means reducing the influence of previous/historical measured data. Having \(\alpha = 1 \) denotes that the current measured value is compared only against the previous for a forwarding decision inside the SAN. The following assessments are achieved by adopting the three reconstruction Policies in the EN with values for \(\alpha \in \{0.5,0.7, 1\}\) and \(\theta \) values up to the upper bound of 0.2 resulted from Sect. 5.4, i.e., \(\theta \in \{10^{5}, 10^{3}, 10^{2}, 0.05, 0.06, 0.1, 0.2\}\). In further evaluation the shown figures represent only some results of our evaluation of the proceed 315 experiments for each dataset. They are representing the general outcome. Moreover, it should be note that the displayed dots inside the figures are values for \(\theta \) with the relation towards its difference and its communication.
5.5.1 Reconstruction difference assessment
It is worth noting that, as shown in Figs. 8 and 9, using our proposed predictive intelligence for edge analytics on DS1, a communication overhead of 30% can be saved by tolerating an reconstruction error of less than 1%. If IoT applications can tolerate up to 2% error in their analytics accuracy, it is possible to save between 50 and 60% of communication. Saving this amount of communication with a given tolerated error would increase the lifetime of an edge network between 30 and 50%. Moreover, in Fig. 9, it is applicable that for DS2 an even extreme tradeoff is performed. Depending on the chosen value for \(\alpha \), without any error produced, up to 50% communication can be saved. Tolerating a relatively slight 2.5% error, it is possible to save even up to 70% of communication. This phenomenon can be explained by considering that DS2 is measuring every 10 min, which is causing similar or even identical measurements because of slightly changes in the indoor environment of the School of Computing Science. In the contrary, DS1 is measuring every hour, thus, the surrounding environment could change significantly towards the previous measure.
5.5.2 Aggregation analytics assessment
For DS1, it is observed in Fig. 10, that a reduction of 20% by communication for MIN, MAX and 30% for AVG is only generating an error of 0.5%. Therefore, it is possible to increase the lifetime of a network up to 30% with tolerating a slightly difference towards the true result. For IoT applications that can tolerate a higher discrepancy for this kind of aggregation functions, they can save up to 50% with an error of 1.5–2%. In comparison with DS2, as illustrated in Fig. 11 for all three aggregation functions, it is possible to use only 60% of the communication without any difference towards the aggregation analytics output. A further 20% can be saved by tolerating an accuracy difference of maximum 0.5%.
5.5.3 Predictive analytics assessment
Figure 12 is representing the Regression Analytics Difference \(\gamma \) for \(\alpha \in \{0.5,1\}\) over all three different reconstruction policies inside the EN for DS1. Respectively, in Fig. 13 \(\gamma \) is shown for DS2 and \(\alpha \in \{0.5,0.7\}\). For DS2 it is not possible to illustrate TEEN as no change in \(\gamma \) occur with increasing \(\theta \). For better illustration the figures showing the Regression Analytics Difference only showing the \(\theta \) values up to 0.06. Increasing \(\theta \) over this threshold is increasing the error and decreasing the communication. For better readability of smaller values this is hidden in the figures above.
However, applicable from both Figures Policy 3 is creating the best tradeoff between communication saving and regression analytics quality. This is independent on the chosen \(\alpha \) value in the SAN as seen when comparing the left and right figures of each dataset. Moreover, it is applicable for DS1 that using our mechanism the same linear regression model is produced even with only 80% of the communication. Considering a communication saving of 50% around 0.0002 for \(\gamma \) needs to be tolerated from the IoT applications site for DS1 using \(\alpha = 0.5\). For DS2 an identical regression can be produced by using only 15–20% of the complete communication, which is shown in Fig. 13. Considering these both outputs, our proposed mechanism is increasing the lifetime of an edge network for predictive analytics tasks.
Overall performance of Policy 1, 2, 3 and Policy 1 with TEEN in SAN for reconstruction, aggregation, and predictive analytics
Reconstruction  Aggregation analytics  Regression analytics  

Policy 1  +  ++  + 
Policy 1 (TEEN)  +  +  – – 
Policy 2  – – –  −  – – 
Policy 3  ++  ++  +++ 
6 Conclusions and future work
We focus on the edge computing paradigm where pushing aggregation and predictive analytics to the edge of the IoT network allows the complexity of analytics tasks to be distributed into many smaller and more manageable pieces and to be physically located at the source of the contextual information. We introduce a lightweight, distributed, predictive intelligence mechanism that supports communication efficient aggregation and predictive modeling within the edge network of SANs and ENs. The mechanism is following the evolving nature of the multivariate time series (context vectors) based on the idea of locally deciding whether to deliver contextual data or not in light of minimizing the induced communication overhead and providing high quality analytics tasks. Based on splitting this intelligence into: prediction (through exponential smoothing) and decision making at the SANs and context reinstruction at ENs (by proposing three policies), we eliminate data transfer at the edge of the network, by exploiting the predictability of the captured contextual data. We provide fundamental theoretical analyses of the upper bounds of the reconstructed data quality and a comprehensive sensitivity analysis with the most important model parameter. We provide comprehensive comparative (theoretical and experimental) assessment with baseline solutions found in the literature and experimental evaluation of the proposed mechanism over two real multidimensional contextual datasets for aggregation and linear regression analytics tasks. We show the benefits stemmed from its adoption in edge computing environments and experiment with the tradeoff between accuracy (quality) of edge analytics tasks and communication overhead. Our mechanism demonstrated its efficiency in supporting high quality of edge analytics by tolerating a relatively low error in light of decreasing significantly the communication overhead in an edge network.
Our future agenda includes investigating intelligent delay tolerant mechanisms for further minimizing the induced analytics errors in favor of saving communication. Moreover, future work is focused on certain modifications of our mechanism to support advanced analytics tasks including outliers detection, nonlinear predictive models, and concept drifts in multidimensional contextual data streams in edge computing environments.
Footnotes
 1.
Double exponential smoothing (Holt–Winters time series smoothing) could be adopted dealing with the same computational complexity.
References
 Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139CrossRefGoogle Scholar
 Ahmad Y, Berg B, Cetintemel U, Humphrey M, Hwang JH, Jhingran A, Maskey A, Papaemmanouil O, Rasin A, Tatbul N, Xing W, Xing Y, Zdonik S (2005) Distributed operation in the Borealis stream processing engine. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data (SIGMOD '05). ACM, New York, NY, USA, pp 882–884. doi: 10.1145/1066157.1066274
 Anagnostopoulos C, Hadjiefthymiades S, Georgas P (2012) PC3: principal componentbased context compression. J Parallel Distrib Comput 72(2):155–170CrossRefMATHGoogle Scholar
 Anagnostopoulos C, Hadjiefthymiades S, Katsikis A, Maglogiannis I (2014) Autoregressive energyefficient context forwarding in wireless sensor networks for pervasive healthcare systems. Pers Ubiquitous Comput 18(1):101–114CrossRefGoogle Scholar
 Anagnostopoulos, C, Triantafillou P (2014) Scaling out big data missing value imputations: Pythia vs. Godzilla’. In: 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14), New York, pp 651–660Google Scholar
 Anagnostopoulos C, Hadjiefthymiades S, Kolomvatsos K (2016) Accurate, dynamic, and distributed localization of phenomena for mobile sensor networks. ACM Trans Sensor Netw 12(2). doi: 10.1145/2882966
 Anagnostopoulos C, Hadjiefthymiades S (2014) Advanced principal componentbased compression schemes for wireless sensor networks. ACM Trans Sensor Netw 11(1). doi: 10.1145/2629330
 Anagnostopoulos C, Anagnostopoulos T, Hadjiefthymiades S (2010) An adaptive data forwarding scheme for energy efficiency in wireless sensor networks. In: 5th IEEE international conference intelligent systems, London, pp 396–401Google Scholar
 Anagnostopoulos C, Triantafillou P (2017a) Querydriven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov Data 11(4):46. doi: 10.1145/3059177
 Anagnostopoulos C, Triantafillou P (2015) Learning set cardinality in distance nearest neighbours. In: IEEE international conference on data mining (IEEE ICDM 2015), Atlantic City, pp 691–696Google Scholar
 Anagnostopoulos C, Triantafillou P (2017b) Efficient scalable accurate regression queries in InDBMS analytics. In: IEEE international conference on data engineering (ICDE), San DiegoGoogle Scholar
 Arasu A, Babu S, Widom J (2006) The CQL continuous query language: semantic foundations and query execution. VLDB J 15(2):121–142CrossRefGoogle Scholar
 Awang A, Suhaimi MH (2007) RIMBAMON©: A forest monitoring system using wireless sensor networks. In: International conference on intelligent and advanced systems 2007, Kuala Lumpur, pp 1101–1106. doi: 10.1109/ICIAS.2007.4658555
 Bottou L (2016) LargeScale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics (COMPSTAT’2010), Springer, Paris, pp 177–187Google Scholar
 Bottou L, Curtis FE, Nocedal J (2017) Optimization methods for largescale machine learning. arXiv:1606.04838. [stat.ML]
 Box GEP, Jenkins G (1990) Time series analysis, forecasting and control. HoldenDay, IncorporatedGoogle Scholar
 Cheng B, Papageorgiou A, Bauer M (2016) Geelytics: enabling ondemand edge analytics over scoped data sources. In: IEEE international congress on big data (BigData Congress), San Francisco, pp 101–108Google Scholar
 Chowdappa VP, Botella C, BeferullLozano B (2015) Distributed clustering algorithm for spatial field reconstruction in wireless sensor networks. In: IEEE 81st vehicular technology conference (VTC Spring), Glasgow, pp 1–6Google Scholar
 Chu D, Deshpande A, Hellerstein JM, Hong W (2006) Approximate data collection in sensor networks using probabilistic models. In: Proceedings of the 22nd international conference on data engineering (ICDE '06). IEEE Computer Society, Washington, DC, USA, p 48. doi: 10.1109/ICDE.2006.21
 Dallachiesa M, JacquesSilva G, Gedik B, Wu K.L, Palpanas T (2015) Sliding windows over uncertain data streams. Knowl Inf Syst 45(1):159–190. doi: 10.1007/s1011501408045
 De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750–757 (ISSN 09254005) CrossRefGoogle Scholar
 Durbin J, Jan Koopman S (2012) Time series analysis by state space methods. Oxford Statistical Science SeriesGoogle Scholar
 Eidson GW et al (2010) The South Carolina digital watershed: endtoend support for realtime management of water resources, vol 2010. In: Proc. 4th intl. symposium on innovations and realtime applications of distributed sensor networks (IRADSN 09), USAGoogle Scholar
 Ganti R, Ye F, Lei H (2011) Mobile crowdsensing: current state and future challenges. IEEE Commun Mag 49(11):32–39CrossRefGoogle Scholar
 Gemulla R, Nijkamp E, Haas PJ, Sismanis Y (2011) Largescale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’11). ACM, New York, pp 69–77Google Scholar
 Goel S, Imielinski T (2001) Predictionbased monitoring in sensor networks: taking lessons from MPEG. ACM SIGCOMM Comput Comm Rev 31(5):82–98CrossRefGoogle Scholar
 Gray J, Chaudhuri S et al (1997) Data cube: a relational aggregation operator generalizing groupby, crosstab, and sub totals. Data Min Knowl Discov 1(1):29–53CrossRefGoogle Scholar
 Hentschel K, Jacob D, Singer J, Chalmers M (2016) Supersensors: Raspberry Pi devices for smart campus infrastructure. In: 4th IEEE international conference on future internet of things and cloud, FiCloud, Vienna, pp 58–62Google Scholar
 Jiang H, Jin S, Wang C (2011) Prediction or not? An energyefficient framework for clusteringbased data collection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 22(6):1064–1071CrossRefGoogle Scholar
 Kang J et al (2013) Highfidelity environmental monitoring using wireless sensor networks. Article 67. In: Proc. 11th ACM conference on embedded networked sensor systems (SenSys ’13), USAGoogle Scholar
 Kejela G, Esteves RM, Rong C (2014) Predictive analytics of sensor data using distributed machine learning techniques. In: IEEE 6th International Conference on cloud computing technology and science, Singapore, 2014, pp 626–631. doi: 10.1109/CloudCom.2014.44
 Kim JJ et al (2010) Wireless monitoring of indoor air quality by a sensor network. Indoor Built Environ 19(1):145–150CrossRefGoogle Scholar
 Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2017) Data fusion and type2 fuzzy inference in contextual data stream monitoring. IEEE Trans Syst Man Cybern Syst 47(8):1839–1853. doi: 10.1109/TSMC.2016.2560533 CrossRefGoogle Scholar
 Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin. doi: 10.1007/9781461468493 (ISBN 9781461468493) CrossRefMATHGoogle Scholar
 Lane N, Miluzzo E, Lu H, Peebles D, Choudhury T, Campbell A (2010) A survey of mobile phone sensing. IEEE Commun Mag 48(9):140–150CrossRefGoogle Scholar
 Manjeshwar A, Agrawal DP (2001) TEEN: a routing protocol for enhanced efficiency in wireless sensor networks. In: Proceedings of the 15th international parallel and distributed processing symposium (IPDPS ’01). IEEE Computer Society, Washington, DC, p 189Google Scholar
 McConnell SM, Skillicorn DB (2005) A distributed approach for prediction in sensor networks. In: Proceedings of 1st international workshop on data mining in sensor networks as part of the SIAM International Conference on data mining, pp 28–37Google Scholar
 Muth J (1960) Optimal properties of exponentially weighted forecasts. J Am Stat Assoc 55(290):299–306CrossRefMATHGoogle Scholar
 Nguyen N et al (2010) A realtime control using wireless sensor network for intelligent energy management system in buildings. In: Proc. IEEE workshop on environmental energy and structural monitoring systems (EESMS 10), pp 87–92Google Scholar
 Nittel S (2009) A survey of geosensor networks: advances in dynamic environmental monitoring. Sensors 9:5664–5678CrossRefGoogle Scholar
 Oliveira LM, Rodrigues JJ (2011) Wireless sensor networks: a survey on environmental monitoring. J Commun 6(2):143–151CrossRefGoogle Scholar
 Papithasri K, Babu M (2016) Efficient multihop dual data upload clustering based mobile data collection in wireless sensor network. In: 2016 3rd international conference on advanced computing and communication systems (ICACCS), Coimbatore, pp 1–6Google Scholar
 Patroumpas K, Sellis T (2011) Maintaining consistent results of continuous queries under diverse window specifications. Inf Syst 36(1):42–61CrossRefGoogle Scholar
 Patroumpas K, Sellis T (2010) Multigranular timebased sliding windows over data streams. Temporal representation and reasoning (TIME). In: 2010 17th international symposium, pp 146–153Google Scholar
 Patroumpas K, Sellis T (2006) Window specification over data streams. In: Proc. international conference on current trends in database technology (EDBT’06). Springer, Berlin, pp 445–464Google Scholar
 Satyanarayanan M et al (2015) Edge analytics in the internet of things. IEEE Pervasive Comput 14(2):24–31Google Scholar
 Silberstein A, Braynard R, Filpus G, Puggioni G, Gelfand A, Munagala K, Yang J (2007) Datadriven processing in sensor networks. In: Proc. Conf, innovative data systems research (CIDR) 3rd Biennial Conference on innovative data systems research (CIDR) Jan 7–10, 2007, Asilomar, California, USAGoogle Scholar
 Simoens P, Xiao Y, Pillai P, Chen Z, Ha K, Satyanarayanan M (2013) Scalable crowdsourcing of video from mobile devices. In: Proceeding of the 11th annual international conference on mobile systems, applications, and services (MobiSys ’13). ACM, New York, pp 139–152Google Scholar
 Simonetto A, Leus G (2014) Distributed maximum likelihood sensor network localization. IEEE Trans Signal Process 62(6):1424–1437MathSciNetCrossRefGoogle Scholar
 Stojmenovic I, Wen S (2014) The fog computing paradigm: scenarios and security issues. In: 2014 Federated conference on computer science and information systems, Warsaw, pp 1–8Google Scholar
 The mobileedge computing initiative. http://www.etsi.org/technologiesclusters/technologies/mobileedgecomputing (Online)
 Tofallis C (2015) A better measure of relative prediction accuracy for model selection and model estimation. J Oper Res Soc 66(8):1352–1362CrossRefGoogle Scholar
 Tulone D, Madden S (2006) An energyefficient querying framework in sensor networks for detecting node similarities. In: Proceedings of the 9th ACM international symposium on modeling analysis and simulation of wireless and mobile systems (MSWiM '06). ACM, New York, NY, USA, pp 191–300. doi: 10.1145/1164717.1164768
 Vulimiri A, Curino C, Godfrey PB, Jungblut T et al (2015) WANalytics: geodistributed analytics for a data intensive world. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1087–1092Google Scholar
 Xu G et al (2014) Applications of wireless sensor networks in marine environment monitoring: a survey. Sensors 14(9):16932–16954CrossRefGoogle Scholar
 Xu Y, Lee WC (2003) On localized prediction for power efficient object tracking in sensor networks. In: Proceeding of the 23rd international conference on distributed computing systems workshops, 2003, pp 434–439. doi: 10.1109/ICDCSW.2003.1203591
 Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 workshop on mobile big data, pp 37–42Google Scholar
 Zervas E et al (2011) Multisensor data fusion for fire detection. Inf. Fusion 12(3):1566–2535. ElsevierGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.