1 Introduction

Traffic data collected at roadside sensors can offer significant value to transport managers. The raw data is typically transformed into a time series format, capturing metrics such as the vehicle count or average speed over the road network. This information can be used to make forecasts about the state of the road network in the near future, which can enable proactive responses when heavy or unusual load on the network is predicted.

A wide range of forecasting approaches have been applied to the traffic prediction task, from statistical methods such as ARIMA [2, 17, 32], to Deep Learning (DL) models such as LSTM [20]. In recent years, Graph Neural Network (GNN) approaches have achieved state-of-the-art results, due to their ability to capture spatial dependencies between sensors [19, 27, 30, 34, 36, 51, 62, 68]. GNNs typically model the road sensor network as a graph structure, whose weighted adjacency matrix reflects the strength of inter-sensor relationships.

Despite an extensive body of research into the traffic forecasting problem, there are still several challenges to overcome when building practical solutions. First, there are operational requirements around the scalable handling and pre-processing of the streaming traffic data, to enable its use for real-time forecasting. Typically, forecasting models are developed for offline use, and do not consider the challenges of producing forecasts on streaming data. The real-time forecasting problem requires that the prediction process takes place continuously within a given time lag of each real-world traffic event occurring (i.e., vehicles passing a sensor). This is an important problem to resolve for practical data-driven systems, as transport managers need to be able to take action based on responsive short-term forecasts. It has also been identified as an open research issue, and entails significant data management challenges, particularly when DL models are employed [9]. Furthermore, existing research has largely considered forecasting 1 hour ahead, mostly on static data, with only a few works seeking to predict further ahead [6, 67]. It would be beneficial to transport managers if accurate forecasts could be made further into the future, allowing more time for responsive action to be taken. Next, in addition to data captured at roadside sensors, dynamic urban events (DUE) and vehicle-level flow data should also be incorporated into forecasting models to improve predictive performance. Finally, many inference workloads are sporadic in nature, with queries arriving at irregular intervals and being distributed over multiple forecasting models/scales. Whilst an inference platform built using always-available virtual machine (VM) resources may not be a cost-effective solution for such workloads, there are challenges associated with achieving efficient inference performance on pay-as-you-go alternatives such as serverless computing. These include provider-imposed restrictions on vCPU, memory, and maximum function runtime [49].

The Foresight cloud-based forecasting system [12] achieves real-time forecasting over urban traffic data, and effectively leverages DUE and vehicle-level flow data to improve predictive performance. In this work, we extend Foresight with several novel enhancements; we term the improved system Foresight Plus.Footnote 1

First, we present an approach for extending the forecasting scale in Foresight Plus beyond the ‘1 hour ahead’ horizons typically seen in this domain. Our extended solution provides efficient inference while enabling forecasting many hours into the future, with little to no degradation in the quality of predictions. Further, we design a fully serverless inference solution for traffic forecasting. Foresight Plus is more cost-effective than a provisioned inference solution for many workloads, and seamlessly handles requests over multiple predictive models with an attractive cost-to-performance ratio.

The contributions of this work are as follows:

  • We present Foresight Plus, a cloud-based real-time traffic forecasting system which extends the forecasting scales catered for in Foresight, efficiently enabling predictions further into the future.

  • We design a fully serverless inference solution which handles sporadic inference workloads. We also present a cost model for serverless forecasting, and consider the implications of several design choices in this context.

  • We observe that GNN forecasting models are robust to extensions of the forecasting scale up to 48 hours ahead, and can achieve improved performance as the scale grows.

  • We identify fully serverless inference as a cost-effective and efficient solution for sporadic inference workloads. We study the scalability, cost and performance characteristics of serverless offerings to optimize resource configurations.

The rest of the paper is organized as follows. Section 2 presents related work. Section 3 formalizes the traffic forecasting problem, and describes its real-time extensions. Section 4 illustrates Dynamic Urban Events, before Section 5 describes the Flow-based GNN Adjacency Matrix. Section 6 presents our approach for extending the forecasting scale. Section 7 then describes our fully serverless inference solution for sporadic workloads. The Foresight Plus system architecture is illustrated in Section 8. Section 9 covers our experimental analysis, both of Foresight and Foresight Plus. Finally, Section 10 concludes the paper.

2 Related work

Identifying the future state of a system via forecasting has been applied in a wide range of disciplines including economics [31], energy and environmental studies [3, 4, 23, 41], epidemiology [22, 53], crowd flow prediction [26] and transport management [13, 16, 47, 59, 65].

Various systems and models have been utilized to achieve valuable predictive outcomes such as accurate traffic predictions on the road network. The Autoregressive Integrated Moving Average (ARIMA) and its variations have been consistently popular time-series models [2, 17, 32]. Machine Learning (ML) approaches have also been applied, with the Support Vector Machine (SVM) [63, 71], XGBoost [15] and the Random Forest [5, 43, 63] being the most commonly used. Deep Learning (DL) solutions based on Artificial Neural Networks have been increasingly utilized due to their improved forecasting accuracy and the ability to account for non-linear dependencies [25, 33, 38, 54, 55]. Long Short-Term Memory (LSTM) and Feed Forward Neural Networks (FFNN) are among the popular models applied to forecast traffic flows [20, 37, 39, 56], with several hybrid approaches also investigated [57, 70]. Finally, Graph Neural Networks (GNNs), which can capture the spatial dependencies between the traffic monitoring sensors by representing the road network as a graph structure, have further improved prediction accuracy. Hence, multiple GNN applications for traffic flow forecasting have been presented in recent years [9, 19, 27,28,29,30, 34,35,36, 51, 62, 68].

2.1 Dynamic urban events

Urban events such as roadworks have been demonstrated to significantly impact traffic flow [7, 48]. Hence, the incorporation of auxiliary information about such events can further improve traffic forecasting performance. For example, roadwork and accident information has been utilized in traffic simulation systems and ML models [1, 8, 36]. A combination of roadworks and weather conditions have been added to a bi-directional LSTM Autoencoder for short-term traffic prediction [18].

2.2 GNN adjacency matrix

In GNN models, the underlying graph structures are usually represented with an adjacency matrix which captures the spatial relationships between the nodes of a graph [19]. Although GNN adjacency matrices are typically binary [21], multiple variations have been proposed [28]. For example, a real-valued distance-based adjacency matrix is a common alternative for representing the spatial dependencies between nodes, and has been applied in numerous traffic forecasting studies with GNNs [10, 45, 52, 60, 69]. The travel time between nodes has also been considered as an alternative to distance-based metrics [61]. More recently, dynamic matrices which capture changes in the spatial dependencies of the graph have been introduced [14]. Coarse origin-destination (OD) data has also been applied as a substitute for a distance-based adjacency matrix [64]. Our work instead leverages vehicle-level flow information, obtained from traffic cameras in the West Midlands, to realistically model the propagation of traffic through the network.

2.3 Forecasting systems

In addition to statistical and ML/DL modelling approaches, forecasting systems have also been developed as specialized tools for time series prediction and road network management. For example, the AutoAI for Time Series Forecasting (AUTOAI-TS) [50] automates the selection, training and optimization of forecasting models for a given dataset. DeepTRANS [58] combines the DeepTTE system [40] with DCRNN [34] for bus travel time estimation. The system uses archive information about bus and traffic flow from sensor data, and DCRNN is used to estimate traffic speed at buses’ locations. The TrafficStream forecasting system leverages GNNs and Continual Learning (CL) [11]. It constructs a sub-graph to capture network expansion, and constraints are applied on the current training model to integrate information from historical data. In contrast, Foresight Plus focuses on the design and evaluation of practical pipelines to satisfy real-time forecasting requirements, supporting extended predictive scales and using a fully serverless inference architecture.

3 Real-time spatio-temporal forecasting

In this section, we first describe the traffic forecasting problem, before introducing its real-time variant. A key requirement of this procedure is that the aggregation, pre-processing and inference of the traffic data must take place within a certain time period. These practical aspects of forecasting have attracted relatively little attention in the large body of research on the topic.

3.1 Traffic forecasting problem

We first present the definition of the traffic forecasting problem, where the goal is to predict the future state of the road network, given a sequence of previously observed time series readings. Traffic information is typically obtained from roadside sensors, which can capture features such as traffic counts or average speed, to form a (multivariate) time series. Given a set of sensors S, we denote the traffic information observed across all sensors as \(\varvec{X} \in \mathbb {R}^{H \times |S| \times P}\), where H is the total number of historical traffic readings, and P is the number of predictive features used. Let \(\varvec{X}^{(t)} \in \mathbb {R}^{|S| \times P}\) denote the traffic signal observed at time t, and \(\varvec{Y}^{(t')} \in \mathbb {R}^{|S| \times Q}\) denote the traffic signal to be predicted at time \(t'\). Note that the number of target features Q may be different to P. We aim to learn a function \(f(\cdot )\) which maps from \(T'\) historical traffic signals to T future traffic signals:

$$\begin{aligned}{}[\varvec{X}^{(t - T' + 1)} , \dots , \varvec{X}^{(t)}] \xrightarrow {f(\cdot )} [\varvec{Y}^{(t+1)}, \dots , \varvec{Y}^{(t+T)}] \end{aligned}$$
(1)

3.2 Real-time forecasting

The real-time variant of the traffic forecasting problem adds the constraint that all processing takes place within a specified duration following the end of each time bin. Performing real-time forecasting, particularly with DL models, has been identified as a significant challenge [9]. In Foresight Plus, anonymized streaming traffic data is collected at road cameras and ingested into the platform via an API endpoint. Further details of this procedure are illustrated in Section 8.1.

The real-time forecasting routine begins at the end of each time bin, which are each B minutes long. First, the raw vehicle-level data (held in cloud storage) is aggregated for the most recent time bin (i.e., the B minutes from \(\varvec{X}^{(t-1)}\) to \(\varvec{X}^{(t)}\)). This aggregation entails iterating over all individual vehicle captures that arrived during the time bin, and summing the counts for each vehicle class (e.g., petrol car, HGV), camera and lane. Note that while the vehicle class information is not required for our forecasts, it can be utilized for other data analysis tasks. We denote the time taken for this aggregation as \(T_{Agg}\). Next, the aggregated data is pre-processed so that it is appropriately formatted for model inference. This includes fetching and processing the aggregated traffic count information for the last \(T'\) time bins, as well as retrieving any additional model-specific data used for inference (e.g., roadwork time series, adjacency matrix). The time taken for this phase is referred to as \(T_{PreProc}\). Once the required data have been produced, the inference API endpoint is invoked to perform the forecast. The time taken for inference processing to occur, as in (1), is denoted by \(T_{Inf}\).

We require the following expression to be satisfied for a system to be capable of real-time forecasting:

$$\begin{aligned} T_{Total} = T_{Agg} + T_{PreProc} + T_{Inf} \le B \end{aligned}$$
(2)

A value of \(T_{Total} \le B\) ensures that the shortest forecasting horizon still pertains to information that is yet to be aggregated in the system, and is therefore relevant to network managers.

4 Dynamic urban events

DUE data, in our use case, can be defined as any information which may impact the performance of a traffic prediction solution, outside and apart from the metrics gathered by sensors in the network itself. There are many types of such data which may impact real-world outcomes, thus impacting the reliability of the predictive model. Scheduled or unscheduled roadworks can slow or impede traffic flows; the same applies to traffic accidents. Events at venues linked to the network can change traffic levels and flows. Weather conditions can impact driving speeds and the likelihood of accidents. Delays/cancellations/industrial action affecting other modes of transport can have a major impact on traffic volumes on the road network. School/public holidays are clearly impactful. Foresight is able to leverage DUE data to improve the accuracy of its forecasts. We use roadworks data as an illustrative example, but other such information (e.g., social event data) could readily be applied in a similar fashion. In the context of traffic forecasting, planned and unplanned roadworks frequently influence the volume and nature of traffic propagation through the road network [7, 48], and so incorporating roadwork schedules into predictive models is intuitively helpful for accurate predictions. Foresight automatically ingests DUE data and processes it into a format which forecasting models can easily exploit.

Roadworks data is ingested into Foresight via the Street Manager APIFootnote 2, which is invoked to receive a feed of planned roadwork events. We denote the set of all roadworks listed by a given API call as R. For each roadwork \(r \in R\), we obtain its latitude/longitude, as well as its start and end dates \(T_s\) and \(T_e\). In order to associate the live roadworks on a given day T with the road sensor network S, we first select only those roadworks where \(T_s \le T \le T_e\). Next, we calculate the road network distance (using an indicative driving speed over a shortest path calculation on the road network) between each \(r \in R\) and each \(s \in S\). These distances populate an \(|R| * |S|\) matrix \(\varvec{W}\), with each entry (ij) denoting the road network distance from live roadwork i to traffic sensor j in the network.

To incorporate this roadwork-to-camera influence information into the forecasting models, we convert \(\varvec{W}\) into a time series format at the same temporal granularity as the observed traffic data. This has been shown to be an effective method for adding roadwork data to forecasting models [36]. We define this as a new feature set \(\varvec{\hat{X}} \in \mathbb {R}^{H \times |S|}\). Each entry \(\varvec{\hat{x_i}} \in \varvec{\hat{X}}^{(t)}\) has a value between 0 and 1 which denotes the strength of the influence of the nearest active roadwork to sensor i at time t. We consider two approaches to approximate this influence. The first is a binary thresholding approach, where entries are activated if there is a roadwork within threshold distance d metres of the sensor. The second method involves first calculating the distance from each sensor to its nearest live roadwork, before normalizing these distances into [0, 1]. We perform this normalization using a thresholded Gaussian kernel, with threshold k.

Combining \(\varvec{X}\) and \(\varvec{\hat{X}}\), a new matrix \(\varvec{\tilde{X}} = \begin{bmatrix} \varvec{X}\\ \varvec{\hat{X}} \end{bmatrix}\) is constructed, which is the new feature vector passed to the forecasting models. We evaluate these approaches within the context of a GNN model in Section 9.

5 Flow aggregated adjacency matrix

Graph Neural Networks (GNNs) are popularly used in state-of-the-art forecasting models [10, 14, 27, 30, 34, 35, 45, 51, 52, 60, 62, 69]. These methods typically represent the traffic sensor network as a graph structure, whose adjacency matrix aims to capture spatial relationships between the sensors. The objective of GNN message passing and node aggregation approaches in the context of traffic forecasting, such as diffusion convolution [34], is to simulate traffic propagation in the network. This method of extracting features is typically embedded into a wider learning structure so that temporal features can be learnt along with spatial features in an integrated fashion.

The graph structure which models the traffic sensor network is described by an \({|S|} \times {|S|}\) (weighted) adjacency matrix. The value at position (ij) approximates the strength of the relationship between sensor \(s_i\) and sensor \(s_j\). A popular method to assign weights in the adjacency matrix is to calculate pairwise sensor distances measured in the road network [34, 60, 62].

The aim of our approach is to more realistically reflect the actual flow of traffic in the network, compared to coarse sensor separation measures such as Euclidean distance. Simple distance-based measures alone are insufficient, as sensor separation per-se does not necessarily indicate traffic flow levels. Even though two sensors are spatially co-located, traffic might rarely pass between them consecutively, or may flow in one direction significantly more than the other; these properties cannot be easily captured by this approach.

We therefore develop a method for computing the adjacency matrix weights which uses vehicle-level flow data to more accurately determine the relationships between sensors. By leveraging the properties of granular ANPR (Automatic Number Plate Recognition) data, our method can capture (in order) the sequence of sensors which (anonymized) vehicles pass as they traverse the road network. By aggregating this information, we are able to determine actual flows within the network. The new adjacency matrix is designed to retain the same dimensions used in most GNN methods for spatio-temporal forecasting, so it can be directly applicable within these models.

The Flow Aggregated Adjacency Matrix (FAAM), denoted as \(\varvec{F} \in \mathbb {R}^{\left| S\right| \times \left| S\right| }\), is constructed by aggregating observed flow between cameras within a given time frame. 1 unit of flow is recorded between sensor \(s_i \in S\) and \(s_j \in S\) when a vehicle is observed at \(s_i\) at time t, and is then next observed at \(s_j\) no later than \(t + \tau \), where \(\tau \) is a parameter given in seconds which denotes the acceptable transition period. Note that we operate on a network of urban roadside sensors, rather than on the underlying road network. Therefore, one can not claim that the vehicle has not traversed any other roads between \(s_i\) and \(s_j\), but rather that it has not passed any other sensors. Indeed, if the vehicle had passed sensor \(s_k\) between \(s_i\) and \(s_j\), we would record \(s_i \rightarrow s_k\), and then \(s_k \rightarrow s_j\) (as two separate units of flow). This is conceptually different to an origin-destination (OD) approach. To construct \(\varvec{F}\), each entry \(\varvec{F_{i,j}}\) is incremented by 1 for each observed unit of flow. \(\varvec{F_{i,j}}\) is then averaged over all the time periods during which flow was observed, before being normalized into [0, 1]. Each entry \(\varvec{F_{i,j}}\) thus approximates the likelihood of a vehicle transitioning directly from \(s_i\) to \(s_j\) within transition period \(\tau \). This can be periodically updated to reflect changes in the network over time, such as seasonality. We note that a more granular time scale would be possible in this formulation, e.g., to capture shifting traffic patterns throughout the day, but we leave this to be explored in future work.

We select a value for \(\tau \) based on a review of typical elapsed travel times between the most separated sensors in terms of road network distance, with some overhead added to capture outliers. Setting \(\tau \) on a global basis was done for simplicity in this case, but there are alternative approaches which could be explored. A \(\tau \) value could be allocated per sensor-pair, based on the road distance between them, or on previous vehicle flow times across them. The latter would take account of factors other than distance which may impact traversal time - for example the presence of service/refuelling stations. The \(\tau \) value could even be learned using a separate ML model. Mechanisms for obtaining optimal \(\tau \) settings would be an interesting area for further study.

6 Foresight plus: extending the forecasting scale

The majority of DL-based traffic forecasting models are evaluated on a small number of benchmark datasets such as METR-LA [24], and seek to forecast a maximum of 1 hour ahead [9, 27, 30, 34, 36, 51, 68]. While this is useful in short-term forecasting applications, it would be advantageous to extend the forecasting scale further into the future. The ability to forecast traffic patterns multiple hours ahead would provide more time for transport network managers to perform interventions such as traffic re-routing.

Whilst the short-term (up to 1hr) prediction problem is now well-studied and recent solutions based on GNNs and Transformers continue to set benchmarks, there is a research gap around models capable of longer-term predictions [72]. Where existing solutions have been applied to this problem, the standard approach has been to produce short-term predictions and to use these as ‘ground truth’ to create the next set of short-term predictions, repeatedly. This presents two main challenges. First, errors are propagated forward and accumulated at each phase. Secondly, computational complexity (and cost) becomes an issue. It has been estimated that following the above approach, using 7 days of historical data and predicting 12 hours ahead, the training time for DCRNN (for METR-LA on a single NVIDIA TITAN RTX 16 GB GPU) would be over 7,000 hours [72]. Clearly a more efficient approach is required, and new studies which predict further into the future without using predictions as ground truth are now appearing [66].

We develop a lightweight approach using the aggregation of multiple historical timesteps to enable longer-term forward predictions in a single pass. We employ a GNN forecasting model but do not rely on the use of predictions as ground truth. The approach performs well as confirmed by our experimental analysis. In light of this, we extend the architecture of the existing Foresight system to offer multiple forecasting scales, henceforth denoted k. In particular, we cater for traffic forecasts multiple hours into the future (up to \(k=1,3,6,12,24,36\) or 48 hours ahead at present). The user is able to dynamically select from the supported scales at inference time with no additional provisioning time/costs. Foresight Plus forecasts further into the future than several previous approaches [6, 67], which have considered forecasts up to 4 and 10 hours ahead, respectively.

Whilst we acknowledge that (absolute) predictive errors will naturally increase as forecasts extend further into the future, we show that this happens in a stable manner. We present an efficient method to enable DL-based models to forecast at multiple scales. For the original 1 hour ahead forecasts generated by Foresight, the underlying data was aggregated into B-minute bins, as discussed in Section 3. In Foresight Plus, we introduce an efficient aggregation and upscaling pipeline. This procedure is abstracted away from the forecasting models, which still execute the same prediction function \(f(\cdot )\) illustrated in Section 3.1; \(T'\) historical traffic signals are used to forecast T future signals.

Recall that for a set of sensors S, the entire historical traffic data is denoted as \(\varvec{X}\) \(\in \) \(\mathbb {R}^{H \times |S| \times P}\), where H is the total number of historical B-minute bins. To upscale the traffic data to a new scale k, we sum each group of k rows of \(\varvec{X}\). This results in a new historical time series \(\varvec{X_k} \in \mathbb {R}^{\frac{H}{k} \times |S| \times P}\). We then train bespoke forecasting models on each of the upscaled datasets (requiring only minimal adjustments to the models themselves). To process an inference query at scale k, data from the appropriate (\(k \times T'\)) B-minute bins are first collected. These can be efficiently upscaled as described above, to produce inference input data \(\varvec{X_k'} \in \mathbb {R}^{T' \times |S| \times P}\). \(\varvec{X_k'}\) is then sent to the appropriate inference endpoint as input; further details of the Foresight Plus inference workflow are given in Section 7. We evaluate the effectiveness of this approach in Section 9.4.

7 Foresight plus: cost-effective serverless inference for sporadic workloads

The original Foresight architecture, which was initially designed with short-term forecasting in mind, provides a pipelined MLOps solution for regularly scheduled predictions. This satisfies the operational requirements of Transport for the West Midlands (TfWM), and integrates with existing traffic reporting systems. However, spatio-temporal forecasting requests that are spread over multiple scales may be more ad-hoc in nature (for example, in response to unusual traffic/cultural events). In scenarios such as this, inference requests are often triggered manually (e.g., via mobile apps) and arrive in a sporadic fashion. While provisioned inference solutions are well-suited to handling frequent and regularly scheduled predictions, they may not be cost-effective for sporadic workloads due to low utilization. Further, several endpoint instances may be required to handle bursty traffic and/or multiple models (for different forecasting scales).

With this in mind, we extend Foresight’s MLOps suite in Foresight Plus by adding a fully serverless inference solution across multiple models/scales. Upon the receipt of an inference request at a given scale (as described in Section 6), a lightweight serverless instance (i.e., an AWS LambdaFootnote 3 function invocation) performs the necessary data extraction, upscaling and pre-processing, before invoking a further serverless inference endpoint. We maintain a unique serverless inference endpoint for each forecasting scale. These incur no cost when not in use (as is also the case with the aforementioned serverless pre-processing instance), and can rapidly scale to accommodate parallel requests. A serverless inference solution can be significantly more cost-effective than a provisioned alternative for many workloads [44].

7.1 Serverless inference cost model

We now formalize the cost model for fully serverless ML inference. This procedure consists of a pre-processing phase, followed by the invocation of a serverless inference endpoint:

$$\begin{aligned} C_{Total} = C_{PreProc} + C_{Inf} \end{aligned}$$
(3)

Both stages run on lightweight AWS Lambda Function-as-a-Service (FaaS) instances.

We first consider the detailed cost model for the pre-processing phase. \(C_{PreProc}\) consists of the expenses incurred by running the FaaS instance, as well as those of the corresponding requests to object storage to fetch the necessary data. It is defined as follows:

$$\begin{aligned} C_{PreProc} = C_{\lambda (PreProc)} + C_{S3(PreProc)}\end{aligned}$$
(4)
$$\begin{aligned} C_{\lambda (PreProc)} = C_{\lambda (Inv)} + T_{PreProc}M_{PreProc}C_{\lambda (Run)}\end{aligned}$$
(5)
$$\begin{aligned} C_{S3(PreProc)} = LC_{S3(List)} + GC_{S3(Get)} \end{aligned}$$
(6)

Where \(C_{\lambda (Inv)}\) is the cost of invoking a single FaaS instance, \(T_{PreProc}\) is the duration of pre-processing (i.e., the runtime of the FaaS invocation), \(M_{PreProc}\) is the memory assigned to the instance (in MB), and \(C_{\lambda (Run)}\) is the cost per MB-second of FaaS runtime. Note that increasing the amount of AWS Lambda memory leads to a larger vCPU allocation, introducing a cost-to-performance trade-off which we examine in Sections 9.4.2 and 9.4.3. L and G correspond to the number of required object storage LIST and GET operations, respectively, with \(C_{S3(List)}\) and \(C_{S3(Get)}\) representing their costs. In Foresight Plus, we leverage the metadata provided in object storage solutions to filter out redundant files, hence minimizing the number of LIST and GET requests.

Next, we define the cost model for inference \(C_{Inf}\) as follows:

$$\begin{aligned} C_{Inf} = T_{Inf}M_{Inf}C_{ServInf} + YC_{Byte(In)} + ZC_{Byte(Out)} \end{aligned}$$
(7)

Similarly to above, \(T_{Inf}\) and \(M_{Inf}\) reflect the serverless inference instance runtime and memory allocation, respectively. The running cost of the AWS Lambda worker used for SageMaker inference, \(C_{ServInf}\), is \(\sim 19\%\) more expensive per MB-second than a regular FaaS instance at presentFootnote 4,Footnote 5. The final two terms represent the costs of data transfer in/out of the serverless inference instance. As described in Section 3, this has dimensionality \(T \times |S|\), so is the same for all scales.

The standard alternative to a serverless inference solution is to permanently provision one or more inference endpoints (hosted on VM instances). Such a solution can incur high passive running costs over time, while achieving poor resource utilization under sporadic workloads. We evaluate the cost savings of our fully serverless inference solution in Section 9.4.

7.2 Enhanced prediction flexibility

The Foresight system was able to produce pre-defined outputs at regularly scheduled intervals. These outputs can be used for the purposes of implementing timely interventions in the likelihood of unusual network conditions. Foresight Plus, on the other hand, can provide a more ad-hoc predictive capability, which apart from offering multiple extended forecasting scales, can address different use-cases. If, for example, a road traffic accident occurs in the network, a network manager may wish to envisage the likely impact on traffic flows a few hours ahead. Mobile applications could trigger such inference requests off-site. Roadwork operators may wish to submit ‘what-if’ inference requests to determine optimal periods (in terms of reducing disruption) for such works to take place. Foresight Plus addresses these requirements, supplementing the regularly scheduled inference outputs of the Foresight system.

7.3 Serverless architecture

The ad-hoc nature of the typical Foresight Plus use-case is an ideal fit for a cloud-based serverless architecture. Serverless computing is a resource delivery model in which the cloud provider is responsible for the provision and management of the underlying infrastructure and services. The attractive properties of serverless computing include elasticity, high availability, and cost-effectiveness with granular billing. In particular, users are only charged while resources are being used. In recent years, serverless computing has been successfully applied for machine learning (ML) inference, in situations where ML models are suitably sized and have achievable latency/throughput requirements on such platforms. Whilst the Foresight system uses pre-provisioned AWS Sagemaker endpoints (servers) which incur continuous costs, Foresight Plus benefits from the serverless inference approach using lightweight Function-as-a-Service (FaaS) compute instances (including via AWS Sagemaker Serverless Inference). Although the serverless architecture brings certain challenges, including restricted compute/memory capacities and (in some situations) a short ‘cold-start’ delay when processing requests [49], it is well-suited for the Foresight Plus usage scenario.

8 Foresight plus system architecture

In this section we present an overview of the Foresight Plus system architecture, as illustrated in Fig. 1. Foresight Plus was built in collaboration with our partners at TfWM, and extended their existing AWS platform. With this in mind, we chose to use AWS services throughout the architecture. It should be stressed that our methods could be readily implemented on any comparable cloud platform. We will first describe how streaming traffic data is ingested and aggregated, before presenting the MLOps suite and fully serverless inference procedure. Details of how DUE data and flow information are processed are given in Sections 4 and 5 respectively. See Sections 6 and 7 for discussion of our extended forecasting scales and serverless inference, respectively.

8.1 Streaming data ingestion, aggregation and storage

Figure 2 illustrates the data ingestion, aggregation and storage pipeline of Foresight Plus. The primary data source is anonymized ANPR vehicle capture information in the West Midlands road network managed by TfWM. This data flows into the system using a POST request to an API endpoint, before being forwarded to a streaming ETL service (Kinesis Data FirehoseFootnote 6). Individual vehicle captures (including a timestamp, salted hash of vehicle registration, camera/lane of observation and vehicle type) are buffered using this service, and are periodically flushed to object storage (once the buffer fills, or a short time period elapses). The buffered file is also converted to a columnar format (Apache ParquetFootnote 7) for improved query performance.

Fig. 1
figure 1

High-level architecture of the Foresight Plus cloud-based forecasting system

We next use a serverless data integration offering (AWS GlueFootnote 8), to periodically crawl (i.e., scan) the object storage buckets and catalogue these intermediate files. This enables the use of AWS AthenaFootnote 9 to run serverless SQL queries over the columnar Parquet data. These queries generate aggregated traffic count data, illustrating the total number of vehicles of each type (e.g., petrol car, HGV) that have passed each roadside camera within the current time bin, i.e., the last B minutes. We use scheduling functionality in a cloud monitoring service (AWS CloudWatchFootnote 10) to trigger the SQL processing (via lightweight serverless functions) for the current time bin. This procedure writes a single file to object storage (AWS S3Footnote 11) per the current time bin, which can later be used as an input to ML workflows.

8.2 MLOps suite, training and inference

We leverage an AWS SageMakerFootnote 12 MLOps pipeline to create and deploy forecasting models. Data scientists can run experiments (e.g., in SageMaker Studio Notebooks) over data held in object storage, using standard libraries such as NumPy, PyTorch, TensorFlow, etc. Once a model has been developed, its source code can be pushed to one of two Git repositories (test, production) hosted in AWS CodeCommitFootnote 13.

Repository updates then trigger the MLOps pipeline to provision a compute instance and perform the necessary pre-processing and training of the model. The MLOps pipeline can be configured to re-train the model periodically, e.g., once per week, to continually incorporate the latest traffic data.

Fig. 2
figure 2

High-level overview of Foresight Plus streaming data ingestion, aggregation and storage pipeline

The trained model is then deployed to a SageMaker inference endpoint. In Foresight, this used a VM instance, while in Foresight Plus, we employ a fully serverless inference architecture. Both inference solutions are illustrated in Fig. 1. In Foresight Plus, bespoke trained models for each forecasting scale (see Section 6) are first produced in the MLOps suite. We then deploy a SageMaker Serverless Inference (AWS Lambda) endpoint for each scale. This entails uploading the trained model artifacts and associated inference code to each Lambda container. Note that configuring these functions incurs no cost; serverless instances are only billed when they are invoked, and involve no passive costs over time. Our inference routine then proceeds in two stages. End users firstly submit HTTP inference requests to a pre-processing Lambda function, specifying the desired forecasting scale. This function fetches the necessary inference input data from object storage (e.g., the last 3 hours of traffic data for \(k = 3\)), and performs the necessary pre-processing. It then forwards the processed inference data to the relevant SageMaker Serverless Inference endpoint. Once inference has been completed, results are returned to the user.

9 Experimental analysis

9.1 Dataset and experimental setup

In this section we present the results of our experiments which test the effectiveness of popular traffic forecasting methods in a new setting. We then evaluate the impact of incorporating DUE data as an additional dimension to the input feature vector. We also consider the performance impact of using the FAAM, in place of a distance-based adjacency matrix, in a GNN forecasting model. After exploring the error profiles of our models, and their efficiency within Foresight, we evaluate the enhancements made in Foresight Plus. In particular, we assess forecasting performance at longer scales, as well as the cost-effectiveness and performance of our fully serverless solution (compared against non-serverless alternatives).

9.1.1 Road camera dataset

The anonymized and aggregated data used for the experiments is from a set of ANPR cameras in the West Midlands region of the UK, covering several large conurbations including Birmingham and Coventry. The precise locations of cameras remain private. Cameras are located along a variety of different route types, including busy interconnections, inner city streets and suburban/rural roads etc. This is different to many prior datasets, such as METR-LA [24], where road sensors are typically located on freeways where one can expect a high volume of free-flowing traffic. The quality of our data is high; the rate of missingness is only 2.3%, compared with 8.1% for METR-LA. We use linear interpolation to impute these missing values.

9.1.2 Experimental setup

Unless stated otherwise, the vehicle count data used in the following experiments was collected between August 5th and December 5th 2021 (inclusive), and was aggregated at 15 minute intervals. DUE data was collected for the same period. The flow was measured between August and November 2021 in order to compute the FAAM. Experiments on Foresight Plus are described in Section 9.4; the dataset used for these experiments was collected over a different date range.

The data was split into training, validation and test sets in a 70/10/20 ratio. We evaluate performance using mean absolute error (MAE) and mean absolute percentage error (MAPE). We also calculate the error distribution’s coefficient of variation, which we refer to as the error coefficient of variation (ECV). We refer to the set of absolute errors across all test samples as \(\mathcal {E}\), and hence \(ECV = \frac{\sigma (\mathcal {E})}{\mu (\mathcal {E})}\). The ECV allows us to compare the dispersion of the error terms across different distributions (i.e., the sets of errors made by different models), as it normalizes by the mean error. A high ECV indicates that predictions are inconsistent.

We evaluate the results firstly over all time periods in the test data, which we refer to as ‘Any Time’ (AT) experiments. We also perform evaluation focusing only on ‘Peak Times’ (PT). We identify peak times as those that have historically shown high average traffic counts, but also high levels of variability. High average traffic counts indicate heavy load on the network, which we assume are periods of interest for transport managers. High levels of variability are a sign of challenging forecasting conditions, and may denote periods of unusual traffic conditions on the network. We identify these periods of interest by first dividing the dataset into weekends and weekdays, and then further splitting each of these into hourly subsets. The mean and coefficient of variation of each subset is then calculated. Any of these subsets with both mean and coefficient of variation in the upper two quartiles is classified as peak time. The only time periods which satisfy this are 7am-8am, and 8am-9am on weekdays, hence we select these as our peak times. This selection also conforms closely to the domain knowledge of our partners at TfWM.

9.2 Forecasting models

As discussed in Section 2, numerous methods have been applied to the traffic prediction problem over many years. Our objective is not to identify the current state of the art in this domain, but rather to select a leading, popular GNN-based solution, and to adapt and enhance its performance using our methods and the real-world data available via our partnership with TfWM. DCRNN [34] is such a solution and is widely referenced in surveys of leading traffic prediction models [25, 29, 55]. As comparative baselines, we select a small number from the plethora of alternative solutions, with representatives from some of the common categories of prediction models. One of our selection criteria was to find solutions which had been implemented on the METR-LA [24]/PEMS-BAY datasets, as these are comparable to the traffic count data available to us for this study.

The following forecasting models have been evaluated on the road camera dataset.

  • Historical Average (HA): We produce a historical average matrix based on the training set. The average reading over the training set is calculated at each sensor in S for each of the 672 (4x24x7) weekly time steps. To perform inference, we give the historical average value of the target time period as our prediction (the notion of \(T'\) historical traffic signals is not applicable to this method).

  • ARIMA: We iterate over all sensors and all test examples. In each iteration, we train an ARIMA modelFootnote 14, using the previous \(T' = 100\) values as the training input.

  • Feed Forward Neural Network (FFNN): We implement an FFNN, where the input consists of the previous \(T'\) readings across all sensors \(s \in S\). The model produces predictions for the next T forecasting horizons. The network is constructed with two hidden linear layers, with ReLU activation functions. Model parameters are learned using backpropagation, with an L1 loss function.

  • Long Short Term Memory (LSTM): This is implemented similarly to FFNN, except using LSTM layers in place of linear layers. Within the LSTM layers, input data is treated as a sequence and temporal patterns are learnt using an additional hidden layer to capture the cell state, which passes information along the sequence.

  • Diffusion Convolutional Recurrent Neural Network (DCRNN-Base): We select DCRNN [34] as an illustrative example of an effective GNN method. This method has been previously identified as one of the best-performing (GNN) approaches for traffic forecasting on benchmark datasets [9]. The model utilizes a distance-based adjacency matrix to model the spatial relationships between road sensors, and employs diffusion convolution and bidirectional random walks to simulate traffic propagation in the network. We utilize the PyTorch implementation of DCRNN [34]. Note that we term this DCRNN-Base to avoid confusion with the variants below.

  • DCRNN-RW-T / DCRNN-RW-G: DCRNN with DUE adaption to include roadwork data. DCRNN-RW-T associates live roadworks to all sensors within a 1000m distance threshold. DCRNN-RW-G uses thresholded Gaussian kernel normalization (threshold \(k = 0.1\)).

  • DCRNN-F: DCRNN with the FAAM representing the underlying graph structure. An acceptable transition period \(\tau \) between sensors is given as 3600 seconds and thresholded Gaussian kernel normalization (\(k = 0.1\)) is applied on the matrix.

  • DCRNN-RW-F: DCRNN with roadworks (using Gaussian kernel method) and FAAM.

All models are implemented in AWS SageMaker Studio using Python 3.6, on a ml.g4dn.xlarge instance. We use PyTorch 1.8 to implement FFNN, LSTM, and DCRNN (including all variants). Unless stated otherwise, \(T' = T = 4\), and \(B = 15\) minutes. In practice, we make predictions over horizons of 15, 30, 45 and 60 minutes (henceforth referred to as 15m, 30m, 45m, 60m). Note that we make predictions over all horizons simultaneously, i.e. the model does not gain information about the observed time series at \(t+1\) when predicting for \(t+2\). The more distant forecasting horizons (i.e., 45m, 60m) offer transport managers more time to implement pre-emptive interventions on the road network. Hence, performance gains here are particularly valuable.

Table 1 Table of results for all time periods (AT)
Table 2 Table of results for peak time (PT) periods only

9.3 Experimental results: foresight

We describe the key findings from our experimental results, which are presented in Tables 1 (AT) and 2 (PT). First, we compare the performance of several existing forecasting approaches in our new data setting. We then consider the impact of incorporating roadworks as an exogenous input feature, as well as using flow to determine edge weights in the adjacency matrix. Next, we discuss our findings pertaining to prediction reliability using ECV. Finally, we analyze the efficiency of Foresight.

9.3.1 Analysis of existing approaches

We first consider the performance of several existing spatio-temporal forecasting approaches in our new data setting. During the AT experiments it can be observed that DCRNN-Base makes more accurate predictions across all four time horizons (MAE improvements - 15m: 20.9%, 30m: 20.6%, 45m: 25.3%, 60m: 22.7%) compared to the next closest non-DCRNN model (LSTM), with the largest improvements seen at the longest forecasting horizons. Similarly, during the PT experiments DCRNN-Base remains the most accurate model compared to the non-DCRNN options. However, it is interesting to note that the improvements compared to LSTM are now much smaller (MAE improvements - 15m: 18.5%, 30m: 15.1%, 45m: 14.2%, 60m: 8.3%), and the trend at longer horizons is reversed where we see the smallest MAE improvements. ARIMA tends to be a competitive model for shorter horizons, during both PT and AT experiments, however the performance deteriorates quickly at longer forecasting horizons, which indicates that this model requires fresh data to support accurate predictions. HA and FFNN make the least accurate predictions across all forecasting horizons.

Different trends emerge when MAPE performance is considered. DCRNN-Base now exhibits poorer performance than LSTM across all forecasting horizons during the AT experiments (MAPE degradation - 15m: 22%, 30m: 25.8%, 45m: 21.3%, 60m: 11.3%); these discrepancies are further exacerbated in PT experiments (MAPE degradation - 15m: 37.7%, 30m: 36.6%, 45m: 41.2%, 60m: 18.3%). This is an interesting result as it suggests that while LSTM makes poorer predictions on average (i.e., MAE), it also makes fewer mistakes of a significant margin, leading to a lower MAPE (this metric is highly sensitive to outliers in the error term). It may therefore be inferred that LSTM is better than DCRNN-Base at predicting unusual traffic patterns, especially at peak times. In terms of MAPE, ARIMA was shown to be a highly competitive model across all forecasting horizons, outperforming DCRNN-Base in most cases, with more pronounced gains in PT experiments. As ARIMA is retrained on the most recent data when evaluating each test sample, it will naturally be more responsive to unusual traffic patterns than models trained using a conventional train/test split. LSTM still largely outperforms ARIMA in regards to MAPE. HA performs particularly poorly on this metric, due to its inability to dynamically respond to current network conditions.

9.3.2 DUE analysis

We also evaluate the impact of adding dynamic urban events to GNN models. Mixed results are achieved when MAE is considered. DCRNN-RW-G, which associates roadworks using a thresholded Gaussian kernel, generally yields higher MAE than DCRNN-Base across both AT and PT experiments. These discrepancies in MAE are particularly pronounced for long forecasting horizons during peak times (MAE degradation - 45m: 5.4%, 60m: 11.4%). On the other hand, DCRNN-RW-T (binary thresholding) achieves lower MAE compared to DCRNN-Base over all AT experiments. However, it still yields inferior performance at more distant forecasting horizons at peak times (MAE degradation - 45m: 1.4%, 60m: 2.1%).

The results for MAPE present a contrasting picture, where DCRNN-RW-G outperforms DCRNN-RW-T across all experiments. During AT experiments (especially at longer horizons), DCRNN-RW-G achieves significant improvements compared to DCRNN-Base (MAE improvements - 45m: 29.8%, 60m: 29.1%). We observe a similar pattern during peak times. These findings indicate that using a thresholded Gaussian kernel during the construction of a FAAM yields a reduction in large outlier errors (likely resulting in improved performance under unusual road network conditions).

9.3.3 FAAM analysis

Our experimental results indicate that using vehicle-level flow data to model inter-sensor relationships is an effective strategy. For AT experiments, DCRNN-F achieves lower MAE than DCRNN-Base across all time horizons, and is particularly effective at long forecasting horizons (MAE improvement - 60m: 3.2%). Further, we observe even larger MAE gains for DCRNN-F compared to DCRNN-Base at peak times (MAE improvement - 60m: 8.8%). These findings support the inclusion of vehicle-level flow data into GNN models for improved predictive performance.

We note that leveraging the FAAM in place of a distance-based adjacency matrix (i.e., DCRNN-Base) yields MAPE improvements in all cases. However, the most significant MAPE gains are still experienced by DCRNN-RW-G, indicating that incorporating roadworks is a more effective strategy for minimizing outlier errors. It should be noted that DCRNN-F mitigates much of the degradation in MAPE performance at peak times that DCRNN-Base suffers in comparison to LSTM, while also offering leading MAE results.

9.3.4 Error coefficient of variation

As illustrated in Tables 1 and 2, DL models, particularly those which have been enhanced by DUE data or the FAAM, experience the highest ECV (especially at longer forecasting horizons). As shown above, it is at these more distant horizons that the biggest performance improvements (MAE/MAPE) are observed for our augmented models. This suggests that while these solutions produce the best forecasts on average, their errors are the least consistent. This finding is noteworthy, and we would recommend further investigation to better understand its implications.

9.3.5 Efficiency analysis

As discussed in Section 3.2, the real-time forecasting task requires that \(T_{Total} = T_{Agg} + T_{PreProc} + T_{Inf} \le B\). In the current version of Foresight, we allow for \(T_{Agg} \le 40\) seconds. For all of the implemented forecasting models, \(T_{PreProc} \le 6\) seconds. Each model except ARIMA achieves \(T_{Inf} \le 2\) seconds. As discussed above, at inference time we train an ARIMA model over the previous \(T' = 100\) values for each \(s \in S\). For ARIMA, \(T_{Inf} \le 16\) seconds. Hence, all of the presented models achieve \(T_{Total} \approx 1\) minute, satisfying (2) with significant headroom for \(B = 15\) minutes. Further, these results conform to alternative notions of real-time forecasting [42], where predictions were produced in a single-digit order of minutes.

9.4 Experimental results: foresight plus

9.4.1 Extending the forecasting scale

We first evaluate the resilience of DL-based forecasting models to the extension of the forecasting scale further into the future, as well as the impact of a larger historical traffic time series. As discussed in Section 6, Foresight Plus offers 7 different forecasting scales; namely \(k = 1, 3, 6, 12, 24, 36, 48\) hours ahead. Each still forecasts 4 horizons into the future, with the time period (B minutes) covered by each horizon expanding (uniformly) with the scale. For instance, the ‘1hr ahead’ scale includes forecasting horizons 15mins, 30mins, 45mins and 60mins into the future, while the horizons for ‘12hrs ahead’ reflect 3hrs, 6hrs, 9hrs and 12hrs ahead. The forecasting scales, together with their corresponding horizons, are presented in Table 3.

Table 3 Table illustrating the time periods encapsulated by each horizon, for all seven forecasting scales (\(k = 1, 3, 6, 12, 24, 36, 48\)).
Fig. 3
figure 3

Mean Absolute Error (MAE) results for all forecasting scales (\(k = 1, 3, 6, 12, 24, 36, 48\)). Each scale has four unique horizons (see Table 3)

We plot MAE and MAPE results in Figs. 3 and 4, respectively. Our experimental findings for the ‘1hr ahead’ scale indicate that DL/GNN-based forecasting models such as DCRNN benefit significantly from a larger historical dataset of traffic signals. Expanding from 4 to 18 months of ANPR readings leads to a reduction in MAPE from \(\sim 27\)-\(40\%\) to \(\sim 17\)-\(24\%\) (over the 4 horizons for \(k=1)\). Beyond the obvious benefit of learning from a greater number of training samples, the fact that the model has observed data from each month in the calendar year allows it to better capture the seasonality of traffic patterns.

Next, considering MAE over all scales, we see that errors increase in an approximately linear fashion as the forecasting scale grows. This is expected, since as k increases, so does the overall volume of traffic being predicted (note that we forecast traffic count, rather than speed), as well as the likelihood of an unusual traffic event influencing predictions. Forecasting errors increase over the 4 horizons in a steady fashion over all scales, apart from a slightly increased error for horizon 2 at \(k=36\).

Considering MAPE, we see that prediction errors remain stable as the forecasting scale expands significantly further into the future, and can even improve. We observe minimal degradation between \(k = 1, 3, 6, 12\), outside of minor lapses in performance at \(k = 12\) (horizon 1). Forecasting errors then significantly reduce for \(k=24, 36, 48\). While the particularly low errors at these longer timescales may indicate the relative stability of daily traffic results, and/or the reduced impact of isolated unusual traffic events such as roadworks, it is nonetheless impressive to see such strong predictive accuracy when predicting much further into the future (using only a lightweight upscaling method). These results indicate that Foresight Plus is a reliable solution across all forecasting scales, highlighting its utility for a variety of use cases. It is interesting to observe the contrast between the errors at \(k=24, 48\) and \(k=36\). The slight increase in MAPE at \(k=36\) may indicate that forecasting traffic at multiples of 24 hours is a special case which offers additional stability. Further work could investigate this effect in more detail.

Fig. 4
figure 4

Mean Absolute Percentage Error (MAPE) results for all forecasting scales (\(k = 1, 3, 6, 12, 24, 36, 48\)). Each scale has four unique horizons (see Table 3)

9.4.2 Fully serverless real-time inference: cost

We now evaluate Foresight Plus’ fully serverless solution for real-time forecasting inference. In this section, we consider three variants of Foresight Plus, each with a different memory allocation. Namely, we configure serverless inference pipelines with 1GB, 3GB and 6GB memory (note that this applies to both \(M_{PreProc}\) and \(M_{Inf}\)). These configurations were selected as 1GB/6GB are the current min/max memory allocation for AWS SageMaker Serverless Inference. Note that increasing the memory allocation to AWS Lambda instances (which both the pre-processing and inference stages are executed on) entails an increase in vCPU allocation and network bandwidth.

Our cost model only considers inference expenditure, as our work focuses on sporadic inference workloads across multiple scales, rather than minimizing the number/duration of training jobs. As an illustrative example of training overheads, our models incurred costs of \(\sim \) $0.30 (\(k = 48\)) to \(\sim \) $10 (\(k = 1\)) on an AWS ml.g4dn.xlarge instanceFootnote 15. These costs will be amortized by daily inference costs, particularly at large query volumes. While the models will periodically need to be re-trained, the frequency/duration of these jobs is beyond the scope of our work, and has been explored previously [46]. Further, the large date range covered by the ANPR dataset (18 months, vs 4 months for METR-LA) ensures that seasonality is captured by the model; this should reduce the frequency at which the models require re-training.

We first compare the cost-effectiveness of Foresight Plus to that of a standard inference solution; that is, a provisioned endpoint, running on VM instances (analogous to the original Foresight solution). In this experiment, we model the non-serverless solution as running with two AWS t2.medium instances (2 vCPU, 4GB memory). We allocate two instances to handle overlapping queries (which could exceed the memory of one instance), and to offer redundancy. To simulate the sporadic inference workloads which Foresight Plus targets, we model queries being received over a 24-hour period, which are uniformly distributed over the 7 forecasting scales (\(k = 1, 3, 6, 12, 24, 36, 48\)).

Fig. 5
figure 5

Figure showing the daily query cost against the daily query volume, for each serverless inference solution as well as a provisioned endpoint alternative. Assumes a uniform distribution over forecasting scales and cold/warm requests

Fig. 6
figure 6

Figure showing the per-request runtime and cost for each serverless inference implementation, across all forecasting scales. Left: cold inference requests. Right: warm inference requests

We also consider both cold and warm inference requests. Each AWS Lambda request runs in a dedicated container. Upon the receipt of a request, the AWS Lambda service attempts to allocate it to a container. If no containers are immediately available, one must be started; this incurs a short delayFootnote 16, and is known as a ‘cold start’. Once a request has been completed, its container remains warm for \(\sim 15\) minutes. If another request is received during this period, it can be rapidly assigned to the warm container (this effect is termed ‘container re-use’), hence avoiding the cold start delay. It should be noted that container re-use only applies to instances of the same Lambda function, so consecutive requests relating to different forecasting scales will not benefit from this effect. However, all inference requests initially invoke a shared pre-processing function, which will benefit from container re-use for all queries. In this experiment, we assume a 50/50 split of cold/warm inference requests.

On the other hand, the provisioned endpoint solution will have a fixed cost regardless of utilisation, as VM instances are simply billed at an hourly rate. In Fig. 5, we observe that each serverless variant is significantly more cost-effective than the provisioned alternative, until the daily query volume becomes very high (1GB: 4120 queries/day, 3GB: 2424 queries/day, 6GB: 1270 queries/day). This highlights the significant cost savings which a lightweight serverless inference solution can provide.

9.4.3 Fully serverless real-time inference: performance

We next consider the cost-to-performance ratio of each serverless solution. As illustrated in Fig. 6, while 1GB is clearly the cheapest solution, it is also the least performant. We identify the 3GB configuration as a strong compromise of cost and performance, for both cold and warm inference requests. Its end-to-end inference runtimes are significantly faster than 1GB, and are on par with 6GB for all tests. It also costs significantly less than 3x as much as 1GB; this is because Lambda functions are billed by the second of active runtime, so improving the computational efficiency can also reduce costs. While the 6GB solution is usually the most performant, it typically only achieves small improvements over 3GB while incurring significantly higher costs.

However, it should be noted that all three solutions comfortably satisfy the real-time inference requirements described in Sections 3.2 and 9.3.5. Therefore, even the 1GB solution is a viable option for deployment, particularly if the inference workload will predominantly focus on shorter forecasting scales (1-3hrs ahead). However, if responsiveness is of especially important (e.g., if queries are performed by members of the public via a mobile app), and/or many queries will target the longer-to-execute scales, then 3GB is likely a more suitable configuration.

10 Conclusion

In this work, we develop Foresight Plus (in collaboration with Transport for the West Midlands), which builds upon the existing Foresight system to enhance its real-time spatio-temporal forecasting capabilities. We present a novel method for extending the forecasting scale, enabling predictions to be made 1, 3, 6, 12, 24, 36 or 48 hours ahead. This is a significant improvement when compared to the 1 hour ahead scale which is commonly seen in forecasting literature (and was studied in Foresight). It greatly improves the utility of the system for users who require accurate forecasts multiple hours in advance. Our experimental analysis shows that GNN forecasting models such as DCRNN can achieve impressive performance under extended forecasting horizons, with MAPE frequently reducing as the forecasting scale increases. These results highlight the effectiveness of Foresight Plus as both a short and longer-term forecasting system. Further, we consider the fact that many inference workloads will be sporadic in nature, with queries arriving at irregular intervals and being spread over multiple different predictive models/scales. We identify that a fully serverless, pay-as-you-use inference solution is far more cost-effective for such workloads than a provisioned approach (as was utilized in Foresight). We develop an optimized serverless inference procedure in Foresight Plus, and evaluate its scalability, cost and performance across multiple resource configurations. Further work could evaluate the robustness of additional forecasting model types to extensions in the forecasting scale. Also, additional optimisations could be made to accelerate queries made in quick succession (e.g., caching of previous results for a short period of time). The system is generalizable to any similar setting. The dataset used in this study, although relatively complete when compared to benchmark traffic datasets, is not particularly specialized or feature-rich. Any transport authority with a sensor network incorporating ANPR capabilities (which enable the capture of vehicle-level flow information), and with access to sources of DUE data, could readily adopt this cloud-based system. Our incorporation of roadworks data into the predictive model, using simple distance metrics from the sensor network, is a demonstration of how DUE data can improve predictive accuracy. Indeed, there is much scope for leveraging other types of exogenous datasets into such solutions.