Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction

The advent of the Internet of Things and 5G has further accelerated the growth in devices attempting to gain access to the wireless spectrum. A consequence of this has been the commensurate growth in spectrum conflict and congestion across the wireless spectrum, which has begun to impose a significant impost upon innovation in both the public and private sectors. One potential avenue for resolving these issues, and improving the efficiency of spectrum utilisation can be found in devices making intelligent decisions about their access to spectrum through Dynamic Spectrum Allocation. Changing to a system of Dynamic Spectrum Allocation would require the development of complex and sophisticated inference frameworks, that would be able to be deployed at a scale able to support significant numbers of devices. The development and deployment of these systems cannot exist in isolation, but rather would require the development of tools that can simulate, measure, and predict Spectral Occupancy. To support the development such tools, this work reviews not just the available prediction frameworks for networked systems with sparse sensing over large scale geospatial environments, but also holistically considers the myriad of technological approaches required to support Dynamic Spectrum Allocation.


Introduction
Spectrum conflict is on the rise, due to the growth of devices seeking access to a finite amount of spectral resources. The competition for access to this spectrum already has a significant impost on innovation in both the public and private sectors. These costs will only grow as it is projected that there will be up to 50 billion internet enabled wireless devices by 2020 (Al-Fuqaha et al. 2015), attempting to exchange 50 exabytes of data per month by 2021 (Zhou et al. 2018). Supporting this growth will require a commensurate increase in capacity, through both capital investment and the development of new technologies. Failing to address this growth in demand will result in an increasingly congested spectral environment, cost increases, and access constraints that will affect mobile devices, the Internet of Things, satellites, and radar systems.
These issues can broadly be seen as a consequence of static spectrum allocation, which has produced a spectral environment that is simultaneously congested and underutilised (Shared Spectrum Company 2010). One approach for improving the efficiency of access to the spectrum environment would be to move to policies of Dynamic Spectrum Allocation, in which devices negotiate with the spectral environment for access, in a fashion that balances public and private needs against demand (Marţian et al. 2010). Building such systems around Software Defined Radios (SDRs) and Cognitive Radios, to allow devices to dynamically and intelligently manage their access to the spectral environment without necessarily requiring any centralised control (Iii 2000;Sithamparanathan and Giorgetti 2012), essentially performing local optimisation to improve the individual device experience (Jiang et al. 2017b).
Performing such optimisation is an inherently complex task, as it requires a deep understanding of the multifaceted nature of signal behaviour, and the interactions between signals and environment in which they exist. This behaviour is driven not just by technological factors-like the use of WiFi and cellular networks, or the level of data throughput expected by users-but also how we both move through the built and natural environments, and how the signals that we produce interact with said environments. Moreover, additional complexity results from how contributions to the signal environment can be driven by the state of the signal environment itself. While it is clearly impossible to predict the conscious and unconscious decisions that drive an individual's contribution to the spectral environment, it may well be possible to accurately predict the aggregate contributions of a population group.
Armed with the data to make such predictions and the appropriate methodologies to leverage said data, SDRs would significantly improve the efficiency of spectral utilisation. However, the pathway to deployment of such systems is particularly complex, due to the lack of representative data sets to test the accuracy of prospective predictive frameworks. While it would be possible to collect such data, the lead time to designing and deploying the systems required would be significant, which would in turn lead to significant costs. Furthermore, deploying fixed measurement systems prior to designing the predictive framework would mean that lessons learned from the predictive approaches could not be leveraged to improve the design of the systems used to measure Spectral Occupancy.
Clearly, a linear developmental approach, in which the systems engineering first considers measurement, then prediction, and finally the hurdles of deployment would require significant investments in both financial resources and time. Disrupting this sequential design process, and moving to a more efficient, circular design methodology, is possible by introducing simulated data to the design process. By constructing a simulacra of the spectral environment, the performance of predictive techniques, the requisite engineering specifications for the sensors, and the designs of the sensor networks themselves can all be considered in parallel. While constructing such a simulacra is not without its challenges-due to the multifaceted nature of spectral occupancy, and the geographic and temporal scales over which it varies-doing so has the potential to significantly decrease the development time required to build the systems required to support Dynamic Spectrum Allocation.

Summary of contributions
The task of predicting dynamic spectrum allocation intuitively feels as if it would be a natural extension of predictive modelling in other fields. However, the spectral and spatiotemporal dimensionality, the scale of underlying processes, and the inherent difficulties imposed by the dimensionality of the underlying data present unique challenges that do not have many analogues in related fields. Even when considered in the context of these sub-problems, the strong inter-dependencies between problem facets exacerbates the overall complexity. The nature of these inter-dependencies-as is emphasised within Fig. 1inherently means that a good solution to one-sub-problem may be infeasible in the context of the broader problem space.
As such, this review considers a holistic approach the systems and techniques required to support dynamic spectrum allocation. This includes discussing the motivation of the problem space in Sect. 2; the measurement of spectrum usage in Sect. 3; constructing future predictions within Sect. 4; and the deployment of such systems in Sect. 5.
In taking such a holistic perspective, the contribution of this work is not only to review the performance of techniques (drawn from works within and outside the field of dynamic spectrum allocation), but also to provide a big-picture perspective to researchers considering facets of the spectrum allocation problem. It is hoped that by doing so future work can be directed in a fashion that is complimentary to the wider problem space.
In aide of this, our work makes the following specific contributions: 1. A concise definition of Spectral Occupancy, that can be applied to a range of domains, while being well suited for spectral-spatio-temporal predictive modelling, as covered within Sect. 2. This chapter also reviews the considerations and techniques required for the measurement of Spectral Occupancy. Fig. 1 Flowchart emphasising the inherent inter-dependencies in attempting to support Dynamic Spectrum Allocation. Here fundamental problem scales are selected, prior to either simulating or measuring data. This data is then modelled, and based upon the outcomes of this process, the fundamental assumptions regarding the appropriate scales may need to be revised

Motivation and background
To date, studies of dynamic spectral allocation have typically focused upon specific subproblems. One common feature is the reliance upon univariate signals, representing the capture of spectral data at a single point of space, over some defined time scale. These issues persist across both works considering the measurement of Spectral Occupancy (Xue et al. 2013;Subramaniam et al. 2015) and those testing the applicability of predictive regimes (Wang and Salous 2011). While such work does demonstrate the applicability of localised spectral sensing, it inherently misses opportunities to understand key drivers of the signals environment, especially in urban environments. In such spaces, there is both the potential for there to be inherent correlations between sensor locations, and an inherent interdependency between the use of spectrum and the surrounding built environment. As such, rather than consider spectrum allocation at a single point in space, this work is the first to consider not just the suite of potential approaches, but how these approaches can be built into a cohesive, holistic predictive system.
In aide of this, we begin by defining how Spectral Occupancy should be considered quantitatively. Any measure of Spectral Occupancy should reflect the difficulty of accessing the spectral environment. This environment is inherently inhomogeneous across not just time and space, but also the frequency at which a device is able to communicate. To support Dynamic Spectrum Allocation, the metric should also reflect not just the level of congestion within the spectral environment, but also the ability for the device to transmit signals relative to such congestion. In doing so, the measure should reflect that different devices will have a different perspective on what constitutes occupancy, as a mobile phone has far less power, and thus ability to access and impact congested spectrum, than a television transmission tower.
The Spectrum Occupancy O at any point of space can be considered as a time series that represents the proportion of the time window (t, t + t) that a signal would face congestion from another transmitter, whose received signal would be above the considered threshold power in the bandwidth region of interest. This corresponds to the probability that measured signals at a point and time will be strong enough to induce congestion within the spectrum. Such a measure is consistent with the latest 802.22 wireless standard for SDRs (Cordeiro et al. 2005). Building upon the Duty Cycle model of Spectral Occupancy (Wellens and Mähönen 2010) to incorporate spectral and spatiotemporal variation, we propose measuring occupancy as here t and t are the time and the length of the time window respectively, x is the location at which the occupancy is to be measured, P is the maximum received power at a point in space and time, P is the threshold power (in dBW), and b represents the bandwidth representing the frequency span [f 0 , f 1 ] . Treating this as an integrable measure only with respect to time, and not space, is a consequence of imposing that the measure of occupancy relates to received power at any point. Moreover, considering the occupancy in terms of bandwidth, rather than frequency, reflects the potential for frequency modulation to occur, and also allows a degree of quantisation to be introduced to the model, which can reduce the complexity of the task of predicting Spectral Occupancy while simultaneously reflecting the nature of different device behaviours.

Measuring spectral occupancy: spectral sensing
Before considering the prediction of spectral occupancy, it is important to understand how deployed systems are able to capture the spectral environment. An exemplar sensing environment is shown in Fig. 2, in which a sparse network of sensors are able to capture the local spectral occupancy. These can be considered as local observations at

Fig. 2
An example sensing domain, containing a base station antenna, four mobile devices, three fixed sensors, and both natural and constructed environmental factors distinct x ⊂Ω at time t. Depending upon the context of the deployed system, any predictions may need to be constructed at either the discrete sensor locations x ⊂Ω ; or across a broader domain x ⊂ Ω.
To support the development of these systems, it is important to understand the behaviour of sensors of Spectral Occupancy, as their physical behaviours will inform the range of available prediction methodologies. The tools for such spectrum sensing fall into two broad categories, known as narrowband or wideband sensors (Sithamparanathan and Giorgetti 2012), with the demarcation between the two determined by the number of frequency channels the sensor can access. Narrowband sensors consider one distinct frequency at each point in time, whereas wideband sensors aim to analyse a frequency bandwidth, in a manner that operates similarly to a number of narrowband sensors operating in parallel (Ali and Hamouda 2016), which introduces complexity to the implementation, but requires less energy while producing faster results than sequential-sensing (Lu et al. 2017).
Narrowband sensors can be constructed based upon a number of frameworks, with matched-filter detection (Veen and Van Der Wiellen 2003), energy detection (Sun et al. 2010), and cyclostationarity feature detection (Sutton et al. 2008) being the most commonly implemented. Other narrowband sensing techniques include covariance based detection (Zeng and Liang 2008) and cooperative machine learning based sensing (Wang and Yang 2016). With Matched-filtering, the received signal-to-noise ratio is maximised under the assumption that the primary user signal can be demodulated, which imposes additional constraints. Energy detection is a simplification of Matched-filtering, that introduces an energy metric to perform sub-optimal non-coherent detection in a manner that can require fewer samples to meet a detection constraint as compared to matched-filtering. Cyclostationarity feature detection is constructed around the assumption that detected signals contain periodic components, that can be leveraged to decrease the number of samples required for a detection event. However, the process of Cyclostationarity feature detection is computationally expensive, due to the need to both make multiple frequency transforms and calculate the signal's autocorrelation.
To support Dynamic Spectrum Allocation, sensing systems require the ability to provide information about wide swathes of the frequency range, in order to exploit extant opportunities. While techniques have been proposed for capturing a representative sample of the environment through a distributed network of sensors (Blum et al. 1997), they are still fundamentally limited in the amount of information that they can capture. In contrast, wideband sensors are able to sample the broader spectral environment with a self contained device, producing consistent and coherent results, without any of the potential robustness concerns stemming from a reliance on data from a series of networked narrowband sensors.
While the ability to capture larger swathes of the spectral environment with a single sensor is inherently appealing for SDRs, conventional wideband spectrum sensing often leads to a sampling rate so high that it becoms unaffordable, or introduces implementation issues that significantly increase the complexity of designing and integrating these systems. As such, novel signal processing approaches have been required to leverage the potential of wideband sensing. As channels in the wideband frequency spectrum are inherently sparse (Do et al. 2004), there is a clear application case for sparse sampling and Compressed Sensing (CS) (Candes et al. 2006). While most frequency-sampling approaches assume that the ability to reconstruct a signal is limited by the Nyquist frequency, CS incorporates knowledge of the signals sparsity to reconstruct signals sampled significantly below the Nyquist frequency. This is made possible by formulating the frequency reconstruction problem as an under-determined linear system, and then using L 1 -norm minimisation to construct a single solution to the under-determined system.
Within a signals sensing framework, sampling below the Nyquist frequency means that significantly fewer samples are required to accurately reconstruct a signal operating at a given frequency, which has implications for both the speed of processing, the temporal resolutions that are possible, and the hardware requirement for spectrum sensing (Sharma et al. 2016). Because of this, numerous authors have considered applications of CS to narrowband and wideband sensing (Tian and Giannakis 2007;Ma et al. 2016). Each of these authors have presented variants of sub-Nyquist wideband sensing that leverages CS, and while they each have their own relative strengths, a particularly promising approach can be found in one-bit sensing (Laska et al. 2011), which considers only the sign information of measurements, in a manner that significantly reduces the amount of data that needs to be stored and processed (Jacques et al. 2013). By treating these one-bit sign measurements as constraints for the CS based reconstruction, sparse signals can be reconstructed with high probability . By discarding unnecessary components of the original signal, one-bit compressive sensing is able to perform fast sampling without growth in the computational and hardware complexity, while still being robust to noise.
Alternate approaches to sub-Nyquist sampling include wavelet detection, in which a continuous signal is decomposed into a series of markers, corresponding to changes in occupancy of the frequency band (El-Khamy et al. 2013). Other options include multi-band joint detection (Quan et al. 2008), adaptive thresholding (Gorcin et al. 2010) and filter bank detection (Kim and Takada 2009), although each of these techniques is limited by the necessary sampling rate, and the associated latency, complexity, and energy consumption that comes with sampling at the Nyquist frequency level.
Ultimately, the choice of narrowband or wideband, and the specific sampling technique within those frameworks, is a multifaceted optimisation task subject to competing objectives. For a system designed to be integrated into a broader machine learning modelling and forecasting framework, important factors will include the robustness to false positives, as well as both the sensitivity to noise and the robustness of the sensor to noise. Similarly, computational requirements, problem specific latency, and resource utilisation considerations must also be taken into account.
Another important consideration in the specification of sensor networks is the trade off between accuracy and efficiency. In these terms, accuracy relates to both the ability of a sensor to detect transmissions above its noise floor and separate out signal detections from noise, and the false positive rate of transmissions; while efficiency refers to the proportion of time the sensor can query a part of the spectrum, given both observation and throughput constraints (Yu et al. 2012). These two factors of the sensors performance are intrinsically coupled-as more focus is placed upon the accuracy of the sensor, system resources are consumed that reduce its throughput, which will in turn affect the efficiency of the sensor. However, it has been shown that by acting in parallel, networks of sensors can improve both accuracy and efficiency by sharing some of the sensing workload (Xie et al. 2010).

Distributed sensor networks
A network of sensors offer more opportunities than just improving the local performance of sensors-distributed sensor networks are a crucial tool for sampling geographic regions that would be beyond the scope of an individual sensors capabilities. The design of such networks is inherently complicated, due to both the domain complexity-in terms of both the spectral, natural and built environments-and the range of objectives that must be considered. These objectives for resource allocation can include, but are not limited to, considerations of coverage, energy efficiency, communication and processing constraints, detection reliability, and robustness to failure modes, as well as more application specific concerns.
As the domain of interest increases in size, resource constraints make perfect sampling of the domain to be an almost impossible task. Instead, it is likely that there will be geographically correlated problem areas in the domain coverage, giving rise to regions known as holes (Ahmed et al. 2005). A coverage hole is a geographic region that is not covered by at least k sensors, where k is the specified level of coverage required from the application, to capture all data and to account for fault tolerance (Huang and Tseng 2005). Constructing accurate measurements of the geographic distribution of spectrum occupancy-including potential geolocation of signal sources-may require k ≥ 3 (Niculescu and Nath 2003). Peer-to-peer network topologies may also require multiple coverage to ensure robust connectivity. The design of the network may also lead to the formation of routing holes, in which network messages from a geographic region are unable to be communicated to the broader network, due to bandwidth constraints, missing links, or an improperly designed adaptive routing framework (Akkaya and Younis 2005).
Static distributed sensor networks designs can be broadly categorised as belonging to either Target, Area, or Cell-Edge (or Barrier) coverage. For Target coverage, the sensed domain has predetermined discrete target locations that have been marked as being particularly important; Area coverage attempts to maximise the sampling across a defined geographic region; and Cell-Edge coverage attempts to sense transitions into and out of a region of interest, without focusing upon the interior region. Of these three taxonomies, Area coverage has the greatest relevance to the problem of spectral occupancy prediction, and as such we shall focus our consideration of network design on techniques suitable for Area coverage.
Under a static regime, techniques from computational geometry like Voronoi cells (Vieira et al. 2003) and Delaunay triangulation (Wu et al. 2007) were initially popular for managing grid distribution, however they are limited in their ability to be adapted to manage multifaceted objectives. To maximise the coverage of a sensor network, numerous iterative schemes have been successfully tested, including Ant colony optimisation (Liu and He 2014), Glow-worm swarm optimisation (Liao et al. 2011) and Genetic Algorithms (Yoon and Kim 2013), with similar techniques also being employed to maximise the connectivity between network nodes (Younis and Akkaya 2008). Connectivity maximisation is particularly important in the design of systems that are robust to the impact of environmental or adversarial interference.
The design objectives of physical implementations of wireless sensor networks are more likely to be multi-faceted, and as such approaches that can consider balanced objectives must be considered as being of particular importance. In doing so, coverage and connectivity considerations can also be balanced with other design parameters, like energy efficiency, redundancy, and the expected lifespan of the network. Due to their well established efficacy for multi-objective functions, Evolutionary and Genetic algorithms (Khalesian and Delavar 2016), Particle Swarm optimisation (Pradhan and Panda 2012) and even predator scent marking and bee-colony algorithms (Hashim et al. 2016) have all been proposed as mechanisms for managing the deployment of static wireless sensor networks. The motivation for employing these techniques is their well known ability to handle multi-objective optimisation tasks, however they do introduce computational burdens, and are inscrutable when attempting to determine the factors that have given rise to the suggested topology.

Mobile distributed sensing
Distributed sensing is not necessarily limited to fixed topologies-such sensors could be mounted on moving human agents, ground or aerial vehicles, or satellites. As such, (Heo and Varshney 2003) introduced both algorithms, and performance metrics for ensuring that self-deploying wireless sensor networks best meet objectives for coverage, uniformity, deployment time, and distance travelled. However, accessibility for perfect placement is not necessarily guaranteed, and as such deployment algorithms have been designed to account for uncertainty derived from aerial (Corke et al. 2004) or ground based (Chang et al. 2008) deployment systems. The performance of such a system can be further enhanced if the behaviour of individual sensors can be modified, either in terms of their sleep-wake cycles and sensor power (Abo-Zahhad et al. 2016), or using mobility to rearrange the sensor nodes (Niewiadomska-Szynkiewicz et al. 2016). Beyond relocation, node mobility can also be exploited to enhance coverage (Liang and Ren 2005), and to generate efficient data transfer structures (Zhan et al. 2017). The balance of these factors will then be driven by the resources available for deployment.
While observations of the spectral environment are critically important, their value is not able to be capitalised upon without having the supporting hardware and software to manage the data, and to support the post-processing required to generate predictions of future spectral utilisation. This task is inherently complex, due to the potential presence of structural heterogeneities in the collected data, differential delivery rates due to system latencies, and the need to be robust against adversarial attacks and security concerns (Lu et al. 2013). The implementation hurdles associated with these concerns grow nonlinearly with the size of the sampled region, as a consequence of the volume of data that must be processed and managed.

Robust network design
While the aforementioned holes can broadly be considered failures in design and implementation, there is also the possibility that they were introduced through the actions of outside actors, behaving in an inadvertent or adversarial manner (Karlof and Wagner 2003). By compromising a node in such a manner that its behaviour is disrupted, adversarial agents can generate denial-of-service conditions, or inject false data into the detection framework. A sink hole is one in which a node that has either been compromised or exploited introduces behaviour that increases the resource demands placed on neighbouring nodes, in terms of energy consumption or bandwidth. Such an attack aims to introduce a failure mode that propagates through the nodes, to induce an artificial coverage or routing hole (Kalita and Kar 2009).
Another classical attack would be jamming the communication between nodes (in a deliberate or inadvertent manner), to ensure that any information describing a detection cannot be propagated out from the jammers domain of influence. As a consequence of this, any detections that would be able to be identified by network inference would also be denied. Such an event is known as a black hole, and there are also gray holes, in which the malicious node intermittently behaves as a black hole. An alternate adversarial approach is the insertion of a wormhole (Pathan et al. 2006), in which malicious nodes distributed across a wide geographic region create network tunnels between each other, with the aim of introducing erroneous routing behaviour, and, potentially, spurious geographic cross-correlates that could present as false detections, or hide true detections, affecting the accuracy of the sensors.

Predicting spectral occupancy
The complexity of predicting spectral occupancy (in the form of Eq. 1) is complicated due to the high-dimensional spectral-spatio-temporal nature of occupancy. This complexity only grows as the evolution of the spectral environment is dependent upon a large number of confounding factors, including the environment that the signals exist in-that can include built and natural topography, as well as weather events-and human factors. As such, it is important that consideration is given to prediction frameworks that can accurately incorporate sources of data beyond observations of the historic spectral environment.
Another factor that underscores the importance of reviewing available techniques is the local nature of occupancy observations, and the likely need in an SDR context for global (over a domain of interest) predictions. Given the cost of performing observations of the spectral environment it is likely that the observations are highly sparse-at a level beyond what has been required for other problem domains-relative to the resolution required for predictions.
In the context of these factors, which introduce challenges unique to this problem domain, the review of appropriate prediction techniques will be split into a taxonomy that independently considers predictions at, and away from observation sites as on-and offsensor prediction respectively. This taxonomy can be considered as a generalisation of the consideration of prediction domains in Sect. 3. From a time series perspective, a basic exemplar of the differences between on-and off-sensor predictions can be seen in Fig. 3.
For on-sensor predictions, the historic values of a single sensor, or a network to predict the future state of spectral occupancy. By decomposing the task further to consider discrete frequency bands, the prediction task can be expressed as an m × n-dimensional time series, where m and n are the number of connected sensors and frequency bands respectively.
As frequency bands can be treated as being nominally independent, the prediction task can be further decomposed to being n distinct m-dimensional time series. While the overall Fig. 3 Example of the information available for on-sensor and off-sensor predictions. Here the 10 time series provide historical information up to the red dotted line, and the prediction methodologies need to predict all subsequent points in the time series number of time series is consistent across either of these approaches, the latter should be more computationally efficient, as the search for correlations within the time series should be proportional to the dimensionality of the time series. Unfortunately many suitable prediction frameworks are only designed for 1-dimensional time series, requiring the problem to be further decomposed, and to treat each sensor and frequency band as producing its own, distinct time series. While taking such an approach does prevent the prediction framework from leveraging information from neighbouring sensors or correlated frequency bands, it does significantly increase the range of approaches that can be applied to the task of predicting the spectral occupancy. While this lost information may be significant in some sensor network designs, as the spacing between sensors is increased the correlations will likely decrease, lending support for this decomposition approach.
On-sensor predictions can be considered as predictions of the form for i ∈ m and j ∈ n , where each f i,j (t) can be considered independent of every other sequence. It must be noted here that while this framework implies that each sensor is spatially stationary at x i across time, it is also possible to consider moving sensors under this framework. However, as many of these techniques are unable to incorporate contextual information-like position-into their temporal predictions, it must be expected that prediction quality will likely significantly decrease as the speed of motion increases, and becomes less deterministic and patterned.
Off-sensor predictions can be considered either as a distinct task, or a sequentially dependent extension of on-sensor predictions, in which the on-sensor extensions are extended across the spatial domain. The simplest approach for such extensions would be to use interpolation approaches to construct a geospatial representation of spectral occupancy across a domain of spatial interest.
A complicating factor for both on-and off-sensor predictions is the behaviour of the sensors involved. As was discussed in Sect. 3, some sensor designs-particularly wideband sensors-are only capturing a subset of the frequency domain at any one time, as they need to perform a sweep of frequency space to capture the totality of spectral data. This results in a time series sequence that is inherently sparse. Depending upon the properties of the sensor, it may be possible to construct a continuous sequence by down-sampling from the sensor frequency. However, if this is not possible, then even on-sensor predictions may need to be considered using techniques more suitable for off-sensor problems, due to the inherent need for data imputation.

On-sensor predictions
Forecasting Spectral Occupancy can take numerous forms, depending upon both the nature of the requirements, and the resources available to support the prediction. In applications where the measurement of the spectral occupancy occurs at a single sensor-or where the sensors are independent and do not communicate with each other-the prediction task is well suited to the wide range of one-dimensional time series forecasting frameworks. Analogous problems have been successfully considered through statistical methods, ranging from linear extrapolation and sequential models like exponential smoothing (Gardner Jr 2006), through to more complex frameworks like ARIMA and Hidden Markov Models. Machine Learning offers another potential avenue for considering these one-dimensional 1 3 forecasting problems, with Recurrent Neural Networks and Long-Short Term Memory Networks showing particular promise.
The performance of these systems can be improved for systems of multiple sensors, if the sensors are able to pass information between one another. In this case, the system becomes analogous to a multidimensional time series problem, of a form that is well suited for machine learning prediction tasks. The remainder of this subsection will explore suitable techniques for single-and multi-dimensional time series predictions, following the Fig. 4 Taxonomy of regression algorithms for on-sensor wireless spectrum occupancy prediction taxonomy of Fig. 4, with particular emphasis upon techniques that have been applied to analogous problem spaces.

Statistical methods for temporal prediction
Historically, Bayesian models have frequently been used for predicting time series problems from varying domains, including spectral processes (Eltom et al. 2018a). The depth of a well-fit Bayesian explanatory model allows for explicit claims about the generative process that actually created the observation. Hence, the estimated latent generative model can synthesise observations based on the conditional probability relationship between the observations and latent features. However, the downside of model richness is scalability, and notably higher computational costs. Furthermore, such generative models often under perform their competitors if the conditional relationships between model features only partially approximate the underlying generative process (Yu and Deng 2010).
Mixture models are hierarchical models that combine the probability distribution of observation subsets to generate a hybrid distribution. Bayesian mixture models then perform inference by incorporating the mixture likelihood with prior distributions for both the component hyper-parameters and the mixture probabilities. A common extension to Bayesian mixture models is to assume the latent parameters defining the mixture component identities are connected in a Markov chain, rather than independent identically distributed random variables. The resulting models are sequential hierarchical models such as Hidden Markov models, Kalman filters and particle filters (Arulampalam et al. 2002). Another extension are Bayesian nonparametrics (Müller and Quintana 2004), which introduce priors are placed upon a number of components through methods like Dirichlet point processes in order to improve accuracy.
Bayesian Markov-based model for temporal spectrum prediction is essentially a doubly stochastic tracking problem in the time dimension. Within a doubly stochastic model the distribution of the random variable of interest is assumed, or observed, to be a standard, pre-defined distribution and the related parameters of the distribution itself are assumed to be time varying random variables. Through this framework, numerous models have been proposed for modelling aspects of spectrum utilisation, through the use of Poisson distributed Markov chain models proposed by (Bayhan and Alagöz 2012); Hidden Markov Models of (Eltom et al. 2018b); two-state (Csurgai-Horváth and Bito 2011) and three-state discrete-time Markov Models (Bayhan and Alagöz 2012); and higher-order Markov chains (Li et al. 2010).
In contrast to the aforementioned approaches, Gaussian models utilise a Kalman filter, either directly for Linear models, or through a sub-optimal variant for Nonlinear models. These variants include, but are not limited to, the extended Kalman filter, and uncentered Kalman filters. More general nonlinear models can also be constructed using particle filter methods and Monte-Carlo Markov Chain (MCMC) approximations to construct the conditional posterior probability for the relationship between the latent variables and the observations (Kobayashi et al. 2011).
In contrast to the aforementioned stochastic methods, another popular framework for deterministic modelling is the Autoregressive Moving Average (ARMA) model, and its associated techniques. These approaches assume that future behaviour of a time series is a product of a weighted sum of past observations, and an additional set of random contributions that cannot be observed. ARMA is particularly appealing for time series modelling, due to both its simple implementation and easy of interpretation-in contrast to machine learning models-as the output state is not a black box. ARMA variants have been widely used across a range of time series prediction tasks, ranging from predicting short-term stock price movements (McLeod and Zhang 2008), retail sales behaviour (Ramos et al. 2015), and residential electricity consumption forecasting (Badrinath Krishna et al. 2016). This level of support for the technique is driven by the simplicity of ARMA, on both a conceptual and computational level. This simplicity has allowed ARMA to be generalised to multivariate time series, of the form likely to be seen within the spectrum allocation context, through Vector Autoregression (VAR) and the Vector Autoregression Moving-Average (VARMA) (Lütkepohl 2006).
However, it must be noted that ARMA-like models are limited by their reliance upon regularly sampled data which must also be stationary. The stationary condition imposes that all properties of the modelled process must be constant with time, which is not necessarily true for spectral occupancy, which can have seasonalities and trends on time scales ranging from days to years. To account for this, it has been shown that repeatedly differentiating the time series until the point where the sequence can be considered to be stationary works, producing what is known as an Autoregressive Integrated Moving Average (ARIMA) model. The influence of seasonalities can also be explicitly incorporated through a seasonal ARIMA (SARIMA) (Chatfield 1980, Chap. 4), the idea of which has been extended further by Facebook's Prophet algorithm (Taylor and Letham 2017), which is a variant of Generalised Additive Model (GAM) that attempts to decompose the time series into components corresponding to the trend, the seasonality and the influence of holidays (Harvey and Peters 1990). In contrast to ARIMA-like models, Prophet has been designed to manage irregularly spaced data, subject to the assumption that this data can be assumed to have an underlying continuity. By adding additional features, more non-stationary traffic processes, such as trends in vehicular movement (Williams 2001;Jiang 2002) and freeway traffic speeds (Chandra and Al-Deek 2009) have been able to be modelled using extensions of ARIMA. However, even in these works, the autoregressive core of ARIMA means that the approach is still limits these techniques to data sets in which the mean and variance of the underlying signal do not change with time.
Another concern with ARIMA-like methods is their ability to resolve discontinuities and shocks (Christiano et al. 2006;Chari et al. 2008;Kascha 2012). In a signals context such behaviours are visible at very short time scales as devices cease or begin activity; and in very long scales, where aggregate features change, such as would be the case for observations at a time scale that would not, for example, capture the transition from a city during day-and night-times.
Decision trees are another viable technique for exploring the relationship between observations, and their dependent outputs, in a manner that is suitable for understanding the factors that drive changes in the observed spectral environment. Such trees are hierarchically structured topologies of branches connecting decision points, connected by nodes representing bifurcations in the state after the decision point (Rokach and Maimon 2008). In aggregate, the unfolding set of decision nodes produces a model that can readily be interpreted, in contrast to many other nonlinear techniques.
A powerful approach for constructing regression trees can be found in Gradient boosting, with a variant known as eXtreme Gradient Boosting (XGBoost) (Chen and Guestrin 2016), which produces an ensemble of models that, in aggregate, exhibit strong convergence properties while preserving scalability and minimising over-fitting. The potential for XGBoost to predict spectral occupancy at a receiver is underscored by its success in other, similar domains, including electricity demand (Ben Taieb and Hyndman 2014) and oil price forecasting (Zhou et al. 2019).

Machine learning methods for temporal prediction
Artificial Neural Networks (ANNs) are a family of computational techniques for machine learning, that draws upon ideas of mathematical biology. These tools can be used to predict, classify, or cluster data, through what are essentially large systems of nonlinear equations, subject to sets of free parameters. By exposing the model to data, the underlying parameter space can be refined, so that the nonlinear system of equations begins to reflect the data in a manner that allows for the future state of the system to be predicted.
One primary advantage of machine learning type approaches are their ability to incorporate contextual information. This information can include information including, but not limited to, device mobility data, or information from nearby sensors or correlated frequency bands. In doing so, we can generalise Eq. 2 to take the more flexible form where g represents the fusion of spectral-spatio-temporal contextual data. While the synthesisation of any such information into the predictive framework does not guarantee increased prediction accuracy, through careful design it should be possible to improve the generalisability of the constructed predictions.

RNNs Recurrent Neural Networks (RNNs) are a family of neural networks which
are ideally suited to problems involving sequentially structured data. While traditional feedforward neural networks can capture temporal dynamics (Chen et al. 2021), they are often limited to sequences where that the input to each node must be of a fixed length. In contrast, RNNs encode cyclical structural dependencies, that are more appropriate for capturing the sequential nature of time series data. These cyclic structures also allows for parameter sharing to occur, as a number of distinct nodes will all be subject to the same parameters, allowing for the model to efficiently learn correlations between elements of a sequence. Furthermore, because a node with a self-loop may be unfolded an arbitrary number of times, there is no fundamental restriction placed upon the length of the input sequence to the node. This property is of particular relevance to time series problems, as it allows a single model to be constructed for time series with different lengths.
Beyond these cyclical structural elements, RNNs can also be constructed in fashion that allows them to capture: sequential inputs with non-sequential outputs (or vice versa); cases where the input and output sequence are of different lengths (as seen in encoder-decoder architectures); where there are output-to-hidden loops instead of hidden-to-hidden loops (teacher forcing); and bidirectional structures, in which connections extend both forward and backwards in time.
Due to the flexibility in which RNNs can be constructed, they have been successfully applied to problems with crossover to the task of spectral prediction by (Yasdi 1999;Van Lint et al. 2002), and (Song et al. 2016), who all considered problems stemming from traffic dynamics and human mobility. Other comparable problem domains have included predicting wireless user activity (Agarwal et al. 2016) and detecting anomalous behaviours in radio networks (O'Shea et al. 2016;Katzef et al. 2020Katzef et al. , 2021Katzef et al. , 2022.

LSTMs
While RNN architectures are able to conduct time series predictions, they often struggle as the time scale of interdependence increases, as the training process struggles to update weights as the sequence length increases (Bengio et al. 1994). To resolve this, Long Short-Term Memory (LSTM) cells have been introduced as hidden units, as an alter- native to RNN cells. These cells take the structure of an RNN cell, subject to the addition of an additional logical element known as a forget gate (Hochreiter and Schmidhuber 1997). This gate, which is unique to LSTMs, takes the form of a vectors of numbers between 0 and 1 that depend upon the input and previous state through trainable parameters, that focuses the attention on subsets of the previous state. Through the addition of these gates, LSTMs are able to learn which parts of the input and previous state to keep or forget, and to what degree, which allows for modelling complex and arbitrarily long long-term dependencies, beyond what RNNs are capable of.
Because of this, LSTMs have quickly become the most commonly utilised technique for regression and classification tasks in a time series context. Within a spectrum prediction context, several recent works have demonstrated that LSTMs can successfully predict temporal problems from the dynamic spectrum allocation space (Radhakrishnan et al. 2021b), in a fashion that exhibits lower computational complexity than prior approaches (Radhakrishnan and Kandeepan 2020;Radhakrishnan et al. 2021a). When considering the broader set of problem spaces with similar dynamics to those seen within dynamic spectrum allocation, LSTMs have also been shown to be an accurate for predicting nonlinear vehicular traffic dynamics (Zhao et al. 2017b), weather, precipitation (Shi et al. 2015), and energy consumption forecasting (aggregate and single-meter) (Jian et al. 2017a).
LSTMs can also be used in conjunction with what are known as Deep Belief Networks (DBMs), which are a form of Bayesian neural network. Such Bayesian models attempt to predict not just a value, but its probability distribution. These stacked models have been shown to be successful for developing Bayesian nonlinear high dimensional models where other Bayesian models like HMM have struggled (Melchior et al. 2017). For spectrum occupancy modelling, DBN in conjunction with Long Short Term Memory networks have the potential to capture the inherent inter-dependencies between the spatial, temporal and spectral features of the channel states.
Another potential avenue for augmenting LSTMs can be found in Physics-Guided Neural Networks (Karpatne et al. 2017). Such approaches attempt to regularise the behaviour of the solution through a loss function that considers a basic, physical model of the underlying system behaviour. For the purposes of modelling Spectral Occupancy, such a model could be used to increase the effective spatial density of sensor networks, by approximating the distribution of spectral occupancy in their neighbourhoods in terms of known, physics based relationships. (Bahdanau et al. 2014) have recently grown in popularity-especially within the Natural Language Processing community-for their ability to create complex, representative models by calculating the relationship between different positions within sequential data. This mechanism works by constructing a matrix between the input and output sequences, which can be thought of as a learned representation of correlation between each element of the input sequence to each element of the output sequence. Within a Natural Language Processing context, if the input sequence is a sentence in one language, and the output the same sentence translated into another language, then the correlations measure how important an individual word is to the output translation, and how its context also affects the translation.

Attention mechanisms Attention mechanisms
Through this approach, attention mechanisms are able to better capture long-range dependencies, which are particularly difficult to represent within RNN and LSTM architectures. Similar to RNNs and LSTMs, Attention mechanisms have often been embedded within encoder-decoder architectures (Luong et al. 2015), although recent work has suggested that Attention mechanisms perform best when embedded in a network architecture known as a Transformer, in which the model behaviour is entirely driven by the Attention mechanism itself (Vaswani et al. 2017).
While the development of Attention mechanisms has primarily been focused upon sequential data, like temporal sequences, recent work has also demonstrated its promise for spatial data (Xu et al. 2015). This in turn suggests that Attention mechanisms may hold promise for the spectral-spatio-temporal aspect of forecasting Spectral Occupancy, either directly or as part of a broader modelling framework. However, it must be noted that the ability for Attention mechanisms to scale with higher-dimensional data remains an open question.

Convolutional networks
One of the most commonly utilised machine learning techniques are Convolutional Neural Networks (CNNs), which utilise convolution operations to leverage structural dependencies in regular data to quickly generate accurate representative models. Unlike the techniques discussed to this point, a CNNs can be thought of as acting not directly upon the data itself, but rather upon a filtered form of said data. In the context of an image, a filter could be thought of as taking a subset of, blurring, or down-sampling the image, or some combination of the aforementioned operations. A neural network that is built in terms of such filtered components reduces the cost of discovering inter-dependency of the individual components of the neural network, while simultaneously improving convergence properties in a manner that minimises the overall computational cost (Goodfellow et al. 2016, Chap. 9) while producing successful results in a range of domains, including predicting wireless interference (Schmidt et al. 2017) and human mobility (Zhang et al. 2016b).
Typically CNNs are employed in image-based problems, due to the inherent regularity and translation invariance of pixel structured data (Zhang et al. 1990). However, in occupancy data similar regularity can be constructed through careful design of sensor networks. Moreover, CNNs have been applied to spatiotemporal processes by simply adding time as an additional matrix dimension, which can introduce underlying structural seasonalities that is well suited to CNNs (Pyrkov et al. 2018), making them potentially appropriate for both sequential and parallel spatiotemporal prediction. Through this, CNNs have been successfully employed in predicting predicting wireless interference (Schmidt et al. 2017) and human mobility (Zhang et al. 2016b).
One promising variant for parallel spectral occupancy prediction can be found in the Wavenet (Oord et al. 2016) variant of CNN, which employs dilated convolutions are applied in an autoregressive manner through the introduction of an offset stride between temporal components. Wavenet uses to date have primarily been in audio prediction, however such data contains structural similarities with constructing predictions across a range of frequency bands and locations across the spectral-spatio-temporal domain.

Graph convolutional networks
In the case where data cannot be expressively discretised upon a regular grid the performance of CNNs begins to degrade. In response to this, a substantial volume of recent research has focused upon extending convolution operators onto less regularly structured data, through the use of graph networks. These networks often utilise spectral graph theory (Bruna et al. 2013), and construct the convolution operator through a Fourier transform on the graph. This Fourier transform is in turn a generalisation of the Laplacian operator to a graph-based Laplacian, and has been used across a broad range of problem domains including medical classification (Parisot et al. 2017), human activity recognition (Shi et al. 2018), and traffic forecasting (Yu et al. 2017;Hermes et al. 2022). Extensions to the concept of Spectral graph networks can be found in localised spectral graph convolution (Bruna et al. 2013), and multi-scale graph convolution (Abu-El-Haija et al. 2018), which allow for computational efficiency improvements by avoiding having to perform large scale Laplacian eigendecomposition.
However, due to the inherently non-local nature of spectral graph theory, interpreting the convolution operations of such networks can be challenging. This is particularly vexing for some problem domains, like traffic or spectral occupancy forecasting, where interpreting the learned spatial dependencies can provide additional insight. An alternate approach to spectral graph theory can be found in spatial convolutional graph networks, which construct localised convolution operations across the nodes of the graph, where the magnitude of the dynamic edge weighting can give insight into the predicted dynamics (Cui et al. 2018). In the context of a deployed multi-sensor environment, such insight could be used to inform the redeployment of sensor nodes, or the prioritisation of communication between nodes.

Off-sensor predictions
Extending from on-sensor to off-sensor predictions greatly complicates the task of finding analogous problem domains. This in turn increases the complexity of assessing the applicability of extant techniques to the problem at hand. The closest problem domain is numerical weather prediction, in which a finite set of weather stations are used to predict future weather across a broader geographic domain. However even in this case the discrete sensors are augmented by satellite and radar imagery, which is able to capture the exact state across significant proportions of the spatial domain of interest.
Other problem domains that require similar spatiotemporal predictions include vehicular traffic forecasting, and both taxi and ambulance demand forecasting. In the case of traffic forecasting, the motivating data-sets are often recordings of motion through traffic loops on urban road networks. In such cases, while there is a geospatial relationship between the traffic loops, the nature of vehicular traffic means that there is no need to generalise the prediction across the whole problem domain. As such, traffic loop data, while sparse in terms of the physical space it represents is a dense measurement of the behaviours of interest, and it is this density that makes it difficult to map solution techniques from this problem domain into a spectral sensing context.
While taxi and ambulance demand forecasting has a clear spatiotemporal context, the sparse nature of demand for these services means that data is aggregated into large geographic regions (ranging in size from several city blocks up to a suburb), with demand being assumed to be homogeneous over these regions of interest. This assumption of homogeneity introduces a problem similar to that observed with numerical weather prediction-that the characteristic distance scales of such a prediction task are potentially going to be far larger than would be needed to resolve spectral behaviour. Furthermore, the sparse and stochastic nature of demand for these services at any location is fundamentally different to the dense and nominally continuous nature of spectral usage. This again limits the direct applicability of techniques derived for taxi and ambulance demand forecasting to the task of spectral occupancy prediction.
As was alluded to previously, constructing these predictions across the geospatial domain can be performed through two distinct modelling approaches-parallel and sequential prediction. In parallel prediction, the spatial and temporal components (and potentially also the bandwidth) are are considered simultaneously, as part of one single aggregate model; while sequential prediction would involve first constructing predictions forward in time, and then using those predictions in a separate, strictly spatial model to generalise the temporal predictions across the spatial domain. While an integrated, parallel prediction framework may seem intuitively appealing, the relative sparsity of appropriate techniques, and the different nature of occupancy as a function of space as compared to time may lead to undesirable outcomes.
In contrast, sequential prediction allows for drawing from a wider suite of techniques and apply them in the context-be it spatial or temporal prediction-that they are best suited to. At the most basic level, this could be simply adding spatial interpolation (in the form of nearest neighbour, linear, or nonlinear predictions) across the discrete sensor locations, in order to generalise the prediction across space. However, the sequential nature of these models introduces modelling complexity, in terms of how validation is performed upon the spatial layer.
To validate a temporal model at discrete locations, a subset of data (in time) is reserved from the training of any learnable parameters, and then the performance of the model is assessed against the reserved model. To assess the accuracy of the consolidated sequential model at predicting spectral accuracy, spatial predictions should be constructed in terms of predicted (not measured) temporal values at a subset S of the total receivers R (such that S ⊂ R ), and then tested against the corresponding measured temporal values at R ⧵ S . In doing so, the relative accuracy and performance of different sequential spatio-temporal prediction frameworks can be quantitatively assessed and compared.

Data imputation
Treating off-sensor predictions has motivated a number of recent works in data imputation, in which the geospatial nature of data is used to provide contextual information that can be used to predict behaviours away from sensors. Techniques like nearest-neighbour or linear interpolation have successfully produced data reconstructions without introducing any trainable parameters (Batista and Monard 2003;Zhang 2012). While this significantly reduces the computational burden, on a design level this inflexibility limits the level of response that can be performed to predict domain specific features. To improve upon the accuracy of such approaches, trainable ensembles of interpolation has been shown to improve model accuracy across geospatial domains (Oke et al. 2002). This process has been shown to be an accurate framework for extending the geospatial coverage of weather and climate models (Sanderson et al. 2015). Another similar approach can be found in Kriging, due to its ability to better capture both dependencies and uncertainties across spatial data (Yang et al. 2018).
Drawing from statistical tools, data fusion has been successfully applied to predict the likelihood of occupancy (Eltom et al. 2016). Constructed coalition games (Zhou et al. 2017) and Bayesian non-parametric frameworks (Saad et al. 2012) have also been proposed for managing data fusion under a spectrum occupancy framework. The latter of these tracks the signal amplitude of primary users at each sensor node through a nonlinear particle filter, in order to generate a measure of correlated perception.
Monte Carlo methods (Ni et al. 2005) and Probabilistic Principal Component Analysis (Qu et al. 2009) can also be used to reconstruct the spatiotemporal data by assuming that there is some latent distribution that all points are drawn from. However, recently it has been shown that tensor-completion based mechanisms outperform matrix based mechanisms (Signoretto et al. 2011), leading to several applications of this process to spatiotemporal traffic prediction tasks (Chen et al. 2019). While tensor-completion approaches have been successfully applied to spatiotemporal data imputation, they are best suited to low levels of missing data. As the level of missing data is increased, it has been shown that Loess based Season and Trend Decomposition (Chaudhry et al. 2019) produces the most accurate data reconstruction (Li et al. 2020).

Neural processes
Neural Processes (Garnelo et al. 2018b) are a novel approach that attempts to bridge the space between deterministic neural networks and Bayesian methods (Garnelo et al. 2018a). In the context of functional approximation and regression, while deep neural networks have been shown to be highly accurate, they require access to a large training corpus for effective training, which needs to be repeated for each problem domain. While this can be somewhat mitigated by employing meta-learning, the computational cost is still significant (Erhan et al. 2009). In contrast, Gaussian Processes, while still computationally expensive, can leverage a specified prior distribution to reduce the inference cost. However, this of course relies upon the ability to construct an appropriate and accurate prior, which is a non-trivial task in and of itself (Ghosal et al. 2006). Neural Processes attempt to construct a neural approximation of Gaussian Process for learning a posterior distribution. By doing so, accurate, probabilistic predictions can be constructed even when exposed to only a limited training corpus. One weakness of Neural Processes is their tendency to under fit data, although this has been broadly addressed by subsequent work, which incorporated attention mechanisms into the deterministic and probabilistic predictions (Kim et al. 2019). Inspired by Neural Processes, (De Brouwer et al. 2019) have developed a method that can incorporate sparse, sporadically observed measurements to update predictions. Of particular interest to geospatial modelling of spectral occupancy is that the correlations between sequences are also learned, so an observation in one time series can update the prediction of all other series of interest, in a manner which would be of particular interest in either off-sensor predictions, or predictions for sensors which are not able to record a measurement due to engineering or communications constraints.

Supporting technologies
The deployment of any successful system for spectral occupancy prediction requires not only the sensors and prediction techniques, but a framework for managing the deployment, processing and management of the data collected by these systems. Another factor of concern is that the successful development of any such system requires data to test and assess prediction and deployment frameworks. This would necessitate an unfeasible delay between the investment in physical deployment, and the actual deployment of the full spectral occupancy system, and may lead to an improper allocation of deployed assets.
To circumvent this, we propose that the development of such systems should be supported by the development of synthetic data sets that accurately replicate real world dynamics. In doing so, all aspects of the measurement and prediction framework can be tested prior to the investment in any physical capital, which should significantly decrease the risks of such a program. As such this chapter will outline the tools and techniques required to support the deployment of predictive systems for spectral occupancy.

Deployment, processing, and data management
As was alluded to in Sect. 3, the ability to manage and process data to inform predictions of spectral occupancy is of crucial importance. Latencies in the delivery of data are of particular concern for both data processing, due to concerns around data consistency and potential overwrites, and the application of predictive techniques. These issues are particularly acute for Machine Learning algorithms, which are typically designed and trained under scenarios where information flows are predictable and consistently synchronous. In an attempt to resolve this concern, Delay Tolerant Machine Learning has recently been proposed, with authors focusing on preventing update issues between when information is provided and gradients are updated (Sun 2016), and the development of schemes that are robust to delays generated by scattered, asynchronous data sources (Mishchenko et al. 2018). However, while such an approach removes concerns relating to data consistency, and inherent latencies, it may also increase the cost of developing and deploying the sensor network, and introduce complexities in communicating domain information outside the domain sampled by an individual sensor.
A complimentary approach may be found in down-sampling data, either by way of low-pass filtering strategies like the Exponential Moving Average (Haque et al. 2015), or employing subsets of more complex decompositions (Wen et al. 2019). In doing so, stored data volumes will decrease, but the down-sampling process itself will likely introduce additional computational costs, and potentially prevent the ability to develop real-time predictions, especially under certain predictive time scales.
Under either an offline or online framework, specialised hardware may be required to efficiently execute Machine Learning algorithms. These computational considerations are particularly important as machine learning typically has a computational cost that is orders of magnitude higher than classical signal processing techniques. As such, any utilisation of Machine Learning must be considered in context of the scalability of computational resources.
The ubiquity of CPU and GPU based systems has made them de rigueur for Machine Learning applications. Of these, GPU based implementations can perform machine learning tasks an order of magnitude faster than the equivalent CPU, at the cost of requiring 8 times as much energy (Zhao et al. 2017a). However, more specialised hardware has begun to gain prominence, with particular focus being placed upon Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) due to their ability to improve computational performance while minimising power consumption.
FPGAs, as their name suggests, can be used in multiple contexts, are re-programmable, and can even have parts of the chip reconfigured while other components are under load. In contrast, ASICs exhibit permanent circuitry, and are designed to be operated within a specific context. The design specificity allows ASICs to be more energy efficient than equivalent FPGA chips, however, this comes at a higher cost due to the inherent difficulties in developing and utilising a chip that has been produced for one specific purpose.
Relative to CPUs, (Zhao et al. 2017a) found that FPGAs could be 2.5 times more powerful than CPUs for CNN inference using binary arithmetic, while simultaneously performing 50 times more calculations per watt of energy. This power can be explained by the ability of FPGAs to exploit pipeline parallelism, which allows for batches of multiple data-points to be considered at once, rather than individually (Lacey et al. 2016). Further advances in computational efficiency have been seen in CNNs (Zhang et al. 2016a), ResNets (Nurvitadhi et al. 2017), LSTMs , and Random Forrest implementations (Nakahara et al. 2017) when employing an optimised pipeline, which in turn suggests that it may be possible to unlock further performance from FPGAs by considering device specific optimisation. The power of such optimisations is underscored in the work of (Wang et al. 2019), who produced results for LSTM time series predictions using an optimised FPGA that were between 20 and 50% better than previous work by (Ferreira and Fonseca 2016). These advantages may be further enhanced in the case of deep neural network architectures, although performance gains are not guaranteed, as care needs to be taken to manage the power budget and on-chip memory of FPGAs (Han et al. 2017). Recent work has demonstrated that careful optimisation of a single-FPGA design can yield real-time modulation classification with 8 uS latency and 488k classification throughput (Tridgell et al. 2020).
Moving from FPGAs to ASIC based implementations can in turn introduce additional computational savings, at the expense of the design flexibility of FPGAs. ASICs have been shown to outperform CPU, GPU, and FPGA implementations in both both deep neural networks (Srinivasan et al. 2019) and Gated Recurrent Units (a variant of RNN) (Nurvitadhi et al. 2016). These performance differences can be attributed to ASICs inherently higher compute density and efficiencies, however it must be noted that newer FPGAs will likely close the efficiency gap. ASICs have also been used to accelerate solutions to dense (Chen et al. 2014) and sparse deep neural networks (Albericio et al. 2016), with such accelerators being able to potentially realise significant speedups, while decreasing energy requirements (Zhang et al. 2016c).

Simulating spectral occupancy
While it is possible to collect the requisite data for predicting Spectral Occupancy, doing so at the scales required to assess the accuracy of predictive regimes would require a significant investment. However, this task is only one part of the overall design process, as once the data is collected tools would still be required to manage the prediction of future states of the spectral environment. Doing so would leave the measurement assets stranded with limited utility, until such time that the remainder of the design process had been completed. An alternative development approach could be to simulate the required data, and use it to inform the development of both the sensor networks, and test prediction regimes.
While such an approach would significantly decrease the initial investment required for such a scheme, it would still present significant hurdles. Any such model must take into account not just the behaviour of devices, but also human mobility. These interactions can quickly become highly complex, as human behaviour can be influenced by the interaction between devices and the spectral environment. As such, this section will cover the major facets of this predictive task, and outline tools that can be used in order to support the development of a comprehensive framework to support Dynamic Spectrum Allocation.

Empirical signal propagation models
Urban modelling of signal propagation can be traced back to, at least, the work of (Turin et al. 1972), who conducted experimental and statistical work on multi-path signal propagation of analogue and digital signals. Statistical models of urban environments typically classify the environment in terms of broad metrics that include the topographic variation, vegetation density, building density and height distribution, and density of regions of open area and water (Ibrahim and Parsons 1983).
Of the extant empirical models, the most commonly employed are Okumura-Hata (Falli 1988), Walfisch-Ikegami (Low 1992a), (Erceg et al. 2001), Longley-Rice (Longley andRice 1968) models, andLog-Normal (Erceg et al. 1999;Walden and Rowsell 2005;Chrysikos and Kotsopoulos 2013) the features of which are summarised in Table 1, with the latter coming into prominence recently due to its ease of fitting. Of these models, Longley-Rice is unique in that it relies less upon experimental measurements, and more on the incorporation of topographic maps, which are used empirically to estimate the geographic influence on large scale signal attenuation.
While the Okumara-Hata, Walfisch-Ikegami and SUI models are able to model signal propagation, they have limited applicability for considering signals in the modern, urban context, due to the spatial length scales involved and the need to model transmitters located below the height of local buildings, as the observed error is inversely proportional to the ratio between the transmitter height and the building height, so that the error increases as the transmitter height is decreased to the building height (or below the building height) (Low 1992b), as the influence of diffraction around building edges and wall reflections becomes more dominant. These factors are further complicated by the density of transmitters in a modern urban environment.

Physics-based signal propagation models
To address these concerns, some simulations have resorted to reduced-order, physics inspired models, including simple radial propagation (where the signal is presumed to exist within a fixed radius R about the source), free-space and two-ray models. However, while such models are computationally efficient, they fail to account for the influence of local environmental conditions, and as such fail to capture physically realistic channel variability.
To capture the influence of the built environment, Ray-tracing techniques were introduced to model the propagation of individual signals. Such models were initially restricted to the 2D plane (Valenzuela 1993), but have since been extended to 3D environments (Kim et al. 1999). Several authors have implemented ray-tracing models for vehicle-to-vehicle transmission under urban, suburban and rural conditions (Maurer et al. 2004), with (Pilosu et al. 2011) incorporating additional pre-processing in order to reduce the computational cost of calculating the attenuation between connected regions of interest. However, while these approaches improve the fidelity of the simulation, they still are fundamentally limited due to the nonlinear growth in computational complexity with the number of agents, and the number of buildings within the environment. As such, these models become computationally intractable as the domain of interest increases to scales of practical relevance. To address this, frameworks have been developed for constructing computationally efficient propagation models within dense urban environments, through what is known as Beam-tracing (Sridhara and Bohacek 2007). Such an approach can be considered a computationally efficient variant of Ray-tracing, that still allows for incorporating transmission, diffraction, reflection, scattering, channel-gain, and delay-spread within both interior and exterior environments. Of course, such a model requires knowledge of not just building locations and sizing, but also their interior and exterior materials, and while it is more computationally efficient than Ray-tracing, it still shares the same fundamental computational limitations as the size of the simulation region increases.
When considering large scale simulations, path loss propagation models have typically been constructed using simplified representations of the local geometry. Combining these geometric abstractions with pre-calculated statistical measures can be used to minimise the computational cost of the ensuing calculations. An example of such a formulation can be found in the work of (Cheng et al. 2007), who simulated suburban environments using a Dual-Slope Log Distance formulation for long distance signal propagation, and a Nakagami model (Proakis and Salehi 2001) for small-scale signals. These Nakagami models leverage the Nakagami-m probability distribution (Nakagami 1960) in order to approximate the fading and spatial attenuation of wireless signals (Alouini and Goldsmith 2000). A similar Nakagami implementation was also considered by (Khan et al. 2009) for vehicular ad-hoc networks. This dual scale approach is becoming increasingly popular as a method to reduce the computational complexity of solving signal propagation, and has lead to systems that scale linearly with the number of agents (Viriyasitavat et al. 2015). A broad taxonomy of these techniques is outlined in Fig. 5.   Fig. 5 Classification of spatial propagation models for wireless spectrum While the geographic scale of the city-scale environments is the primary driver of signal attenuation in urban environments, multiple other factors influence spectral propagation. The nature of built topography is both a secondary driver of attenuation-as well as being a primary source of both shadowing and fading-with other factors including weather effects, especially rain, cloud cover, and smog all influencing signal attenuation through the introduction of additional scattering and diffraction (Jaruwatanadilok et al. 2004). Under ideal circumstances, all of these factors would be accounted for in a model, however there is presently no literature covering a comprehensive signals based model for the built environment that incorporates weather conditions, and as such, the accuracy and applicability of any model must be assessed in the context of the local weather conditions.

Device mobility
Beyond the influence of the built and natural environment, a realistic simulacrum of the spectral environment must also take into account device mobility, which in turn is a product of human behaviour and vehicular patterns. While some abstracted, low-resolution data sets do exist (Barlacchi et al. 2015), there is a broad absence of accurate real world data sets that accurately represent large scale spectral-spatio-temporal behaviour at high resolution. This in turn limits the ability for testing techniques for measuring and predicting spectral occupancy requires models to be constructed. Moreover, constructing a representative simulacra of such data is also a non-trivial task, due to the inherently complex and stochastic nature of human behaviour.
The most viable conceptual framework for attempting to construct such a model can be found within Agent Based Modelling (ABM), which attempts to construct rich, representative dynamics not by prescriptive modelling, but rather by establishing motivations for a suite of simulated agents, and then establishing the rules for how these agents can behave and interact. By then iteratively allowing the collective set of agents to evolve to optimise their individual behaviour, rich models full of emergent behaviour can transpire, that have been shown to reflect real world dynamics.
Computationally, the design of an ABM is a competitive process between the number of resolved agents and the spatial and temporal time scales considered, against the extant computational resources. Simulations over limited spatial domains, like public events (Batty et al. 2003) and indoor movement (Crooks et al. 2015) have been resolved to second-level time scales, whereas city-scale simulations are typically modelled over fundamental time scales at the minute (Crooks and Hailegiorgis 2014), hour (Groff 2007), day (Heppenstall et al. 2006), or even year long periods (Haase et al. 2010). By restricting the underlying scales to the smallest individual level of interest, additional resources are freed up for the generation of complexity in the agent dynamics.
Such models can be informed by integrating geographic and population level data, in order to refine the development of the models, and the behaviour of the individual agents. A common source of such data is census information, which can be accessed at resolutions that range from population wide to sub-suburb resolutions. This can also be combined with activity and time-use surveys, such as the UK Time Use Survey (Sullivan and Gershuny 2018) and the American Time Use Survey (ATUS) (Spencer and Aultman-Hall 2019) to inform features like the distribution of work start and end times, the likelihood of people going shopping on work days, and the amount of time spent on key tasks every day. Incorporating such data ensures that the fundamentals of the model are tied in with real world behaviours, without being overly prescriptive.
Further, more granular data can be found in physical footfall surveys, or geolocated and time stamped data from social media networks. Such Geographic Information System (GIS) data sets are particularly interesting, due to their ability to be collected in near-continuous time (Evans 2012). GAMA (GIS & Agent Based Modelling Architecture) and MOSAIIC are computational frameworks for merging GIS data with an ABM approach (Taillandier et al. 2019), that has been used for city-scale modelling. More broadly, a Dynamic Data Assimilation (DDA) framework can be employed to refine agent based models in (Ward et al. 2016); and for calibrating dynamical system models of urban crime (Lloyd et al. 2016) and traffic (Work et al. 2008). The fusion of ABMs and DDA has occurred using both Ensemble Kalman Filters (Ward et al. 2016), for which the ensemble is based upon the stochastic evolution of the underlying ABM and the updated data; or Deep Learning Networks (van der Hoog 2017). These data sets can, and should, be used not just for model development, but also to assess the veracity of the ABM as a proxy for real world dynamics. This can be achieved by conducting an exploratory data analysis on validation data sets, that are distinct from the data used for training.
From a computational perspective, scalability is the primary constraint upon the growth of ABMs, due to the nonlinear growth in interactions, and ensuing computational cost (Bazzan and Klügl 2013). As such, city-scale ABMs tend to be built around a small suite of well understood tools, of which MATSim is one of the most popular. This code-base allows for multi-agent simulation of transportation for the purposes of traffic flow management, urban planning, and developing evacuation scenarios, that has been used to simulate more than 50,000 distinct agents at a time (Horni et al. 2016). Multiple extensions of MATSim for city-scale modelling exist, with Tangramob (Castagnari et al. 2018) being one of the more interesting due to its integration of Smart Mobility into the transit model.
For resolving scenarios that rely upon realistic traffic dynamics and interactions, SUMO (Krajzewicz et al. 2002) has proven particularly popular, both for direct simulations, and to manage small-scale traffic interactions within a larger ABM. While traditionally constructed as a 2-dimensional model (Lopez et al. 2018), recent work has shown its viability for contending with 3-dimensional scenarios (Codeca and Härri 2018). SUMO can be extended further through interaction with JADE, a Java development framework for constructing smart networks (Azevedo et al. 2016). A generalised summary of ABMs and their associated tools can be found in Table 2.

Device use patterns
Of course, for considering spectral occupancy human dynamics models need to be augmented by models for the propagation and use of spectrum contributing devices. Agents within an ABM may contribute to the spectral environment through both static devices, such as personal routers or desktop computers, and dynamic devices like mobile phones and laptops, with multiple devices potentially being attributable to individual agents. These devices will also inherently contribute differently to the spectral environment, based upon differential use cases and power levels.
Simulating the manner through which devices access the spectral environment varies, depending upon the device, and the context of its use. For some parts of the spectrum, like those used by television transmission towers, such access is a constant transmission covering a singular component of the frequency band. For individual device level modelling, probabilistic models have frequently been used to express access demand. Of these, the most commonly utilised approach can be found in memoryless Poisson models (Huang et al. 2012), as well as Pareto distribution processes (Neame et al. 1999), Bernoulli processes (Haghani et al. 2007), and Weibull distributions ). These models typically focus upon the distribution of the received time of transmissions or data packets, although they can also be used in the context of aggregated methods, or for specific use contexts.
An alternate approach can be found in Hidden Markov models, which attempt to construct rich simulations of spectral behaviour by considering probabilistic transitions between a finite number of different activity states. A common approach has been to model the transition between on and off states for individual devices (Lee and Seeling 2014). Markov Modulated Fluid models also hold potential, as they have been shown to produce rich dynamics with limited computational cost (Ye et al. 2014). This is made possible by considering events as part of a continuous process in terms of finite flow rates.
The signals transmitted by these devices do not exist in isolation, but rather can be influenced and attenuated by the built and natural environments, and by weather conditions. Such attenuation can be spatially varying, temporally varying, or both. Finally, with the growth of internet enabled devices, the spectral environment can potentially influence the decisions and actions of agents as they move through the environmentfor example, by disrupting GPS signals for routing purposes, or interfering with traffic signalling systems. This in turn can create potential behavioural positive or negative feedback-loops, which in the contexts of an ABM could manifest as different forms of emergent behaviour. Another consideration is that the simulation of device level spectral access behaviour must also be flexible enough to account for behavioural changes that would occur under Dynamic Spectrum Allocation, the ramifications of which are currently a topic of active research (Jiang et al. 2016). 1 3

Challenges and future work
While this work has attempted to identify viable research and development pathways for supporting the task of dynamic spectrum allocation, a number of key challenges still remain. To support future work, we have identified some key challenges, that would be well suited for further consideration.

Data features and management
From a data perspective, broad questions still remain about the qualitative and quantitative characteristics of true spectral data at different spatial, temporal and spectral resolutions. Understanding these features has fundamental implications for simulating, measuring, and predicting spectral data. Extracting these features is of particular importance for choosing appropriate predictive frameworks, as it is unlikely that any one single predictive framework would be optimal for all potential signal characteristics. Even the ability to identify such signal characteristics may prove tricky, in the face of potential legal ramifications of conducting large scale signals measurement, storage, and post-processing. However, if that level of data collection is achievable, then one promising avenue for extracting such features algorithmically is motif identification (Alaee et al. 2020), which has been successfully applied to characterise features of time series data. However, additional work would be required to extend such works to the spectral and spatiotemporal domains. The ability for qualitative and quantitative data features to be characterised could then in turn be used as the basis of either a reference data-set or data set generator that exhibits the characteristics of true signals data that reflects that built and natural environment. The construction of such data would allow for reliable, repeatedly experiments to be performed across multiple works.
It is highly likely that any predictive system, especially one in which sensors communicate with each other would face bottlenecks stemming from the size of the underlying data. As such, it is highly likely that any predictions would be built upon down-sampled or compressed data measurements. This process could be achieved through filtering mechanisms, smoothing, or sub-sampling mechanisms, or in an adaptive fashion by incorporating encoder-style neural networks. Of these, filtering mechanisms have a long established record for facilitating extracting relevant features in temporal data (Baxter and King 1999;Wen et al. 2019).
A further complication on this front is the potential for the recorded data to contain personally identifying features, which may introduce legal complications. This would necessitate a process of data anonymisation, which could draw upon Differential Privacy (Dwork et al. 2006) to ensure that private and identifying information about individual users could not be revealed. However additional research and care would be required to ensure that the temporally continuous nature of the underlying data does not give rise to opportunities for re-identification of individual signal sources (Culnane et al. 2019).
Any consideration of data management, aggregation, anonymisation, and down-sampling must, of course, be assessed in the context of how these changes impact upon the predictability of the underlying signals. While many of these approaches should reduce the inherent variability of the underlying signals, making them easier to predict, these changes may well be offset by the added data variance of any anonymisation process. Moreover, there is the potential for these changes to introduce systemic biases in the ability to predict and understand the state of the spectral environment, which also needs to be considered in any subsequent research.

Adversarial interactions
One important consideration when considering deployed systems is the potential for adversarial interactions. This is especially true given the scale and socioeconomic value of the spectral environment, which would introduce incentives for malicious agents to attempt to attack any deployed systems. Such adversarial behaviours have been repeatedly studied within an image-based context (Biggio et al. 2013), with recent works also beginning to consider adversarial sensitivity in a time series context (Karim et al. 2020). As such open questions still remain about both the potential to exploit large scale time-series models, and the availability of mechanisms to deter such adversarial behaviour. Of course, the presence of adversarial attacks also opens the potential for supporting dynamic spectrum allocation with appropriate adversarial defences, or certifications against adversarial behaviour (Lecuyer et al. 2019;Cullen et al. 2022Cullen et al. , 2023).

Anomaly detection and fault tolerance
While anomaly detection has been well studied-including relevant works relating to timeseries (Laptev et al. 2015)-there is still scope for further work in developing systems that are reactive and responsive to anomalous behaviours. In a dynamic spectrum allocation context, an anomaly could present from a component in the sensor network failing; or it could be the product of a large scale grouping of people, like at a sporting event. As such there is a need for studying how tools and processes relating to both anomaly detection and fault tolerant prediction can be applied to generate predictions that are accurate in the face of both genuine faults and outlier events.

Conclusion
Growth of demand for wireless data, and the commensurate limitations on the amount of spectral resources that can be allocated, are quickly growing to be significant problems worldwide. Servicing that demand necessitates a change in the manner through which devices access and understand the spectral environment. The high dimensionality of the problem space, and the difficulty in accessing the true state makes this problem particularly well suited for exploring through machine learning techniques.
To support the engineering challenge of building deployable systems, a range of prediction techniques has been reviewed, and assessed for their applicability to different facets of the prediction task. This has been performed with the aim of constructing accurate predictions of spectral occupancy across large, heterogeneous geographic environments. This included a study of state of the art machine learning techniques that could be deployed to facilitate this task, and how a distributed sensing framework could be deployed to achieve this. Also included in this review was a consideration of how realistic data could be constructed, as a proxy to measuring the true spectral environment, in order to test how various techniques performed prior to implementation.
While the presented techniques have the potential to inform spectral management and prediction, the deployment of such a system would also need to consider the economics of a dynamic spectrum environment. Such work would need to consider not just how to price spectrum access, but broader questions of the relative merits of cooperative environments or competitive market mechanisms, and how such environments would affect the motivations of adversarial actors. We believe that resolving such questions is the next frontier for developing and implementing a cooperative, Dynamic Spectrum Allocation environment that will scale with future spectral demand.