Abstract
Graph neural networks (GNNs) haven proven to be an indispensable approach in modeling complex data, in particular spatial temporal data, e.g., relating to sensor data given as time series with according spatial information. Although GNNs provide powerful modeling capabilities on such kind of data, they require adequate input data in terms of both signal and the underlying graph structures. However, typically the according graphs are not automatically available or even predefined, such that typically an ad hoc graph representation needs to be constructed. However, often the construction of the underlying graph structure is given insufficient attention. Therefore, this paper performs an indepth analysis of several methods for constructing graphs from a set of sensors attributed with spatial information, i.e., geographical coordinates, or using their respective attached signal data. We apply a diverse set of standard methods for estimating groups and similarities between graph nodes as locationbased as well as signaldriven approaches on multiple benchmark datasets for evaluation and assessment. Here, for both areas, we specifically include distancebased, clusteringbased, as well as correlationbased approaches for estimating the relationships between nodes for subsequent graph construction. In addition, we consider two different GNN approaches, i.e., regression and forecasting in order to enable a broader experimental assessment. Typically, no predefined graph is given, such that (ad hoc) graph creation is necessary. Here, our results indicate the criticality of factoring in the crucial step of graph construction into GNNbased research on spatial temporal data. Overall, in our experimentation no single approach for graph construction emerged as a clear winner. However, in our analysis we are able to provide specific indications based on the obtained results, for a specific class of methods. Collectively, the findings highlight the need for researchers to carefully consider graph construction when employing GNNs in the analysis of spatial temporal data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The ubiquity of sensors [1] has led to a vast increase in available data, enabling comprehensive data collection on sensor networks [2], ultimately leading to complex spatiotemporal data. In such cases, sequential (e.g., time series) information is attributed with location information, forming relationships in the data.
For analyzing these relationships, graph neural networks (GNNs) provide sophisticated modeling options. For instance, in contrast to traditional methods for time series analysis such as autoregressive integrated moving average (ARIMA), which have strong assumptions with respect to a stationary process [3], GNNs enable the modeling of nonlinear dependencies and complex structure, also including spatial information. Thus, GNNs as specialized deep learning models can better capture the dynamics of time series data.
GNNs are a type of neural network that operate on graph data structures, where nodes represent data points and edges represent relationships between them. GNNs are capable of learning representations of graphs, which can capture complex patterns of data and make predictions based on those patterns. This has made them a promising tool for time series analysis, as they can naturally handle irregularly spaced time series data, capture longrange dependencies, and handle missing data.
One of the key advantages of GNNs over traditional neural networks is their ability to incorporate domain knowledge into the model through the graph structure. Sometimes this information is already given [4], such as a graph representing a metro network. Here, the connections are part of the initial problem formulation, and therefore it is clear which objects or data points are neighbors.
However, not every situation is so clear. Imagine a set of sensors that are placed in a city to measure humidity. The problem here is deciding which sensors should have an edge [5,6,7]. Should only directly neighboring sensors have an edge? Or should all sensors in the city have an edge with each other? Connecting all edges seems like a reasonable choice; however, the graph of even a relatively small set of nodes will quickly grow exponentially in size, making it computationally heavy. Therefore, defining a more sparse but still meaningful edge set is still a subject of discussion (see Fig. 1 for a visualization of the problem).
The unfortunate truth is that data (associated with the nodes of a graph, whether location data or time series data) typically exhibit one or more of the following characteristics, making graph construction difficult in most cases [4]. The information may, for instance, contain many dimensions, which would subject it to the curse of dimensionality. There is also a chance that noise will degrade the data quality. This is particularly important if the graph must be built from the signal data alone with no knowledge of the node locations.
Therefore, in this paper, we explore various techniques to create graphs from a set of points in space, and test them with GNNs on time series analysis tasks. These techniques construct a graph from either location information or from sensor data itself, since sometimes the underlying phenomenon could be more informative than the geographical position of the nodes in a network itself. The datasets used in this paper are examples of spatiotemporal data [8]. They are called spatial temporal, as they “live” in the ndimensional space formed by their spatial, temporal, and (multi)attribute dimensions [9]. Some common examples of this kind of data are daytoday temperature measurements in a city over an extended period [10], or a collection of seismic signals from distributed sensor networks [11], which are of complex nature. Our contributions are as follows:

1.
We examine and evaluate the properties of the graphs produced by each construction algorithm.

2.
We perform a variability analysis to investigate where the graph was most crucial.

3.
We provide general insights into which techniques make most sense to use (bestcase analysis), on top of only providing the best scoring combinations that might only work in a narrow usecase.
The rest of the paper is structured as follows: related work is discussed in Sect. 2. Afterward, the general goal, the used algorithms, dataset and models are described in Sect. 3. The results Sect. 4 describes and discusses the findings. Finally, Sect. 5 concludes with a summary.
2 Background and related work
In this section, we first formally introduce basic notions about graphs and related concepts. Then, we provide an introduction to the field of GNNs, where we also discuss related approaches and architectures that have been proposed in this field. Finally, we discuss approaches for graph construction.
2.1 Graphs
A graph \(G=(V,E)\) consist of a set of nodes/vertices V and a set of edges \(E \subseteq V \times V\) (the connections between the nodes). A common way to represent a graph is with an adjacency matrix, which is a \(n \times n\) matrix, with \(n = V\). The (i, j) entry of the matrix, \(A_{i,j}\), indicates whether there is a edge between node i and node j. The weight of an edge can be a binary value (0 or 1) for an unweighted graph, or a real number for a weighted graph, which can be represented by the respective entry in the adjacency matrix.
Edges can be directed and undirected. Directed edges contain information with a direction, e.g., a road can be a oneway street. Undirected edges contain no source, of which an example could be the geographical distance between two weather stations. Formally, a directed edge \(e \in E\) between nodes \(u \in V, v \in V\) is represented by an ordered tuple (u, v) whereas an undirected edge is given by the unordered variant \(\{u, v\}\). The number of neighbors of a node v is known as the degree and is denoted by \(d_{i}=\sum _j A_{ij}\).
Nodes and edges can also have a feature vector \(a=(a_{1}, a_{2}, \ldots , a_{n})\). These features could be informative about the characteristics of a node/edge. For example, node features could contain traffic speed measurements, and edge features could indicate the type of edge that exists between two nodes (which could be more than 1). Whenever a graph contains features (i.e., feature vectors) assigned to nodes and/or edges, we call it an attributed graph. Many graphbased methods rely on these attributes, e.g., GNNs.
2.2 Graph neural networks
The first GNNs were proposed in the early 2000s, but they were limited in their expressive power and scalability [12]. More recently, a new generation of GNNs has emerged that uses neural network layers to compute the message passing function. Some popular architectures include the graph convolutional network (GCN) [13], graph attention networks (GAT) [14] and GraphSAGE [15]. The choice of which method to use will depend on the specific characteristics of the data and the task at hand. For example, GraphSAGE was designed to be scalable with the use of subsampling neighbors.
Essentially, the basic idea behind GNNs is to use message passing to propagate information between nodes in the graph. Each node receives messages from its neighbors, updates its own representation, and passes messages back to its neighbors. The messages are typically computed based on the node and edge features, and the updated node representations are aggregated to form a graphlevel embedding. More specifically, the information that is related to node v stored in the feature vector \({\textbf{h}}_{v}\) is updated by combining the features of its neighbors. After l iterations, the vector \({\textbf{h}}_v^l\) will embed all structural information and node features in the lstep neighborhood of node v. The output of the lth layer in a GNN is then typically defined as:
where \(\text {A}^{(l)}\) is a function that aggregates the node features from the neighborhood of v, and \(\text {C}^{(l)}\) is a function that combines the own features with neighboring nodes [16]. Afterward, a readout R function transforms the feature vectors into the final output:
which after enough iterations can be used for various downstream tasks, such as time series analysis, where they demonstrated promising results. For example, GNNs have been successfully used for traffic forecasting [17], air quality data imputation [18], seismic analysis [19], optimal sensor placement analysis [20], anomaly detection in soil moisture data from satellites [21] and general multivariate time series classification [22].
However, one of the key challenges in using GNNs is how to construct the graph itself. Previous research showed the drastic influence of the decision of neighborhood aggregating techniques for graph related tasks, such as clustering [23]. Also, the problem of constructing the graph is often illposed, since there is no ground truth available. Therefore, there are various techniques for constructing the graph, each with its own strengths and weaknesses.
2.3 Graph construction
There are a variety of methods for constructing graphs depending on information assigned to the nodes, i.e., for constructing the appropriate connection between those nodes, e.g., [24,25,26].
Typically, the methods to connect nodes in Ndimensional space are split into two groups, depending on which information is used. The first group consists of techniques that use the location of points, i. e., the spatial information, to construct the graph. The second group consists of techniques that learn the graph from the signal data of the respective nodes. Therefore, essentially both techniques typically use distance (or similarity) estimations for deriving edges between nodes, however, they do this using different perspectives on the given data—either taking spatial or temporal information into account. This is depicted in Fig. 1.
2.3.1 Locationbased techniques
Locationbased techniques use the spatial information (latitude and longitude) of the nodes for deriving edges between pairs of nodes. The applied algorithms can be characterized by the number of parameters needed [4]. There are parameterfree techniques, such as relative neighborhood and Gabriel graphs, which create graphs with relatively few edges. However, in many cases it is desirable to have a higher level of interconnectivity between localities so that they can both receive input from and contribute output to nearby areas. Techniques that have only one parameter are KNN or qneighborhood graphs, which merely includes edges if samples are within distance of q away from each other [27]. Lastly, there are also techniques that require 2+ parameters, such as DBSCAN which needs parameters MinPoints and eps to connect nodes. Out of these techniques of the first group (that use location data), the KNN graph remains the more common approach since it is more adaptive to larger scale graphs and larger datasets, while an improper threshold value in the qneighborhood graph could result in disconnected components or subgraphs in the dataset, or even isolated nodes.
There are also locationbased techniques that require a precalculated distance matrix, in order to apply a posthoc processing technique to create the graph. A famous example of such a technique is a thresholded Gaussian kernel [28, 29]. Here, the edge weight between two connected samples \(v_{i}\) and \(v_{j}\) is computed as:
where the function dist(\(v_{i},v_{j}\)) evaluates the distance between two nodes and \(\sigma \) is the kernel bandwidth parameter, where often the standard deviation is used.
2.3.2 Signaldriven techniques
The second group of graph construction techniques relies on the signal information assigned to the nodes of the graph to infer the graph structure, i. e., the set of edges. The most clear example involves calculating the correlation between the signals of nodes in a graph, as performed in [30], in order to create edges between nodes for which the correlation between the respective time series is strong. The correlationbased method uses correlation coefficients to measure the strength of the relationship between two time series. The graph is constructed by thresholding the correlation coefficients, where edges are only included if the correlation coefficient is above a certain threshold.
Notice that many of the already discussed techniques focus on how to sparsify a graph, which is very important since sparse graphs result in more scalability and easier interpretation [31]. Especially the scalability is an important consideration when dealing with large graphs, and sparsity can play a critical role in enabling GNNs to scale to such graphs, because sparse graphs can reduce the number of computations required by GNNs.
There is also a new class of techniques that can automatically learn the graph structure with, e.g., a latent correlation layer [17] or using the node embeddings [32]. In addition, Graph Attention Networks (GATs) have been developed that aim to learn additional edge weights for an initial binary adjacency matrix as input, which benefit the subsequent message passing operations [14]. However, these techniques either modify or learn the graph during training, and therefore cannot be seen as standalone graph construction techniques. Therefore, these techniques require very specific model architectures, making them hard to compare to other graph construction techniques.
In addition, previous work often considers the problem as a graphdenoising [33, 34] or graphcompletion problem [35]. In the former, a graph might be incomplete, and the missing edges should be calculated. In the latter, the graph might contain too many edges and the noise edges should be removed. These usecases are examples of semisupervised learning, which is not the focus of this work. In our context, the underlying graph is never available, and should be inferred completely from the information assigned to the nodes. Therefore, this either relates to the spatial information about node locations (locationbased methods) or the time series data of a node (signaldriven techniques). Thus, in contrast to other approaches, this rules out all the aforementioned automatic techniques, which need at least some prior information (existing edges) to work with.
2.3.3 Summary
Overall, despite the growing interest of GNNs and time series analysis, there is limited research on the comparative performance of each technique to construct the graph. However, this is a crucial aspect, since the accuracy and effectiveness of GNNs heavily rely on the quality of the graph structure [5]. Therefore, insights and guidelines about potential effective graph construction methods and approaches are specifically relevant for an appropriate application of GNNs, in particular in the context of complex spatiotemporal data. This is the motivation for the analysis done in this paper. In particular, we compare various techniques to construct graphs from either location data or signal data and estimate their respective performance and impact.
Another potential reason for the lack of research on this topic is the relative newness of GNNs and their application in time series analysis. While GNNs have shown promising results in various domains, their performance in time series analysis is still an active area of research.
Our results provide insights into the strengths and weaknesses of different techniques for constructing graphs for time series data, and help guide the selection of appropriate methods for different applications. In the next section, we introduce each technique and explain their parameters and (dis)advantages.
3 Methods
In this section, we first introduce and explore a large set of techniques to generate graphs from geographically grounded nodes with assigned feature vectors. For that, we propose either locationbased or signaldriven techniques. Specifically, each will be discussed individually in terms of algorithm formalization as well as their advantages and disadvantages (e.g., complexity).
3.1 General goal
The task in this paper is to learn a graph \(G=(V,E)\) where V are the nodes (objects/sensors) and E are the edges (connections between them) from locationdata or signal data itself [6, 7, 36]. While this task shares a conceptual connection with related problems like graph generation (which focuses on creating unique and varied graphs [37, 38]), and graph completion (where missing edges have to be inferred [35]), in graph construction, the entire unknown graph is to be constructed according to a given approach. In our context, it is supposed to be learned from either location or signal data from the set of nodes.
Graph construction can be seen as an illposed problem, as there are multiple ways to accomplish the task and no definitive groundtruth exists. An example is given in Fig. 1, where the edges are calculated by either location information or signal information. If the technique relies on the location information of the nodes, then the goal is: given a set of longitude and latitude locations of objects/sensors X, we need to find a way to create edges/connections given certain criteria. However, if the technique relies on signal data, i.e., M observations of the N nodes (the data entities) represented in a data matrix \(X\in {\mathbb {R}}^{N\times M}\), each column on this matrix X contains a graph signal for each node, and the graph is supposed to be learned from this data.
Both directions result into different graphs with varying characteristics. For example, Fig. 2 shows three versions of a constructed graph from a seismic sensor network called CW, which will be introduced in Sect. 3.5.1 in more detail, for which we specifically refer to Table 2. Figure 2a, b shows techniques that create comparable graphs, but (c) shows a completely different graph that was constructed by the clustering algorithm kmeans that only creates connections between nodes in the same cluster. Here, we can already observe quite different graph structures, also in terms of their properties, since Fig. 2c even generates an unconnected graph, compared to the graphs in Fig. 2a, b which differ in terms of their density, i.e., Fig. 2a shows a much sparser graph.
3.2 Locationbased techniques
Locationbased techniques rely on the location information of nodes for graph construction. This study used the latitude and longitude of stations for this procedure, and distances were calculated with measures that take the curvature of the earth into account (e.g., haversine or geodesic distance). The implemented locationbased graph construction techniques are: MinMax, thresholded Gaussian kernel, KNN(weighted), kmeans, OPTICS, relativeneighborhood and Gabriel graphs.
3.2.1 Reversed MinMax
As performed in [19], the reversed MinMax technique takes the absolute distance between each set of nodes and returns the reversed MinMaxscaled value thresholded by k. It is an own implementation of the well known qneighborhood class of graph construction techniques. In qneighborhood algorithms, a cutoff distance is always in the scale of the dataset. In this version, we rescale the distances first with minmax scaling, to then have an interpretable parameter k between [0, 1] for cutoff. The resulting graph consists of all the edges that are within the qneighborhood of each node (see Algorithm 1).
The advantage of reversed MinMax is the easy to tune and interpret hyperparameter k. However, a disadvantage of reversed MinMax is its computational complexity of \({\mathcal {O}}(n^{2})\), which does not easily scale to larger graphs.
3.2.2 Thresholded Gaussian kernel
A thresholded Gaussian kernel is a mathematical function that describes a bellshaped curve. It is commonly used in signal processing, image processing, and for tasks such as smoothing, blurring, and feature extraction. The thresholded Gaussian kernel function can be used to compute a weight or strength of the connection between nodes, based on their relative distance to each other [28, 29]. The larger the value of \(\sigma \), the wider the curve, and the more nodes will be connected. The smaller the value of \(\sigma \), the narrower the curve, and the fewer nodes will be connected. In this study, the standard deviation of all distances \(d > 0\) were taken as \(\sigma \) (see Algorithm 2). Lastly, the parameter k will act as a thresholding value to sparsify the graph.
The advantage of using a thresholded Gaussian kernel is the easy to tune and interpret hyperparameter k, similar to reversed MinMax. One downside is the time complexity of \({\mathcal {O}}(n^{2})\), since all the pairwise distances have to be calculated.
3.2.3 KNN (un)weighted
The KNN algorithm can be used to define an edge set of the Knearest neighbors of a node in a graph (see Algorithm 3). This technique is one of the most popular options in the literature [27]. In the unweighted version, every edge is assigned a weight of 1, whereas the actual distance value is used in a weighted version. KNN provides an intuitive parameter (k) that directly limits the number of edges in the graph. In addition, with help of KD Tree or Balltree structures, it can be calculated very efficiently (\({\mathcal {O}}(n\log {n})\)) [39, 40]. However, a limitation of always connecting the k most close edges for a node is that whatever the distance, an edge will be created. Therefore, KNN (especially the unweighted version) is sensitive to outliers.
3.2.4 Relative neighborhood graph
The relative neighborhood graph (RNG) [41] is constructed by connecting each point \(v_{i}\) to its nearest neighbor \(v_{j}\), provided that there are no other points \(v_{k}\) that lie within the circles centered at the midpoint of point \(v_{i}\) and \(v_{j}\) (see Algorithm 4). In other words, if two close neighbors share a common closer neighbor, then no edge is drawn between them. While its typically applied in computing minimum spanning trees or creating graphs that are always planar (has no crossing edges, which is helpful in road design tasks such as urban road planning [42]), it is used in this study since it contains zero parameters to tune, which gives it an advantage in graph construction. In addition, it has a time complexity of \({\mathcal {O}}(n \log n)\) [43].
3.2.5 Gabriel graph
The Gabriel graph is a geometric proximity graph that is often used in spatial data analysis [44, 45]. More formally, it is the undirected graph G with vertex set S in which any two distinct points \(v_{i} \in S\) and \(v_{j} \in S\) are adjacent precisely when the closed disk (the region in a plane bounded by a circle) having \(v_{i}v_{j}\) as a diameter contains no other points (see Algorithm 5). There are multiple implementations available, ranging from \({\mathcal {O}}(n^{3})\) to \({\mathcal {O}}(n \log n)\) [46]. The graphs that are constructed will have few edges, but the advantage of this technique is that it is parameter free, and it creates a planar graph (no crossing edges in the visualization).
3.2.6 Kmeans
KmMeans is a clustering algorithm of which there are two categories, e.g., partitional and hierarchical [47]. Hierarchical algorithms locate nested clusters either topdown or bottomup. Instead of imposing a hierarchical structure, kmeans (a partitional algorithm) simultaneously locates all the clusters as a partition of the data.
Clustering data points using the kmeans algorithm [48] involves the partitioning of samples into N groups of equal variance, while minimizing the withincluster sumofsquares criterion (see Algorithm 6). The computational complexity of kmeans is \({\mathcal {O}}(ktn)\), where k is the number of clusters, t is the number of data points, and n is the number of iterations needed for convergence. However, most often not too many iterations are necessary to reach convergence, making kmeans a relatively fast clustering algorithm.
A key advantage of kmeans is the intuitive parameter to tune, namely the number of desired clusters. This is especially useful if the number of clusters is something that can be inferred from domain knowledge of the problem at hand.
However, this k parameter could also be a disadvantage if the number of clusters is hard to intuitively define. Also, kmeans uses Euclidean distance metrics by design [48], so the advantage of calculating distance on a surface is not used. Also, the centroid initiation heavily influences the results [49]. Lastly, all points will be added to a cluster, meaning that also outlier points will always be joined.
3.2.7 OPTICS
The basic idea of OPTICS [50] is similar to DBSCAN [51] (another popular density based cluster algorithm) but it addresses one of DBSCAN’s major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the data points are (linearly) ordered such that spatially closest points become neighbors in the ordering. Clusters can then be found by finding regions where the reachability is below a certain threshold. So OPTICS instead generates an enhanced ordering of the data that represents the densitybased clustering structure of data, rather than directly producing a clustering (see Algorithm 7). Overall, the time complexity can go up to \({\mathcal {O}}(n^{2})\)[52].
In Algorithm 7, two other functions need to be explained: The core distance (\( cored {} \)) is the smallest radius that a particular point must have to be designated as a core point. The supplied point’s core distance (\( cored ){ isundefined}( U ){ ifitisnotacorepoint}.{ Themaximumofthecoredistanceof}p{ andtheEuclideandistance}({ orotherdistancemetric}){ between}p{ and}q{ isthereachabilitydistance}( reachd {} \)) for those points.
The real advantage of OPTICS is that it can detect very different local densities to reveal clusters in different regions of the data space. In other words, it can detect a very dense cluster in a region, and also relatively less dense clusters in another region of the data. In addition, it achieves this while also reducing the number of parameters to tune, making it an easier choice for most situations. However, OPTICS will likely fail if there is a very gridlike structure in the points (no peaks or drops in density)
3.3 Signaldriven techniques
Signaldriven techniques rely on the signal data of nodes for graph construction (see Fig. 1b. The implemented techniques are: correlation, dynamic time warping (DTW) [53] and maximal information coefficient (MIC) [54]. Due to computational constraints, each of these techniques received 50 random data samples to construct the graph. In addition, for correlation and MIC we consider the top\(m{ correlationvaluesforeachedge},\,{ sotakingthestrongestcorrelationvaluesintoaccount}.{ Weempiricallydetermined}m=20\) via grid search in our experiments.
3.3.1 Correlation
Correlation is a statistical measure of the strength and direction of the linear relationship between two continuous variables. Given a pair of time series signals, the Pearson correlation coefficient is calculated as the covariance of the two variables divided by the product of their standard deviations (see Algorithm 8). The correlation coefficients are taken between the signals of every node, and ordered from large to small. The top \(m=20{ correlationscoresaretakenandaveraged},\,{ asdiscussedabove}.{ Afterthat},\,{ thegraphismademoresparsebythresholding}.{ Usingcorrelationhastheadvantageofbeingperfectlyboundedbetween}[1, 1],\,{ anditcanbeeasilycalculatedevenforlongsequencelengthswithatimecomplexityof}{\mathcal {O}}(n^{2}).{ Also},\,{ correlationisinvarianttothescalingofthevariables},\,{ andreportsthedirectionofthecorrelation}.{ However},\,{ itisalsolimitedtodetectingonlylineardependenciesanddisregardsvariousothertypesofrelationships}.{ Thedisadvantageofthistechniqueistheextraparameter}k\) that has to be tuned compared to the other techniques.
3.3.2 Dynamic time warping
Dynamic time warping (DTW) is a distance function that uses a warped path to align two time series and compare them [53]. It is a popular algorithm in the domain of signal processing and pattern recognition. For example, it shows better results in time series classification when the length of the sequences are relatively small (due to computational complexity) than correlation measurements [55].
DTW aims to find the optimal alignment between two time series by stretching or compressing them in time. Each point is compared to each point in the other time series, and the pair of corresponding points that minimize the cumulative distance between them are searched. The total distance is the cumulative distance of all observations in the signals (see Algorithm 9).
DTW has several advantages over Euclidean distance or correlationbased techniques. For example, it is more robust to noise and local distortions than Euclidean techniques. It can also handle time series of different lengths. However, the time complexity of DTW is \({\mathcal {O}}(n^{2}),\,{ where}n\) refers to the length of the input time series. Therefore, the distances need to be reverse minmax scaled between \([0,1]\) since DTW has no upperbound.
3.3.3 Maximal information coefficient
Introduced by [54], the maximum information coefficient (MIC) is a nonparametric measure of similarity between two variables. It is based on the concept of mutual information, which measures the amount of information of \(X{ canbeexplainedbyknowingvariable}Y.{ Itconsidersallpossiblepartitionsofthedataintobinsandcalculatesthemutualinformationofthatpartition}.{ Inturn},\,{ theMICscoreisthemaximumvaluesofallpossiblepartitions}({ seeAlgorithm}~10).{ Theaverageofthetop}20{ MICscoresofeachedgewasusedtocalculatethefinalgraph},\,{ asdiscussedabove},\,{ i}.{ e}.,\,{ thesame}m=20\) as used for the correlation.
MIC has various desirable properties. For example, it is scaleinvariant, meaning that it is not affected by the scale of variables. It can also detect nonlinear relationships (e.g., cubic or exponential), and it is robust to outliers. It is also symmetric, such that the \(\text {MIC}(X,Y)=\text {MIC}(Y,X).{ Lastly},\,{ MICisnaturallyrangedbetween}[0,\,1]{ makingiteasytointerpret}.{ However},\,{ itdoesnotreportthedirectionalityortypeofrelationshipitfinds}.{ Thedisadvantageistheextraparameterthatdeterminesthetop}k{ scoreshastobedetermined}.{ Andmostimportantly},\,{ itiscomputationallyexpensive}({\mathcal {O}}(n^{2.4})\)).
3.4 Parameter tuning
In the following paragraph, the possible parameter values of each algorithm are discussed and described in Table 1. We perform a grid search of all the possible parameter values and report the best settings in the results, to make the comparison between all algorithms fair.
The first group of algorithms, which includes correlation, DTW, MIC, thresholded Gaussian kernel, and MinMax, all have a parameter range between 0.05 and 0.95 with step size of 0.05. A higher value means that two data points must be more similar to be considered edges. The second group of algorithms, which includes kmeans, OPTICS and KNN(un)weighted have a parameter range between 2 and 40. The parameter values determine the size and shape of the clusters, or the number of neighbors. The last two algorithms, Gabriel and relative neighborhood, are parameter free.
3.5 Datasets and tasks
The datasets used in this paper are from the Italian Seismic network (CI and CW), METRLA and PEMSBAY. The datasets differ greatly in their data domain, task, number of nodes, \(\text {km}^{2}{} \) and sample rate.
In short, Table 2 shows the general characteristics of the datasets in this paper. In total, there are four datasets, of which two are used for regression and two for forecasting. To clarify the difference between the two analysis tasks, Fig. 3 combines two examples. In Fig. 3, the entirety of segment A is used to predict a value for the regression task. In other words, the task is to predict numeric values that depend on the whole segment of the series, rather than the last few observations [59]. For the forecasting task, the red part of segment B is used to directly forecast what will happen in the blue segment.
3.5.1 Regression task
The two regression datasets (CI & CW) recorded by the Italian national seismic network [11] contain sensors that measure seismic waves from 3 channels. Both datasets contain 39 nodes, but there is a difference in the number of earthquakes, 915 (CI) versus 266 (CW), and the geographical area (see [19] Figure 3). The task in this dataset is to take an initial 10 s of input data (1000 data points) of an incoming earthquake from multiple stations, in order to predict the ground shaking intensity of other faraway stations from the earthquakes’ epicenter. This 10s input window makes sure that only the nearby stations had the opportunity to actually measure the earthquake. 5 external target values of the input are calculated, which do not depend necessarily on recent values, but rather on the whole length of the time series. This kind of regression task is also referred to as time series extrinsic regression (TSER) [59], of which Fig. 3 (A) is an example.
3.5.2 Forecasting task
For the forecasting task in this paper, data from traffic sensors are used (two wellknown benchmark datasets, see [17, 29, 60]). The objective of time series forecasting is to estimate the future value of \(Y,\,{ giventhehistoricalvaluesof}Y\). Therefore, this task is different from the TSER task, as can be seen in Fig. 3.
The time series are divided with a sliding window of 12step lookback and the same horizon, with a stride of 1. Therefore, a single sample span covers a total of 24 time steps. In the forecasting task, the signaldriven graphs used a signal length of 24 due to computational constraints and to be aligned with the sample span. The average error of all 12 steps ahead is calculated. In other words, a sample of shape (\(N,\,12){ wouldbecomparedtothegroundtruthofshape}(N,\,12){ where}N{ referstothenumberofnodesinthegraph}.{ Naturally},\,{ thelongerthepredictionhorizon},\,{ thehighertheerror}.{ Therefore},\,{ theerrorofthefirstvaluesin}(N\),12) will have smaller errors than the last values.
The first of the two forecasting datasets is METRLA, which contains traffic information collected from 207 loop detectors in the highway of Los Angeles County [61]. The sample frequency was once every 5 min, for the period between March 2012 and June 2012. This dataset has a missing value percentage of 8.11%. These missing values are imputed by the last valid observation. The second forecasting dataset is PEMSBAY, collected by the California Transportation Agencies (CalTrans) Performance Measurement System (PEMS). 325 sensors were selected in the Bay Area and 6 months of data ranging from January 1, 2017, to May 31, 2017, were used. The traffic speed readings are aggregated in 5 min windows. The dataset has less missing values than the METRLA dataset, only 0.02%. Missing values are also imputed similarly to METRLA.
3.6 Models
Here, we introduce the two GNN models used for either regression or forecasting. Their general architectures and graph input are introduced. Afterward, their training procedures and evaluation are discussed.
3.6.1 Regression model
The same model as in [19] is used, which showed high performance on the task (consult [19] for more details). In short, the model uses two 1D convolutional layers (CNNs) (see Fig. 4) that act as feature extractors on the time series data. The output is then reshaped into (Nodes, Features) and node location metadata is concatenated, which showed to improve performance in the task. Afterward, two GCN layers incorporate crossstation information and the output is eventually funneled into five dense layers that predict the 5 output metrics.
The model requires a symmetrically normalized Laplacian matrix [13]. A typical Laplacian matrix can lead to vanishing gradient problems if nodes have different levels of connectivity. To circumvent this issue, we normalize the degree matrix symmetrically. Furthermore, we include each node’s unique features by adding the identity matrix to the Laplacian matrix, following the approach outlined in [13].
3.6.2 Forecasting model
For the forecasting experiments, a timethenspace GNN is used. The model (see Fig. 4) consists of a linear and RNN encoder and a nonlinear GCN readout. The first block (linear encoder) takes as input the tensor of shape [batch size, sequence length, nodes, input size] next to the adjacency matrix of the graph, and applies a linear transformation to it, resulting in a hidden representation of the inputs. The RNN encoder block outputs a sequence of hidden states, where each hidden state corresponds to one time step of the input tensor. At last, the GCN decoder block then transforms these tensors to the output shape [batch size, horizon, nodes, input size] where the horizon is the forecasting period.
3.7 Training and evaluation
Considering training split, the regression datasets are first split in an 80–20% manner. Then, the remaining 80% of training data is used for 3fold crossvalidation to generalize the results of the experiment. In each of these folds, the model is trained separately and tested on the unseen test set. The input maximum (i.e., the greatest amplitude detected across all stations during the time window) is used to normalize the data. Lastly, a batch size of 30 is used. For a more detailed explanation, we refer to [19, 62].
Considering the forecast datasets, all data were standardized using mean and standard deviation. A train, validation test set is used with a validation set of 10% and test of 20%. Lastly, a batch size of 64 is used.
All experiments are evaluated based on their mean absolute error (MAE):
and mean squared error (MSE):
where \(y_i{ aretheactualvaluesand}\hat{y_i}{} \) the predictions. A total of 100 epochs are ran, without early stopping. The best model weights were reinitiated after the last epoch.
3.8 Software and resources
We applied Python with Tensorflow and Keras, but also Pytorch in combination with TSL [63] to develop and train the models. These models were trained on a dedicated server with two Intel Xeon CPUs (3.2 GHz), 256 GB RAM and an Nvidia Quadro RTX6000 (24 GB) GPU. In addition, a supplementary page^{Footnote 1} is made available containing code of the experiments, including a python package to construct the graphs.
4 Results
This section describes the results of our analysis on the performance of all graph construction techniques. We focus on understanding the different characteristics of each technique and inferring general guidelines for graph construction.
4.1 General statistics
In realworld networks, often the majority of nodes have a relatively low degree, but a few nodes, which are connected to many other nodes, will have a very large degree. In order to obtain a better understanding of the structural properties of the constructed graphs, we investigate specific network characteristics in the constructed graph via according general properties and metrics. In particular, we examine the number of edges and the degree distributions of the graphs shown in Table 3 and Fig. 5. The reported results correspond to the best hyperparameter settings in terms of their MSE. In addition, Table 3 also reports the graph densities. The density of a graph is the fraction of the number of (observed) edges to the number of all possible edges (in a completely connected graph), which is therefore 0 for a graph without edges and 1 for a complete graph.
Immediately, it is visible that there are huge differences between the techniques in terms of the number of edges (see Table 3). Techniques such as DTW, MIC and MinMax produce graphs that contain many edges. Especially, the DTW and MIC graphs sometimes contain graph densities that approach 1, which means that all nodes are connected. The best example for this is the PEMSBAY graph constructed by MIC, which has 52,950 edges (and only just 325 nodes). However, other techniques such as RNG, Gabriel and OPTICS result in graphs with very few edges. From RNG and Gabriel this behavior is expected since the goal of those algorithms is to construct optimally sparse graphs. Therefore, techniques resulting in the construction of relatively complete graphs do essentially not support data sparsity in terms of graph structure, since very many edges are contained. On the other hand, the sparser graphs automatically tend to include fewer edges thus focusing potentially on the stronger (and thus potentially more important) connections.
In addition, some graphs show scalefree degree distributions. The most notable characteristic in a scalefree network is the relative commonness of vertices with a degree that considerably exceeds the average. Examples of these techniques are KNN, correlation, kmeans and RNG. From a complex system perspective, those methods therefore tend to model the expected distribution, i.e., the scalefreeness more closely, compared to the other methods. Specifically, there are also graphs that show a linear pattern in the number of nodes that are calculated, those are thresholded Gaussian kernel, MinMax and DTW. These techniques also belong to the ones that construct the most edges.
Another way to investigate the constructed graphs is by examining their edge weight distributions. Figure 6 shows the weight distributions of the PEMSBAY dataset for the algorithms that construct weighted graphs, in contrast to graphs with just binary adjacency matrices. Some clear differences are visible between the signaldriven and locationbased techniques. To start, the signaldriven techniques create edges that have a much narrower distribution than the locationbased techniques. This results in these techniques having less detailed control over their thresholding parameter, since almost all the edges that are created have a weight less than 0.50 (compared to thresholded Gaussian kernel, KNNW and MinMax where the distribution of weights covers the entire \([0, 1]\) range.
4.2 Regression and forecasting
Now that we have a general idea of the types of graphs that were constructed, we can investigate the performance of each technique. The MAE and MSE scores are visible in Table 4. We can observe different highperforming graph construction techniques in each dataset. MIC and correlation perform best in the CI and CW datasets. In the METRLA and PEMSBAY datasets, MIC, correlation and DTW all work best (followed closely by MinMax). However, when taking into account only the MSE, the MinMax technique showed the best performance on METRLA, indicating that the errors of MinMax are in general of smaller magnitude. This could be explained by the many edges that are constructed by the MinMax algorithm, that can act as a damping effect on the predictions.
Clear weakperforming graph construction algorithms are Gabriel, RNG and kmeans. In the case of Gabriel and RNG, this could be due to their algorithm being focused on creating too sparse graphs for the GNN to function correctly. However, in the case of kmeans different reasons could be given. Clustering methods tend to create disconnected components, which now result in low performance. In addition, kmeans is known to produce spherical clusters (also visible in Fig. 2) which may not easily be found in geographical sensor networks. Lastly, it is interesting to see that a thresholded Gaussian kernel seems to perform relatively bad. Thresholded Gaussian kernels are beside KNN or \(q\)neighborhood graphs the next most popular option for constructing the graph [28, 29], especially in the spatial temporal time series forecasting domain.
When we take a step back from individual scores to general characteristics of the techniques, some interesting patterns arise. To start, different performance is visible when characterizing techniques (see Table 4) based on their edge count (see Table 3). In almost all cases, having the most edges is most beneficial. This holds true for the CI, METRLA and the PEMSBAY datasets. In the CW dataset, we see that having less edges seems to be beneficial for good performance. This could be due to the larger geographical area that the CW usecase covers, compared to the other datasets. Here, the graph construction algorithms that can handle larger distances (in this case by ignoring those far away stations) seem to show their strength more, whereas techniques that will connect no matter what create meaningless edges that weakened performance.
When we examine the performance of the different type of techniques (signaldriven or locationbased), some general patterns emerge (see Table 4). In most cases, the signaldriven techniques show the highest performance in the tasks, suggesting that these techniques are able to discover very local patterns in the signal data, which locationbased techniques would not pick up on. For example, the traffic in one road segment could be totally different from a road nearby, which locationbased methods would not pick up on. Also, imagine an air temperature setting where a graph from weather stations has to be constructed. Figure 7 shows a 3D height plot of this scenario, where locationbased techniques would likely connect node D to node C based on their proximity, and signaldriven techniques would likely not, due to the height difference, which would be observable in the signal data. A reason for this is that height is invariant toward location in terms of latitude and longitude. Therefore, signaldriven techniques would rather connect node D and B, due to their possible similar measurements, and locationbased methods would likely not.
A supplementary explanation for the relative good performance of signaldriven techniques is their high edge count. Especially MIC shows this behavior, sometimes connecting all nodes. However, in the case of correlation, this does not necessarily hold true. Only from the PEMSBAY dataset the correlation technique creates many edges, but in the other cases other techniques many more edges.
Another method to examine the impact of the graphs is by looking at the variability in the performance across datasets. This is done by the coefficient of variation, which is the ratio of the standard deviation to the mean for all methods per dataset. The coefficient of variation proves to be valuable, as the context of the mean must always be taken into consideration while interpreting standard deviations. In general, it is noticeable that the variability in scores for each dataset differs (see the bottom row of Table 4). The CI and CW dataset contain the highest variability, with a reported coefficient of variation from the MSE scores of \(13.4\%{ and}14.4\%\). In these datasets, having a richer and more optimal graph seems to have the biggest impact on prediction performance. In the foresting datasets (METRLA and PEMSBAY), the variability is lower. A reason for the difference between the two types of datasets could be related to the nature of the data and its sample size.
Lastly, Table 5 shows the average ranking of each algorithm for all datasets. Algorithms are given a ranking between 1–11, related to the 11 used graph construction techniques, depending on their performance (in terms of MSE) in each dataset, where the lowest performance gets 1 point and the highest 11 points. The model that constructed a graph with correlation shows the best average performance compared to all other algorithms, followed by MIC, DTW MinMax and KNN. The low performing models constructed graphs with Gabriel or RelativeNeighborhood. The most important finding in this ranking is that signaldriven algorithms are overly represented in the top of the rankings. However, out of the signaldriven techniques, the correlation graph is considerably more sparse. Therefore, this leads to lower computational complexity during training. Considering that correlation graphs are also more easily constructed in the first place, this seems to be the best general option to take. A reasonable second option or baseline option seems to be using an unweighted KNN, which can be easily calculated and creates a relatively sparse graph.
One interesting thing to also note from Table 5 is that the unweighted KNN graph seems to generally outperform the weighted version. One would actually expect that the benefit of the weighted KNN technique would be the extra details added to the graph. This is especially important for samples near to the decision boundary, since they are vulnerable to noiserelated effects. Nevertheless, our results indicate that placing greater trust in your immediate neighbors, as reflected by a weight of 1, leads to better results.
5 Conclusions
This work examined the influence of the underlying graph for a GNN in time series analysis tasks. Specifically, this work performed a thorough investigation of different kinds of graph construction techniques and their performance and characteristics. To accomplish this, we used two GNN models for either a (1) time series regression (TSER) or (2) time series forecasting task. The models were both tested on sensor data with varying characteristics, which are shown in Table 2.
For each graph construction technique, a hyperparameter search was performed to compare the best performing settings to each other. In total 11 techniques were tested, including ones that construct graphs from either signaldriven information or locationbased information (geographical location of the sensors). We reported the results individually, but also provided more broad results based on the characteristics of either the graph construction method or the dataset.
The results on the degree distribution of each constructed graphs indicated that the graph construction techniques created graphs with varying characteristics. In general, signaldriven techniques calculated more edges compared to locationbased techniques. However, either graphs with linear degree distributions or longtail distributions showed best results.
Considering the performance, signaldriven techniques showed the most promising results. The entire top 3 of the general ranking is represented by signaldriven techniques. The correlation technique scored best, while also using the least computational resources out of all signaldriven techniques. Therefore, it seems that if high quality signal data are available, using correlation seems to be a great option to take.
With regard to variability in the results, some clear differences between the datasets can be observed. Most variability (with help of the coefficient of variation) was found in the regression datasets, hinting that the impact of the graphs was more prominent for the regression task.
In conclusion, our study highlights the importance of graph construction methods in graph neural network research. The choice of graph construction method has a considerable impact on the performance of GNNs in our tasks. Our findings provide insights into the strengths and weaknesses of different approaches and can serve as a useful guide for future researchers in this field.
Overall, we recommend that future research focus on developing more effective graph construction methods that can capture the underlying structure of the data while avoiding computational complexity. Therefore, for future work, we aim to investigate further methods for graph construction, e.g., using Granger causality to calculate another type of dynamics between two signals. Furthermore, using a neural network to learn the most optimal graph from signal data is another interesting direction to consider. Also, applying further (larger) datasets is another prominent option to consider for future work.
Data availability
There is a Github page (https://github.com/StefanBloemheuvel/graph_comparison) available with the corresponding data and material.
Code availability
There is a Github page (https://github.com/StefanBloemheuvel/graph_comparison) available with the corresponding code.
References
Tilak, S., AbuGhazaleh, N.B., Heinzelman, W.: A taxonomy of wireless microsensor network models. ACM SIGMOBILE Mobile Comput Commun Rev 6(2), 28–36 (2002)
Tubaishat, M., Madria, S.: Sensor networks: an overview. IEEE Potentials 22(2), 20–23 (2003)
Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley (2015)
Qiao, L., Zhang, L., Chen, S., Shen, D.: Datadriven graph construction and graph learning: a review. Neurocomputing 312, 336–351 (2018)
Wu, L., Cui, P., Pei, J., Zhao, L.: Graph Neural Networks: Foundations, Frontiers, and Applications, p. 725. Springer, Singapore (2022)
Segarra, S., Marques, A.G., Mateos, G., Ribeiro, A.: Network topology inference from spectral templates. IEEE Trans. Signal Inf. Process. Netw. 3(3), 467–483 (2017)
Shafipour, R., Segarra, S., Marques, A.G., Mateos, G.: Network topology inference from nonstationary graph signals. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5870–5874. IEEE (2017)
Kisilevich, S., Mansmann, F., Nanni, M., Rinzivillo, S.: SpatioTemporal Clustering. Springer (2010)
Guo, D., Chen, J., MacEachren, A.M., Liao, K.: A visualization system for space–time and multivariate patterns (visstamp). IEEE Trans. Vis. Comput. Gr. 12(6), 1461–1474 (2006)
Zhang, P., Huang, Y., Shekhar, S., Kumar, V.: Correlation analysis of spatial time series datasets: a filterandrefine approach. In: Proceedings of the PAKDD—Advances in Knowledge Discovery and Data Mining, pp. 532–544. Springer (2003)
Michelini, A., Margheriti, L., Cattaneo, M., Cecere, G., D’Anna, G., Delladio, A., et al.: The Italian National Seismic Network and the earthquake and tsunami monitoring and surveillance systems. Adv. Geosci. 43, 31–38 (2016)
Sperduti, A., Starita, A.: Supervised neural networks for the classification of structures. IEEE Trans. Neural Netw. 8(3), 714–735 (1997)
Welling, M., Kipf, T.N.: Semisupervised classification with graph convolutional networks. In: J. International Conference on Learning Representations (ICLR 2017) (2016)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning, pp. 1263–1272. PMLR (2017)
Cao, D., Wang, Y., Duan, J., Zhang, C., Zhu, X., Huang, C., Tong, Y., Xu, B., Bai, J., Tong, J., et al.: Spectral temporal graph neural network for multivariate timeseries forecasting. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17766–17778 (2020)
Cini, A., Marisca, I., Alippi, C.: Filling the g_ap_s: multivariate time series imputation by graph neural networks. In: International Conference on Learning Representations
Bloemheuvel, S., van den Hoogen, J., Jozinovic, D., Michelini, A., Atzmueller, M.: Graph neural networks for multivariate time series regression with application to seismic data. Int. J. Data Sci. Anal. 16, 1–16 (2022)
Peng, S., Cheng, J., Wu, X., Fang, X., Wu, Q.: Pressure sensor placement in water supply network based on graph neural network clustering method. Water 14(2), 150 (2022)
Guan, S., Zhao, B., Dong, Z., Gao, M., He, Z.: Gtad: graph and temporal neural network for multivariate time series anomaly detection. Entropy 24(6), 759 (2022)
Duan, Z., Xu, H., Wang, Y., Huang, Y., Ren, A., Xu, Z., Sun, Y., Wang, W.: Multivariate timeseries classification with hierarchical variational graph pooling. Neural Netw. 154, 481–490 (2022)
Maier, M., Luxburg, U., Hein, M.: Influence of graph construction on graphbased clustering measures. In: Advances in Neural Information Processing Systems, vol. 21 (2008)
Zhou, Z., Chen, X., Zhang, Y., Hu, D., Qiao, L., Yu, R., Yap, P.T., Pan, G., Zhang, H., Shen, D.: A toolbox for brain network construction and classification (BrainNetClass). Hum. Brain Mapp. 41(10), 2808–2826 (2020)
Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G.H., Lemay, A., Advokaat, N.: gMark: schemadriven generation of graphs and queries. IEEE Trans. Knowl. Data Eng. 29(4), 856–869 (2016)
Grady, L.J., Polimeni, J.R.: Discrete calculus: Applied analysis on graphs for computational science. Springer, Berlin (2010)
Lira, H., Martí, L., SanchezPi, N.: A graph neural network with spatiotemporal attention for multisources time series data: an application to frost forecast. Sensors 22(4), 1486 (2022)
Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)
Li, Y., Yu, R., Shahabi, C., Liu, Y.: Diffusion convolutional recurrent neural network: datadriven traffic forecasting. In: International Conference on Learning Representations
Sun, Y., Yao, X., Bi, X., Huang, X., Zhao, X., Qiao, B.: Timeseries graph network for sea surface temperature prediction. Big Data Res. 25, 100237 (2021)
Jebara, T., Wang, J., Chang, S.F.: Graph construction and bmatching for semisupervised learning. In: Proceedings of the International Conference on Machine Learning. ICML’09, pp. 441–448. ACM, New York (2009)
Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatialtemporal graph modeling. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 1907–1913 (2019)
Dai, E., Jin, W., Liu, H., Wang, S.: Towards robust graph neural networks for noisy graphs with sparse labels. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 181–191 (2022)
Luo, D., Cheng, W., Yu, W., Zong, B., Ni, J., Chen, H., Zhang, X.: Learning to drop: robust graph neural network via topological denoising. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 779–787 (2021)
Shafipour, R., Mateos, G.: Online topology inference from streaming stationary graph signals with partial connectivity information. Algorithms 13(9), 228 (2020)
Shang, C., Chen, J., Bi, J.: Discrete graph structure learning for forecasting multiple time series. In: International Conference on Learning Representations
Du, Y., Wang, S., Guo, X., Cao, H., Hu, S., Jiang, J., Varala, A., Angirekula, A., Zhao, L.: GraphGT: machine learning datasets for graph generation and transformation. In: Thirtyfifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Erdos, P.: On random graphs. Mathematicae 6, 290–297 (1959)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Omohundro, S.M.: Five balltree construction algorithms. In: International Computer Science Institute Berkeley (1989)
Toussaint, G.T.: The relative neighbourhood graph of a finite planar set. Pattern Recognit. 12(4), 261–268 (1980)
Watanabe, D.: A study on analyzing the grid road network patterns using relative neighborhood graph. In: The Ninth International Symposium on Operations Research and Its Applications, pp. 112–119. World Publishing (2010)
Lingas, A.: A lineartime construction of the relative neighborhood graph from the Delaunay triangulation. Comput. Geom. 4(4), 199–208 (1994)
Gabriel, K.R., Sokal, R.R.: A new statistical approach to geographic variation analysis. Syst. Zool. 18(3), 259–278 (1969)
Choo, J., Jiamthapthaksin, R., Chen, C.S., Celepcikay, O.U., Giusti, C., Eick, C.F.: Mosaic: a proximity graph approach for agglomerative clustering. In: Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, pp. 231–240 (2007)
Matula, D.W., Sokal, R.R.: Properties of Gabriel graphs relevant to geographic variation research and the clustering of points in the plane. Geogr. Anal. 12(3), 205–222 (1980)
Jain, A.K.: Data clustering: 50 years beyond kmeans. Pattern Recognit. Lett. 31(8), 651–666 (2010)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, No. 14, pp. 281–297. Oakland (1967)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the kmeans clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. ACM Sigmod Record 28(2), 49–60 (1999)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise. AAAI Press (1996)
Kamil, I.S., AlMamory, S.O.: Enhancement of optics’ time complexity by using fuzzy clusters. Mater. Today Proc. 80, 2625 (2021)
Berndt, D.J., Clifford, J.: Using Dynamic Time Warping to Find Patterns in Time Series. AAAI Press (1994)
Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334, 1518–1524 (2011)
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552 (2008)
Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intell. Data Anal. 11(5), 561–580 (2007)
Shao, F., Liu, H.: The theoretical and experimental analysis of the maximal information coefficient approximate algorithm. J. Syst. Sci. Inf. 9(1), 95–104 (2021)
Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. Proc. IEEE 80(9), 1502–1517 (1992)
Tan, C.W., Bergmeir, C., Petitjean, F., Webb, G.I.: Time series extrinsic regression: predicting numeric values from time series data. Data Min. Knowl. Discov. 35, 1032–1060 (2021)
Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., Zhang, C.: Connecting the dots: multivariate time series forecasting with graph neural networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 753–763 (2020)
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)
Jozinović, D., Lomax, A., Štajduhar, I., Michelini, A.: Rapid prediction of earthquake ground shaking intensity using raw waveform data and a convolutional neural network. Geophys. J. Int. 222(2), 1379–1389 (2020)
Cini, A., Marisca, I.: Torch Spatiotemporal (2022). https://github.com/TorchSpatiotemporal/tsl
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
SB conceptualized the idea of the study, and designed the methodology, with input from MA. SB retrieved, prepared and processed the data used for the experiments. SB and JH collaborated on the analysis of the experiment results. All authors drafted the manuscript. All authors reviewed and edited the manuscript. SB implemented the methods and algorithms, and ran the experiments, in collaboration with JH. All authors read, reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bloemheuvel, S., van den Hoogen, J. & Atzmueller, M. Graph construction on complex spatiotemporal data for enhancing graph neural networkbased approaches. Int J Data Sci Anal 18, 157–174 (2024). https://doi.org/10.1007/s41060023004522
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060023004522