1 Introduction

Time series, which can be thought of as collections of observations indexed by time, are ubiquitous in all domains from climate studies or health monitoring to financial data analysis. There is a plethora of statistical models in the literature adequate to describe the behaviour of time series (Shumway and Stoffer 2017). However, technological developments, such as sensors and mobile devices, lead to the gathering of large amounts of high dimensional time indexed data for which appropriate methodological and computational tools are required. With this purpose, recently, feature-based time series characterization has become a popular approach among time series data researchers (Fulcher 2018; Henderson and Fulcher 2021; Wang et al. 2006) and proved useful for a wide range of temporal data mining tasks ranging from classification (Fulcher and Jones 2014), clustering (Wang et al. 2006), forecasting (Montero-Manso et al. 2020; Talagala et al. 2018), pattern detection (Geurts 2001), outlier or anomalies detection (Hyndman et al. 2015), motif discovery (Chiu et al. 2003), visualization (Kang et al. 2017) and generation of new data (Kang et al. 2020), among others.

The main idea behind feature-based approaches is to construct feature vectors that aim to represent specific properties of the time series data by characterizing the underlying dynamic processes (Fulcher 2018; Fulcher and Jones 2017). The usual methodologies for calculating time series features include concepts and methods from the linear time series analysis literature (Shumway and Stoffer 2017), such as autocorrelation, stationarity, seasonality and entropy, but also methods of nonlinear time-series analysis based on dynamic systems theory (Fulcher et al. 2013; Henderson and Fulcher 2021; Wang et al. 2006). These methods usually involve parametric assumptions, parameter estimation, non-trivial calculations and approximations, as well as preprocessing tasks such as finding time series components, differencing and whitening thus presenting drawbacks and computation issues related to the nature of the data, such as the length of the time series.

This work contributes to the feature-based approach in time series analysis by proposing an alternative set of features based on complex networks concepts.

Complex networks describe a wide range of systems in nature and society by representing the existing interactions via graph structures (Barabási 2016). Network science, the research area that studies complex networks, provides a vast set of topological graph measurements (Costa et al. 2007; Peach et al. 2021), a well-defined set of problems such as community detection (Fortunato 2010) or link prediction (Lü and Zhou 2011), and a large track record of successful application of complex network methodologies to different fields (Vespignani 2018), including graph classification (Bonner et al. 2016).

Motivated by the success of complex network methodologies and with the objective of acquiring new tools for the analysis of time series, several network-based approaches have been recently proposed. These approaches involve mapping time series to the network domain. The mappings methods proposed in the literature may be divided into one of three categories depending on the underlying concept: proximity, visibility and transition (Silva et al. 2021; Zou et al. 2019). Depending on the mapping method, the resulting networks capture specific properties of the time series. Some networks have as many nodes as the number of observations in the time series, as visibility graphs (Lacasa et al. 2008), while others allow to reduce the dimensionality preserving the characteristics of the time dynamics, as the quantile graphs (Campanharo et al. 2011). Network-based time series analysis techniques have been showing promising results and have been successful in the description, classification and clustering of time series. Examples include automatic classification of sleep stages (Zhu et al. 2014), characterizing the dynamics of human heartbeat (Shao 2010), distinguishing healthy from non-healthy electroencephalographic series (Campanharo and Ramos 2017) and analysing seismic signals (Telesca and Lovallo 2012).

In this work we establish a new set of time series features, NetF, by mapping the time series into the complex networks domain. Further, we propose a procedure for time series mining tasks and address the question whether time series features based on complex networks are a useful approach in time series mining tasks. Our proposed procedure, represented in Fig. 1 comprises the following steps: map the time series into (natural and horizontal) visibility graphs and quantile graphs using appropriate mapping methods and compute five specific topological measures for each network, thus establishing a vector of 15 features. These features are then used in mining tasks. The network topological metrics selected, average weighted degree, average path length, number of communities, clustering coefficient and modularity, measure global characteristics, are simple to compute and to interpret in the graph context and commonly used in network analysis, thus capable of providing useful information about the structure and properties of the underlying systems.

Fig. 1
figure 1

Schematic diagram of the network based features approach to time series mining tasks

To investigate the relevance of the set of features NetF we analyse synthetic time series generated from a set of Data Generating Processes with a range of different characteristics. Additionally we consider the problem of time series clustering from a feature-based approach in synthetic, benchmark and a new time series data sets. NetF features are assessed against two other sets of features, tsfeatures and catch22 (Hyndman et al. 2020; Kang et al. 2017; Lubba et al. 2019). The results show that network science measures are able to capture and preserve the characteristics of the time series data. We show that different topological measures from different mapping methods capture different characteristics of the data, complementing each other and providing new information when combined, rather than considered by themselves as is common in the literature. Clustering results of empirical data are balanced when compared to conventional approaches, in some data sets the proposed approach obtains better results, and in other data sets the results are quite similar between the approaches. The proposed set of features have the advantage of being always computable, which is not always possible using classical time series features.

NetF, the empirical study implementation, and the data sets presented here are available on GitHub.Footnote 1

We have organized this paper as follows. Section 2 introduces basic concepts of time series and complex networks, setting the notation for the remainder of the paper, and also presents the mapping methods and networks measurements used. Next, in Sect. 3 the novel features of the time series proposed in this work are presented and a study of these features is carried out in order to characterize properties of the time series. In Sect. 4 the time series clustering tasks are performed as an example of application of network-based features, synthetic and empirical data sets are used, they are briefly described and compared to two other classical time series approaches. The results corresponding to the three approaches are presented. Finally, Sect. 5 presents the conclusions and some comments.

2 Background

2.1 Time series

A time series \(\varvec{Y}=(Y_1, \ldots , Y_T)\) is a set of observations obtained over time, usually at equidistant time points. A time series differs from a random sample in that the observations are ordered in time and usually present serial correlation that must be account for in all statistical and data mining tasks. Time series analysis refers to the collection of procedures developed to systematically solve the statistical problems posed by the serial correlation. Statistical time series analysis relies on a set of concepts, measures and models designed to capture the essential characteristics of the data, namely, trend, seasonality, periodicity, autocorrelation, skewness, kurtosis and heteroscedasticity (Wang et al. 2006). Other concepts like self-similarity, non-linearity structure and chaos, stemming from non-linear science are also used to characterize time series (Bradley and Kantz 2015).

Several classes of statistical models that provide plausible descriptions of the characteristics of the time series data have been developed with a view to forecasting and simulation (Box et al. 2015). The statistical models for time series may be broadly classified as linear and nonlinear, referring usually to the functional forms of conditional mean and variance. Linear time series models are models for which the conditional mean is a linear function of past values of the time series. The most popular class are the AutoRegressive Moving Average (ARMA) models. As particular cases of ARMA models we have: the white noise (WN), a sequence of independent and identically distributed observations, the AutoRegression (AR) models, which specify a linear relationship between the current and past values and the Moving Average (MA) models, which specify the linear relationship between the current value and past stochastic terms. ARMA models have been extended to incorporate non-stationarity (unit root) ARIMA models and long memory characteristics, ARFIMA models. Many time series data present characteristics that cannot be represented by linear models such as volatility, asymmetry, different regimes and clustering effects. To model these effects, non-linear specifications for the conditional mean and for the conditional variance lead to different classes of nonlinear time series models, such as Generalized AutoRegressive Conditional Heteroskedastic (GARCH) type models specified by the conditional variance and developed mainly for financial time series, threshold models and Hidden Markov models that allow for different regimes and models for integer valued time series, INAR. Definitions, properties and details about these models are given in Appendix A.

2.2 Complex networks

Graphs are mathematical structures appropriate for modelling complex systems which are characterized by a set of elements that interact with each other and exhibit collective properties (Costa et al. 2011). Typically, graphs exhibit non-trivial topological properties due to the characteristics of the underlying system, and so they are often called complex networks.

A graph (or network), G, is an ordered pair (VE), where V represents the set of nodes and E the set of edges between pairs of elements of V. Two nodes \(v_{i}\) and \(v_{j}\) are neighbours if they are connected by an edge \((v_{i},v_{j}) \in E\). The edges can be termed as directed, if the edges connect a source node to a target node, or undirected, if there is no such concept of orientation. A graph is also termed weighted if there is a weight, \(w_{i,j},\) associated with the edge \((v_{i}, v_{j})\).

Network science has served many scientific fields in problem solving and analyzing data that is directly or indirectly converted to networks. Currently, there is a vast literature on problems and successful applications (Vespignani 2018), as well as an extensive set of measurements of topological, statistical, spectral and combinatorial properties of networks (Albert and Barabási 2002; Barabási 2016; Costa et al. 2007; Peach et al. 2021), capable of differentiating particular characteristics of the network data. Examples include measures of node centrality (Oldham et al. 2019), graph distances (Li et al. 2021), clustering and community (Malliaros and Vazirgiannis 2013), among an infinity of them. Many of these topological measurements involve the concepts of paths and graph connectivity. A path is a sequence of distinct edges that connect consecutive pairs of nodes. And, consequently, two nodes are said to be connected if there is a path between them and disconnected if no such path exists. Thus, some measurements are based on the length (number of edges) of such connecting paths (Costa et al. 2007).

2.3 Mapping time series into complex networks

In the last decade several network-based time series analysis approaches have been proposed. These approaches are based on mapping the time series into the network domain. The mappings proposed in the literature are essentially based on concepts of visibility, transition probability and proximity (Silva et al. 2021; Zou et al. 2019). In this work we use visibility graph and quantile graph methods which are based on visibility and transition probability concepts, respectively. Next, we briefly describe these methods.

2.3.1 Visibility graphs

Visibility graphs (VG) establish connections (edges) between the time stamps (nodes) using visibility lines between the observations, where nodes are associated with the natural ordering of observations. Two variants of this method are as follows.

The Natural Visibility Graph (NVG) (Lacasa et al. 2008) is based on the idea that each observation, \(Y_t\), of the time series is seen as a vertical bar with height equal to its numerical value and that these bars are laid in a landscape where the top of a bar is visible from the tops of other bars. Each time stamp, t, is mapped into a node in the graph and the edges \((v_i, v_j)\), for \(i,j = 1 \ldots T\), \(i \ne j\), are established if there is a line of visibility between the corresponding data bars that is not intercepted. Formally, two nodes \(v_{i}\) and \(v_{j}\) are connected if any other observation \((t_{k}, Y_{k})\) with \(t_{i}<t_{k}<t_{j}\) satisfies the equation:

$$\begin{aligned} Y_{k} < Y_{j}+(Y_{i}-Y_{j})\frac{(t_{j}-t_{k})}{(t_{j}-t_{i})}. \end{aligned}$$
(1)

We give a simple illustration of this algorithm in Fig. 2.

Fig. 2
figure 2

Source: Modified from Silva et al. (2021)

Illustrative example of the two visibility graph algorithms. a Toy time series and corresponding visibility lines between data bars. Solid pink lines represent the natural visibility lines corresponding to the NVG method, and dashed blue lines represent the horizontal visibility lines corresponding to the HVG method. b Network generated by the corresponding mappings. The NVG is the graph with all edges, including the edges represented by the dashed pink lines, and the HVG is the subgraph that does not include these edges

NVGs are always connected, each node \(v_i\) sees at least its neighbors \(v_{i-1}\) and \(v_{i+1},\) and are always undirected unless we consider the direction of the time axis (Silva et al. 2021). The network is also invariant under affine transformations of the data (Lacasa et al. 2008) because the visibility criterion is invariant under rescheduling of both the horizontal and vertical axis, as well as in vector translations, that is, each transformation \(\varvec{Y}'= a\varvec{Y} + b,\) for \(a \in \mathbb {R}\) and \(b \in \mathbb {R},\) leads to the same NVGs (Silva et al. 2021).

Eventual sensitivity of NVGs to noise is attenuated by assigning a weight to the edges. Define \( w_{i,j} = 1 / \sqrt{(t_j-t_i)^2 + (Y_j - Y_i)^2}\) the weight associated with the edge \((v_i, v_j)\) (Bianchi et al. 2017). This weight is related to the Euclidean distance between the points \((t_i, Y_i)\) and \((t_j, Y_j).\) Thus, the resulting network from weighted NVG (WNVG) method is a weighted and undirected graph.

The Horizontal Visibility Graph (HVG) (Luque et al. 2009) is a simplification of the NVG method whose construction differs in the visibility definition: the visibility lines are only horizontal (see Fig. 2). Two nodes \(v_{i}\) and \(v_{j}\) are connected if, for all \((t_{k},Y_{k})\) with \(t_{i}< t_{k} < t_{j}\), the following condition is met:

$$\begin{aligned} Y_{i}, Y_{j} > Y_{k}. \end{aligned}$$
(2)

Given a time series, its HVG is always a subgraph of its NVG. This is illustrated in Fig. 2b where all edges present in the HVG are also present in the NVG, but the converse is not true, the edges represented by dashed pink lines. HVG nodes will always have a degree less than or equal to that of the corresponding NVG nodes. Therefore, there is some loss of quantitative information in HVG in comparison with NVG (Luque et al. 2009). However, in terms of qualitative characteristics, the graphs preserve part of the data information, namely, the local information (the closest time stamps) (Silva et al. 2021).

In a similar way to WNVG, we can assign weights to the edges of the HVG, \(w_{i,j} = 1 / \sqrt{(t_j-t_i)^2 + (Y_j - Y_i)^2}\), resulting in a weighted HVG (WHVG).

2.3.2 Quantile graphs

Quantile Graph (QG) (Campanharo et al. 2011) are obtained from a mapping based on transition probabilities. The method consists in assigning the time series observations to bins that are defined by \(\eta \) sample quantiles, \(q_{1}, q_{2}, \ldots , q_{\eta }\). Each sample quantile, \(q_{i}\), is mapped to a node \(v_{i}\) of the graph and the edges between two nodes \(v_{i}\) and \(v_{j}\) are directed and weighted, \((v_{i}, v_{j}, w_{i,j})\), where \(w_{i,j}\) represents the transition probability between quantile ranges. The adjacency matrix is a Markov transition matrix: \(\sum _{j=1}^\eta w_{i,j} = 1\), for each \(i = 1, \ldots , \eta ,\) and the network is weighted, directed and contains self-loops.Footnote 2 We illustrate this mapping method in Fig. 3.

Fig. 3
figure 3

Source: Reproduced from Silva et al. (2021)

Illustrative example of the quantile graph algorithm for \(\eta = 4\). a Toy time series with coloured regions representing the different \(\eta \) sample quantiles; b network generated by the QG algorithm. Repeated transitions between quantiles result in edges with larger weights represented by thicker lines

The number of quantiles is usually much less than the length of the time series (\(\eta \ll T\)). If \(\eta \) is too large the resulting graph may not be connected, having isolated nodes,Footnote 3 and if it is too small the QG may present a significant loss of information, represented by large weights assigned to self-loops. The causal relationships contained in the process dynamics are captured by the connectivity of the QG.

2.4 Complex networks topological measures

Complex networks have specific topological features which characterize the connectivity between their nodes and, consequently, are somehow reflected in the measurement processes (Costa et al. 2007). There is a wide range of network topology measures capable of expressing the most relevant features of a network. They include global network measurements, which measure global properties involving all elements of the network, node-level or edge-level measurements, which measure a given feature corresponding to the nodes or edges, and “intermediate” measurements, which measure features of subgraphs in the network.

Properties of centrality, distance, community detection and connectivity are central to understanding features of network structures. Centrality measures aim to quantify the importance of nodes and edges in the network depending on their connection topology. Path-based measures refer to sequences of edges that connect pairs of nodes, depend on the overall network structure and are useful for measuring network efficiency and information propagation capability. Communities and node connectivity are also very relevant features of networks, which measure how and which groups of nodes are better connected, measuring the clustering and resilience of the network.

In this work we propose to study the average weighted degree, \(\bar{k}\), average path length, \(\bar{d}\), global clustering coefficient, C, number of communities, S, and modularity, Q, representing global measures of the features described above.

The degree, \(k_{i}\), of a node \(v_{i}\) represents the number of edges of \(v_{i}\). It is a fairly important property that shows the intensity of connectivity in the node neighbourhood. In directed graphs we distinguish between in-degree, \(k_{i}^{in}\), the number of edges that point to \(v_{i}\), and out-degree, \(k_{i}^{out}\), the number of edges that point from \(v_{i}\) to other nodes. The total degree is given by \(k_{i} = k_{i}^{in} + k_{i}^{out}\). For weighted graphs, we can calculate the weighted degree by adding the edge weights instead of the number of edges. Average path length, \(\bar{d}\), is the arithmetic mean of the shortest paths, \(d_{i,j}\), among all pairs of nodes, where the path length is the number of edges, or the sum of the edges weights for weighted graphs, in the path. It is a good measure of the efficiency of information flow on a network. The global clustering coefficient, C, also known as transitivity, is a measure which captures the degree to which the nodes of a graph tend to cluster, that is, the probability that two nodes connected to a given node are also connected. In this work, we refer to network communities as the grouping of nodes (potentially overlapping) that are densely connected internally and that can also be considered as a group of nodes that share common or similar characteristics. The number of communities, S, is the amount of these groups on the network. The modularity, Q, measures how good a specific division of the graph is into communities, that is, how different are the different nodes belonging to different communities. A high value of Q indicates a graph with a dense internal community structure and sparse connections between nodes of different communities.

3 NetF: a novel set of time series features

Over the last decades, several techniques for extracting time series features have been developed, see (Barandas et al. 2020; Christ et al. 2018; Fulcher and Jones 2017; Fulcher et al. 2013; Hyndman et al. 2020; Lubba et al. 2019; O’Hara-Wild et al. 2021) for more details. Most of these techniques have in common the definition of a finite set of statistical features, such as autocorrelation, existence of unit roots, periodicity, nonlinearity, volatility among others, to capture the global nature of the time series.

In this work we introduce NetF as an alternative set of features. Our approach differs from those previously mentioned in that we leverage the usage of different complex network mappings to offer a set of time series features based on the topology of those networks. One of the main advantages of this approach comes from the fact that the mapping methods (Sect. 2.3) do not require typical time series preprocessing tasks, such as decomposing, differencing or whitening. Moreover, our methodology is applicable to any time series, regardless of its characteristics.

3.1 The 15 features of NetF

NetF is constituted by 15 different features, as depicted in Fig. 4. These features correspond to the concatenation of five different topological measures, as explained in Sect. 2.4 (\(\bar{k}\), the average weighted degree; \(\bar{d}\), the average path length; C, the clustering coefficient; S, the number of communities; Q, the modularity), each of them applied to three different mappings of the time series, as explained in Sect. 2.3 (WNVG, the weighted natural visibility graph; WHVG, the weighted horizontal visibility graph; QG, the quantile graph).

Fig. 4
figure 4

Schematic diagram of the NetF set of features. A time series \(\mathbf {Y}\) is mapped into three complex networks (WNVG, WHVG and QG) and for each of these networks five topological measures are taken (\(\bar{k}\), \(\bar{d}\), CS and Q), resulting in the NetF vector containing 15 features

Our main goal is to provide a varied set of representative features that expose different properties captured by the topology of the mapped networks, providing a rich characterization of the underlying time series.

The topological features themselves were selected so that they represent global measures of centrality, distance, community detection and connectivity, while still being accessible, easy to compute and widely used in the network analysis domain.

3.2 Implementation details

Conceptually, NetF does not depend on the actual details of how it is computed. Nevertheless, with the intention of both showing the practicality of our approach, as well as providing the reader with the ability to reproduce our results, we now describe in detail how we computed the NetF set of features in the context of this work.

To compute the WNVGs we implement the divide & conquer algorithm proposed in Lan et al. (2015) and for the WHVGs the algorithm proposed in Luque et al. (2009).Footnote 4 To both we added the weighted version mentioned in Sect. 2.3.1, adding the respective weights to the edges. For the QGs we chose \( \eta = 50 \) quantiles, as in Campanharo and Ramos (2016), and we implemented the method described in Sect. 2.3.2 to create the nodes and edges of the networks. We used the sample quantile method, which uses a scheme of linear interpolation of the empirical distribution function (Hyndman and Fan 1996), to calculate the sample quantiles (nodes) in support of the time series. To save the network structure as a graph structure, we used the igraph (Csardi and Nepusz 2006) package which also allows us to calculate the topological measures. Next, we briefly describe the methods and algorithms used by the functions we used to calculate the measures.

The average weighted degree (\(\bar{k}\)) is calculated by the arithmetic mean of the weighted degrees \(k_i\) of all nodes in the graph. In this work, the average path length (\(\bar{d}\)) follows an algorithm that does not consider edge weights, and use the breadth-first search algorithm to calculate the shortest paths \(d_{i,j}\) between all pairs of vertices, both ways for directed graphs. For calculate the clustering coefficient (C), the function that we use in this work ignores the edge direction for directed graphs. For this reason, before we calculate C for QGs, which are directed graphs, we first transform them into an undirected graph, where for each pair of nodes which are connected with at least one directed edge the edge is converted to an undirected edge. And then, the C is calculated by the ratio of the total number of closed trianglesFootnote 5 in the graph to the number of triplets.Footnote 6 The function we use in this work to calculate the number of communities (S) in a network, calculates densely connected subgraphs via random walks, such that short random walks tend to stay in the same community. See the Walktrap community finding algorithm (Pons and Latapy 2005) for more details. And to calculate the modularity (Q) of a graph in relation to some division of nodes into communities we measure how separated are the nodes belonging to the different communities are as follows:

$$\begin{aligned} Q = \frac{1}{2|E|} \sum _{i,j} \left[ w_{i,j} - \frac{k_i k_j}{2|E|} \right] \delta \left( c_i, c_j \right) , \end{aligned}$$

where |E| is the number of edges, \(c_i\) and \(c_j\) the communities of \(v_i\) and \(v_j\), respectively, and \(\delta (c_i, c_j) = 1\) if \(v_i\) and \(v_j\) belong to the same community (\(c_i = c_j\)) and \(\delta (c_i, c_j) = 0\) otherwise. We performed all implementations and computations in R (R Core Team 2020), version 4.0.3 and a set of packages.

For reproducibility purposes, the source code and the datasets are made available in https://github.com/vanessa-silva/NetF.

3.3 Empirical evaluation

In this section we investigate, via synthetic data sets, whether the set of features introduced above are useful for characterizing time series data.

To this end, we consider a set of eleven linear and nonlinear time series models, denoted by Data Generating Processes (DGP), which present a wide range of characteristics summarized in Table 1. A detailed description of the DGP and computational details are given in Appendix A. For each of the DGP’s in Table 1 we generated 100 realizations of length \(T = 10000.\) Following the steps presented in Fig. 1, we map each realization into three networks and extract the corresponding topological measures. The resulting time series features, organized by mapping, are summarized, mean and standard deviation, in Tables 5 to 7. Note that the values have been Min-Max normalized for comparison purposes since the range of the different features vary across models.

Table 1 Summary about the data generating process (time series models) of the synthetic data
Fig. 5
figure 5

Plot of one instance of each simulated time series model

WNVGs (Table 5) present lowest values for the clustering coefficient (C) for ARIMA models. Models producing time series with more than one state (HMM and SETAR) present lower average weighted degree but higher number of communities (S). The later values are comparable to those for AR(2) time series, fact that can be explained by the pseudo-periodic nature of the particular AR(2) model entertained here. WHVGs (Table 6) present average weighted degree \((\bar{k})\) approximately 0 for HMM’s and approximately 1 for GARCH and EGARCH. This indicates that HMM time series have, on average, horizontal visibility for more distant points (in time and/or value), while the opposite is true for heteroskedastic time series. The clustering coefficient (C) is lowest (approximately 0) for networks obtained from INAR time series, indicating that most points have visibility only for their two closest neighbors. QGs (Table 7) present high values of average path length, \((\bar{d}),\) for ARIMA, contrasting with all other DGP which present low values. On the other hand, the (C) for ARIMA presents low values while all other DGP’s present high values.

Fig. 6
figure 6

Bi-plot of the first two PC’s for the synthetic data set. Each Data Generating Process (DGP) is represented by a color and the arrows represent the contribution of the corresponding feature to the PC’s: the larger the size, the sharper the color and the closer to the red the greater the contribution of the feature. Features grouped together are positively correlated while those placed on opposite quadrants are negatively correlated

The next step is to study the feature space to understand which network features capture specific properties of the time series models. Figure 6 represents a bi-plot obtained using the 15 features (5 for each mapping method) and with the two PC’s explaining 68.8% of the variance. It is noteworthy that the eleven groups of time series models are clearly identified and arranged in the bi-plot according to their main characteristics. Overall, we can say that the number of communities of VGs, S,  are positively correlated among themselves and are negatively correlated with the average weighted degree, \(\bar{k},\) of NetF. The average path length, \(\bar{d},\) of WHVGs and QGs and the clustering coefficient, C,  of WHVG are positively correlated, but negatively to the \(\bar{d}, C\) and Q of WNVG, Q of WHVG and C of QGs. The features that most contribute to the total dimensions formed by the PCA are: \(\bar{k}, S, Q\) and \(\bar{d}\) of the QGs, \(\bar{k}\) of the WNVGs, and \(\bar{k}, S\) and Q of the WHVGs (see Fig. 10).

The (stochastic) trend of the ARIMA, in fact the only non-stationary DGP this data set, is represented by high average path lengths, \(\bar{d}, \) in WHVG and QG. Discrete states in the data, HMM,SETAR,INAR, are associated with the number of communities, S. The bi-plot further indicates that height average weighted degree, \(\bar{k},\) mainly that of the WHVG, represents heteroskedasticity in the time series, e.g., GARCH and EGARCH. Cycles, AR(2), are captured by the clustering coefficient, C.

We also did an empirical study of the NetF features in the context of clustering these synthetic data sets, and the results show that using the entire feature set leads to better performance than any possible subset. This showcases how the different features complement each other and how they capture different characteristics of the underlying time series. The details of this study can be seen in Appendix C.

4 Mining time series with NetF

In this section we illustrate the usefulness of complex networks based time series features in data mining tasks with a case study regarding time series clustering via feature-based approach (Maharaj et al. 2019). Within this mining task we analyse the synthetic data set introduced in Sect. 3.3, benchmark empirical data sets from UEA & UCR Time Series Classification Repository (Bagnall et al.), the M3 competition data from package Mcomp (Hyndman 2018), the set “18Pairs” from package TSclust (Montero and Vilar 2014) and a new data set regarding the production of several crops across Brazil (Instituto Brasileiro de Geografia e Estatística) using NetF and two other sets of time series features, namely catch22 (Lubba et al. 2019) and tsfeatures (Wang et al. 2006), see Appendix D.

4.1 Clustering methodology

The overall procedure proposed here for feature-based clustering is represented in Fig. 7.

Fig. 7
figure 7

Schematic diagram for the time series clustering analysis procedure

Given a set of time series, compute the feature vectors which are then Min-Max rescaled into the [0, 1] interval and organized in a feature data matrix. Principal Components (PC) are computed (no need of z-score normalization within PCA) and finally a clustering algorithm is applied to the PC’s. Among several algorithms available for clustering analysis, we opt for k-means (Hartigan and Wong 1979) since it is fast and widely used for clustering. Its main disadvantage is the need to pre-define the number of clusters. This issue will be discussed within each data set example. The clustering results are assessed using appropriate evaluation metrics: Average Silhouette (AS); Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) when the ground truth is available.

4.2 Data sets and experimental setup

We report the detailed results for the clustering exercise for the eleven data sets summarized in Table 2. The brief description of the data and clustering results for the remaining data sets is presented in Table 10.

The data sets in Table 2 belong to the UEA & UCR Time Series Classification Repository (Bagnall et al.), widely used in classification tasks, the M3 competition data from package Mcomp (Hyndman 2018) used for testing the performance of forecasting algorithms, the set “18Pairs”, extracted from package TSclust (Montero and Vilar 2014) which represents pairs of time series of different domains. For all these we have true clusters and therefore clustering assessment measures ARI and NMI may be used. Additionally, we also analyse a set of observations comprising the production over forty three years of nine agriculture products in 108 meso-regions of Brazil (Instituto Brasileiro de Geografia e Estatística). We note that the size of the ElectricDevices dataset, 16575 time series, is different from the total available in the repository, as exactly 62 time series return missing values for the entropy feature of the tsfeatures set (see Appendix D) and so we decided to exclude these series from our analysis.

Table 2 Brief description of the empirical time series data sets

4.3 Results

First, we investigate the performance of NetF, catch22 and tsfeatures in the automatic determination of the number of clusters k,  using the clustering evaluation metrics, ARI, NMI and AS. The results (see Table 9) show overall similar values but we note that NetF seems to provide a value of k equal to or closer to the ground truth value (when available) more often. For the Production in Brazil data, for which there is no ground truth, values for k are obtained averaging 10 repetitions of the clustering procedure and using the silhouette method. The results of the 10 repetitions are represented in Fig. 12 and summarized in Table 4.

Next, fixing k to the ground truth we perform the clustering procedure. The clustering evaluation metrics, mean over 10 repetitions, are presented in Table 3.Footnote 7

The results indicate that none of the three approaches performs uniformly better than the others. Some interesting comments follow. For the synthetic data sets and 18Pairs, tsfeatures and NetF perform better than catch22 in all evaluation criteria. The clusters for ECG5000, ElectricDevices and UWaveGestureLibraryAll data sets produced by the three approaches fare equally well when assessed by ARI, NMI and AS. The same is true for M3 data and Cricket_X data sets, with slightly lower results. NetF approach seems produce better clusters for CinC_ECG_torso measured according to the three criteria, the tsfeatures seems to produce better clusters for FordA, and the catch22 for FaceAll and InsectWingbeat measured according to the ARI and NMI.

Analyzing the overall results, Tables 3 and 10, we can state that tsfeatures and NetF approaches present the best ARI and NMI evaluation metrics, while tsfeatures achieves by far the best results in the AS. If we consider the UEA & UCR repository classification of the data sets, we note the following: the NetF approach presents good results for time series data of the type Image (BeetleFly, FaceFour, MixedShapesRegularTrain, OSULeaf and Symbols), ECG (CinC_ECG_tors and TwoLeadECG) and Sensor (Wafer); the tsfeatures performs best for types Simulated (BME, UMD and TwoPatterns), ECG (NonInvasiveFetalECGTho), Image (ShapesAll) and Sensor (SonyAIBORobotSurface and Trace); finally the catch22 approach presents best results for Spectro (Coffee), Device (HouseTwenty) and Simulated (ShapeletSim) types. In summary NetF and tsfeatures perform better in data with the same characteristics while catch22 seems to be more appropriate to capture other characteristics.

Table 3 Clustering evaluation metrics obtained for the three approaches NetF, tsfeatures and catch22

Regarding the data set Production in Brazil, Table 4 shows more detail on the clustering results, adding the value k to indicate the number of clusters that was automatically computed. We note that the 4 clusters obtained with NetF correspond to 4 types of goods: eggs; energy; gasoline and cattle; hypermarkets, textile, furniture, vehicles and food. Attribution plots of the clusters obtained by the three approaches are represented in Fig. 8. Note that both tsfeature and catch22 put eggs and textile production in the same cluster, and tsfeature cannot distinguish energy. Notice also how the NetF approach produced the cluster with highest AS and hence the highest intra-cluster-similarity. To illustrate the relevance of the results, Fig. 9 depicts a representative time series for each cluster.

Table 4 Clustering evaluation metrics for the different clustering analysis on Production in Brazil data set based on NetF, tsfeatures and catch22 approaches
Fig. 8
figure 8

Attribution of the Production in Brazil time series to the different clusters, according to each of the feature approaches. The different productions are represented on the horizontal axis and by a unique color. The time series are represented by the colored points according to its production type. The vertical axis represents the cluster number to which a time series is assigned

Fig. 9
figure 9

Production in Brazil representative of each cluster (indicated in subtitle) obtained using the proposed approach, NetF

5 Conclusions

In this paper we introduce NetF, a novel set of 15 time series features, and we explore its ability to characterize time series data. Our methodology relies on mapping the time series into complex networks using three different mapping methods: natural and horizontal visibility and quantile graphs (based on transition probabilities). We then extract five topological measures for each mapped network, concatenating them into a single time series feature vector, and we describe in detail how we can do this in practice.

To better understand the potential of our approach, we first perform an empirical evaluation on a synthetic data set of 3300 networks, grouped in 11 different and specific time series models. Analysing the weighted visibility (natural and horizontal) and quantile graphs feature space provided by NetF, we were able to identify sets of features that distinguish non-stationary from stationary time series, counting from real-valued time series, periodic from non-periodic time series, state time series from non-state time series and heteroskedastic time series. The non-stationarity time series have high values of average path length and low values of clustering coefficients in their QGs, and the opposite happens for the stationary time series. Counting series have lowest value of average weighted degree, highest value of number of communities in their QGs and lowest value of clustering coefficient in WHVGs, while the opposite happens for non-counting time series. For state time series the average weighted degree value in their weighted VGs is the lowest and the number of communities is high, the opposite happens for the non-state time series. Heteroskedastic time series are identified with high average weighted degree values of their WHVGs, compared to the other DGP’s.

To further showcase the applicability of NetF, we use its feature set for clustering both the previously mentioned synthetic data, as well as a large set of benchmark empirical time series data sets. The results for the data sets in which ground truth is available indicate that NetF yields the highest mean for ARI (0.287) compared to alternative time series features, namely tsfeature and catch22, with means of 0.267 and 0.228, respectively. For the NMI metric the results are similar (0.395, 0.397 and 0.358, respectively) and for AS the highest mean was found for tsfeature, 0.332 versus approximately 0.3 for the others. However, the higher values for ASFootnote 8 must be viewed in light of the low values of ARI and NMI which indicate an imperfect formation of the clusters. For the production data in Brazil, for which no ground truth is available, NetF produces clusters which group production series with different characteristics, namely, time series of counts, marked upward trend, series in the same range of values, and with seasonal component.

The results show that NetF is capable of capturing information regarding specific properties of time series data. NetF is also capable of grouping time series of different domains, such as data from ECGs, image and sensors, as well as identifying different characteristics of the time series using different mapping concepts, which stand out in different topological features. The general characteristics of the data, namely, the size of the data set, the length of the time series and the number of clusters, do not seem to be influencing the results obtained. Also, NetF does not require typical time series preprocessing tasks, such as decomposing, differencing or whitening. Moreover, our methodology is applicable to any time series, regardless of the nature of the data.

The mappings and topological network measures considered are global, but it is important to clarify that they do not constitute a “universal” solution. In particular, we found that the weighted versions of the visibility graph mappings used here produce better results than their unweighted versions, as we can see in previous works (Silva 2018). In fact, formulating a set of general features capable of fully characterizing a time series without knowing both the time series properties and the intended analysis is a difficult and challenging task (Kang et al. 2020).

For related future work, we intend to add and explore new sets of topological measures both, adding local and intermediate features to NetF, as well as exploring other mapping methods (such as proximity graphs). We also intend to extend our approach to the multivariate case, since obtaining useful features for multidimensional time series analysis is still an open problem.