Optimization of mixture models on time series networks encoded by visibility graphs: an analysis of the US electricity market

We propose a fully unsupervised network-based methodology for estimating Gaussian Mixture Models on financial time series by maximum likelihood using the Expectation-Maximization algorithm. Visibility graph-structured information of observed data is used to initialize the algorithm. The proposed methodology is applied to the US wholesale electricity market. We will demonstrate that encoding time series through Visibility Graphs allows us to capture the behavior of the time series and the nonlinear interactions between observations well. The results reveal that the proposed methodology outperforms more established approaches.


Introduction
Models based on complex networks have been extensively introduced in the literature to provide new insights into a wide range of natural and social phenomena (Vespignani 2018;Xie et al. 2021;Laengle et al. 2021).In particular, a plethora of network science-based methods have been published over the past decade to analyze the dynamic behavior of time series with the aim of understanding Carlo Mari and Cristiano Baldassari contributed equally to this work.their fine and granular structure (Newman 2003(Newman , 2010)).Indeed, converting a time series into a network increases the quality of the analysis, leading to the identification of non-trivial topological characteristics (Yang and Yang 2008;da Fontoura Costa et al. 2005) and providing fresh perspectives and ideas for investigating the probabilistic structure of time series (Zou et al. 2019;Silva et al. 2021).In this regard, network-based methods can be very useful in calibrating realistic models of time series dynamics on market data such as Gaussian Mixture Models (GMMs) (Mari and Baldassari 2022).The estimation of GMMs can be performed by maximum likelihood by using the Expectation-Maximization (EM) algorithm (Dempster et al. 1977).However, the existence of several solutions that are locally optimal requires the choice of appropriate initialization values of the parameters to obtain accurate results (Hipp and Bauer 2006;Shireman et al. 2017).Clustering methods are typically used in the literature to solve the EM initialization problem.Techniques based on K-means (Steinley and Brusco 2011) and Random algorithms (Biernacki et al. 2003) have recently garnered interest.Both of these approaches require the exogenous setting of the number of mixture components and, due to the large number of local solutions, the estimate technique must be re-initialized several times before a solution can be found.No method can determine how many initializations are needed to fully explore the likelihood function (Shireman et al. 2017).
Recently, a fully unsupervised framework based on mapping the time series into a complex network (or graph) has been proposed in the literature to determine the ideal number of mixture model components and the vector of initial parameters with the aim of improving the efficiency of the EM algorithm (Mari and Baldassari 2022).By using a Graph Approximation framework, hereinafter GA-framework, the suggested methodology effectively addresses the local-optima issue and provides reliable GMM parameter estimation.
Graph approximation techniques suffer from a loss of information from the observed time series that can affect the goodness of the estimation procedure.To overcome this difficulty, we propose a novel approach in this paper based on a different mapping scheme that can significantly reduce information loss and improve the estimation procedure.It is based on graph embedding techniques (Perozzi et al. 2014), and for we will therefore call it the Graph Embedding framework, hereinafter GE-framework.The relevance of the GE-framework over the GA-framework is due to the embedding techniques that enables us to work directly on a not reduced graph.In fact, graph embedding techniques convert the graph into a multidimensional Euclidean space using a data-driven approach and can significantly reduce information loss while maintaining the structural features of the graph (Cai et al. 2021).The time series encoding is performed using Visibility Graphs (Silva et al. 2021).These encoding techniques are based on the traditional visibility algorithms of computational geometry (Ghosh 2007), and allow univariate time series to be mapped into complex networks, thus focusing on the underlying structural features of the time series (Lacasa et al. 2008).We found them to be particularly suitable for identifying relevant features of the observed temporal dynamics.The proper combination of unsupervised techniques used to perform the initialization of the EM algorithm in the GE-framework is the main innovative aspect of the proposed methodology.
The performance of the GA and GE frameworks will be compared through GMM estimation on the time series of financial log-returns, which are calculated as the difference in log-prices between two subsequent observations.In fact, log-return time series display better statistical behavior than market price time series that are generally non-stationary (Voit 2013).In particular, we tested both the GA and GE frameworks in the US wholesale electricity market by analyzing the behavior of daily log-returns observed at Palo Verde (Southwest area), PJM (Northeast region), SP15 (Southern California) and Nepool (New England).
Due to its conservative pipeline (less information loss in the Graph Compression phase), superior data-driven and unsupervised Graph Machine Learning approaches, the GE-framework appears to be a powerful and flexible tool for analyzing the dynamics of time series.Although the GA-framework is able to outperform the K-means and Random-based initialization methods (Mari and Baldassari 2022), in this paper we will demonstrate that the GE-framework produces even better results.In fact, we will prove that the best-fitting model in the GE-framework is several times (from 2.14 for SP15 to 250.89 for PJM) more likely to be the best model than the best-fitting model in the GA-framework, and that, compared with the K-means approach, the results are far better in each market under consideration.
The rest of the present paper is structured as follows.The GE-framework is detailed in Sect. 2. Section 3 focuses on the empirical analysis.Some concluding remarks are provided in Sect. 4. The code needed to reproduce our results can be accessed on Github at https:// bit.ly/ 3Hz1D ky.

The GE-framework
This section illustrates the Graph Machine Learning approach proposed in this paper to estimate GMMs by maximum likelihood by outlining the methods involved in the GE-framework in detail and highlighting the novelty of our study.

An overview of the methodology
Figure 1 shows a unified diagram of the entire methodology.
Three sections can be identified in the workflow: Input, Distribution Estimation and Output.The input consists of a preprocessed time series of log-prices.Very often, in fact, time series feature irregular sampling (lack of daily data points) as well as trend and seasonality.For these reasons, preprocessing, which consists of a gap-filling procedure and a seasonal-trend removal, is carried out first.Then, the preprocessed log-price time series enters the initialization blocks (represented in red in Fig. 1).Through a three-step sequence, namely Graph Encoding, Graph Compression and Graph Partitioning, the number of GMM components and the vector of initialization parameters are determined in a completely unsupervised manner.First, the graph associated with the preprocessed log-price time series is created in the graph encoding step; then, the complexity of the graph is appropriately reduced by compressing the graph in the graph compression step; finally the log-price communities and their associated log-return communities are discovered in the graph partitioning step.Thus, the EM algorithm can be the initialized in a completely unsupervised way.In fact, the number of GMM components is set equal to the number of discovered communities, and the initializing parameters of the EM algorithm can be computed by log-return community membership (Mari and Baldassari 2022).The GMM parameters are estimated in the output section.
The main innovative aspect of the proposed approach consists in the proper combination of unsupervised techniques used to perform the initialization of the EM algorithm.First, the graph encoding step is performed using Visibility Graphs (Silva et al. 2021).Second, the graph compression is performed by using graph embedding techniques belonging to the four classes of the Karate Club framework (Rozemberczki et al. 2020).We used 12 different embedding techniques to identify the one that best fits each market.Third, the community detection task is performed in the Graph Partitioning step directly on the network embedding, using a clustering approach based on Topological Data Analysis (TDA) (Skrlj et al. 2020).Embedding techniques and TDA methods improve the accuracy and efficiency of feature extraction (Cai et al. 2021).

Visibility graphs
Visibility graph mappings have been proposed in the literature as transformations from time series to complex networks based on the conventional visibility techniques of computational geometry.In the empirical analysis, we will use Natural Visibility algorithms (Lacasa et al. 2008) and Horizontal Visibility algorithms (Luque et al. 2009).These algorithms are easy to use, computationally fast and parameter-free.
The concept behind the Natural Visibility Graph (NVG) is that each observation in the time series is associated with a node in the graph and represented as a vertical bar with a height equal to the numerical value of the observation.The nodes in the graph are serially ordered since each one corresponds to a time stamp t of the time series.Two nodes are connected if a line of visibility that is not interrupted by the bar of a node between the two exists between the corresponding data bars.Stated mathematically, two nodes, v i and v j , are connected (have visibility) if any additional observation x k with i < k < j satisfies the following NVG algorithm, The NVG algorithm is simple to use and has a quadratic computational complexity O(n 2 ) , making it rather slow for very long time series.However, a more efficient algorithm based on the divide et conquer technique (Lanet al. 2015) can be used and, in this case, the computational complexity is reduced to O(n log n) .Since vis- ibility graphs are always undirected, and each node v i can see at least two of its neighbors, namely v i−1 and v i+1 , they are always connected.NVGs take on many of the structural characteristics of time series and, for this reason, seem particularly suited to our study.
A simpler technique, known as the Horizontal Visibility Graph (HVG), has been presented in the literature to reduce the computing complexity associated with NVGs.The difference between the construction of NVGs and HVGs is that in the latter case the visibility lines are only horizontal.Stated mathematically, two nodes v i and v j are connected if the following condition is met, for all x k such that i < k < j .In terms of computational complexity, the building of HVGs has a computational complexity of O(n log n) (Yela et al. 2020).For a specific time series, the HVG is always a subgraph of the NVG, but the converse is not true.As a result, some quantitative information is lost in HVGs when compared to NVGs.

Graph embedding
Network embedding methods have made significant contributions to the use of Machine Learning (ML) in network science (Cui et al. 2018).Based on unsupervised feature extraction techniques from graph data, these methods yield information that can be used as input for link prediction, node and graph classification, and community detection.Therefore, they seem to be particularly suitable for our purposes.In particular, node embeddings map graph vertices to vectors in a Euclidean space, and the application of the Euclidean format, instead of the native graph, facilitates the use of conventional ML tools.In this paper we will adopt graph embedding methods belonging to the four classes of the Karate Club framework (Rozemberczki et al. 2020).To ensure generality and completeness, we will use one method chosen from each class: (i) Diff2Vec as a neighborhood preserving embedding; (ii) GraphWave as a structural embedding; (iii) Attributed Social Network Embedding (ASNE); (iv) Network Embedding Update (NEU) as a meta-embedding method.While neighborhood preserving embeddings preserve the closeness of graph nodes, structural embeddings preserve the structural roles of nodes in the embedding space, and attributed embeddings maintain the neighborhood, the structure and the generic attribute similarity of nodes.Meta-embeddings are designed to produce embeddings with a higher representation quality. (1) (2)

ToMATo clustering
Since the graph embedding step generates a high dimensional Euclidean representation of the graph, we chose a TDA-based approach, which is an emerging field of research, with the goal of providing mathematical and algorithmic tools for understanding the topological and geometric structure of data, that is particularly suitable for high-dimensional data (Zomorodian and Carlsson 2005).The community discovery process is performed directly on the graph embedding using the Topological Mode Analysis Tool (ToMATo) (Skrlj et al. 2020), an unsupervised TDA tool that allows us to identify clusters that are stable under small perturbations of the input (Chazal et al. 2013).

The GMM estimation method
Consider a possibly preprocessed log-price time series {x t } N t=1 and its log-return transform {r t } N−1 t=1 , where Let us suppose that the values, r i , i = 1, 2, ⋯ , N − 1 , are extracted from a sequence of independent and identically distributed (iid) random variables with a probability density p(x).We assume that p(x) is a finite mixture distribution with C components, In Eq. ( 4), z i = {z i1 , z i2 ⋯ , z iC } with i = 1, 2, ⋯ , N − 1 , is a vector of C unobserv- able latent binary random variables that are mutually exclusive and exhaustive, i.e., one and only one of the z ic is equal to 1, while the others are equal to 0. The vector z i plays the role of an indicator random variable representing the identity of the mix- ture component responsible for the generation of the outcome denotes the density distributions of the mixture components with parameters c ; finally, 1 , 2 , ⋯ , C are the mixture weights, i.e., positive numbers such that representing the probability that the value x i was generated by the compo- nent c.The complete set of the mixture model parameters is denoted by In the case of univariate Gaussian mixture models, the components of the model are described by univariate Gaussian densities, (3) with parameters c = { c , 2 c } denoting, respectively, the mean and the variance of the single component distribution.
Mixture models can be efficiently estimated by maximum likelihood using the EM algorithm (Dempster et al. 1977).EM is an iterative procedure that begins with an initial estimate of parameters Θ = Θ 0 and then iteratively updates Θ until conver- gence is achieved (Smyth 2017).
The initial parameter vector, Θ 0 , and the number of mixture components, C, are determined by performing the various processes involved in the three initialization blocks depicted in Fig. 1.The specifications of the methods used in the three steps, namely Graph Encoding, Graph Compression and Graph Partitioning, are depicted in Fig. 2 for both the GA and GE frameworks.
In the GA-framework (Mari and Baldassari 2022) the preprocessed log-price time series is encoded into a Quantile Graph.The graph encoding is carried out through the application of a Markov Transition Field (MTF) used as adjacency matrix (Campanharo et al. 2011).Then, the graph undergoes a compression phase through a coarsening process.The purpose of the graph coarsening is to find a smaller graph that well approximates the graph.In our case, the coarsening is performed by reducing the size of the MTF matrix averaging on its elements.The log-return community detection in the graph partitioning task is then performed by applying the Louvain method to the coarsened graph (Blondel et al. 2008).A comparative analysis of community detection algorithms reveals that, taking both accuracy and computational time into account, the Louvain algorithm outperforms other viable alternatives (Lancichinetti and Fortunato 2009).In general, communities can be classified as overlapping or non-overlapping, depending on whether a node can belong to several communities or only one.We adopted the latter option because the Louvain method for community detection is a non-overlapping method, which means that each node in the network can only belong to one community.In the Louvain algorithm, nodes are iteratively assigned to communities based on a modularity measure that indicates the quality of community structure in a network.The algorithm aims to maximize the modularity by moving nodes between communities until no further improvement can be achieved.
We recall that in the GE-framework the preprocessed log-price time series is encoded into a visibility graph.Then, the graph undergoes a compression phase through an embedding process.Graph embedding is an unsupervised Graph Machine Learning technique that computes a representation Euclidean vector for each node in a graph.Community detection is performed through clustering techniques based on TDA applied to the embedded multi-dimensional Euclidean space.For comparative purposes, the non-overlapping communities option was also used in the GE-framework.
In both frameworks, the partitioning step serves as a feature extraction procedure for the distribution estimation to automatically generate the number of mixture components, C, and the initialization parameter vector, Θ 0 .In fact, since each log-return value, r i = x i+1 − x i , can be uniquely associated to the value x i , the detected com- munities can also be viewed as log-return communities.The number of the GMM components, C, is set equal to the number of the detected communities, and the initialization parameters that are contained in the vector Θ 0 , are computed through community membership by using the following set of equations, where w ic = 1 if the observation r i belongs to the cluster c, w ic = 0 otherwise, and N c = ∑ N i=1 w i represents the effective number of data points assigned to component c.We notice that both the mean and the variance are computed similarly to how standard empirical average and variance are computed, but with a weight w ic .
Once initialized, the EM algorithm first computes the parameter values that maximizes the log-likelihood and determines the new membership weights accordingly, thus providing an overlay of the initial communities.The iterative procedure continues until convergence is achieved.

The experiment
In this section, we perform an experiment to evaluate the performance of GA and GE frameworks on US wholesale electricity prices.Our data collection consists of daily prices observed during the time interval from January 1, 2017 to December 31, 2021.Time series data can be freely downloaded from www.eia.gov/electricity/wholesale.Observed time series are illustrated in Fig. 3.
Volatility and extreme unpredictability characterize market price movements, which are frequently accompanied by sharp spikes and jumps generated by changes in the supply-demand balance.All these time series show irregular sampling (as a result of weekends, holidays and other missing data due to specific market reasons) as well as trend and seasonality.Preprocessing of the data was conducted first to fill in gaps in the time series and remove any trends and seasonality over time.

Data preprocessing
Let us denote with {y ob t } the time series of daily prices defined as a function of the incomplete observed raw time base {t} , and {x ob t } its natural logarithm transforma- tion, i.e., x ob t = log y ob t .The time series {x ob t } is sampled irregularly (not evenly throughout the time interval) since weekends, holidays, and other sporadic missing days are not included.The imputation of missing data (gap filling) can convey important knowledge (Owen 2007), thus counteracting the phenomenon known as informative missingness.In the presence of missing data, different time periods may have different information content.In fact, even when the market is closed, new information that can influence price dynamics may emerge (French 1980;Mantegna , Stanley 1999).
The first step, therefore, involved the gap filling of the time series.To do this, we employed a complete daily grid {t} and an imputation technique called missForest (Stekhoven and Bühlmann 2011), a ML algorithm for data imputation that is completely agnostic about data distribution.MissForest is used to compute a value for each missing point.In this way, following the same procedure proposed in (Mari , Baldassari 2021), we extended the raw time series {x ob t } to a complete daily time base {t}, (10) where F is the application that transforms {t} in {t} and computes missing values using the missForest algorithm to fill the daily-complete grid, mapping {x ob t } to {x f t }.In the second step, we looked for temporal trend and seasonality, hereinafter trend, that must be eliminated in order to reveal the stochastic process driving market dynamics.The Locally Weighted Estimated Scatterplot Smoothing (LOWESS) technique was used to perform this task (Cleveland 1979;Cleveland and Devlin 1988).LOWESS is a versatile method for removing the trend by fitting basic polynomial models to restricted portions of data.The key benefit of LOWESS over other methods is that it does not need the specification of a global function or the assumption that the data must conform to a certain distribution shape (Dagum , Bianconcini 2016).Figure 4 shows detrended log-price time series, {x t } , and the trend, {x tr t } , which is superimposed for each market under investigation.
Once detected, the trend can be eliminated from the filled time series, thus obtaining, where we assumed, without loss of generality, that the complete time grid {t} is represented by the first N natural numbers, {1, 2, ⋯ , N} .Then, log-returns are com- puted as the difference between two successive daily log-prices, i.e., Figure 5 depicts the log-return time series, {r t } , for each market under investiga- tion.Table 1 depicts the descriptive statistics of log-returns.In all four investigated markets, empirical log-return distributions exhibit heavy tails, as indicated by the high values of the kurtosis.The occurrence of jumps and spikes in electricity prices significantly increases the value of the fourth central moment of empirical log-return (11) Fig. 4 Detrended log-price time series x t (in green) and trends x tr t (in red) From this perspective, is crucial for a given probabilistic model to capture the first four central moments of empirical distribution of log-returns, especially for financial applications to account for extreme events (Geman 2005;Geman and Roncoroni 2006).

Networks
In this section, we constructed the visuals for the time series networks that were created in this analysis.It is very interesting to compare the two types of networks, NVGs and HVGs, described in Sect. 2. In particular, we two distinct kinds of representations, namely the geometric representation and the Kamada-Kawai representation (Kamada and Kawai 1989) for both the NVG and HVG methods.In this regard, we note that graphs have an infinite number of possible representations, but the informative value of the graph depends on the ease with which it can be read (Di Battista et al. 1994).Figures 6 and 7 show NVGs and HVGs, respectively, in the geometric representation for the time series observed in the markets under investigation.In this geometric representation, the nodes are depicted at the same positions they occupy in the time series.The adopted geometric visibility criterion is   8 and 9 illustrate NVGs and HVGs, respectively, in the Kamada-Kawai representation.The Kamada-Kawai representation provides a graph force-directed layout algorithm that positions the nodes of a graph in twodimensional space, based on an optimization criterion that minimizes the total energy of the system (Kamada and Kawai 1989).It offers an effective and intuitive way to visualize complex graphs, and can help in the interpretation and understanding of graph structure and relationships between nodes.The method takes the relative distances between nodes and the edges that connect them into account, resulting in intuitive and visually appealing graph layouts that better attempt to highlight the initial communities that are identified.

Distribution estimation
In this section we discuss the EM-based maximum likelihood estimation of GMMs on market log-returns time series.The initialization of the EM algorithm will be performed using both the GA and GE frameworks.Figure 10 shows a magnified version of the Distribution Estimation section and summarizes the calibration procedure.As discussed in the previous section, the initialization blocks produce both the number of the model component, C, and the initial parameter set, Θ 0 .Indeed, the number of components is set equal to the number of the detected communities, and the initialization parameters are computed through community membership according to Eqs. ( 7)-( 9).Then, the EM algorithm is used to estimate the GMM parameter set, Θ.
In the GA-framework, to generate the MTF adjacency matrix we need to quantize the time series in order to define the dynamic states.In our analysis, both Quantile binning and Normal binning will be used (Mari and Baldassari 2022).We perform the model estimation for each couple of values (Q, S) defined on a suitable grid, where Q is the number of bins, i.e., the dynamic states, and S is the dimension of the MTF coarsened adjacency matrix.The number of bins, Q, is made to vary from 2 to 100 in increments of 2 for both Quantile and Normal binning strategies; the parameter S is made to vary from 5 to 400 in increments of 5.In the GE-framework, the parameters of the various embedding methods that are employed in this study are the default parameters of the Karate Club framework.Moreover, the ToMATo clustering techniques needs to set three parameters, namely the neighborhood information of the point cloud, the density estimator, and the merging parameter.We used a ToMATo cluster function contained in the Python library tomaster1 which needs only the neighborhood information as input and automatically calculates the other two parameters.Since ToMATo relies heavily on neighborhood information, a popular choice to model a neighborhood seems to be the K-nearest neighbors algorithm.As well documented in the literature, this ML algorithm performs well, recovering the correct clusters under an appropriate choice of the parameter K (Chazal et al. 2013).For the GE-framework the model estimation is, therefore, performed by exploring a suitable one-dimensional grid for the parameter K.In this grid, the parameter K varies from 2 to 100 in increments of 1.The embedding method, during the exploration of the grid, always operates on the complete graph without using an approximate form, as opposed to the coarsening method, which makes use of a reduced adjacency matrix of size S.This is one of the most important differences between the GA and GE frameworks.Table 2 reports the dimension of the Euclidean space for each embedding method.This parameter was chosen based on the default parameter provided by the library.Moreover, Table 2 shows the log-return time series in correspondence of two well defined embedding methods, namely the ASNE and NEU_ASNE methods. 2 As we will see in the next section, this feature can greatly improve the accuracy of the estimation procedure.

Results
This section outlines and discusses the empirical findings of the experiment.More specifically, we will compare the outcomes of six embedding methods for the GEframework (see Table 2) and one coarsening method for the GA-framework.Due to the fact that the GA-framework includes two different binning strategies (Quantile and Normal), and the GE-framework also includes two different strategies (Natural and Horizontal), the comparison will involve 14 outcomes corresponding to 14 different initialization techniques.For comparative purposes, the estimation result obtained through a more traditional K-means-based initialization technique is also considered.
There exist information criteria for assessing the relative quality of statistical models, such as the Bayesian Information Criterion (BIC), and the Akaike Information Criterion (AIC) (Kuha 2004).Both the BIC and AIC criteria attempt to solve the model selection problem by introducing a penalty term for the number of parameters to account for the trade-off between the quality of the fit and the simplicity of the model.The penalty discourages issues related to overfitting because increasing the number of parameters in the model always improves the quality of the fit.AIC and BIC are expressed in terms of log-likelihood in the following way, where L is the likelihood, is the number of free parameters and N is the sam- ple size.According to Eqs. ( 4)-( 6), the number of free parameters is given by = 3C − 1 .Lower AIC and BIC values indicate preferable statistical models.In most cases, the penalty term is larger in the BIC than in AIC criterion, because the penalty is 2 for AIC, and log(N) for BIC.Therefore, BIC generally results in a smaller choice for C than AIC (Steele , Raftery 2010).For these reasons, we will apply the BIC criterion.
As a first result, we emphasize that in all four markets considered the most suitable model according to the BIC criterion is a three-component model in both the GA and GE frameworks.Tables 3, 4, 5, and 6 illustrate this result.Models are ordered by ascending BIC value, so that the first row of every table reports the best model according to the BIC criterion (the model with the minimum BIC value).For each model, the kurtosis absolute percentage error, 4 , is also reported in order to measure the model ability to capture the heavy tail phenomenon as well.It is defined by where m denotes the model kurtosis and o is the empirical kurtosis.We note that in the GMMs m can be computed exactly.In addition, each table shows the binning strategy, which may be quantile (q) or normal (n) for the GA-framework, natural (N) or horizontal (H) for the GE-framework, and the best-fit model, along with the corresponding grid parameters.
The GE-framework outperforms the GA-framework in all the markets under investigation.Lower BIC values indicate better learning features.In each market, there is always an embedding method that produces better results than the GAframework.In particular, we note that ASNE and Diff2Vec methods and their meta-embeddings represent the best choice in all four power markets.The Dif-f2Vec method is the best choice in the Palo Verde and SP15 markets.The ASNE approach, which makes use of extra information carried on by log-return time series as an additional attribute, is the best choice in the Nepool market.Finally, we remark that the NEU approach performs very well, matching several rows in Table 8.In particular, its combination with the ASNE embedding method represents the best choice in the PJM market.Due to less information loss in the Graph Compression phase, superior data-driven and unsupervised Graph Machine Learning approaches and reduced computational complexity, the GE-framework seems to be a flexible and powerful tool for understanding the complex, highly nonlinear behavior of electricity market prices.Moreover, it requires less CPU time.In fact, the wall-clock time on the four markets amounted to about 120 h for the GA-framework and about 30 min for the GE-framework.This difference is primarily attributed to the optimization methods used in each framework.The GA-framework involves a twoparameter grid optimization process, whereas the GE-framework employs a simpler one-parameter optimization process.Additionally, the Louvain algorithm in the GAframework requires a longer processing time compared to the GE-framework, which is designed to be more streamlined (a VM from Colab Pro was used with 2 cores Intel(R) Xeon(R) CPU 2.20GHz, RAM 13 GB).However, it is important to note that the GA-framework ranks in the top 5 of the 14 methods discussed in this paper in all four markets, despite using only one compression technique (graph coarsening).
For comparison purposes, the estimation results obtained using a more traditional K-means-based initialization technique is also shown in Tables 3-6.As expected, our methodology outperforms this more conventional approach.In fact, for each market the BIC value obtained through this initialization technique is greater than the minimum BIC value obtained in both the GA and GE frameworks.The BIC weights obtained by comparing the best performing models of the GE-framework, the GA-framework and the K-means approach are shown in Table 7.The BIC weights are computed according to where with BIC min is the minimum BIC value (Wagenmakers and Farrell 2004).The weight w i can be interpreted as the probability that the i-th model is the best model in terms of Kullback-Leibler discrepancy (Anderson , Burnham 2004).The weight ratio w i ∕w j is calculated between the two models with the highest probability.From Table 7 it can be inferred that the best-fitting model in the GE-framework is several times (from 2.14 for SP15 to 250.89 for PJM) more likely to be the best model than the best-fitting model in the GA-framework.The weight ratios between the best GE-framework model and the best K-means model are much higher for each market under consideration.The same line of reasoning proves that the GA-framework also far outperforms the K-means approach.Finally, Table 8 shows the models of both GA and GE frameworks with three Gaussian components that minimize the value of the absolute percent error of kurtosis, 4 .
Overall, the methodology proposed for both the GE and GA frameworks, in addition to showing a high degree of learning as highlighted by the BIC values, allows for remarkable replication of the first four moments of the empirical distribution, as ( 16)

Concluding remarks
Based on Graph Machine Learning techniques, we provided a new methodology for estimating Gaussian mixture models on financial time series by maximum-likelihood using the EM algorithm.We conducted an experiment on US electricity market price time series by employing two different schemes of analysis, namely the GA-framework and the GE-framework.The graph encoding is performed by using quantile graphs with a Markov Transition Field as the adjacency matrix in the GAframework visibility graphs in the GE-framework.The electricity market price structure is a good example of a system that defies straightforward modeling.To take advantage of the graph-structured information included in electricity market data, we used complex networks techniques that link time series and graphs.In the literature, the GA-framework has been shown to outperform more traditional initialization approaches such as K-means and random techniques (Mari and Baldassari 2022).In this paper, we demonstrated that the GE-framework provides even better results.In fact, we showed that the best-fitting model in the GE-framework is several times (from 2.14 for SP15 to 250.89 for PJM) more likely to be the best model than the best-fitting model in the GA-framework.Compared with the GA-framework, the GE-framework, which uses data-driven and unsupervised graph embedding methods along with TDA-based clustering, has a more conservative approach and therefore loses less information in the process.Moreover, it requires less CPU time.
The proposed approach allows for a high degree of generalization and flexibility and could enable the creation of a super framework adaptable, in principle, to any empirical situation.In fact, referring to Fig. 1, the super framework could accommodate the methods used in this study and admit new ones.The ability of its functional blocks to integrate a wide range of methods and techniques makes it particularly flexible to a wide range of empirical analyses.Within this super framework, both the GA-framework and the GE-framework can be considered as two different configurations of a very flexible overall scheme.This topic will benefit from future investigation.In particular, our future research will focus on two main issues: (i) combining embedding and coarsening procedures in the graph compression block and testing the accuracy of the results; (ii) using initialization blocks as a self-supervised learning framework to discover communities and assign them to different dynamics.

Fig. 3
Fig. 3 Time series observed in the time interval from January 1, 2017 to December 31, 2021.Data are expressed in nominal dollars per megawatt-hour

Fig. 10
Fig. 10 The distribution estimation section

Table 8
Three-component models with minimum 4