Two diﬀerent ﬂavours of complexity in ﬁnancial data

. We discuss two elements that deﬁne the complexity of ﬁnancial time series: one is the multiscaling property, which is linked to how the statistics of a single time-series changes with the time horizon; the second is the structure of dependency between time-series, which accounts for the collective behaviour, i.e. the market structure. Financial time-series have statistical properties which change with the time horizon and the quantiﬁcation of such multiscaling property has been successful to distinguish among diﬀerent degrees of development of markets, monitor the stability of ﬁrms and estimate risk. The study of the structure of dependency between time-series with the use of information ﬁltering graphs can reveal important insight on the market structure highlighting risks, stress and portfolio management strategies. In this contribution we highlight achievements, major successes and discuss major challenges and open problems in the study of these two elements of complexity, hoping to attract the interest of more re-searchers in this research area. We indeed believe that with the advent of the Big Data era, the need and the further development of such approaches, designed to deal with systems with many degrees of free-dom, have become more urgent.


Introduction
In the last few decades the physicists' community has increasingly engaged with the study of complex systems [1]. These systems are normally composed of many interacting elements and they are characterised by the emergence of collective properties that cannot be simply associated with the constituting elements themselves. The study of these systems requires new tools and new approaches because the traditional reductionist paradigm fails to capture such collective emergence [1]. There is not a unique and commonly agreed definition of complex systems but a common element in these a e-mail: tiziana.di matteo@kcl.ac.uk systems is the presence of many parts that are interconnected, heterogeneous and often complex themselves. Paradigmatic examples of complex systems are human societies and in this respect markets have been, so far, the most data-rich case study of such complex structure of interactions between individuals, machines and the real world. Financial markets are systems where participants interact among each others with different strategies, at different time and volume scales and at different frequencies [2]; they are open systems where many subunits interact nonlinearly in the presence of feedback [3,4]. In markets all the elements contribute in different ways to the emergence of a price for each asset. These prices are the consequences of the complex interactions between all these elements and such a complexity is revealed in their behaviors that have statistical properties which change with the time-horizon, following non simple patterns. This is called multiscaling and it is the first of the two 'flavours' of complexity that we are going to discuss in this paper. Multiscaling properties of financial data have been studied extensively since the pioneering works of Mandelbrot [5][6][7][8][9][10]. Econophysicists have contribute greatly to the empirical investigation of these scaling properties, the development of tools for their quantification and the modeling of their emergence [2,6,[10][11][12][13][14][15][16][17].
A further element of complexity in markets is consequence of the fact that prices do not evolve in isolation, on the contrary there are common factors and mutual influences that make the markets collectively move. The structure of the interactions that characterize these collective moves is the second 'flavours' of complexity that we are discussing in this paper. In the last decade there have been great progresses, mostly driven by physicists, in the use of network-theoretic tools to study complex systems. In markets these approaches have been mostly devoted to the study of the structure of dependency between financial assets. Such a structure is complex and changeable in time and it can be used both to filter out relevant information from large redundant datasets [18,19], to characterize states of the market [20,21] and to predict future evolutions [22,23].
In our opinion, overall, one of the main contribution of physicists to Economics and Finance consists in the systematic study of the statistical properties of financial data which started in the nineties when large amount of financial data started to became easily available [2,6,12]. In particular, physicists introduced data-analytics and modeling tools originally developed in the field of statistical physics [12], which turned out to be effective and well-suited also for the characterization of financial markets.
As mentioned above, in this contribution we discuss two features of complexity: the multiscaling property of financial time-series and the dependence structure. The former regards the properties of a single time-series and has been argued to be linked to the presence of different traders operating on different time-scales on the markets [8,9]. The latter instead focuses on the properties of the market as a whole and is motivated by the fact that, given the huge amount of data nowadays available, we need appropriate tools to filter the relevant information [18][19][20].
The rest of the paper is organized as follows: in Section 2 we briefly discuss the concept of multiscaling and how it proves to be useful both from a modelling and descriptive perspective for financial time-series, in Section 3 we discuss advances and successes in the study of markets via network filtering with a focus on portfolio optimisation and Big Data extensions, in Section 4 we draw some conclusions and discuss further directions.

Multiscaling
The multiscaling behaviour of financial time-series is one of the acknowledged stylized facts in the literature (see [2,7-10, 13-15, 24]). Many works have been dedicated to its investigation and characterization [7,16,[25][26][27][28][29][30][31][32][33][34][35], reporting strong evidence of its presence in financial markets. Let us take for example a price process p(t) and consider its log-returns at time-horizon τ : where K(q) is the q−moment for τ = 1, H(q) is the so called Generalized Hurst Exponent which is a function of q and the function qH(q) is concave [7,10,31]. Particular cases are uniscaling processes where H(q) = H (so it does not depend on q) like for example the Brownian Motion, where H = 1/2, or the Fractional Brownian Motion, where 0 < H < 1. Several methods to estimate the scaling exponents have been proposed in the literature, including the Multifractal Detrended Fluctuation Analysis [32], Generalized Hurst Exponent Methods (GHE) [8][9][10] and Wavelet Transform Modulus Maxima [36]. Among them the GHE turned out to be one of the best behaved in estimating the Hurst exponent under heavy tailed distributions [33]. As an example we report in Multiscaling models which reproduce also the two main stylized facts, namely power law tails of the unconditional distribution of the returns and the volatility clustering have been proposed. Among them let us recall the Markov Switching Multifractal Model (MSM) [25,[37][38][39][40] and the Multifractal Random Walk (MRW) with its extensions [41][42][43]. Beyond their descriptive properties, these models proved to have also forecasting power. In particular the MSM proved to give better volatility predictions with respect to GARCH type models [40,44] and also better volumes predictions compared to long memory models such as FIGARCH [45] or ARFIMA [46]. MRW gives instead better volatility and VaR forecasts with respect to the GARCH(1, 1) and tGARCH(1, 1) models [47,48].
From an empirical point of view, Hurst exponents measures are useful tools to detect the degree of development of a market by computing it on stock market indices, foreign exchange rates and fixed income instruments [8][9][10]. A recently proposed weighted version the Hurst Exponent demonstrated that this tool can be also used to monitor the stability of firms with possible applications in the risk management field [16]. From a more descriptive perspective it was also proposed as a tool which gives insights into the dynamic of the market microstructure [17]. Recently an ongoing debate concerning the source of the multiscaling behaviour of financial time-series has been solved, identifying this source in their causal structure [14,34]. Moreover this study allowed to uncover sources of bias in the measurement of the scaling exponents and a tool to take them into account has been proposed [35]. A different approach to investigate scaling and multiscaling of financial time series which makes use of Empirical Mode Decomposition has been recently introduced [49]. This methodology has the advantage of being robust against non-stationarity and it provides a timedependent exponent [50].
This was an example of a tool (scaling analysis) widely used in Physics that, now applied outside its traditional domain, gives relevant and important insights of financial markets behaviour. We now turn our attention to the study of the structure of dependence between different assets.

Structure of dependence between time series
Log-returns of different assets display a high cross-dependence, even across industries and asset classes [12,51]. This is generally explained in terms of synchronizations among market participants, due to common flows of information and overlapping investment strategies [12]. The resulting structure of dependence is of interest in many areas of Finance, from Pricing to Risk Management [52]. The dynamical evolution of such structure is also relevant; in fact, from potential changes in correlation a new category of risk arises, namely the "Correlation risk" [52].
Estimating the structure of dependence from empirical data is challenging. The amount of statistical noise and redundancy makes the use of raw correlation matrices unreliable for applications, such as portfolio optimisation [53]. In order to access a less noisy information on the dependency structure, correlation-based information filtering networks have been introduced [4,12,20,54]. In particular, they map correlation matrices into sparse graphs that retain only a subset of entries according to some filtering criterion. Different correlation matrices make use of different filtering criteria. For instance, Minimum Spanning Tree (MST) requires tree-like topology (e.g. no loops) [20], whereas Embedded Graphs (EG) require the network to be embedded on surface with fixed genus g [19,55,56]. When the condition is g = 0, the resulting network is planar and we call it a Planar Maximally Filtered Graph (PMFG) [18,19]. In terms of computational complexity the algorithms that build MST and PMFG are respectively O(N 2 log N ) [57] and O(N 3 ) [19], where N is the correlation matrix size; recently [58] a new algorithm for planar filtering has been proposed, able to build a chordal maximum planar graph (called Triangulated Maximally Filtered Graph, TMFG) with a computational complexity of O(N 2 ), making possible a much higher scalability and the application to Big Data [58]. Interestingly, there is a deep connection between network filtering and hierarchical clustering techniques: Linkage methods have been shown to be related to the MST [59], whereas recently the Directed Bubble Hierarchical Tree (DBHT) method has been introduced [60, 61] as the natural clustering associated to the PMFG.
Since the seminal work by Mantegna [20], correlation-based networks have provided interesting insights for risk management and portfolio optimisation. It has been  observed that the structure of such networks contains significant economic information, related to the industrial sectors classification, but it also conveys important independent information [20,21]. Such structure is quite robust against variations in the time horizon of log-returns [62], at least when the average signal is removed from the original time series [63].
In terms of temporal evolution, several results have been obtained as well. In [64], it has been observed that stocks belonging to the same industrial sector have similar topological relevance in the network (a concept quantified by the so-called centrality); moreover, this differentiation is robust against time. In particular stocks belonging to Energy, Utilities and Health Care (Forbes classification) tend to be the located in the peripheral region of the network whereas Basic Materials, Capital Goods and Finance stocks are located more in the central region. Correlation-based networks based on partial correlation reveal a topology in which the role of the Financial sector is even more prominent [65].
Despite this overall robustness, the dependence structure is known to be nonstationary [66]. Network filtering is a particularly powerful tool for measuring such non-stationarity. In [67] the network topology appears to evolve at two different time scales: a slow one, of the order of years, and a fast scale of few months. The faster dynamics seems to be associated to events and news impacting financial markets. In fact in [68] deep structural changes are shown to have occurred in the network during the 1987 Black Monday. Similar phenomena have been observed for correlations on Foreign Exchange data [69]. For what concerns the slower dynamics, in [54] is reported that the Financial sector has been loosing centrality over the first decade of 2000's. In [70] the authors show that the network structure had lost persistence in the years preceding the 2008 financial crisis; a clustering analysis based on the network topology reveals also a high level of non-stationarity during the crisis. This is shown in Figure 2, where the amount of economic information (left panel) and the persistence of the dependence structure (right panel) are measured through the Adjusted Rand Index [71], an index which quantifies the similarity between different stock partitions.
Network filtering procedures are valuable tools for applications to portfolio optimisation too. In [53] it has been reported that Markowitz optimisation on network-filtered correlation matrices is more reliable than the same optimisation performed on unfiltered matrices. In [72] it is shown that peripheral nodes in PM-FGs are good candidates to select a well-diversified portfolio. Similarly, in [73] it has been found that stocks selected by Markowitz method tend to be the "leaves" of the MST.
A recent approach called LoGo [23] uses information filtering networks (MST and TMFG in particular [58]) to construct probabilistic sparse modeling for financial systems. LoGo is computationally very efficient and has performances comparable or better to state-of-the-art sparse modeling tools. By means of LoGo one can make use of information filtering networks for forecasting, stress testing and risk allocation [23].
Despite this remarkable amount of empirical findings, one aspect is still open and has been mostly overlooked: the modeling of correlation-based networks temporal evolution. We know that the network topology is continuously changing, but we still do not know whether patterns and regularities characterise such changes. The attempt to model networks evolution is already the focus in many areas of Network Theory [74]. The first works exploring these aspects show promising results: correlation-based networks display patterns which can be fruitfully exploited for forecasting market risk changes [22,75]. We expand more on this aspect in the next section.

Challenges and future directions
In this paper we discussed two features of complexity of the financial markets, namely the multiscaling properties, which are associated with the behaviour of each single asset and the complex dependency structure, which is associated with the behaviour of a set of assets. Let us here discuss which, in our opinion, are the next challenges which should be faced in the future.
For what concerns the multiscaling analysis we regard as important to develop further the first promising steps moved in the direction of linking the Hurst Exponents with the risk and the forecasting, with the aim of providing better suited instrumets for forecasting and for financial decisions.
For what concerns the study of collective dynamics and the related dependency structure in our opinion the study of the dynamics of dependency and causality filtered networks is the most promising and interesting to investigate in the future. Modeling the correlation dynamics is already the goal of several traditional econometrics tools, such as Multivariate GARCH [76] and Stochastic volatility models [77]. However such tools are not able to cope with more than few assets, due to the rise in the number of parameters. Conversely, approaches based on information filtering networks [23] can provide a valuable insight into the dynamics of thousands of assets at once. This would be of great interest for Risk Management, systemic risk analyses and regulators. Moreover, it would pave the way for a new generation of forecasting tools, conceived to deal with entire markets and different asset classes rather than with few assets.
Before concluding we want to share our hope that this paper will help to move a step forward in the process of linking even more the Economics with the Physics communities. We believe that the best research outcomes can come out only from a deep interplay between the two fields which should put together their respective expertise in order to reach the common goals.