Abstract
The availability of empirical data that capture the structure and behaviour of complex networked systems has been greatly increased in recent years; however, a versatile computational toolbox for unveiling a complex system’s nodal and interaction dynamics from data remains elusive. Here we develop a two-phase approach for the autonomous inference of complex network dynamics, and its effectiveness is demonstrated by the tests of inferring neuronal, genetic, social and coupled oscillator dynamics on various synthetic and real networks. Importantly, the approach is robust to incompleteness and noises, including low resolution, observational and dynamical noises, missing and spurious links, and dynamical heterogeneity. We apply the two-phase approach to infer the early spreading dynamics of influenza A flu on the worldwide airline network, and the inferred dynamical equation can also capture the spread of severe acute respiratory syndrome and coronavirus disease 2019. These findings together offer an avenue to discover the hidden microscopic mechanisms of a broad array of real networked systems.
Similar content being viewed by others
Main
From two-photon calcium imaging of neuronal activities1,2 and high-throughput genetic experiments3,4 to digital recordings of human mobility5,6,7, our ability to observe the dynamic behaviour of nodes in complex biological, social and technological systems has advanced spectacularly in the past years. The collected observations, often in the form of time-series data, allow us to extract the dynamic patterns of a system’s individual nodes. To gain meaningful insights into the system, however, such a reductionist approach of tracking all the individual nodes is insufficient. Indeed, complex system behaviour emerges not just from the single nodes but rather from the dynamic interactions between the nodes6,8,9,10,11,12,13,14,15,16,17,18. This requires us to infer complex network dynamics, that is, to retrieve both self-nodal dynamics and interaction dynamics from the accumulating data of network topological structure and nodes’ activities.
The balance of self versus interaction dynamics is the most naturally captured by a general equation that tracks the activities of all the nodes via9
where xi(t) ≡ (xi,1(t),…, xi,d(t))T is node i’s d-dimensional activity, representing, for example, the membrane potential of a neuron in a brain network9,12, the proportion of infected people in a country or region5,6,7, or the state of a component in an oscillator network19. These activities are driven by the self-regulation function F(xi) ≡ (F1(xi),…, Fd(xi))T (designed to describe the dynamics of all the nodes in isolation) and the pairwise function G(xi(t), xj(t)) ≡ (G1(xi, xj),…, Gd(xi, xj))T (which captures the dynamic mechanisms of interaction between the nodes). Finally, the network Aij, an n × n adjacency matrix, denotes the influence or flow from node j to i, where n is the number of nodes in the system. As shown in another study, with appropriate choices of nonlinear functions F and G, equation (1) is able to describe a broad range of complex systems9. However, for most real systems, the functions F and G are unknown. Hence, a pressing lacuna in the study of complex systems is a versatile computational toolbox for automatically inferring equation (1) from the observed data of network topology Aij and nodes’ activities xi(t).
Complex biological, social or technological systems lack the fundamental physical rules that govern particle systems; therefore, we do not have a priori knowledge of their internal microscopic mechanisms20. Therefore, the goal is not to only identify the model’s parameters but rather to retrieve the forms of F and G and infer the explicit model itself. Despite the recent important progress in developing methods to infer the governing equations of single- or few-body dynamics21,22,23,24,25,26,27, the task of inferring network dynamics poses particular challenges. For example, F and G are usually of different types; hence, one cannot obtain their compact forms when only using orthogonal basis functions22,23,28,29. Nodes’ activities data are noisy and the mappings of network topologies are usually incomplete30,31. Collective behaviour, such as synchronization and consensus19, can conceal the specific forms of microscopic mechanisms in interaction dynamics. To overcome these challenges, we propose here a two-phase inference approach. Our analysis indicates that the two-phase strategy allows us to achieve efficient and—most importantly—highly accurate inference, even in the face of unfavourable scenarios, such as noisy or low-resolution data or an only partially mapped topology (Fig. 1a).
Results
Overview of the two-phase inference approach
Lacking a priori knowledge of the structures of F and G, a natural approach is to pre-construct two extensive libraries LF and LG that contain a variety of elementary functions. The combinations of these elementary functions can potentially generate the true network dynamics. In this work, the libraries contain not only orthogonal basis functions but include polynomial, trigonometric, exponential, fractional, rescaling, sigmoid and other activation functions frequently used in various domains (Supplementary Tables 1 and 2). Large libraries are helpful for finding a compact and optimal model to capture network dynamics but they also make the inference problem more difficult; due to the lack of orthogonality, the elementary functions can be similar with each other and thus less discriminative.
By introducing the time-series data xi(t) (where i = 1, 2,…, n) into LF and LG, we obtain two time-varying matrices ΘF(t) ≡ LF(xi(t)) and ΘG(t) ≡ LG(xi(t), xj(t)) that encode the patterns of nodes’ activities imposed by the elementary functions in LF and LG (Fig. 1b). Then, the inference problem can be recast to the selection of appropriate patterns in ΘF(t) and ΘG(t) that best match the evolution of observed system state \(\dot{{{{\bf{x}}}}}(t)\), that is, to inferring the sparse coefficients ξF and ξG that best solve
where \(\widetilde{A}\equiv A\otimes {I}_{d}\), \({\widetilde{{{\varTheta }}}}_{F}\equiv {{{\varTheta }}}_{F}\otimes {I}_{d}\) and \({\widetilde{{{\varTheta }}}}_{G}\equiv {{{\varTheta }}}_{G}\otimes {I}_{d}\), where the symbol ⊗ denotes the Kronecker product, and Id is the d-dimensional identity matrix. Here we consider the general setting where each node state is d dimensional and the network is directed and heterogeneous. Consequently, the problem of inferring complex network dynamics is high dimensional and irreducible. Indeed, the number of elementary functions in LF and LG is approximately 25, 80 or 140 when the node activity itself has one, two or three dimensions, respectively, in the simulation validations below (Supplementary Tables 1 and 2).
Our approach is a two-phase procedure consisting of global regression and local fine-tuning. In phase I, we approximate the derivatives \(\dot{{{{\bf{x}}}}}(t)\) (Methods) and calculate the matrices \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\) and then normalize each of their columns (Fig. 1b). These normalized data are used to identify, through regression, the leading elementary functions that are most probably constituents of true F and G (Fig. 1c and Methods). Phase I is able to narrow down the model space, but the dynamical equation inferred by such regression alone lacks generative power (Fig. 1d). Next, in phase II, we perform fine-tuning with the original values of \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\), that is, without normalization. We use topological samplings (Methods) and the weighted Akaike’s information criterion (wAIC; Methods) to sequentially remove the elementary functions with the smallest inferred coefficients (Fig. 1e). The final sets of elementary functions and their coefficients \({\hat{{{{\bf{\xi }}}}}}_{F}\) and \({\hat{{{{\bf{\xi }}}}}}_{G}\) compose \(\hat{{{{\bf{F}}}}}\) and \(\hat{{{{\bf{G}}}}}\), leading to the inferred dynamics of complex networks (Fig. 1f).
Inferring complex network dynamics
To validate the effectiveness of our approach, we apply it to infer five network dynamics, including the Hindmarsh–Rose32 (HR, d = 3) and FitzHugh–Nagumo32 (FHN, d = 2) neuronal systems, social balance dynamics33 (SB, d = 1), Kuramoto dynamics34 (d = 1) and coupled heterogeneous Rössler oscillators35 (d = 3); here d is the dimension of each node activity. To obtain the nodes’ activities data, we simulate these dynamics (Supplementary Table 4) on a variety of topologies, including Erdős–Rényi (ER) and scale-free (SF) synthetic networks and five empirical networks—cellular-level brain networks of Caenorhabditis elegans and Drosophila, Advogato social network, and power grids of Northern Europe and United States. The time series of node activities and each network topology are the input data to our approach. The five specific equations governing these dynamics are the ground truths that we aim to infer. These dynamical models and networks are widely used in various domains and exhibit different properties (Supplementary Sections II and III), which accounted for the diversity of our tests.
Figure 2 illustrates the procedure of inferring FHN neuronal network dynamics. Through global regression, phase I identifies the ten most relevant elementary functions for each dimension of FHN (Fig. 2b); then, by local fine-tuning, phase II autonomously learns the compact and optimal form of the dynamical equation as well as the most appropriate coefficient for each of the necessary elementary functions (Fig. 2c). The form of the inferred equation in Fig. 2c perfectly matches the ground truth in Fig. 2a, and the learnt coefficients are also highly accurate. Indeed, the relative errors \({{\varDelta }}=(\xi -\hat{\xi })/\xi\), where ξ and \(\hat{\xi }\) are the true and learnt coefficients, respectively, are smaller than 3% (Fig. 2d). The dynamical equation inferred by our approach exhibits generative power, being able to generate nodes’ activities and trajectories that agree well with the observation data (Fig. 2e,f).
Our approach also successfully infers the equations governing the other four network dynamics. Regarding the accuracy of learnt coefficients, the relative errors |Δ| are less than 3% for the HR (Fig. 3a) and edge (Fig. 3c) dynamics on both synthetic and empirical networks. In Kuramoto dynamics and coupled heterogeneous Rössler oscillators, the self-dynamics are non-identical, that is, each node’s dynamics has its own form (Supplementary Section III). Hence, we aim to infer an effective form of equation (1) that minimizes the inconsistency between the inferred and true nodes’ activities. Even for these more challenging cases, the two-phase approach still succeeds with relative coefficient errors |Δ| < 5% or |Δ| < 20% (Fig. 3e,g). Both activities and trajectories generated by the effective equations exhibit high agreement with the true averaging dynamics (Fig. 3f,h,i).
Inferrability of network dynamics
Whether a network dynamics is inferrable depends on several factors. Here we explore three key factors, namely, synchronized dynamics, dynamical heterogeneity and deficient libraries.
Synchronized dynamics: if a network is completely synchronized, that is, all its nodes behave in the same manner19,34,35, distinguishing the activities of a node and its neighbours becomes impossible, and the microscopic interacting mechanism G(xi, xj) between the nodes will be cloaked and undiscoverable. In other words, the more synchronized a network, more difficult it is to infer its dynamics. Here we tune the coupling strength between the nodes to change the degree of network synchronization (that is, order parameter 〈R〉; Supplementary Section IV), and test the capability of our two-phase approach in inferring partially synchronized network dynamics. As shown in Fig. 4a, although the inference inaccuracy increases when the system becomes more synchronized, our approach can still infer the true FHN equation even when the network is highly synchronized (〈R〉 ≈ 0.7). The inference inaccuracy is quantified by a symmetric mean absolute percentage error (sMAPE; Methods). The more accurate the inference result, the closer the sMAPE value is to zero.
Dynamical heterogeneity: equation (1) assumes that nodes have the same form F of self-dynamics; yet this is not always true. For instance, although the self-dynamics of the Kuramoto model is simply one elementary function ω representing the natural frequency of a node, different nodes can have different values of ω. For such non-identical self-dynamics, it is difficult—if not impossible—to infer a specific form Fi(xi) for each node i due to an n-fold increase in the dimensionality of potential model space (n is the network size). Therefore, we aim to infer an effective equation that best captures the averaging dynamics (Fig. 3e,g). Here we further explore the extent of dynamical heterogeneity that our approach can tolerate. To do so, we assign each node a value of ω randomly drawn from a normal distribution \({{{\mathcal{N}}}}(0,\sigma )\) and increase the standard deviation σ. The inference inaccuracy indeed increases when σ becomes larger, and the two-phase approach can tolerate dynamical heterogeneity σ ≤ 0.5 (Fig. 4b).
Deficient libraries: although two rather comprehensive libraries of elementary functions are built, it is still possible that some elementary functions of the true unknown dynamics are missing. Another possibility is that the compact form of true dynamics cannot be composed by these elementary functions. For these cases, our two-phase approach will infer an alternative equation to capture the system behaviours. We test such capability in gene regulation and HR neuronal dynamics whose true coupling functions are intentionally removed from LG. As shown in Fig. 4c, the trajectories generated by the inferred and true equations are close to each other, and the discrepancy is small for all the nodes (Methods and Supplementary Section IVB).
Inferring from incomplete and noisy data
The incompleteness of mapped network topology and noises of observed nodes’ activities are inevitable in real data30,31. Hence, here we validate the robustness of our two-phase approach against low resolution, dynamical and observational noises, and spurious and missing links, as well as through comparisons with previous methods23,36,37.
Low resolution: experimental and digital recording technologies often have limited measurement frequencies, inducing low resolution of the observed time series. To validate our approach’s robustness against low resolution, we numerically simulate the five nonlinear network dynamics in Figs. 2 and 3 with a step size of 0.01, and then regularly downsample the activity data. We calculate the failure ratios in inferring the form of true equations (Supplementary Fig. 14a) as well as the inference inaccuracies (Fig. 5a). The results show that the two-phase strategy requires only a proportion of 5% to 50% data for the inference.
Observational and dynamical noises: observational noises are induced by the measuring process and dynamical noises represent the intrinsic stochasticity in dynamics. To produce the former, we add Gaussian noises to the nodes’ activity data and quantify the intensity of observational noise with the signal-to-noise ratio (Supplementary Section VA). To imitate the latter, we add a stochastic term of Gaussian white noise with intensity η into the true dynamical equations and generate the nodes’ activities data by the numerical simulations of these stochastic differential equations (Supplementary Section VA). We test the impact of these two types of noise on the performance of the two-phase inference approach, without any denoising pre-process. As shown in Fig. 5b and Supplementary Fig. 14b, the approach can tolerate dynamical noise with η ≤ 0.15, meaning that it successfully reconstructs the hidden equations when the stochastic intensity is not higher than 15% of the average amplitude of true deterministic dynamics. Moreover, the approach can tolerate 30 dB observational noise (Fig. 5c and Supplementary Fig. 14c).
Spurious and missing links: spurious and missing links in real data induce an incomplete network topology Aij, which further leads to an inaccurate interaction matrix ΘG. To test the impact of these erroneous links, we randomly add or remove a fraction of links from the true network topology that was used to simulate the nodes’ activities. Owing to the topological sampling in phase II, our approach is able to tolerate 25% spurious and 30% missing links (Fig. 5d,e and Supplementary Fig. 14d,e).
Comparison with previous methods: the two most illuminating and effective methods for dynamics inference are Sparse Identification of Nonlinear Dynamics (SINDy)23 and Algorithm for Revealing Network Interactions (ARNI)37. Note that ARNI originally aimed at inferring network topology but can be transferred to infer network dynamics by minor modification (Supplementary Section VC). Here we compare our approach with SINDy and ARNI from different aspects, including the amount of required data (Fig. 5f), robustness against observational noise (Fig. 5g), correlated dynamical noise (Fig. 5h and Supplementary Section VA), missing links (Fig. 5i) and different network sizes (Fig. 5j). Although ARNI needs fewer data points if network topologies are complete and nodal activities do not have any noise (Fig. 5f), the two-phase approach outperforms both SINDy and ARNI in inferring complex network dynamics from incomplete and noisy data (Fig. 5g–j). We also perform comparisons with SINDy’s variant36 regarding partially synchronized or heterogeneous dynamics (Supplementary Figs. 13 and 17). These results indicate that our approach can better handle high-dimensional networked systems and better cope with incompleteness and noises in data.
Ablation studies: besides the two-phase strategy, our approach also involves three important components, namely, normalization in the first phase yet non-normalization in the second for solving the issue raised by highly skewed observations at different nodes, topological sampling for imitating the feature of observed incomplete topologies and optimal selection by wAIC for determining the most appropriate complexity of inferred dynamics. The essentiality of the two-phase strategy and the three abovementioned components is demonstrated by ablation studies. Specifically, we ablate each phase or component and then assess the performance of the degenerated approaches. As shown in Fig. 5k,l and Supplementary Section VB, the inference inaccuracy (sMAPE) indeed increases if the phases or components are individually ablated.
Inference of empirical systems
To demonstrate the approach’s ability of handling empirical systems, we apply it to infer the spreading dynamics of the infectious disease influenza A (H1N1). The network underlying this diffusion system is the worldwide airline network, which captures human mobility between different countries or regions and plays a dominant role for global disease spreading5,6. Each entry Aij of the weighted network’s adjacency matrix A represents the traffic volume from node j to i, where each node denotes a country or region. The total passengers daily are approximately Φ = 8.9 × 106; taking into account the population Pi of each node i, the adjacency matrix is modified to
The magnitude order of entries in matrix \(\hat{A}\) is around 10−2 to 10−3. The nodal activities xi(t) are extracted from the daily reports of infected cases in each country or region. Here we consider the nodes whose accumulated H1N1 cases are more than 100 and focus on the early spreading dynamics, that is, within the 45 days since the first case was reported in each node: this captures the system behaviour before government control.
Based on these empirical data, our approach successfully infers a concise effective dynamical equation
where a = 0.074 and b = 7.130 (Supplementary Section VI and Supplementary Fig. 18). It is interesting that our approach infers a sigmoid (nonlinear) form, rather than the linear form of epidemic models, to better capture the interaction dynamics. This might be caused by the fact that people usually consciously travel less if their countries/regions or the destinations have a higher infection risk. Although equation (4) describes the dynamics of all the nodes with the same parameters a and b, we also extend it by taking into account dynamical heterogeneity in the nodes, that is, to obtain ai and bi from each node i’s activity data (Fig. 6b–e and Supplementary Fig. 18).
Because empirical systems lack ground truths, we verify the inferred equation (4) by testing its generalizability to the spread of severe acute respiratory syndrome (SARS) and coronavirus disease 2019 (COVID-19). Based on the daily reported numbers within the first 45 days in each node, we find that equation (4) is also able to capture the early spread of SARS and COVID-19 on the worldwide airline network. Indeed, as shown in Fig. 6f–i and Supplementary Figs. 19 and 20, evolution of the cumulative numbers of SARS cases (for nodes whose eventual infected cases are more than 100) and COVID-19 cases (for nodes whose eventual infected cases are more than 2,000) agree well with the activities generated by equation (4) with heterogeneous parameters ai and bi.
Discussion
Many real networks have been mapped so far, but there are still complex systems whose network structure information is totally missing. For the latter, a possible scheme is inferring their topological structure, especially directed or causal networks30,37,38,39,40,41, from nodes’ activities data first and then applying our approach to infer the system dynamics. It is worth noting that inferring the network structure from nodes’ activities data is also challenging, especially when the number of nodes is large42,43, because the number of parameters needing to be estimated is about n2 (where n is the network size). Therefore, how to simultaneously infer both structure and dynamics of large, complex systems is still an outstanding problem.
Our work also raises several questions worthy of future pursuit. First, stochasticity in the dynamics of some real complex systems might be stronger than that considered in this work. Such highly stochastic systems are better described by stochastic differential equations29,44,45,46. Second, our approach does not account for discrete or Boolean dynamics, or systems that contain thresholding terms or exhibit irregular dynamics with instability properties47. Third, when the nodal activity is multidimensional, experimental access might be limited to a sub-dimension of the activity vector. The Koopman operator and time-delay embedding techniques are helpful for capturing the dynamical properties of sub-dimension observable systems48. Yet, the problem remains unsolved for complex networked systems. Finally, the nodes in a complex system can have higher-order—beyond pairwise—couplings, and such higher-order interactions may impact the dynamics of networked systems49,50. Hence, it is an interesting direction to extend the approach to inferring higher-order network dynamics.
Methods
Two-phase inference approach
The left-hand side of equation (1) represents the time-varying derivative of each node’s activity, which can be numerically obtained from xi(t) through the five-point approximation51
where δt is the time step. Hence, the specific goal is to infer both exact structure and corresponding coefficients of the self-dynamics function F(xi(t)) and the interaction dynamics function G(xi(t), xj(t)).
Because we lack a priori knowledge of the forms of F and G, we construct two comprehensive libraries, namely, LF and LG, for self- and interaction dynamics, respectively, including polynomial, trigonometric, exponential, fractional, rescaling and various activation functions (Supplementary Tables 1 and 2). By introducing the observed time series of nodes’ activities to the elementary functions in LF and LG, we obtain two matrices ΘF(t) = LF(xi(t)) and ΘG(t) = LG(xi(t), xj(t)) that describe the corresponding behaviours of these elementary functions (Supplementary Fig. 1). To infer the compact forms that best match equation (2), we propose a two-phase approach.
Phase I, global regression: the purpose of this phase is to assess the relevance of each elementary function in LF and LG to the true, yet unknown, network dynamics. Given the observations of xi(t) for all i at time t, we approximate the derivatives \(\dot{{{{\bf{x}}}}}(t)\) and calculate the matrices \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\). These values are highly skewed and can span several orders of magnitude (Supplementary Fig. 3) due to the skewness of node degrees and nonlinearity of system dynamics, which could induce an overestimation of the importance for inherently low-value constituents. To eliminate this severe effect, it is crucial to normalize each column in \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\). Then, the inference problem described by equation (2) is further recast to an optimization formula:
where λ > 0 is a hyper-parameter that regulates the sparsity of coefficient vectors ξF and ξG. We employ the regression analysis method of the least absolute shrinkage and selection operator to solve equation (6) and perform a fivefold validation to obtain the most appropriate value of λ (Supplementary Section IB). The resultant ξF and ξG capture the relevance of each elementary function in LF and LG, enabling the identification of leading elementary functions that are most probably the constituents of the true F and G (Fig. 1c). Consequently, phase I is able to narrow down the model space. However, the dynamical equation inferred by such regression alone lacks generative power. For instance, as shown in Fig. 1d, the trajectory generated by an inferred dynamical equation of phase I deviates from that of the true network dynamics.
Phase II, local fine-tuning: to reconstruct generative and concise expressions for F and G, we next perform fine-tuning in the reduced model space (Supplementary Section IB). In contrast to phase I, we now use the original values of \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\), that is, without normalization, to further identify the necessary elementary functions and learn their precise coefficients. Since spurious or missing links in the observed network topology have an adverse effect on learning, we perform topological sampling (discussed later) that imitates the feature of observed—usually incomplete—topologies. Another issue is to determine the minimal number of elementary functions required for reconstructing F and G. To do so, we sequentially remove the elementary functions with the smallest inferred coefficients and calculate, using a weighted version of Akaike’s information criterion (wAIC; discussed later), the information inconsistency between the observed nodes’ activities and the remaining set of elementary functions. This process stops when removing a certain elementary function consistently increases the value of wAIC. As shown in Fig. 1e, each curve in a plot at the left column represents the information inconsistency versus model complexity for one topological sample. We find that, indeed, the joint operation with wAIC and topological sampling is helpful for inference from noisy and incomplete data (Fig. 5k,l).
The final sets of elementary functions and their coefficients \({\hat{{{{\bf{\xi }}}}}}_{F}\) and \({\hat{{{{\bf{\xi }}}}}}_{G}\) compose the forms \(\hat{{{{\bf{F}}}}}\) and \(\hat{{{{\bf{G}}}}}\), leading to the successful inference of network dynamics described by equation (1). Indeed, as demonstrated in Fig. 1f, the trajectory generated by the inferred dynamical equation agrees well with the numerical simulations of the true network dynamics. It is worth noting that the ground truth, that is, the form of the true equation, remains unknown during the whole procedure and is only used to assess the accuracy of the final inferred results; hence, our approach works in an autonomous, unsupervised way.
wAIC
The original Akaike’s information criterion (AIC)52 is a frequently used method to balance the fitting and complexity of a model with respect to the observed data, defined as AIC = nlogMSE + 2p, where n is the number of observations, MSE is the mean squared error of the regression result of the model and p is the number of variables. By using AIC, one aims to select an optimal model that best fits the observations with the fewest variables from the model candidates. However, we find that the original AIC does not work well in the inference problem we aim to solve in the present work. Hence, we introduce a weighted version of AIC (namely, wAIC) as
which balances the fitting accuracy and model complexity. Here w is the inferred coefficient of a term from phase I. A term with a larger w inferred by phase I is more likely to be able to capture the properties of the underlying unknown dynamics. Thus, multiplying w or 1/w with AIC amplifies the impact of removing this term from the equation. The smaller the wAIC, the more consistent is the composition of the elementary functions with the observed data and less important is this removed term.
To be specific, to evaluate the relevance of term i, we remove this term from the equation inferred by phase I and calculate the value of wAICi of the new, shorter equation (Supplementary Fig. 2). We repeat this process to obtain the wAIC for each term. Then, we sort these terms based on their wAIC values, and remove terms one by one with wAIC values from small to large. This operation gives a shorter equation at each step, and we calculate the AIC values of these shortened equations. The optimal equation is determined at the turning point where the curve starts to consistently increase (Fig. 1e, purple stars).
Topological sampling
We perform topological sampling in phase II as follows. We randomly choose S nodes from all n nodes, and obtain the activities of these S nodes’ partial neighbours. Introducing the sampled ego structures and nodes’ activities into libraries LF and LG allows us to construct the self- and interaction matrices \({\widetilde{{{\varTheta }}}}_{F}\) and \({\widetilde{{{\varTheta }}}}_{G}\), respectively, as well as to further distil the elementary functions and their coefficients. We repeat the process to obtain K sets of samples and average the coefficients of the elementary functions inferred from the K sample sets. In the present work, we set S = 10 and K = 20.
sMAPE
The inference inaccuracy is quantified by sMAPE53:
where m is the cardinal number of the set that contains both inferred and true elementary functions and Ii and Ri are the inferred and true coefficients, respectively. The range of sMAPE is [0, 1]. The more accurate the inferred equation, the lower is the value of sMAPE. Note that if an inferred elementary function should not exist in the true equation or a true elementary function is not successfully inferred, the value of sMAPE increases. Therefore, sMAPE captures not only the errors of inferred coefficients but also the incorrectness of the inferred equation form.
NED
To evaluate the discrepancy between the inferred and true dynamics, we used the metric of normalized Euclidean distance (NED) that represents the distance between the two trajectories generated by the inferred and true dynamical equations. That is,
Here xi is the true trajectory and \({\hat{x}}_{i}\) is the trajectory generated by the inferred equation; t0 and T are the beginning and ending times, respectively; and Dmax is the longest Euclidean distance between a pair of points of the true trajectory.
Data availability
Source data are provided with this paper. The empirical network data include C. elegans connectome54,55,56, the mushroom-body region of Drosophila57, Northern Europe power grid58, the US power grid59, Advogato social network60 retrieved from https://networkrepository.com/ and worldwide airline network data retrieved from OpenFlights (https://openflights.org/data.html). The empirical data of epidemic spreading include daily reported numbers of H1N1 and SARS cases available at Kaggle (https://www.kaggle.com/lnunes/a-brief-comparative-study-of-epidemics/data) and the daily reported numbers of COVID-19 cases61.
Code availability
All the source codes are publicly available at the Code Ocean capsule62.
Change history
29 April 2022
A Correction to this paper has been published: https://doi.org/10.1038/s43588-022-00255-8
References
Grewe, B. F., Langer, D., Kasper, H., Kampa, B. M. & Helmchen, F. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nat. Methods 7, 399–405 (2010).
Stetter, O., Battaglia, D., Soriano, J. & Geisel, T. Model-free reconstruction of excitatory neuronal connectivity from calcium imaging signals. PLoS Comput. Biol. 8, e1002653 (2012).
Reuter, J. A., Spacek, D. V. & Snyder, M. P. High-throughput sequencing technologies. Mol. Cell. 58, 586–597 (2015).
Levy, S. E. & Myers, R. M. Advancements in next-generation sequencing. Annu. Rev. Genom. Hum. Genet. 17, 95–115 (2016).
Colizza, V., Barrat, A., Barthélemy, M. & Vespignani, A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl Acad. Sci. USA 103, 2015–2020 (2006).
Brockmann, D. & Helbing, D. The hidden geometry of complex, network-driven contagion phenomena. Science 342, 1337–1342 (2013).
Chang, S. et al. Mobility network models of COVID-19 explain inequities and inform reopening. Nature 589, 82–87 (2021).
Newman, M., Barabási, A.-L. & Watts, D. J. The Structure and Dynamics of Networks (Princeton Univ. Press, 2011).
Barzel, B. & Barabási, A.-L. Universality in network dynamics. Nat. Phys. 9, 673–681 (2013).
Harush, U. & Barzel, B. Dynamic patterns of information flow in complex networks. Nat. Commun. 8, 2181 (2017).
Stankovski, T., Pereira, T., McClintock, P. V. & Stefanovska, A. Coupling functions: universal insights into dynamical interaction mechanisms. Rev. Mod. Phys. 89, 045001 (2017).
Breakspear, M. Dynamic models of large-scale brain activity. Nat. Neurosci. 20, 340–352 (2017).
Santolini, M. & Barabási, A.-L. Predicting perturbation patterns from the topology of biological networks. Proc. Natl Acad. Sci. USA 115, E6375–E6383 (2018).
Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025–1028 (2010).
Yang, Y., Nishikawa, T. & Motter, A. E. Small vulnerable sets determine large network cascades in power grids. Science 358, eaan3184 (2017).
Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925 (2015).
Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl Acad. Sci. USA 114, E5070–E5076 (2017).
Arenas, A., Díaz-Guilera, A., Kurths, J., Moreno, Y. & Zhou, C. Synchronization in complex networks. Phys. Rep. 469, 93–153 (2008).
Barzel, B., Liu, Y.-Y. & Barabási, A.-L. Constructing minimal models for complex system dynamics. Nat. Commun. 6, 7186 (2015).
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).
Wang, W.-X., Yang, R., Lai, Y.-C., Kovanis, V. & Grebogi, C. Predicting catastrophes in nonlinear dynamical systems by compressive sensing. Phys. Rev. Lett. 106, 154101 (2011).
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).
Raissi, M. & Karniadakis, G. E. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018).
Iten, R., Metger, T., Wilming, H., Del Rio, L. & Renner, R. Discovering physical concepts with neural networks. Phys. Rev. Lett. 124, 010508 (2020).
Frishman, A. & Ronceray, P. Learning force fields from stochastic trajectories. Phys. Rev. X 10, 021009 (2020).
Brückner, D. B., Ronceray, P. & Broedersz, C. P. Inferring the dynamics of underdamped stochastic systems. Phys. Rev. Lett. 125, 058103 (2020).
Shandilya, S. G. & Timme, M. Inferring network topology from complex dynamics. New J. Phys. 13, 013004 (2011).
Newman, M. E. J. Network structure from rich but noisy data. Nat. Phys. 14, 542–545 (2018).
Rabinovich, M. I., Varona, P., Selverston, A. I. & Abarbanel, H. D. Dynamical principles in neuroscience. Rev. Mod. Phys. 78, 1213 (2006).
Marvel, S. A., Kleinberg, J., Kleinberg, R. D. & Strogatz, S. H. Continuous-time model of structural balance. Proc. Natl Acad. Sci. USA 108, 1771–1776 (2011).
Strogatz, S. H. Exploring complex networks. Nature 410, 268–276 (2001).
Barahona, M. & Pecora, L. M. Synchronization in small-world systems. Phys. Rev. Lett. 89, 054101 (2002).
Mangan, N. M., Kutz, J. N., Brunton, S. L. & Proctor, J. L. Model selection for dynamical systems via sparse regression and information criteria. Proc. Math. Phys. Eng. Sci. 473, 20170009 (2017).
Casadiego, J., Nitzan, M., Hallerberg, S. & Timme, M. Model-free inference of direct network interactions from nonlinear collective dynamics. Nat. Commun. 8, 2192 (2017).
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S. & Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5, eaau4996 (2019).
Sugihara, G. et al. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).
Sun, J., Taylor, D. & Bollt, E. M. Causal network inference by optimal causation entropy. SIAM J. Appl. Dyn. Syst. 14, 73–106 (2015).
Kralemann, B., Pikovsky, A. & Rosenblum, M. Reconstructing effective phase connectivity of oscillator networks from observations. New J. Phys. 16, 085013 (2014).
Frässle, S. et al. Regression DCM for fMRI. NeuroImage 155, 406–421 (2017).
Gilson, M., Moreno-Bote, R., Ponce-Alvarez, A., Ritter, P. & Deco, G. Estimation of directed effective connectivity from fMRI functional connectivity hints at asymmetries of cortical connectome. PLoS Comput. Biol. 12, e1004762 (2016).
Deco, G., Rolls, E. T. & Romo, R. Stochastic dynamics as a principle of brain function. Prog. Neurobiol. 88, 1–16 (2009).
Genkin, M., Hughes, O. & Engel, T. A. Learning non-stationary Langevin dynamics from stochastic observations of latent trajectories. Nat. Commun. 12, 5986 (2021).
Zhao, H. Inferring the dynamics of ‘black-box’ systems using a learning machine. Sci. China Phys. Mech. Astron. 64, 270511 (2021).
Jahnke, S., Memmesheimer, R.-M. & Timme, M. Stable irregular dynamics in complex neural networks. Phys. Rev. Lett. 100, 048102 (2008).
Champion, K. P., Brunton, S. L. & Kutz, J. N. Discovery of nonlinear multiscale systems: sampling strategies and embeddings. SIAM J. Appl. Dyn. Syst. 18, 312–333 (2019).
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021).
Lambiotte, R., Rosvall, M. & Scholtes, I. From networks to optimal higher-order models of complex systems. Nat. Phys. 15, 313–320 (2019).
Sauer, T. Numerical solution of stochastic differential equations in finance. in Handbook of Computational Finance 529–550 (Springer, 2012).
Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Flores, B. E. A pragmatic view of accuracy measurement in forecasting. Omega 14, 93–98 (1986).
White, J. G., Southgate, E., Thomson, J. N. & Brenner, S. The structure of the nervous system of the nematode Caenorhabditis elegans. Philos. Trans. R. Soc. Lond. B Biol. Sci. 314, 1–340 (1986).
Varshney, L. R., Chen, B. L., Paniagua, E., Hall, D. H. & Chklovskii, D. B. Structural properties of the Caenorhabditis elegans neuronal network. PLoS Comput. Biol. 7, e1001066 (2011).
Yan, G. et al. Network control principles predict neuron function in the Caenorhabditis elegans connectome. Nature 550, 519–523 (2017).
Scheffer, L. K. et al. A connectome and analysis of the adult Drosophila central brain. eLife 9, e57443 (2020).
Menck, P. J., Heitzig, J., Kurths, J. & Schellnhuber, H. J. How dead ends undermine power grid stability. Nat. Commun. 5, 3969 (2014).
Kunegis, J. KONECT: the Koblenz network collection. In Proc. 22nd International Conference on World Wide Web 1343–1350 (ACM, 2013).
Rossi, R. & Ahmed, N. The network data repository with interactive graph analytics and visualization. In Twenty-Ninth AAAI Conference on Artificial Intelligence (2015).
Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
Gao, T.-T. & Yan, G. A two-phase approach for inferring complex network dynamics. Code Ocean https://doi.org/10.24433/CO.4774495.v1 (2022).
Acknowledgements
T.-T.G. and G.Y. are supported by the National Key Research and Development Program of China (grant no. 2021ZD0204500), National Natural Science Foundation of China (grant nos. 12161141016 and 11875043), Shanghai Municipal Science and Technology Major Project (grant no. 2021SHZDZX0100), Shanghai Municipal Commission of Science and Technology Project (grant nos. 18ZR1442000 and 19511132101) and Fundamental Research Funds for the Central Universities. We are also grateful for the helpful discussion with B. Barzel, J. Moore, X. Ru and T. Li.
Author information
Authors and Affiliations
Contributions
G.Y. conceived the research. G.Y. and T.-T.G. designed the research. T.-T.G. performed the research. T.-T.G. and G.Y. analysed the results. G.Y. and T.-T.G. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Matthieu Gilson and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Handling editor: Jie Pan, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–20, Sections I–VI and Tables 1–5.
Source data
Source Data Fig. 1
True and inferred trajectories data.
Source Data Fig. 2
Unprocessed inferred results, time-series data and trajectories data.
Source Data Fig. 3
Unprocessed inferred results, time-series data and trajectories data.
Source Data Fig. 4
Statistical source data and trajectories data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Raw empirical data and time-series data.
Rights and permissions
About this article
Cite this article
Gao, TT., Yan, G. Autonomous inference of complex network dynamics from incomplete and noisy data. Nat Comput Sci 2, 160–168 (2022). https://doi.org/10.1038/s43588-022-00217-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-022-00217-0
- Springer Nature America, Inc.
This article is cited by
-
Higher-order Granger reservoir computing: simultaneously achieving scalable complex structures inference and accurate dynamics prediction
Nature Communications (2024)
-
Uncovering hidden nodes and hidden links in complex dynamic networks
Science China Physics, Mechanics & Astronomy (2024)
-
Data-driven discovery of linear dynamical systems from noisy data
Science China Technological Sciences (2024)