Main

From two-photon calcium imaging of neuronal activities1,2 and high-throughput genetic experiments3,4 to digital recordings of human mobility5,6,7, our ability to observe the dynamic behaviour of nodes in complex biological, social and technological systems has advanced spectacularly in the past years. The collected observations, often in the form of time-series data, allow us to extract the dynamic patterns of a system’s individual nodes. To gain meaningful insights into the system, however, such a reductionist approach of tracking all the individual nodes is insufficient. Indeed, complex system behaviour emerges not just from the single nodes but rather from the dynamic interactions between the nodes6,8,9,10,11,12,13,14,15,16,17,18. This requires us to infer complex network dynamics, that is, to retrieve both self-nodal dynamics and interaction dynamics from the accumulating data of network topological structure and nodes’ activities.

The balance of self versus interaction dynamics is the most naturally captured by a general equation that tracks the activities of all the nodes via9

$$\frac{\,{{\mbox{d}}}\,{{{{\bf{x}}}}}_{i}(t)}{\,{{\mbox{d}}}\,t}={{{\bf{F}}}}\left({{{{\bf{x}}}}}_{i}(t)\right)+\mathop{\sum }\limits_{j=1}^{n}{A}_{ij}{{{\bf{G}}}}({{{{\bf{x}}}}}_{i}(t),{{{{\bf{x}}}}}_{j}(t)),$$
(1)

where xi(t) ≡ (xi,1(t),…, xi,d(t))T is node i’s d-dimensional activity, representing, for example, the membrane potential of a neuron in a brain network9,12, the proportion of infected people in a country or region5,6,7, or the state of a component in an oscillator network19. These activities are driven by the self-regulation function F(xi) ≡ (F1(xi),…, Fd(xi))T (designed to describe the dynamics of all the nodes in isolation) and the pairwise function G(xi(t), xj(t)) ≡ (G1(xi, xj),…, Gd(xi, xj))T (which captures the dynamic mechanisms of interaction between the nodes). Finally, the network Aij, an n × n adjacency matrix, denotes the influence or flow from node j to i, where n is the number of nodes in the system. As shown in another study, with appropriate choices of nonlinear functions F and G, equation (1) is able to describe a broad range of complex systems9. However, for most real systems, the functions F and G are unknown. Hence, a pressing lacuna in the study of complex systems is a versatile computational toolbox for automatically inferring equation (1) from the observed data of network topology Aij and nodes’ activities xi(t).

Complex biological, social or technological systems lack the fundamental physical rules that govern particle systems; therefore, we do not have a priori knowledge of their internal microscopic mechanisms20. Therefore, the goal is not to only identify the model’s parameters but rather to retrieve the forms of F and G and infer the explicit model itself. Despite the recent important progress in developing methods to infer the governing equations of single- or few-body dynamics21,22,23,24,25,26,27, the task of inferring network dynamics poses particular challenges. For example, F and G are usually of different types; hence, one cannot obtain their compact forms when only using orthogonal basis functions22,23,28,29. Nodes’ activities data are noisy and the mappings of network topologies are usually incomplete30,31. Collective behaviour, such as synchronization and consensus19, can conceal the specific forms of microscopic mechanisms in interaction dynamics. To overcome these challenges, we propose here a two-phase inference approach. Our analysis indicates that the two-phase strategy allows us to achieve efficient and—most importantly—highly accurate inference, even in the face of unfavourable scenarios, such as noisy or low-resolution data or an only partially mapped topology (Fig. 1a).

Fig. 1: Overview of the two-phase inference approach.
figure 1

a, Observation data of network topology Aij, including spurious and missing links, and low-resolution and noisy data of nodal activities xi(t). b, Mapping the normalized observation data into two matrices ΘF and ΘG that represent the time-varying patterns of elementary functions. c, Phase I that narrows down the model space by identifying several leading elementary functions through global regression for each dimension of \({\dot{{{{\bf{x}}}}}}_{i}(t)\). d, Comparison of trajectories generated by the true network dynamics and the dynamical equation inferred by phase I alone. e, Phase II that performs local fine-tuning, by using topological sampling and wAIC, to further determine the optimal number (indicated by purple stars) of elementary functions for \(\hat{{{{\bf{F}}}}}({{{{\bf{x}}}}}_{i}(t))\) and \(\hat{{{{\bf{G}}}}}({{{{\bf{x}}}}}_{i}(t),{{{{\bf{x}}}}}_{j}(t))\). f, Comparison of trajectories generated by the true and inferred dynamical equations. The example illustrated in cf is HR neuronal dynamics on a directed Barabási-Albert (BA) network with size n = 100 and average degree 〈k〉 = 5.

Source data

Results

Overview of the two-phase inference approach

Lacking a priori knowledge of the structures of F and G, a natural approach is to pre-construct two extensive libraries LF and LG that contain a variety of elementary functions. The combinations of these elementary functions can potentially generate the true network dynamics. In this work, the libraries contain not only orthogonal basis functions but include polynomial, trigonometric, exponential, fractional, rescaling, sigmoid and other activation functions frequently used in various domains (Supplementary Tables 1 and 2). Large libraries are helpful for finding a compact and optimal model to capture network dynamics but they also make the inference problem more difficult; due to the lack of orthogonality, the elementary functions can be similar with each other and thus less discriminative.

By introducing the time-series data xi(t) (where i = 1, 2,…, n) into LF and LG, we obtain two time-varying matrices ΘF(t) ≡ LF(xi(t)) and ΘG(t) ≡ LG(xi(t), xj(t)) that encode the patterns of nodes’ activities imposed by the elementary functions in LF and LG (Fig. 1b). Then, the inference problem can be recast to the selection of appropriate patterns in ΘF(t) and ΘG(t) that best match the evolution of observed system state \(\dot{{{{\bf{x}}}}}(t)\), that is, to inferring the sparse coefficients ξF and ξG that best solve

$$\dot{{{{\bf{x}}}}}(t)={\widetilde{{{\varTheta }}}}_{F}(t){{{{\bf{\xi }}}}}_{F}+\widetilde{A}{\widetilde{{{\varTheta }}}}_{G}(t){{{{\bf{\xi }}}}}_{G},$$
(2)

where \(\widetilde{A}\equiv A\otimes {I}_{d}\), \({\widetilde{{{\varTheta }}}}_{F}\equiv {{{\varTheta }}}_{F}\otimes {I}_{d}\) and \({\widetilde{{{\varTheta }}}}_{G}\equiv {{{\varTheta }}}_{G}\otimes {I}_{d}\), where the symbol ⊗ denotes the Kronecker product, and Id is the d-dimensional identity matrix. Here we consider the general setting where each node state is d dimensional and the network is directed and heterogeneous. Consequently, the problem of inferring complex network dynamics is high dimensional and irreducible. Indeed, the number of elementary functions in LF and LG is approximately 25, 80 or 140 when the node activity itself has one, two or three dimensions, respectively, in the simulation validations below (Supplementary Tables 1 and 2).

Our approach is a two-phase procedure consisting of global regression and local fine-tuning. In phase I, we approximate the derivatives \(\dot{{{{\bf{x}}}}}(t)\) (Methods) and calculate the matrices \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\) and then normalize each of their columns (Fig. 1b). These normalized data are used to identify, through regression, the leading elementary functions that are most probably constituents of true F and G (Fig. 1c and Methods). Phase I is able to narrow down the model space, but the dynamical equation inferred by such regression alone lacks generative power (Fig. 1d). Next, in phase II, we perform fine-tuning with the original values of \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\), that is, without normalization. We use topological samplings (Methods) and the weighted Akaike’s information criterion (wAIC; Methods) to sequentially remove the elementary functions with the smallest inferred coefficients (Fig. 1e). The final sets of elementary functions and their coefficients \({\hat{{{{\bf{\xi }}}}}}_{F}\) and \({\hat{{{{\bf{\xi }}}}}}_{G}\) compose \(\hat{{{{\bf{F}}}}}\) and \(\hat{{{{\bf{G}}}}}\), leading to the inferred dynamics of complex networks (Fig. 1f).

Inferring complex network dynamics

To validate the effectiveness of our approach, we apply it to infer five network dynamics, including the Hindmarsh–Rose32 (HR, d = 3) and FitzHugh–Nagumo32 (FHN, d = 2) neuronal systems, social balance dynamics33 (SB, d = 1), Kuramoto dynamics34 (d = 1) and coupled heterogeneous Rössler oscillators35 (d = 3); here d is the dimension of each node activity. To obtain the nodes’ activities data, we simulate these dynamics (Supplementary Table 4) on a variety of topologies, including Erdős–Rényi (ER) and scale-free (SF) synthetic networks and five empirical networks—cellular-level brain networks of Caenorhabditis elegans and Drosophila, Advogato social network, and power grids of Northern Europe and United States. The time series of node activities and each network topology are the input data to our approach. The five specific equations governing these dynamics are the ground truths that we aim to infer. These dynamical models and networks are widely used in various domains and exhibit different properties (Supplementary Sections II and III), which accounted for the diversity of our tests.

Figure 2 illustrates the procedure of inferring FHN neuronal network dynamics. Through global regression, phase I identifies the ten most relevant elementary functions for each dimension of FHN (Fig. 2b); then, by local fine-tuning, phase II autonomously learns the compact and optimal form of the dynamical equation as well as the most appropriate coefficient for each of the necessary elementary functions (Fig. 2c). The form of the inferred equation in Fig. 2c perfectly matches the ground truth in Fig. 2a, and the learnt coefficients are also highly accurate. Indeed, the relative errors \({{\varDelta }}=(\xi -\hat{\xi })/\xi\), where ξ and \(\hat{\xi }\) are the true and learnt coefficients, respectively, are smaller than 3% (Fig. 2d). The dynamical equation inferred by our approach exhibits generative power, being able to generate nodes’ activities and trajectories that agree well with the observation data (Fig. 2e,f).

Fig. 2: Inferring FHN neuronal network dynamics on synthetic and real topologies.
figure 2

a, True FHN dynamics used to simulate nodes’ activities data on various topologies. Fd and Gd are self- and interaction dynamics of the dth dimension, respectively; xi,d is the dth dimension’s state of node i, and \({x}_{i,d}^{p}\) is the polynomial with order p. b, Ten leading elementary functions identified by phase I for each dimension. c, Necessary elementary functions and their coefficients further inferred through phase II on two synthetic networks (directed ER and undirected SF) and one empirical network (Drosophila mushroom body), where gFHN denotes the term \(({x}_{j}-{x}_{i})/{k}_{i}^{\,{{\mathrm{in}}}\,}\). d, Relative errors Δ of the inferred elementary functions and their coefficients. Note that the elementary functions ruled out from ΘF and ΘG by our approach (whose coefficients are inferred as zero) are not shown. e,f, Nodes’ activities (e) and trajectories (f) generated by the true and inferred equations.

Source data

Our approach also successfully infers the equations governing the other four network dynamics. Regarding the accuracy of learnt coefficients, the relative errors |Δ| are less than 3% for the HR (Fig. 3a) and edge (Fig. 3c) dynamics on both synthetic and empirical networks. In Kuramoto dynamics and coupled heterogeneous Rössler oscillators, the self-dynamics are non-identical, that is, each node’s dynamics has its own form (Supplementary Section III). Hence, we aim to infer an effective form of equation (1) that minimizes the inconsistency between the inferred and true nodes’ activities. Even for these more challenging cases, the two-phase approach still succeeds with relative coefficient errors |Δ| < 5% or |Δ| < 20% (Fig. 3e,g). Both activities and trajectories generated by the effective equations exhibit high agreement with the true averaging dynamics (Fig. 3f,h,i).

Fig. 3: Inference accuracy for other four typical nonlinear network dynamics.
figure 3

a,b, Similar to Fig. 2 but for inferring HR neuronal dynamics, where the interaction dynamics G(xi, xj) are composed of \({g}_{1}^{\,{{\mathrm{HR}}}\,}\equiv 1/(1+{\mathrm{e}}^{10({x}_{j}-1)})\) and \({g}_{2}^{\,{{\mathrm{HR}}}\,}\equiv {x}_{i}/(1+{\mathrm{e}}^{10({x}_{j}-1)})\). c,d, Relative errors (c) and six edges’ activities (d) of the inferred edge dynamics of social balance. ei, Relative errors of the inferred effective equations for network dynamics of the Kuramoto model and coupled Rössler oscillators. In both cases, the self-dynamics are heterogeneous, that is, the intrinsic frequency of each node is not identical but follows a normal distribution \({{{\mathcal{N}}}}(1,\sigma )\) with σ = 0.1. The grey curves represent the activity of individual nodes and the black curves represent the averaging activity of systems. Symbols gkura and gross denote the terms sin(xj − xi) and (xj − xi), respectively. The details of these dynamics and empirical networks are shown in Supplementary Tables 3 and 4.

Source data

Inferrability of network dynamics

Whether a network dynamics is inferrable depends on several factors. Here we explore three key factors, namely, synchronized dynamics, dynamical heterogeneity and deficient libraries.

Synchronized dynamics: if a network is completely synchronized, that is, all its nodes behave in the same manner19,34,35, distinguishing the activities of a node and its neighbours becomes impossible, and the microscopic interacting mechanism G(xi, xj) between the nodes will be cloaked and undiscoverable. In other words, the more synchronized a network, more difficult it is to infer its dynamics. Here we tune the coupling strength between the nodes to change the degree of network synchronization (that is, order parameter 〈R〉; Supplementary Section IV), and test the capability of our two-phase approach in inferring partially synchronized network dynamics. As shown in Fig. 4a, although the inference inaccuracy increases when the system becomes more synchronized, our approach can still infer the true FHN equation even when the network is highly synchronized (〈R〉 ≈ 0.7). The inference inaccuracy is quantified by a symmetric mean absolute percentage error (sMAPE; Methods). The more accurate the inference result, the closer the sMAPE value is to zero.

Fig. 4: Inferrability of network dynamics.
figure 4

a, Inference inaccuracy represented by sMAPE and synchronization represented by order parameter 〈R〉 versus coupling strength between the nodes. b, Inaccuracy of inferred effective equation for Kuramoto network dynamics where the natural frequency ω of each node follows a normal distribution \({{{\mathcal{N}}}}(1,\sigma )\). Larger σ indicates higher dynamical heterogeneity. c, NED (Methods) when some true elementary functions were deliberately removed from libraries LF and LG. The box–whisker plots are visualized with the Tukey method (the box represents the interquartile range (IQR) and the line in the box indicates the median, with whiskers that extend 1.5 times the IQR from the box edges; the outliers are also shown) and the sample size is 100. The networks are SF with size n = 100 and average degree 〈k〉 = 5.0. The simulation details are shown in Supplementary Table 4.

Source data

Dynamical heterogeneity: equation (1) assumes that nodes have the same form F of self-dynamics; yet this is not always true. For instance, although the self-dynamics of the Kuramoto model is simply one elementary function ω representing the natural frequency of a node, different nodes can have different values of ω. For such non-identical self-dynamics, it is difficult—if not impossible—to infer a specific form Fi(xi) for each node i due to an n-fold increase in the dimensionality of potential model space (n is the network size). Therefore, we aim to infer an effective equation that best captures the averaging dynamics (Fig. 3e,g). Here we further explore the extent of dynamical heterogeneity that our approach can tolerate. To do so, we assign each node a value of ω randomly drawn from a normal distribution \({{{\mathcal{N}}}}(0,\sigma )\) and increase the standard deviation σ. The inference inaccuracy indeed increases when σ becomes larger, and the two-phase approach can tolerate dynamical heterogeneity σ ≤ 0.5 (Fig. 4b).

Deficient libraries: although two rather comprehensive libraries of elementary functions are built, it is still possible that some elementary functions of the true unknown dynamics are missing. Another possibility is that the compact form of true dynamics cannot be composed by these elementary functions. For these cases, our two-phase approach will infer an alternative equation to capture the system behaviours. We test such capability in gene regulation and HR neuronal dynamics whose true coupling functions are intentionally removed from LG. As shown in Fig. 4c, the trajectories generated by the inferred and true equations are close to each other, and the discrepancy is small for all the nodes (Methods and Supplementary Section IVB).

Inferring from incomplete and noisy data

The incompleteness of mapped network topology and noises of observed nodes’ activities are inevitable in real data30,31. Hence, here we validate the robustness of our two-phase approach against low resolution, dynamical and observational noises, and spurious and missing links, as well as through comparisons with previous methods23,36,37.

Low resolution: experimental and digital recording technologies often have limited measurement frequencies, inducing low resolution of the observed time series. To validate our approach’s robustness against low resolution, we numerically simulate the five nonlinear network dynamics in Figs. 2 and 3 with a step size of 0.01, and then regularly downsample the activity data. We calculate the failure ratios in inferring the form of true equations (Supplementary Fig. 14a) as well as the inference inaccuracies (Fig. 5a). The results show that the two-phase strategy requires only a proportion of 5% to 50% data for the inference.

Fig. 5: Inference robustness against incompleteness and noises.
figure 5

ae, Inference inaccuracies (sMAPE) when the nodes’ activities data are low resolution (a), have dynamical noises (Gaussian white noise with intensity η (b)) or observational noises (intensity quantified by signal-to-noise ratio (c)), or when the topology data have spurious (d) and missing (e) links. fj, Comparisons of inference inaccuracies between SINDy, ARNI and our approach for inferring HR neuronal network dynamics, with varying amounts of time points (f), observational noise (g), correlated dynamical noise (h), missing links (i) and different network sizes (j). Simulation details are shown in Supplementary Table 4. k,l, Comparison results of ablation studies: HR (k) and Rössler (l). The box–whisker plots are visualized with the Tukey method (the box represents the IQR and the line in the box indicates the median, with whiskers that extend 1.5 times the IQR from the box edges; the outliers are also shown) and the sample size is 20. Five ablation studies were performed: removing topological sampling (①), using original AIC instead of wAIC (②), removing phase II (③), removing phase I (④) or without normalization to ΘF and ΘG (⑤). Statistical significance is obtained through multiple Mann–Whitney tests. Three or four asterisks indicate a p value of <10−3 or 10−4, and n.s. means not significant.

Source data

Observational and dynamical noises: observational noises are induced by the measuring process and dynamical noises represent the intrinsic stochasticity in dynamics. To produce the former, we add Gaussian noises to the nodes’ activity data and quantify the intensity of observational noise with the signal-to-noise ratio (Supplementary Section VA). To imitate the latter, we add a stochastic term of Gaussian white noise with intensity η into the true dynamical equations and generate the nodes’ activities data by the numerical simulations of these stochastic differential equations (Supplementary Section VA). We test the impact of these two types of noise on the performance of the two-phase inference approach, without any denoising pre-process. As shown in Fig. 5b and Supplementary Fig. 14b, the approach can tolerate dynamical noise with η ≤ 0.15, meaning that it successfully reconstructs the hidden equations when the stochastic intensity is not higher than 15% of the average amplitude of true deterministic dynamics. Moreover, the approach can tolerate 30 dB observational noise (Fig. 5c and Supplementary Fig. 14c).

Spurious and missing links: spurious and missing links in real data induce an incomplete network topology Aij, which further leads to an inaccurate interaction matrix ΘG. To test the impact of these erroneous links, we randomly add or remove a fraction of links from the true network topology that was used to simulate the nodes’ activities. Owing to the topological sampling in phase II, our approach is able to tolerate 25% spurious and 30% missing links (Fig. 5d,e and Supplementary Fig. 14d,e).

Comparison with previous methods: the two most illuminating and effective methods for dynamics inference are Sparse Identification of Nonlinear Dynamics (SINDy)23 and Algorithm for Revealing Network Interactions (ARNI)37. Note that ARNI originally aimed at inferring network topology but can be transferred to infer network dynamics by minor modification (Supplementary Section VC). Here we compare our approach with SINDy and ARNI from different aspects, including the amount of required data (Fig. 5f), robustness against observational noise (Fig. 5g), correlated dynamical noise (Fig. 5h and Supplementary Section VA), missing links (Fig. 5i) and different network sizes (Fig. 5j). Although ARNI needs fewer data points if network topologies are complete and nodal activities do not have any noise (Fig. 5f), the two-phase approach outperforms both SINDy and ARNI in inferring complex network dynamics from incomplete and noisy data (Fig. 5g–j). We also perform comparisons with SINDy’s variant36 regarding partially synchronized or heterogeneous dynamics (Supplementary Figs. 13 and 17). These results indicate that our approach can better handle high-dimensional networked systems and better cope with incompleteness and noises in data.

Ablation studies: besides the two-phase strategy, our approach also involves three important components, namely, normalization in the first phase yet non-normalization in the second for solving the issue raised by highly skewed observations at different nodes, topological sampling for imitating the feature of observed incomplete topologies and optimal selection by wAIC for determining the most appropriate complexity of inferred dynamics. The essentiality of the two-phase strategy and the three abovementioned components is demonstrated by ablation studies. Specifically, we ablate each phase or component and then assess the performance of the degenerated approaches. As shown in Fig. 5k,l and Supplementary Section VB, the inference inaccuracy (sMAPE) indeed increases if the phases or components are individually ablated.

Inference of empirical systems

To demonstrate the approach’s ability of handling empirical systems, we apply it to infer the spreading dynamics of the infectious disease influenza A (H1N1). The network underlying this diffusion system is the worldwide airline network, which captures human mobility between different countries or regions and plays a dominant role for global disease spreading5,6. Each entry Aij of the weighted network’s adjacency matrix A represents the traffic volume from node j to i, where each node denotes a country or region. The total passengers daily are approximately Φ = 8.9 × 106; taking into account the population Pi of each node i, the adjacency matrix is modified to

$${\hat{A}}_{ij}=\frac{{{\varPhi }}}{\mathop{\sum }\nolimits_{i = 1}^{n}{P}_{i}}{A}_{ij}.$$
(3)

The magnitude order of entries in matrix \(\hat{A}\) is around 10−2 to 10−3. The nodal activities xi(t) are extracted from the daily reports of infected cases in each country or region. Here we consider the nodes whose accumulated H1N1 cases are more than 100 and focus on the early spreading dynamics, that is, within the 45 days since the first case was reported in each node: this captures the system behaviour before government control.

Based on these empirical data, our approach successfully infers a concise effective dynamical equation

$$\frac{\,{{\mathrm{d}}}{x}_{i}}{{{\mathrm{d}}}\,t}=a{x}_{i}+b\mathop{\sum }\limits_{j=1}^{N}{\hat{A}}_{ij}\frac{1}{1+{\mathrm{e}}^{-({x}_{j}-{x}_{i})}},$$
(4)

where a = 0.074 and b = 7.130 (Supplementary Section VI and Supplementary Fig. 18). It is interesting that our approach infers a sigmoid (nonlinear) form, rather than the linear form of epidemic models, to better capture the interaction dynamics. This might be caused by the fact that people usually consciously travel less if their countries/regions or the destinations have a higher infection risk. Although equation (4) describes the dynamics of all the nodes with the same parameters a and b, we also extend it by taking into account dynamical heterogeneity in the nodes, that is, to obtain ai and bi from each node i’s activity data (Fig. 6b–e and Supplementary Fig. 18).

Fig. 6: Inference of early spreading dynamics from empirical data.
figure 6

a, Worldwide airline network (partial) used for the inference. Each node represents a country or region, and the line thickness represents the amount of passenger flow. The form describes the dynamical equation inferred by the two-phase approach. be, Comparisons between the empirical cumulative number of H1N1 cases for different nodes (dashed lines) and the cumulative number generated by the inferred equation (solid lines). For better visualization, the comparisons are displayed in four plots (from b to e) and all the dates (when the first case is reported for each node) are shifted to the first day (t = 1). fi, Comparisons between the empirical and inferred cumulative numbers in different nodes for SARS (f and g) and COVID-19 (h and i).

Source data

Because empirical systems lack ground truths, we verify the inferred equation (4) by testing its generalizability to the spread of severe acute respiratory syndrome (SARS) and coronavirus disease 2019 (COVID-19). Based on the daily reported numbers within the first 45 days in each node, we find that equation (4) is also able to capture the early spread of SARS and COVID-19 on the worldwide airline network. Indeed, as shown in Fig. 6f–i and Supplementary Figs. 19 and 20, evolution of the cumulative numbers of SARS cases (for nodes whose eventual infected cases are more than 100) and COVID-19 cases (for nodes whose eventual infected cases are more than 2,000) agree well with the activities generated by equation (4) with heterogeneous parameters ai and bi.

Discussion

Many real networks have been mapped so far, but there are still complex systems whose network structure information is totally missing. For the latter, a possible scheme is inferring their topological structure, especially directed or causal networks30,37,38,39,40,41, from nodes’ activities data first and then applying our approach to infer the system dynamics. It is worth noting that inferring the network structure from nodes’ activities data is also challenging, especially when the number of nodes is large42,43, because the number of parameters needing to be estimated is about n2 (where n is the network size). Therefore, how to simultaneously infer both structure and dynamics of large, complex systems is still an outstanding problem.

Our work also raises several questions worthy of future pursuit. First, stochasticity in the dynamics of some real complex systems might be stronger than that considered in this work. Such highly stochastic systems are better described by stochastic differential equations29,44,45,46. Second, our approach does not account for discrete or Boolean dynamics, or systems that contain thresholding terms or exhibit irregular dynamics with instability properties47. Third, when the nodal activity is multidimensional, experimental access might be limited to a sub-dimension of the activity vector. The Koopman operator and time-delay embedding techniques are helpful for capturing the dynamical properties of sub-dimension observable systems48. Yet, the problem remains unsolved for complex networked systems. Finally, the nodes in a complex system can have higher-order—beyond pairwise—couplings, and such higher-order interactions may impact the dynamics of networked systems49,50. Hence, it is an interesting direction to extend the approach to inferring higher-order network dynamics.

Methods

Two-phase inference approach

The left-hand side of equation (1) represents the time-varying derivative of each node’s activity, which can be numerically obtained from xi(t) through the five-point approximation51

$$\dot{x}_t\approx \frac{{x}_{t-2\delta t}-8{x}_{t-\delta t}+8{x}_{t+\delta t}-{x}_{t+2\delta t}}{12\delta t},$$
(5)

where δt is the time step. Hence, the specific goal is to infer both exact structure and corresponding coefficients of the self-dynamics function F(xi(t)) and the interaction dynamics function G(xi(t), xj(t)).

Because we lack a priori knowledge of the forms of F and G, we construct two comprehensive libraries, namely, LF and LG, for self- and interaction dynamics, respectively, including polynomial, trigonometric, exponential, fractional, rescaling and various activation functions (Supplementary Tables 1 and 2). By introducing the observed time series of nodes’ activities to the elementary functions in LF and LG, we obtain two matrices ΘF(t) = LF(xi(t)) and ΘG(t) = LG(xi(t), xj(t)) that describe the corresponding behaviours of these elementary functions (Supplementary Fig. 1). To infer the compact forms that best match equation (2), we propose a two-phase approach.

Phase I, global regression: the purpose of this phase is to assess the relevance of each elementary function in LF and LG to the true, yet unknown, network dynamics. Given the observations of xi(t) for all i at time t, we approximate the derivatives \(\dot{{{{\bf{x}}}}}(t)\) and calculate the matrices \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\). These values are highly skewed and can span several orders of magnitude (Supplementary Fig. 3) due to the skewness of node degrees and nonlinearity of system dynamics, which could induce an overestimation of the importance for inherently low-value constituents. To eliminate this severe effect, it is crucial to normalize each column in \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\). Then, the inference problem described by equation (2) is further recast to an optimization formula:

$$\mathop{{\mathrm{arg}}\,\, {\mathrm{min}}}\limits_{{{{{\bf{\xi }}}}}_{F},{{{{\bf{\xi }}}}}_{G}}\int\nolimits_{0}^{T}\left(\parallel {\widetilde{{{\varTheta }}}}_{F}(t){{{{\bf{\xi}}}}}_{F}+\widetilde{A}{\widetilde{{{\varTheta }}}}_{G}(t){{{{\bf{\xi }}}}}_{G}-\dot{{{{\bf{x}}}}}(t){\parallel }^{2}\right)\,{{\mathrm{d}}}t+\lambda (\left\Vert {{{{\bf{\xi }}}}}_{F}\right\Vert +\left\Vert {{{{\bf{\xi }}}}}_{G}\right\Vert ),$$
(6)

where λ > 0 is a hyper-parameter that regulates the sparsity of coefficient vectors ξF and ξG. We employ the regression analysis method of the least absolute shrinkage and selection operator to solve equation (6) and perform a fivefold validation to obtain the most appropriate value of λ (Supplementary Section IB). The resultant ξF and ξG capture the relevance of each elementary function in LF and LG, enabling the identification of leading elementary functions that are most probably the constituents of the true F and G (Fig. 1c). Consequently, phase I is able to narrow down the model space. However, the dynamical equation inferred by such regression alone lacks generative power. For instance, as shown in Fig. 1d, the trajectory generated by an inferred dynamical equation of phase I deviates from that of the true network dynamics.

Phase II, local fine-tuning: to reconstruct generative and concise expressions for F and G, we next perform fine-tuning in the reduced model space (Supplementary Section IB). In contrast to phase I, we now use the original values of \(\dot{{{{\bf{x}}}}}(t)\), \({\widetilde{{{\varTheta }}}}_{F}(t)\) and \({\widetilde{{{\varTheta }}}}_{G}(t)\), that is, without normalization, to further identify the necessary elementary functions and learn their precise coefficients. Since spurious or missing links in the observed network topology have an adverse effect on learning, we perform topological sampling (discussed later) that imitates the feature of observed—usually incomplete—topologies. Another issue is to determine the minimal number of elementary functions required for reconstructing F and G. To do so, we sequentially remove the elementary functions with the smallest inferred coefficients and calculate, using a weighted version of Akaike’s information criterion (wAIC; discussed later), the information inconsistency between the observed nodes’ activities and the remaining set of elementary functions. This process stops when removing a certain elementary function consistently increases the value of wAIC. As shown in Fig. 1e, each curve in a plot at the left column represents the information inconsistency versus model complexity for one topological sample. We find that, indeed, the joint operation with wAIC and topological sampling is helpful for inference from noisy and incomplete data (Fig. 5k,l).

The final sets of elementary functions and their coefficients \({\hat{{{{\bf{\xi }}}}}}_{F}\) and \({\hat{{{{\bf{\xi }}}}}}_{G}\) compose the forms \(\hat{{{{\bf{F}}}}}\) and \(\hat{{{{\bf{G}}}}}\), leading to the successful inference of network dynamics described by equation (1). Indeed, as demonstrated in Fig. 1f, the trajectory generated by the inferred dynamical equation agrees well with the numerical simulations of the true network dynamics. It is worth noting that the ground truth, that is, the form of the true equation, remains unknown during the whole procedure and is only used to assess the accuracy of the final inferred results; hence, our approach works in an autonomous, unsupervised way.

wAIC

The original Akaike’s information criterion (AIC)52 is a frequently used method to balance the fitting and complexity of a model with respect to the observed data, defined as AIC = nlogMSE + 2p, where n is the number of observations, MSE is the mean squared error of the regression result of the model and p is the number of variables. By using AIC, one aims to select an optimal model that best fits the observations with the fewest variables from the model candidates. However, we find that the original AIC does not work well in the inference problem we aim to solve in the present work. Hence, we introduce a weighted version of AIC (namely, wAIC) as

$${{\mathrm{wAIC}}}=\left\{\begin{array}{rcl}&&w (n\log {{\mathrm{MSE}}}+2p),(n\log {{\mathrm{MSE}}}+2p)\ge 0,\\ &&(n\log{{\mathrm{MSE}}}+2p)/w,(n\log {{\mathrm{MSE}}}+2p) < 0,\end{array}\right.$$
(7)

which balances the fitting accuracy and model complexity. Here w is the inferred coefficient of a term from phase I. A term with a larger w inferred by phase I is more likely to be able to capture the properties of the underlying unknown dynamics. Thus, multiplying w or 1/w with AIC amplifies the impact of removing this term from the equation. The smaller the wAIC, the more consistent is the composition of the elementary functions with the observed data and less important is this removed term.

To be specific, to evaluate the relevance of term i, we remove this term from the equation inferred by phase I and calculate the value of wAICi of the new, shorter equation (Supplementary Fig. 2). We repeat this process to obtain the wAIC for each term. Then, we sort these terms based on their wAIC values, and remove terms one by one with wAIC values from small to large. This operation gives a shorter equation at each step, and we calculate the AIC values of these shortened equations. The optimal equation is determined at the turning point where the curve starts to consistently increase (Fig. 1e, purple stars).

Topological sampling

We perform topological sampling in phase II as follows. We randomly choose S nodes from all n nodes, and obtain the activities of these S nodes’ partial neighbours. Introducing the sampled ego structures and nodes’ activities into libraries LF and LG allows us to construct the self- and interaction matrices \({\widetilde{{{\varTheta }}}}_{F}\) and \({\widetilde{{{\varTheta }}}}_{G}\), respectively, as well as to further distil the elementary functions and their coefficients. We repeat the process to obtain K sets of samples and average the coefficients of the elementary functions inferred from the K sample sets. In the present work, we set S = 10 and K = 20.

sMAPE

The inference inaccuracy is quantified by sMAPE53:

$$\,{{\mathrm{sMAPE}}}\,=\frac{1}{m}\mathop{\sum }\limits_{i=1}^{m}\frac{| {I}_{i}-{R}_{i}| }{(| {I}_{i}| +| {R}_{i}| )},$$
(8)

where m is the cardinal number of the set that contains both inferred and true elementary functions and Ii and Ri are the inferred and true coefficients, respectively. The range of sMAPE is [0, 1]. The more accurate the inferred equation, the lower is the value of sMAPE. Note that if an inferred elementary function should not exist in the true equation or a true elementary function is not successfully inferred, the value of sMAPE increases. Therefore, sMAPE captures not only the errors of inferred coefficients but also the incorrectness of the inferred equation form.

NED

To evaluate the discrepancy between the inferred and true dynamics, we used the metric of normalized Euclidean distance (NED) that represents the distance between the two trajectories generated by the inferred and true dynamical equations. That is,

$$\,{{\mathrm{NED}}}\,({x}_{i},{\hat{x}}_{i})=\frac{1}{{D}_{\max }(T-{t}_{0})}\mathop{\sum }\limits_{t={t}_{0}}^{T}\sqrt{{({x}_{i}(t)-{\hat{x}}_{i}(t))}^{2}+{({\dot{x}}_{i}(t)-{\dot{\hat{x}}}_{i}(t))}^{2}}.$$
(9)

Here xi is the true trajectory and \({\hat{x}}_{i}\) is the trajectory generated by the inferred equation; t0 and T are the beginning and ending times, respectively; and Dmax is the longest Euclidean distance between a pair of points of the true trajectory.