1 Introduction

Over the millennia, numerous prediction approaches, methods for feature extraction [10], and large-scale computations have advanced as efforts to understand and predict the evolution of dynamic systems. Chaotic systems, known as predetermined dynamical systems [3], display stochastic behavior [9] over time and occasionally defy prediction. To characterize the temporal and spatial relationships of these entities, physical principles are employed as models. Several complex systems have been developed to comprehend naturally occurring phenomena, their erratic behavior, and the way in which modifications in initial circumstances can lead to random changes in the solutions. In recent decades, there has been a confluence of techniques, driven by advances in processing power, computational advancements, and the abundance of data, towards contemporary information-driven approaches. Physics-based machine learning, dynamical understanding using deep neural networks [32], and portrayal learning for studying continual system dynamics and biochemical factors have all benefited from this integration.

The butterfly effect [8], observed in deterministic non-linear systems [1], refers to the delicate dependency on initial conditions where a slight modification in one phase can have a significant impact on subsequent phases. Henri Poincaré [28], a French mathematician and scientist, was the first to recognize that meteorology can be influenced by small factors that could have large impacts. Norbert Wiener [38] a renowned American scientist and theologian, also contributed to this idea. Lorenz [35] provided a quantifiable foundation to the unpredictability of the Earth’s atmosphere and oceans and linked it to the behavior of large categories of chaotic environments experiencing dynamic behavior and predictable chaos. Figure 1 depicts a Graphical Plot of Lorenz’s Strange Attractor [19], which is associated with the Butterfly Effect. Equations 1, 2, and 3 represent the dynamics of the Lorenz Attractor.

$$\begin{aligned}{} & {} \frac{dx}{dt}=\sigma \left( y-x\right) \end{aligned}$$
(1)
$$\begin{aligned}{} & {} \quad \frac{dy}{dt}=x\left( \rho -z\right) -y \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \quad \frac{dz}{dt}=xy-\beta z \end{aligned}$$
(3)
Fig. 1
figure 1

Lorenz’s strange vortex plotted for constants of \(\rho =28\), \(\sigma =10\), and \(\beta =\frac{8}{3}\). The Butterfly Effect, also known as hypersensitive dependency on initial values, is a dynamic nonlinear feature that causes the sequential positions to become indefinitely split apart from one another, starting from any of several relatively comparable potential initial states on that attractor

Chaotic systems, like natural disasters and meteorological patterns, are extremely susceptible to perturbations. While the climate can be predicted in the immediate future, some unforeseen events make long-term forecasting difficult. Dynamical systems, which express the spatio-temporal interdependence seen between a system’s components, are used to characterize these nonlinear systems. Physics-driven techniques can anticipate nonlinear systems in the shorter term when distortion or disturbance is prevalent, as is often the case in chaotic systems, but they may fail to foresee critical considerations. On the other hand, direct simulation analysis may not be suitable for modeling certain nonlinearities due to the intricate nature of the kinematics, leading to reduced forecast accuracy and increased ambiguity in the interpretation of the physical methods. Data-driven methodologies that rely on deep neural network algorithms to predict and anticipate nonlinear systems require massive volumes of data. However, in unpredictable systems, drawing conclusions and simulating long-term dependencies can be challenging and often fall short. Several studies have focused on modeling nonlinear systems using deep learning and incorporating prior knowledge, especially when testing data was scarce. Spatial quantities may hold promise in simulating real-world occurrences and solving the enduring problems of modeling nonlinear systems. Physics-informed machine learning (PIML) [13] is a novel approach that integrates past scientific knowledge into data-driven computations.

When dealing with chaotic systems involving extreme occurrences, substantial chunks of time series data are needed to provide accurate and reliable predictions. The integration of physiological expertise into information-driven machine learning algorithms allows a better grasp of complex and comprehensive occurrences rather than solely relying on the data. Physics-informed neural networks (PINNs) [5] have been proposed as a part of contemporary research. By using a smoothing filter in the error function, PINN can produce a precise outcome to the underlying partial derivative equations (PDE) by supplying a number of cluster centroids, the start, and material parameters. PINN can also be employed to tackle inverse problems when the PDE’s indices, beginning, or boundary conditions are reconstructed, and the controlling PDE’s workaround is only partially known. Additionally, spatio-temporal estimation problems have been modeled using Physics-Guided Neural Network models (PGNN) in which LSTM networks construct the simulation using the fluctuations of temperature at the water surface. [6] These models enforce the uniformity of densities with detail by firmly encoding the kinetics on a computer granularity inside a PGNN framework for a limited data regime.

Computational Intelligence has long been utilized for predicting extreme events in time series data, provided that sufficient historical data is available. In the work titled "Modeling Extreme Events in Time Series Prediction," Ding, Zhang, Pan, Yang, and He explored the central theme of enhancing the ability of deep learning models to handle extreme events in time series prediction [7]. The authors identified a weakness in conventional deep learning methods, rooted in the use of quadratic loss functions. To overcome this limitation, they drew inspiration from the Extreme Value Theory and devised a new form of loss called Extreme Value Loss (EVL) to effectively detect future occurrences of extreme events. Additionally, they proposed the incorporation of a Memory Network to memorize extreme events in historical records. By combining EVL with an adapted memory network module, they successfully achieved an end-to-end framework for time series prediction with extreme events.

In another work titled “An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks,” Li, Xu, and Anastasiu introduced a novel probability-enhanced neural network model called NEC+ for forecasting time series with extreme events [18]. They highlighted the issue of existing methods ignoring the negative influence of imbalanced data or severe events on model training. Moreover, these methods were often evaluated on a small number of well-behaved time series, which did not demonstrate their ability to generalize. To address these challenges, the proposed NEC+ model simultaneously learned extreme and normal prediction functions and employed selective backpropagation to choose among them. The model’s performance was evaluated on a challenging 3-day ahead hourly water level prediction task for nine reservoirs in California.

Furthermore, in the work “Enhancing Time Series Predictors With Generalized Extreme Value Loss,” Zhang, Ding, Pan, and Yang introduced a unified loss form called Generalized Extreme Value Loss (GEVL) [40]. This loss function bridged the misalignment between the tail parts of the estimation and the ground-truth by applying transformations to either the observed events or the estimator. The authors presented three heavy-tailed kernels, namely shifted Gaussian, Gumbel, and Fréchet kernels, and derived the corresponding GEVLs, each offering a different trade-off between modeling effectiveness and computational resources. Comprehensive experiments were conducted on a diverse set of time series predictors and real-world datasets, validating that their novel loss form substantially enhanced representative time series predictors in modeling extreme events.

These works have provided valuable insights and directions for shaping research in the domain of Time Series Prediction of Extreme Events. The integration of Computational Intelligence with advanced loss functions and memory networks has shown promise in improving the handling of extreme events in time series data. These approaches offer potential applications in various downstream tasks and demonstrate the significance of addressing extreme events in time series prediction.

Despite these advancements, existing approaches still fall short when simulating nonlinear systems. PINNs are limited to mimicking data and require inputs for coordination sites, boundaries, and initial conditions, making them applicable only to synthetic information and not directly suitable for chaotic model transformation issues in the real world.

In this article, we propose a fundamentally coherent and universally applicable deep learning framework for modeling nonlinear systems that seamlessly integrate actual statistics and modeled information directly from the kinetics and their mathematical models. To this end, we introduce a family of Attractor-Inspired Deep Learning (AiDL) models. We will demonstrate the effectiveness of our approach using extreme events from various domains, such as the mechanics of catastrophic weather controlling the cycles of the El Niño surface temperature of the sea, [11] San Juan Dengue virus transmission, [23] and Bjørnøya daily downpour. [31] Sect. 2 will discuss Attractors and their Dynamics, while Sect. 3 will lay out the AiDL proposition. To sum up the key takeaways from this research are,

  1. 1.

    Proposition of the family of Attractor-Inspired Deep Learning (AiDL).

  2. 2.

    Practical Application of AiDL to Chaotic Extreme Events.

  3. 3.

    Contrast of AiDL with traditional time series forecating algorithms.

2 Attractors

An attractor in the field of non-linear dynamics mathematics [2] is a group of states towards which a consensus mechanism advances, regardless of the system’s initial and boundary conditions. When system characteristics are sufficiently close to the vortex parameters, they remain in that vicinity even when slightly perturbed. Generally, one or more differential equations [17] can characterize a chaotic system. The behavior of a particular non-linear system is described by its characteristics over any given short time frame. To predict the system’s behavior over a longer period, it is often necessary to incorporate these equations, which can be done analytically or iteratively, frequently using computing methods. In the material realm, chaotic systems typically arise from viscous dissipation because, without a propelling source, locomotion would cease. As the initial transient response dies out, the system settles into its regular behavior when the absorption and guiding forces balance each other. The attractor, also known as the attractive portion or the attractee, represents the component of the chaotic system’s dimensional space that corresponds to the appropriate behavior. The concept of an attractor is analogous to limit sets [4] and invariant sets [14]. An immutable collection conforms to its own characteristics due to kinetics. Attractors can contain deterministic sets, and a limit set consists of elements that approach a baseline state unreasonably close as time approaches infinity. However, not every limit set is an attractor. [22] Some system points may approach a limit set, but if other system elements are perturbed slightly off the limit set, they may get pushed away and never return to the proximity of the limit set.

As mentioned earlier, AiDL (Attractor-Inspired Deep Learning) is a family of Deep Learning Paradigms. To illustrate this, we consider two members of this family, namely the Rössler Attractor (Sect. 2.1) and the Rabinovich–Fabrikant Attractor. (Sect. 2.2) Let us explore each of them.

2.1 Rössler Attractor

The Rössler attractor [16] serves as the core of Rössler systems, a set of three quasi-conventional differential equations that describe a consistent dynamical system exhibiting chaotic behavior due to the attractor’s fractal characteristics. The Rössler system’s fundamental properties necessitate the application of non-linear techniques such as Poincaré maps [37] and bifurcation charts, [20] while some system-specific characteristics can be inferred using linear techniques like eigenvectors. The Rössler attractor was designed to emulate the behavior of the Lorenz attractor while being simpler to qualitatively examine, as described in the classic Rössler study. In the presence of an unstable reference value, an orbit inside the attractor spirals outward approaching the x,y plane. As the trajectory further spirals, a secondary equilibrium point [39] influences it, causing the z-dimension to rise and contort. Although each parameter oscillates within a predetermined value range, in the temporal domain, the fluctuations appear disordered.

This attractor bears some resemblance [34] to the Lorenz attractor, although it is less complicated with just one manifold. The dynamics of the Rössler Attractors are represented by Eqs. 4, 5, and 6.

$$\begin{aligned}{} & {} \frac{dx}{dt} = -y - z \end{aligned}$$
(4)
$$\begin{aligned}{} & {} \quad \frac{dy}{dt} = x + ay \end{aligned}$$
(5)
$$\begin{aligned}{} & {} \quad \frac{dz}{dt} = b + z(x - c) \end{aligned}$$
(6)

Setting \(z=0\) leads to a mathematically elegant form of the Attractor. In the x-y plane, the dynamics of the Rössler Attractors are represented by Eqs. 7 and 8.

$$\begin{aligned}{} & {} \frac{dx}{dt} = -y \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \quad \frac{dy}{dt} = x + ay \end{aligned}$$
(8)

Fig. 2 depicts a graphical representation of the Rössler Attractor in the x-y plane.

Fig. 2
figure 2

Rössler’s strange attractor plotted for the constants \(a=0.2\). It closely resembles a stereogram

2.2 Rabinovich–Fabrikant Attractor

The Rabinovich–Fabrikant equations [29] are a set of three coupled ordinary differential equations that can exhibit chaotic behavior depending on the values of the model parameters. Equations 9, 10, and 11 describe the dynamics of the Rabinovich-Fabrikant Attractor.

$$\begin{aligned}{} & {} \frac{dx}{dt} = y(z - 1 + x^2) + \gamma x \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \quad \frac{dy}{dt} = x(3x + 1 - x^2) + \gamma y \end{aligned}$$
(10)
$$\begin{aligned}{} & {} \frac{dz}{dt} = -2z(\alpha + xy) \end{aligned}$$
(11)

Figure 3 presents a visual representation of the Rabinovich–Fabrikant Attractor.

Fig. 3
figure 3

Rabinovich-Fabrikant Attractor plotted for constants \(\alpha =0.05\) and \(\gamma =0.1\). The attractor exhibits a peculiar shape resembling a Gramophone

In this figure, we can observe the attractor’s behavior for specific parameter values of \(\alpha =0.05\) and \(\gamma =0.1\), resulting in a structure that closely resembles a Gramophone.

3 Attractor Inspired Deep Learning (AiDL)

We propose two species within this genus, one following the Rössler Attractor [16] and the other the Rabinovich Fabrikant Attractor [29]. In our study, we consider a known parameter Attractor system with exogenous harmonic perturbations based on their dynamical equations. To generate our artificial dataset, we utilize the Runge–Kutta method [30] to simulate pieces of information from the equations of chaotic differentials. Using the modeled time-series data, we construct a Recurrent Neural Network (RNN) [21] while simultaneously incorporating the physiological law into the structure as a regularisation term. The network is able to capture intricate patterns from the distribution channels of the expected values and historical significance. In the traditional Physics-Informed Neural Network (PINN) paradigm, the temporal component is treated as an input, and the resolution of the non-linear equations becomes the outcome. The process of determining the control parameter involves differentiating the richly textured feed-forward neural network and calculating the temporal dependencies using auto-differentiation. Estimating the time derivatives is challenging because time-series data consist of discrete occurrences and lack a strong mathematical model to regulate the covariates over time. To address this issue, we generate discrete components of the time-series data. Here, we computed \(\frac{dx}{dt}\) as in Equtaion 12.

$$\begin{aligned} \frac{dx}{dt}=\frac{x\left( t+\delta t\right) -x\left( t\right) }{\delta t} \end{aligned}$$
(12)

Time is represented chronologically in a sequential manner for real-time system series information. In our study, we choose \(\delta t=7\), representing the number of days in a week. As the transfer function used in MLP is differentiable, the conventional PINN paradigm backpropagates the temporal implications to modify the network loss. The network predicts the values of x(t), \(\frac{dx}{dt}\), and \(\frac{d^2x}{dt^2}\).

For Rössler Attractor Inspired Deep Learning, the Loss Function is as in Eq. 13.

$$\begin{aligned} {\mathfrak {L}}_{physical}=RMSE\left( {\mathfrak {Y}}_{predicted},\ {\mathfrak {Y}}_{real}\right) \ \end{aligned}$$
(13)

where,

$$\begin{aligned} {\mathfrak {Y}}_{predicted}=\left( \frac{d^2y_{pred}}{dt^2}+y_{pred}-a\frac{dy_{pred}}{dt}\right) \\ {\mathfrak {Y}}_{real}=\left( \frac{d^2y_{real}}{dt^2}+y_{real}-a\frac{dy_{real}}{dt}\right) \end{aligned}$$

and RMSE stands for Root mean Squared Error.

For, Rabinovich Fabrikant Attractor Inspired Deep Learning, the Loss Function is as in Eq. 14.

$$\begin{aligned} {\mathfrak {L}}_{physical} = RMSE\left( {\mathfrak {Y}}_{predicted},\ {\mathfrak {Y}}_{real}\right) \ \end{aligned}$$
(14)

where,

$$\begin{aligned} {\mathfrak {Y}}_{\mathfrak {predicted}}=\frac{d^2y_{pred}}{dt^2} -\left( \frac{dy_{pred}}{dt}\right) \left( 2yy_{pred}+\gamma \right) +\left( {\mathfrak {y}}_{pred}\right) \end{aligned}$$

with,

$$\begin{aligned} {\mathfrak {y}}_{pred}={y_{pred}}^5-3y_{pred}^4-2y_{pred}^3 -{y_{pred}}^2\left( \gamma y-3\right) +y_{pred}+\gamma y \\ {\mathfrak {Y}}_{\mathfrak {real}}=\frac{d^2y_{real}}{dt^2}-\left( \frac{dy_{real}}{dt}\right) \left( 2yy_{real} +\gamma \right) +\left( {\mathfrak {y}}_{real}\right) \end{aligned}$$

with,

$$\begin{aligned} {\mathfrak {y}}_{real}={y_{real}}^5-3y_{real}^4-2y_{real}^3 -{y_{real}}^2\left( \gamma y-3\right) +y_{real}+\gamma y \end{aligned}$$

RMSE stands for Root mean Squared Error and

$$\begin{aligned} \left( \frac{\frac{dx}{dt}-\gamma x}{x^2-1}\right) =y \end{aligned}$$

Here, using the AiDL, we will compute \(x\left( t+1\right)\), \(\frac{d}{dt}\left( x\left( t+1\right) \right)\), and \(\frac{d^2}{dt^2}\left( x\left( t+1\right) \right)\), that are the essentials for the next stage. Computaion of these, may result in an additional Loss, generically mentioned as \({\mathfrak {L}}_{data}\), delineated in Eq. 15.

$$\begin{aligned} {\mathfrak {L}}_{data}=RMSE\left( {\mathfrak {X}}_{predicted},\ {\mathfrak {X}}_{real}\right) \end{aligned}$$
(15)

where,

$$\begin{aligned} {\mathfrak {X}}_{predicted}=x_{pred}\left( t+1\right) , \left( x_{pred}\left( t+1\right) \right) ^\prime ,\left( x_{pred}\left( t+1\right) \right) ^{\prime \prime }\\ {\mathfrak {X}}_{real}=x_{real}\left( t+1\right) , \left( x_{real}\left( t+1\right) \right) ^\prime ,\left( x_{real}\left( t+1\right) \right) ^{\prime \prime } \end{aligned}$$

Algorithm 1 presents the Pseudocode for AiDL, and Fig. 4 presents the pictorial representation of the Architecture for AiDL Model. In the context of the Pseudocode, in Algorithm 1, for the Pre-Training as well as Training Phase, the Loss Functions were combined with Data Losses as \({\mathfrak {L}}={\mathfrak {L}}_{data}+{\mathcal {C}}_1\times {\mathfrak {L}}_{physical}\) and \({\mathfrak {L}}={\mathfrak {L}}_{data}+{\mathcal {C}}_2\times {\mathfrak {L}}_{physical}\), respectively, where \({\mathcal {C}}_1\) and \({\mathcal {C}}_2\) are the hyperparameters. For the Attractor Inspired Deep Learning, we considered both these hyperparameters to be 1.2. The value is chosen by means of Trial and Error.

Algorithm 1
figure a

Attractor inspired Deep Learning (AiDL)

Fig. 4
figure 4

Attractor-Inspired Deep Learning (AiDL) Architecture: The AiDL model comprises two phases - “Pre-Training” and “Training”. During the pre-training phase, neurons are modeled by taking input from the Chaotic Dynamics of Attractors. The training of neurons in this phase is accomplished using a Loss Function, \({\mathfrak {L}}_1\), through the process of Back Propagation [15], considering the Forecasted Derivatives \(x\left( t+1\right)\), \(x^\prime \left( t+1\right)\), and \(x^{\prime \prime }\left( t+1\right)\). Once the Pre-training Phase is completed, the weights and biases of the neuronal network are transferred to the Training Phase, where the model is exposed to Real-World Extreme Events. Even in this phase, the neurons are trained based on the loss function, \({\mathfrak {L}}_2\). Finally, a step-ahead forecast is obtained using the Forecasted Derivatives of \(x_e\left( t\right)\). In each phase, the training is performed by splitting the available data in an 80:20 ratio

4 Results

To validate the AiDL (Attractor inspired Deep Learning) model, we implement a primitive species of the genus—the Rössler Attractor inspired Deep Learning Model. This model will be applied to three benchmark datasets, El Niño surface temperature of the sea, San Juan Dengue virus transmission, and Bjørnøya daily downpour.

The AiDL Model consists of two phases, Pre-Training Phase and Real World Extreme Events Phase. In the Pre-Training Phase, we initiate the model with the dynamics of the Chaotic Attractors, specifically using the Rössler Attractor for experimentation. From the Rössler Attractor, we obtain a sequence of time-dependent values, denoted as x(t). To generate the First and Second Derivatives of the time-dependent values x(t), we use the ab-initio method (refer Eq. 12) for computing the derivatives. These derivatives are then fed to the neurons. The data is split according to an 80–20 rule for validation, and a loss function is applied. The main objective of this step is to generate appropriate values of Weights and Biases that pertain to the Chaoticity of the Attractors. In Phase 2, the Real World Extreme Events Phase, historical data is once again split in an 80–20 ratio to retrain the Neurons. We continue the training process even after obtaining specific values of Weights and Biases from the Pre-Training Phase, utilizing Transfer Learning. Next, we generate the higher derivatives of the time-dependent values of the Extreme Event, denoted as \(x_e(t)\). The set of 3 tuples, \(x_e(t+1)\), \(x_e^\prime \left( t+1\right)\), and \(x_e^{\prime \prime }\left( t+1\right)\), are the Forecasted Values.

To evaluate the efficiency of the model, we compare it with prevalent techniques, Long Short Term Memory (LSTM), [33] Bi-Directional LSTM, [12] Convolutional Neural Networks, [27] N-BEATS, [26] Prophet, [36] Reservoir Computing (RCN), [24] and Single Exponential Smoothing (SES). [25] We employ three widely used metrics to assess our models, where lower values indicate better performance in the forecasting task. The metrics used in this study are as follows,

  1. 1.

    Physical inconsistency

  2. 2.

    Mean absolute error

  3. 3.

    Root mean squared error

Coming to the Extreme Event data, the Sea surface temperature (SST) variations [11] in the Pacific Ocean exhibit regular patterns known as the El Niño–Southern Oscillation (ENSO). The dataset from January 3, 1990, to April 21, 2021, consists of 1634 weekly recordings of SST in El Niño zone 1. Additionally, the San Juan dataset [23] provides a univariate time-series collection showing the fluctuations in dengue infection cases over the years. Dengue fever, spread by mosquitoes, exhibits sporadic occurrences due to the unpredictability of transmission. The dataset contains a total of 1197 observations recorded between week 17 in 1990 and week 16 in 2013. We focus solely on the time series with instances reported within this period. Finally, dataset concerning catastrophic weather was compiled using precipitation observations from the Bjørnøya region by the Norwegian Meteorological Service Center. The analysis includes 15,320 daily samples [31] covering the period from June 16, 1980, to June 16, 2022. The observed results correponding to these Extreme Data are summarized below in Table 1 and graphically compared in Fig. 5.

Fig. 5
figure 5

Comparative Analysis of the Rössler Attractor inspired Deep Learning Model with Long Short Term Memory Model, and Convolutional Neural Network

Table 1 Results for Single Step ahead forecast. Each measure was subjected to 5 sets of execution, and the mean of them is tabularized

5 Discussions

The intersection of Attractor-Inspired Deep Learning (AiDL) with time series forecasting of extreme events has yielded commendable results. Our forecasting model, driven by the Rössler Attractor-inspired Deep Learning approach, outperformed both Convolutional Neural Networks (CNN) and Vanilla Long Short-Term Memory (LSTM) in capturing outliers and peak events (refer to Fig. 5). The superiority of AiDL becomes apparent through its two-stage training process, Pre-Training and Training Phases. Graphical representations of predictions in Fig. 5 vividly demonstrate that Physics-Informed Deep Learners, such as AiDL, possess a distinct advantage over traditional time series forecasting paradigms like N-BEATS, Prophet, SES, and others. This advantage is primarily attributed to the inherent pre-training by the Chaoticity of Dynamical Systems, which plays a crucial role in modeling the dynamic nature of extreme events. In contrast, traditional models lack this crucial pre-training, limiting their ability to capture the complexities of extreme events accurately. For the purpose of contrasting results, we specifically considered CNN and LSTM models in Fig. 5, as they closely align with our proposed Rössler Attractor-Inspired Deep Learning Model. The superior performance of AiDL over these traditional models demonstrates its effectiveness and superiority in handling extreme event predictions.

Overall, the results obtained from AiDL’s integration of actual statistics and mathematical models, combined with its two-stage training process, highlight its immense potential for robust time series forecasting. By effectively capturing extreme events and outperforming conventional methodologies, AiDL stands as a promising approach for various real-world applications. The success of AiDL prompts further investigation and exploration of its application to other complex domains, where the combination of physics-based insights with data-driven deep learning can unlock new frontiers in predictive modeling.

6 Conclusion

In conclusion, the proposed deep learning framework, Attractor-Inspired Deep Learning (AiDL), seamlessly integrates actual statistics and mathematical models of system kinetics. AiDL successfully bridges the gap between physics-based models and data-driven methods, providing a powerful solution for modeling nonlinear systems and predicting extreme events. Empirical results using real-world data from catastrophic weather mechanics, El Niño cycles, and disease transmission showcase AiDL’s effectiveness in enhancing modeling accuracy during extreme events. AiDL’s adaptability allows for multi-step predictions, offering valuable insights for decision-making and risk management in chaotic and stochastic systems. The integration of experience-based information with deep learning methodologies holds great promise for addressing challenges in diverse domains, making AiDL a valuable asset for researchers and practitioners alike. We provide the source codes and datasets at https://github.com/Anurag-Dutta/AiDL, encouraging further research and facilitating replication to advance time series prediction and understanding of extreme events in complex systems.