1 Introduction

Time series is a statistical term for a collection of non-static data points [10] with temporal coherence between them. Time series forecasting involves predicting futuristic data points in time by statistical or intellectual means. Traditionally, time series prediction finds its intuition from statistical inference methodologies, for example, autoregressive integrated moving average (ARIMA) [23], simple exponential smoothening (SES) [14], etc. In recent times, machine intelligence has been notable, and they are being used in almost every domain of science and technology, be it, engineering [29], medicine [28], or even the legislature [31]. Thereby, machine learning is also utilized for time series forecasting, for example, long short-term memory (LSTM) [15], reservoir computing (RCN) [32], etc.

One of the best exemplary uses of time series forecasting was evident during the COVID-19 Pandemic; several researchers, across the statistical and computer domains came forward to provide defense against the virus. This resulted in the development of novel epidemic forecasting models, epidemiologically-informed neural networks (EINNs) [30], epidemiological priors informed deep neural networks (Epi-DNNs) [25], etc. Post-COVID-19 saw a boom in Machine Intelligence, be it, large language models (LLMs) [8], natural language processing (NLP) [16], generative adversarial networks (GANs) [6], etc.

Mostly, time series is used for forecasting numeric data points, but with this, in our research, our target is visual data points, (or multimedia). Multimedia is a stream of binary digits [17], with added semantics. For this research, we are considering the most important component of multimedia-images. Our objective is to predict the next Frame, from a set of previous frames as a lag. Recent works on time series forecast focus on two important aspects of forecast, length of forecast, and computational time of forecast.

Initially, LSTMs were used for the forecast. Still, it was difficult to scale up using LSTM units, later on with the innovation of attention-based strategies [34], Transformers came up with their stack-based architecture which is easy to scale. But the main aspects were still lagging. They were not computationally efficient. Then came, the lightweight stack-block type architecture [36], Neural basis expansion analysis for interpretable time series forecasting (NBEATS) [27], neural basis expansion analysis with exogenous variables (NBEATSx) [26], neural hierarchical interpolation for time series forecasting (NHiTS) [7], etc. Alongside these, the trivial models came up again hybridizing with loss functions inspired by physical differentials like the works of Elabid et al. [13], and Dutta et al. [12], performing almost similarly to the stack-block type architecture. While the former focused on a unit-step-ahead forecast, the latter focused on extending the paradigm to a multi-step-ahead forecast. Besides, the latest work by Cerqueira et al. [5] contributed extensively to the existing literature by proposing, a novel methodology, forecasted trajectory neighbors (FTN), that claims to be integrated with existing multi-step-ahead forecasting algorithms provoking substantial improvements in their performance. Drawing from the positives of these paradigms, we will propose, newtonian physics informed neural network (NwPiNN), a physics informed neural network built on the laws of Newtonian (or classical) physics, thus giving it the ability to forecast alongside modelling the perturbations in the forecasting lag.

The reason behind the choice of a PINN for the forecast is that the forecast of pixel gray values (which is the main objective of the research) will not only be dependent on the base pixel gray values throughout the lag but also, it will be affected by the velocity and acceleration possessed by pixels. If it had been simply the base gray values, lightweight models like NBEATS, or NHiTS would have taken the pie, but since it involves the perturbations of the velocity and acceleration of each pixel’s values, a physics informed neural network [4] is needed to model its parameters accordingly.

Newtonian Laws of Motion is an established set of conceptualizations that succeeds in explaining the dynamics of several (non-chaotic) observations that we witness around us. Since the targeted input for this proposal is a sequence of frames, containing visual data; besides the proposal NwPiNN being a physics-informed neural network, for the modelling purpose, we adopt the Newtonian Laws for the physics behind the PINN. Further, suppose we peep deep into the input characteristics, they are simply a sequence of pixels, with physical coherence between them, in terms of their derivates of displacement, concerning the temporal dependence. For Example, if we break down a “One-Second Video” into say 24 frames, we can notice the coherence between consecutive frames [19]. Now, the relationship between pixels can be well visualized; some pixels will keep moving by a constant shift (displacement), while some will move with a constant scale (velocity, or higher order differentials of displacement), which can be well modelled by the PINN.

In recent times, much work is being done on the spatio-temporal forecast of the image sequence, For example, the work by Meyer, Langer et al. [21] on the forecast of fresh concrete properties concludes on the fact that their research can be modified by utilizing more of updated forecasting models like those based on Transformers, promising hope of improved performance. Another work by Arslan et al. [1] on the forecast of molecular profiling of multi-omic biomarkers using hematoxylin and eosin stained slide images comments on exploring assortment between tumour morphology, and model predictions, and this correlation is hoped to promise improved accuracy.

2 Problem Statement

Given a sequence of n frames of a \(q \times p\) visual data, as n data points, \(\mathcal {V}(n)=\{v_1, v_2, \ldots v_n\}\), we need to generate the spatio-temporal future data points, \(v_{n+\lambda }, \ \forall \ \lambda > 0\), such that,

$$\begin{aligned} v_i=\left( \begin{matrix}\gamma _{1,1,i}&{}\cdots &{}\gamma _{1,p,i}\\ \vdots &{}\ddots &{}\vdots \\ \gamma _{q,1,i}&{}\cdots &{}\gamma _{q,p,i}\\ \end{matrix}\right) \end{aligned}$$
(1)

where, each \(0\le \gamma _{x,y,i}\le 2^8-1\) in (1) represents the gray value of the pixel in \(x{\text {th}}\) row, \(y{\text {th}}\) column, of the \(i{\text {th}}\) data point (or frame). For this research, we would consider, \(\lambda =1\). Therefore, the data stream is supposed to be cuboidal. Now, LSTM expects three-dimensional input in the form of (batch, seq_length, input_feature), to make the input data (\(n \times p \times q\)) compatible for the same, samples of seq_length n from each of \(p \times q\) pixels were considered in iterations, with batch of 64 each with their respective input_feature.

3 Preliminaries

3.1 Long-Short Term Memory

The artificial neural network (ANN) [11], long short-term memory is utilized in machine learning and machine intelligence. LSTM features backpropagation instead of typical feed-forward neural network [3] models. Such a neural network with recurrent connections may comprehend complete data streams and individual data pieces. Because of this feature, LSTM recurrent networks are perfect for handling and forecasting information. A cell, an input \(\left( I_t\right)\), an output gate \(\left( O_t\right)\), and a forget gate \(\left( F_t\right)\) make up a typical LSTM unit. The three gates determine how data moves through and out of the neuron, and the cell retains data across configurable time frames. By allocating a number between zero and one to a prior level in comparison to a current feed, forget gates choose which knowledge from a prior level to disregard. Considering there may be the latency of uncertain length amongst significant occurrences in a time series, LSTM recurrent networks are quite well suited to categorizing, interpreting, and predicting outcomes using time-series information. To solve the diminishing contour issue that can arise when learning conventional RNNs, LSTMs were created. The advantage of LSTM beyond RNNs concealed Markov models [9], and other succession learning approaches in many situations is their relative callousness to the separation distance. NwPiNN is also built on the realms of long short term memory. To adjust each component of the LSTM recurrent network following the gradient of the failure at the output nodes of the LSTM network, a Recurrent neural network [20] with LSTM blocks can be developed under supervision on a collection of training sequences using an optimizer-like gradient descent mixed with backpropagation throughout the duration to estimate the slopes desired throughout the optimization procedure.

3.2 Newtonian Laws of Kinematics

The movement of macroscopic objects, celestial objects, and galaxies, is described by the physical theory of classical mechanics. Sir Issac Newton is regarded as the father of classical physics (or Newtonian Physics) because of his immense contributions to the field of classical dynamics [24], be it Kinematics, Gravitation, or Optics, Newtonian Laws cover them all. For this research, we would stick to his contribution to Kinematics. Kinematics is a sub-branch of Classical Physics that defines the dynamics of particles, without considering the compelling force.

Fig. 1
figure 1

Velocity of any particle, \({\textbf {v}}\) is computed by the gradient of the trajectory followed by the radial vector, \({\textbf {r}}\)

Fig. 2
figure 2

Acceleration of any particle, \({\textbf {a}}\) is computed by the rate of change of velocity, \({\textbf {v}}\)

To begin with, we have the position vector, \({\textbf {r}}\) of any particle,

$$\begin{aligned} {\textbf {r}} = x\ \hat{{\textbf {i}}} + y\ \hat{{\textbf {j}}} + z\ \hat{{\textbf {k}}} \end{aligned}$$
(2)

where, xyz are the coordinates, and \(\hat{{\textbf {i}}},\ \hat{{\textbf {j}}}\), and \(\hat{{\textbf {k}}}\) are the unit vectors along the respective axes.

The rate of change of the position vector (refer to Fig. 1), as in (2) with time, is velocity, \({\textbf {v}}\),

$$\begin{aligned} {\textbf {v}} = \frac{\text {d}{} {\textbf {r}}}{\text {d}t} = \left( \frac{\text {d}x}{\text {d}t}\right) \ \hat{{\textbf {i}}} + \left( \frac{\text {d}y}{\text {d}t}\right) \ \hat{{\textbf {j}}} + \left( \frac{\text {d}z}{\text {d}t}\right) \ \hat{{\textbf {k}}} \end{aligned}$$
(3)

Further, the rate of change of the velocity vector (refer to Fig. 2), as in (3) with time, is acceleration, \({\textbf {a}}\),

$$\begin{aligned} {\textbf {a}} = \frac{\text {d}^2{\textbf {r}}}{\text {d}t^2} = \left( \frac{\text {d}^2x}{\text {d}t^2}\right) \ \hat{{\textbf {i}}} + \left( \frac{\text {d}^2y}{\text {d}t^2}\right) \ \hat{{\textbf {j}}} + \left( \frac{\text {d}^2z}{\text {d}t^2}\right) \ \hat{{\textbf {k}}} \end{aligned}$$
(4)

Kinematic equations relating (2), (3), and (4), termed as equations of motion are as follows,

  1. 1.
    $$\begin{aligned} |{\textbf {v}}_{t} |= |{\textbf {v}}_{0} |+ |{\textbf {a}} |\cdot t \end{aligned}$$
    (5)
  2. 2.
    $$\begin{aligned} |{\textbf {r}} |= |{\textbf {v}}_{0} |\cdot t + \frac{1}{2} \cdot |{\textbf {a}} |\cdot t^2 \end{aligned}$$
    (6)
  3. 3.
    $$\begin{aligned} |{\textbf {v}}_{t} |^2 - |{\textbf {v}}_{0} |^2 = 2 \cdot |{\textbf {a}} |\cdot |{\textbf {r}} |\end{aligned}$$
    (7)

Equations 5, 6, and 7 serve a pivotal role in Newtonian Physics Informed Neural Network (NwPiNN).

Fig. 3
figure 3

Architectural view of the Newtonian physics informed neural network (NwPiNN). The model builds on the critical modelling capabilities of physics informed neural network to obtain the zeroth, first, second derivatives, \(x_{\text {event}}\), \(v_{\text {event}}\), and \(a_{\text {event}}\), respectively. An LSTM unit is used as a back-end for NwPiNN, followed by a loss function, motivated by the laws of Newtonian physics for both the pre-training and training phases. The trained neuronal parameters (weights and biases) after \(\epsilon _\text {max}\) epochs from the Pre-Training Phase are transferred to the training Phase. This procedure is followed for all the \(q \times p\) pixels of the lag frames

4 Newtonian Physics Informed Neural Network (NwPiNN)

This section lays the foundation, of Newtonian physics informed neural network (NwPiNN), building upon the single step ahead prediction framework of knowledge based deep learning (KDL) [13]. The proposal, unlike other physics informed neural networks, is free from chunks of parameters. The model is composed of 2 conjugated components, incentivized by transfer learning [33]. The first segment, pre-training of the recurrent neural network (RNN), is done by modelling the weights and biases using an artificial dataset, based on the dynamics of Newtonian mechanics, simulated using the Runge–Kutta method, to teach the exogenous harmonic perturbations in the neurons. This training (technically, pre-training) is done by incorporating a physical loss function based on Newtonian dynamics, supported by an adaptive moment estimation optimizer; because of this, the RNN will successfully capture disturbances in the historical sequence across the temporal domain. The resolution of the non-linear equations is the output, and the temporal component is considered as an input in the conventional physics-informed neural network (PINN) paradigm. Differentiating the highly-grained feed-forward neural network and computing the temporal relationships through auto-differentiation are the steps involved hereby in figuring out the control parameter. Time-series data are discrete events without a robust mathematical model, and to control the variables over time is difficult, thus we create discrete time-series data components to overcome this challenge. The first order differentials, \(\frac{\text {d}\phi }{\text {d}\psi }\) are computed using the First Order Principle [35] as follows,

$$\begin{aligned} \frac{\text {d}\phi }{\text {d}\psi }=\frac{\phi \left( \psi +\delta \psi \right) -\phi \left( \psi \right) }{\delta \psi } \end{aligned}$$

The traditional physics-informed neural network paradigm backpropagates the temporal implications to adjust the network loss since the transfer function employed in multi-layered perceptron (MLP) [2] is differentiable. The network thereby forecasts (or computes), the values of the zeroth, first, and second order differentials, of the discrete series, x(t).

The Loss function that initiates the backpropation (both, for pre-training and training phases) for NwPiNN, is given in (8).

$$\begin{aligned} \mathfrak {L}_{\text {phy}}=\text {RMSE}\left( \mathfrak {Y}_{\text {pred}},\ \mathfrak {Y}_{\text {real}}\right) \end{aligned}$$
(8)

where,

$$\begin{aligned} \mathfrak {Y}_{\text {pred}} = 2 \cdot y_{\text {pred}} \cdot \frac{\text {d}^2y_{\text {pred}}}{\text {d}t^2} - \left( \frac{\text {d}y_{\text {pred}}}{\text {d}t}\right) ^2 \end{aligned}$$

and

$$\begin{aligned} \mathfrak {Y}_{\text {real}} = 2 \cdot y_{\text {real}} \cdot \frac{\text {d}^2y_{\text {real}}}{\text {d}t^2} - \left( \frac{\text {d}y_{\text {real}}}{\text {d}t}\right) ^2 \end{aligned}$$

Following the transfer of the learned neuronal parameters by the RNN, to the second segment where, we will compute \(x\left( t+1\right)\), which is the next data point in the discrete temporal sequence; computation of which, may result in an additional data loss, represented as \(\mathfrak {L}_{\text {data}}\), as in (9).

$$\begin{aligned} \mathfrak {L}_{\text {data}}=\text {RMSE}\left( \mathfrak {X}_{\text {pred}},\ \mathfrak {X}_{\text {real}}\right) \end{aligned}$$
(9)

where, \(\mathfrak {X}_{\text {pred}}=\frac{\text {d}^0\left( x_{\text {pred}}(t+1)\right) }{\text {d}t^0}, \frac{\text {d}^1\left( x_{\text {pred}}(t+1)\right) }{\text {d}t^1}, \frac{\text {d}^2\left( x_{\text {pred}}(t+1)\right) }{\text {d}t^2}\), and \(\mathfrak {X}_{\text {real}}=\frac{\text {d}^0\left( x_{\text {real}}(t+1)\right) }{\text {d}t^0}, \frac{\text {d}^1\left( x_{\text {real}}(t+1)\right) }{\text {d}t^1}, \frac{\text {d}^2\left( x_{\text {real}}(t+1)\right) }{\text {d}t^2}\).

Algorithm 1 presents the Pseudo code for NwPiNN, and Fig. 3 presents the pictorial representation of the architecture for NwPiNN model. In the context of the Pseudo code, in Algorithm 1, for the pre-training as well as training phase, the loss functions were combined with data losses as \(\mathfrak {L}=\mathfrak {L}_{\text {data}}+\mathcal {C}_1\times \mathfrak {L}_{\text {phy}}\) and \(\mathfrak {L}=\mathfrak {L}_{\text {data}}+\mathcal {C}_2\times \mathfrak {L}_{\text {phy}}\), respectively, where \(\mathcal {C}_1\) and \(\mathcal {C}_2\) are the hyperparameters. For the Newtonian physics informed neural network (NwPiNN), we considered both these hyperparameters to be 0.2. The value is chosen through Trial and Error. Now, since in this research, we are working with spatio-temporal data, we would apply the NwPiNN model recursively for all the \(q \times p\) pixels of the lag frames.

Algorithm 1
figure a

Newtonian physics informed neural network (NwPiNN).

Fig. 4
figure 4

Bouncy ball synthetic dataset

5 Experimental Setup

This section discusses the data and metrics for the experiment and finally delineates the results.

5.1 Datasets

For testing the efficiency of our model, we would run the model of 2 synthetic datasets,

  1. 1.

    Repeated projectile on an elastic surface, which is a 200 frame code generated synthetic dataset, generated subjecting to 3 degrees of freedom, initial velocity (\(v_{0}\)), angle of projection (\(\Theta\)), and coefficient of restitution (\(e_{\text {res}}\)). Varying any (or all) of these parameters is expected to have a visible influence on the Number of Repetition (\(n_{\text {rep}}\)), and the Maximum Height (\(y_{\text {max}_i}\)) for these \(n_{\text {rep}}\) repeated projectiles, \(\forall \ 1 \le i \le n_{\text {rep}}\).

  2. 2.

    Bouncy ball on a hard pitch, is again a 200 frame code generated synthetic dataset, generated subjecting to 2 degrees of freedom, initial height (\(y_{\text {max}_1}\)), and coefficient of restitution (\(e_{\text {res}}\)). Varying one (or both) of these parameters is expected to have a visible influence on the number of bounce (\(n_{\text {bounce}}\)), and the maximum height (\(y_{\text {max}_i}\)) for \(n_{\text {bounce}}\) repeated projectiles, \(\forall \ 2 \le i \le n_{\text {bounce}}\).

Figures 5, and 4 give a pictorial representation of the Synthetic Datasets.

Fig. 5
figure 5

Projectile synthetic dataset

5.2 Implementation

Newtonian physics informed neural network (NwPiNN) is implemented on a general MIPS-5 Stage Architecture, with an Intel Core i7-1065G7 microprocessor, an NVIDIA GeForce MX330 graphic accelerator, and an Intel(R) Iris(R) Plus Graphics graphic accelerator.

5.3 Metrics

For evaluation of Newtonian physics informed neural network (NwPiNN), we would use the following metrics,

  1. 1.

    Root mean squared error, which calculates the squared root of the average squared difference between the estimated values \(\left( \gamma _{i,j,n+1}^{\text {pred}} \right)\) and the actual gray values \(\left( \gamma _{i,j,n+1}^{\text {real}} \right)\) for the \(p \times q\) pixels. Mathematically,

    $$\begin{aligned} \text {RMSE}=\sqrt{\frac{ \sum _{i=1}^{p}\sum _{j=1}^{q}\left( \gamma _{i,j,n+1}^{\text {pred}}-\gamma _{i,j,n+1}^{\text {real}}\right) ^2}{p \times q}} \end{aligned}$$
  2. 2.

    Mean absolute error, which calculates the average absolute difference between the estimated values \(\left( \gamma _{i,j,n+1}^{\text {pred}} \right)\) and the actual gray values \(\left( \gamma _{i,j,n+1}^{\text {real}} \right)\) for the \(p \times q\) pixels. Mathematically,

    $$\begin{aligned} \text {MAE}=\frac{ \sum _{i=1}^{p}\sum _{j=1}^{q}|\gamma _{i,j,n+1}^{\text {pred}}-\gamma _{i,j,n+1}^{\text {real}}|}{p \times q} \end{aligned}$$

5.4 Results

The NwPiNN Module is implemented on the synthetic datasets, as discussed in Subsection 5.1. Table 1 gives a tabulated view of the RMSE, and MAE of the predicted visual data, and the actual visual data. The results were compared with existing models, like ConvLSTM [18], and CNN-LSTM [22], and are tabulated in Table 2. For each tabulated data, we have done an extensive run of 5 counts for each arrangement, and their mean values are reported, with the standard deviation from mean in brackets.

Table 1 Tabularized results for the implementation of NwPiNN module on the repeated projectile, and bouncy ball dataset with variable initial parameters
Table 2 Comparative analysis of the proposed NwPiNN model with the pre-existing ConvLSTM, and CNN-LSTM models

6 Discussion

This section is an introspection on the merits and demerits of the proposition, NwPiNN from a readers’ point of view. While, the need of the hour is machine intelligence, data engineering is quite an important task, because, as per expert opinions the 4th industrial revolution will come into the hands of artificial intelligence, and data will play a pivotal role in that. Concerning the proposal laid in this research, the PINN, induced by the laws of Newtonian physics has proved to work efficiently for the prediction of image sequences (in layman’s terms). On a broader note, though the task performed is time-series forecasting, making use of standalone models like LSTM, or transformers won’t be able to take up the variation in every derivate of pixel displacement; these models can forecast the zeroth order differentials of the base pixel displacement, it won’t be able to catch up with the velocity, acceleration, and higher differentials, and for that, we would need some physics informed neural network that can model them based on some physical laws. For the same, NwPiNN comes into action by consuming the modelling capabilities of LSTM and the laws of classical physics. But, since everything good comes at a cost, for this model the major challenge (or limitation) is its dependence on computational units (or the semiconductor industry). As a testing phase, we implemented the model on a general MIPS-5 Stage Architecture, with an Intel Core i7-1065G7 microprocessor, an NVIDIA GeForce MX330 graphic accelerator, and an Intel(R) Iris(R) Plus Graphics graphic accelerator. Still, on a large scale, when dealing with some very large-scale data sets, of higher dimensionalities, computational requirements will skyrocket. Besides, for testing purposes, a synthetic dataset is used, thus the model performance may fluctuate practically, though a promising expected performance, similar to the results described in Sect. 5.4 can be anticipated.

7 Conclusion

Nothing in this entire universe is complete, and so is the scenario with our proposition, Newtonian physics informed neural network (NwPiNN). For a wide variety of cases, the proposed model performs well in comparison with the pre-existing models for time series forecast of visual data, for instance, ConvLSTM, CNN-LSTM, etc. to name a few. Due to computational constraints (or limitations), we considered visual data of \(200 \times 200\) pixels, but NwPiNN can be applied to even a broader spectrum of visuals, providing good computational power with GPU units.

A future direction to this research could be on optimizing the algorithm to a computationally less expensive model, or ensembling the same with lightweight models like NBEATS, NHiTS, or transformers. In a nutshell, the model has its arms spread throughout the ever-evolving domain of Sci-ML (scientific machine learning). To promote more study and replication to develop time series prediction in complex systems, we made the source codes and datasets available at https://github.com/Anurag-Dutta/nwpinn.