After preprocessing of the dataset such as data collection, analysis, cleaning, wrangling, and independent parameters selection, we propose a machine learning-based linear and polynomial regression models according to the flow diagram presented in Fig. 1. The following four accuracy metrics are used to check the models: R-Square error, Adjusted R-Square error, root mean square error (RMSE), and mean absolute error (MAE).
The following four independent parameters are used from the dataset for LR and PR and the number of deaths ‘Deaths’ is predicted as the dependent parameter.
x1 = ‘Day’, x2 = ‘People_Tested’, x3 = ‘Active’ and x4 = ‘Confirmed’.
The model is trained on 80% input dataset using machine learning techniques on linear and polynomial regression models on a single parameter, two parameters, three parameters, and four parameters from the feature set ‘x1’, ‘x2’, ‘x3’ and ‘x4’. At each stage, the models are evaluated with four evaluation metrics and, intercept and coefficients are obtained.
The input dataset is from a ‘p’ dimensional real space and the output is also from a real space. The data comes from some joint distribution unknown a priori.
$$x\,\epsilon\, {R}^{p},$$
We try to learn a function f(x) on a training sample dataset (x1, y1), (x2, y2), …, (xN, yN) and validate on the test dataset.
$$f\left(x\right): {R}^{2}\to R,$$
$$ \hat{y} =f\left(x\right)= {\beta }_{0}+ {\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+ \cdots +{\beta }_{p}{x}_{p},$$
where ‘x’ comprises of (x1, x2,.., xp); each corresponds to an attribute that describes the data.
$$f\left(x\right)={\beta }_{0}+ \sum_{j=1}^{p}{\beta }_{j}{x}_{j}.$$
It can also be written as:
$$f\left(x\right)= \sum_{j=0}^{p}{\beta }_{j}{x}_{j},$$
where set x0 = 0.
It is represented in vector form as:
$$f\left(x\right)={x}^{\mathrm{T}}\upbeta ,$$
$$f\left(x\right)=\left[\begin{array}{ccc}{x}_{01}& \cdots & {x}_{\begin{array}{c}0p\\ \end{array}}\\ \vdots & \ddots & \vdots \\ {x}_{N1}& \cdots & {x}_{Np}\end{array}\right]\left[\begin{array}{c}{\beta }_{0}\\ \vdots \\ {\beta }_{p}\end{array}\right].$$
Polynomial regression is used where the dataset to be fitted shows a curvilinear pattern in nature. This class helps in providing features to add a polynomial term to a simple linear regression model. Then an object of the class is created that helps in transforming matrix of features into a new matrix of features. This new matrix of features contains independent parameters like x, x2 which represents additional polynomial terms. In other words, the transformation converts a parameter ‘x’ into new matrix that contains additional independent parameters with power 2, 3, 4, etc.
For a degree = ‘n’ polynomial with one independent parameter ‘x’, the general form of the equation is as:
$$ \hat{y} ={\beta }_{0+}{\beta }_{1}x+{\beta }_{2}{x}^{2}+ \ldots \ldots + {\beta }_{n}{x}^{n}$$
For a degree two polynomial with two independent parameters ‘x1’ and ‘x2’, the equation predicting the number of deaths is given as:
$$ \hat{y} ={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+{\beta }_{3}{{x}_{1}}^{2}+{\beta }_{4}{x}_{1}{x}_{2}+{\beta }_{5}{{x}_{2}}^{2}$$
where β0 is the intercept and β1, β2, β3, β4 and β5 are coefficients. This can be generalized for a polynomial of degree three and higher.
The following equations were obtained for LR and PR with one independent parameter x1:
$$ \hat{y} =44099.58+998.08{x}_{1},$$
$$ \hat{y} =32469.98+1495.94{x}_{1} -3.59{{x}_{1}}^{2},$$
$$ \hat{y} =20166.54+2568.63{x}_{1}-22.745{{x}_{1}}^{2}+ 9.15e-2{{x}_{1}}^{3},$$
$$ \hat{y} =18295.42+2.84782e+3{x}_{1}-31.84{{x}_{1}}^{2}+ 0.19{{x}_{1}}^{3}- 3.61742e-4{{x}_{1}}^{4},$$
$$ \hat{y} =20261.29+2.4821e+3{x}_{1}-9.928{{x}_{1}}^{2}- 2.2848e-1{{x}_{1}}^{3}+2.9977e-3{{x}_{1}}^{4}- 9.6088e-6{{x}_{1}}^{5},$$
$$ \hat{y} =19393.42+2.639869e+3{x}_{1}-2.8680e+1{{x}_{1}}^{2}+3.091e-1{{x}_{1}}^{3}-4.12e-3{{x}_{1}}^{4}+3.528e-5{{x}_{1}}^{5}-1.08e-7{{x}_{1}}^{6},$$
$$ \hat{y} =18929.89+2.870903e+3{x}_{1}-4.767232e+1{{x}_{1}}^{2}+1.064{{x}_{1}}^{3}-1.89e-2{{x}_{1}}^{4}+1.88e-4{{x}_{1}}^{5}-9.026e-7{{x}_{1}}^{6}+ 1.63e-9{{x}_{1}}^{7},$$
$$ \hat{y} =26718.82+2.462e+1{x}_{1}+2.3669e+2{{x}_{1}}^{2}-1.161{e+1{x}_{1}}^{3}+2.8158e-1{{x}_{1}}^{4}-3.87e-3{{x}_{1}}^{5}+3.0469e-5{{x}_{1}}^{6}- 1.27362e-7{{x}_{1}}^{7}+2.189e-10{{x}_{1}}^{8}.$$
The following equations were obtained for LR and PR with two independent parameter x1 and x2:
$$ \hat{y} =34728.076+1.7459768e+3{x}_{1 }-1.4014e-3{x}_{2},$$
$$ \hat{y} =22801.69+2.7482e+3{x}_{1 }-1.51e-3{x}_{2}-2.14e+1{x}_{1 }{x}_{2}+2.311\mathrm{e}-5 {{x}_{1}}^{2}+1.0e-11{{x}_{2}}^{2},$$
$$ \hat{y} =36439.88+1.83e-9{x}_{1 }+ 1.25e-9{x}_{2}-2.52\mathrm{e}-13 {{x}_{1}}^{2}-5.4\mathrm{e}-8 {x}_{1 }{x}_{2}+ 8.14\mathrm{e}-10 {{x}_{2}}^{2}-3.43\mathrm{e}-11 {{x}_{1}}^{3} -5.37\mathrm{e}-6 {{x}_{1}}^{2}{x}_{2}+7.0\mathrm{e}-12 {{x}_{2}}^{2}{x}_{1}-6.28\mathrm{e}-18 {{x}_{2}}^{3}$$
The coefficients of equations with the above-mentioned one and two parameters as well as for three and four parameters were generated with a machine learning library. However, the equations with three and four parameters become very complex for polynomials.