Introduction

Due to limited conventional resources and harmful impact on the environment, renewable energy sources like solar energy are the most attractive alternative replacement due to abundant availability in most places in the world. Solar energy on the earth is sufficient to fulfil all the energy requirements of human beings. Total 173,000 terawatt hours (TWh) amount of energy strikes the Earth in one hour and the world electricity consumption by any means (including industrial, household, vehicle, cooling, heating purpose etc.) in 2022 is 25,500 TWh [1]. Due to the unpredictability of Photovoltaic (PV) cell solar energy, it is still unattractive to some consumers. To meet the demand for quicker and more accurate assessments that would support the operation and control of contemporary power systems, predictive analysis of a power system is required.

End-of-2021 renewable power capacity was 3064 GW, in accordance with the International Renewable Energy Agency (IRENA) Global, as shown in Fig. 1. With a 1230 GW capacity, hydropower made up the greatest portion of the overall global output, but in the last few years, solar dominated the major portion of the expansion. The remaining energy was split equally between solar and wind power, with capabilities of 849 GW and 825 GW, respectively [2]. In India, according to the Ministry of New and Renewable Energy (MNRE), solar power installed capacity has reached around 61.97 GW as of November 2022. Additionally, 23.14 GW of capacity is in various phases of bidding, while 60.66 GW of capacity is in various stages of implementation [3]. India stands 4th in solar PV deployment across the globe as on the end of 2022. In this work, we have used the meteorological data to estimate the solar power of PV panel technology using machine-learning algorithms. Prior to the machine learning methods, statistical tools are deployed by the researcher for estimation work.

Fig. 1
figure 1

Global Renewable Energy Status (Source- International Renewable Energy Agency (IRENA)

The first linear model presented by Angstrom, presuming a completely clear sky, describes the linear relationship between the worldwide solar irradiation (H) and sunshine hour duration (S). In order to alter it as the Angstrom-Prescott relation, Prescott (1948) substituted the transparent sky given in Eq. 1, for the perfect clear sky assumption [4].

$$\frac{{\text{H}}}{{{\text{H}}}_{{\text{ext}}}} =\mathrm{ a }+\mathrm{ b}\frac{{\text{S}}}{ {{\text{S}}}_{{\text{ext}}}}$$
(1)

Hext is the monthly average daily extraterrestrial radiation, while Sext is the maximum monthly average daily sun-shining hour or day length. To anticipate solar radiation, various horizons (the length of the real solar-related meteorological data) have been used as input. Weekly predictions are considered long-term forecasts, and minute-by-minute forecasts of solar power are considered short-term forecasts [5, 6]. Statistical models, including non-linear and linear traditional models like persistence, Autoregressive Integrated Moving Average (ARIMA), Autoregressive Moving Average (ARMA) are employed to capture the stochastic nature of solar energy. Numerous drawbacks exist for different techniques, including their computational cost and inability to modify time-varying time-series systems. The persistence model predicts the future value of the radiation from the present value, but it is inaccurate for more than 1-h estimation and sudden weather changes. ARMA is the combination of autoregressive (AR) and moving average (MA), based on the set of data generated or obtained sequentially, known as time series data, to estimate future values. The main disadvantages of ARMA are its high computational complexity, difficulty in ensuring convergence, and stationary time series. ARIMA was developed to deal with nonstationary time series data. This is the most accurate model, but it is unstable in terms of both fluctuating observations and altered model specifications. ARIMA parameters defined manually, therefore, finding the most accurate fit is a time-consuming process [7].

The difficulties and time of computation minimized by adopting Artificial Neural Networks (ANN) techniques. ANN-based techniques have several advantages over traditional ones, which include higher accuracy, simplicity in updating, ease of maintenance, and the ability to handle incomplete inputs [8]. Neural networks are capable of carrying out a wide range of tasks, including nonlinear estimating, grouping, pattern recognition, and optimization. The different ANN techniques for solar energy estimation are utilized [9,10,11]. Mellit et al. [12] used a multilayer perceptron model to estimate the worldwide sun irradiation 24 h in advance, where daily mean air temperature and daily mean irradiance values are the input parameters. S. K. Aggarwal et al. [13] used a Feedforward Neural Network (FFNN) to predict how much solar energy there is as part of the American Meteorological Society's (AMS 2013–2014) prediction competition. This method works better than Least Squares Regression (LSR) on data from numerical weather forecasts. Silva A. et al. [14] used ANN to predict hourly PV power using the Perceptron type with Multiple Layers (PML) and Radial Base Functions (RBF). They found that the PML method is more accurate than the RBF method. Vakili et al. [15] evaluated the daily surface’s global solar radiation at Tehran's using the PLM technique with meteorological data (maximum and minimum daily temperature, relative humidity, and wind speed) including suspended Particulate Matters (PM10 and PM2.5) in the atmosphere and found MAPE 1.5% and an absolute fraction of variance of 99%. The main disadvantages of ANN are its large and complete dataset dependency, lack of interpretability, and limited for short-term forecasting.

In an effort to produce an accurate forecast, ML and DL methods have been deployed in recent research work [16,17,18,19]. However, it requires large amounts of data for training, validation, and testing purposes. ML algorithms accuracy depends on the amount of training data and the chosen parameters of the model [20]. Researchers suggested various ML approaches to predict solar power, wind power, irradiance, load, and power demand [21,22,23,24,25]. Meng F. et al. [26] suggested a hybrid model that combines the deep Wavelet Transform Package (WTP), Generative Adversarial Networks (GAN), and Dragonfly Algorithm (DA) for solar energy prediction. This approach first break down the irradiance data into its constituent harmonics and then trains a deep GAN-based model. To identify the ideal settings, the generator network adjusted by using an adaptive modified DA technique. The results shown by the proposed method are better as compared to other ML techniques in terms of statistical measures like MAPE and RMSE. Gupta R. et al. [27] used Facebook Prophet and XG Boost to perform Time Series Forecasting (TSF) of solar energy production and determined that the XG Boost model is more effective in terms of precise estimation and more appropriate fitting; the MAPE of XG Boost and Facebook Prophet was 10.9% and 21.8%, respectively. Mutavhatsindi et al. [28] used the convex combination method and Quantile Regression Averaging (QRA) to compare predictions from ML models. They discovered that QRA is better than statistical and traditional ML models. Zhao et al. [29] proposed fault detection and classification of PV modules by using Graph-based Semi-supervised Learning (GBSSL) with the help of only 1% of the total data set and unlabeled data, while Alaraj et al. [30] proposed an ensemble tree approach to ML using meteorological and geographical data from the Kingdom of Saudi Arabia. Other than solar power forecasting, ML and DL techniques, as depicted in Fig. 2, are also used for Maximum Power Point Tracking (MPPT), battery life, load, failure, and tariff prediction [31,32,33,34].

Fig. 2
figure 2

Some popular Machine Learning Algorithms

Convolutional Neural Network (CNN) is a type of Deep Neural Network (DNN) that is mainly used for computer vision and image identification; however, it can also solve problems with sequential data, like time-series data. There are a few different types of neural networks that Alam et al. [35] suggested for short-term PV power forecasting. These are CNN, CNN- Long Short-Term Memory (LSTM), multi-headed CNN, and traditional methods like ARMA and MLR. Heo J. et al. implemented a multi-channel CNN model on meteorological data and geographical datasets to forecast monthly PV solar power. This method utilizes geographical and meteorological features of PV sites from raster image datasets. A MAPE of 8.639% was observed for the applied technique, which is better than other conventional methods such as multiple linear regression (16.187%) and ANN (15.991%) [36]. Cheng et al. [37] suggest power prediction based on intra-hour satellite measurement using a spatial temporal Graph Neural Network (GNN) and conclude that it is more accurate than statistical and CNN-based models. A CNN model uses infrared (IR) thermographic images and the solar panel's temperature to assess the malfunctioning of solar PV modules [38, 39].

LSTM (Long Short-Term Memory) networks are another type of DNN that is extensively used by researchers to estimate solar power due to their extraordinary capability to manage sequential data [40]. Obiora C. et al. [41] proposed a ConvLSTM model for irradiance forecasting to mitigate the effects of solar PV power fluctuations in Johannesburg. The statistical measure in terms of nRMSE was reported 1.51% (for a ten-year dataset). A similar study reported in China, where a hybrid model, LSTM-CNN, is used for the estimation of solar power and extracts the temporal-spatio features of PV data [42]. Gao M. et al. [43] suggested an LSTM base model for day-ahead power forecast using Numeric Weather Prediction (NWP) data, while a frequency domain decomposition and LSTM model were proposed by Wang L. et al. [44]. Sarmas E. et al. [45] developed a stacked Long Short-Term Memory (LSTM) model with three Transfer Learning (TL) strategies to forecast accurate solar PV plant production using a limited data set. Bui L. et al. [46] demonstrated a similar technique to predict the output of a large solar PV power plant in Vietnam in conditions of curtailment. Djaafari A. et al. [47] proposed a unique hybrid model where the Balance-Dynamic Sine–Cosine Algorithm is combined with an LSTM predictor for accurate estimation of Direct Normal Irradiation (DNI) with a relative RMSE value of less than 2.07% and all correlation coefficients greater than 0.99.

The objective of this paper is to estimate solar power using a ML algorithm using meteorological data as input. After going through the literature, we have observed that solar irradiance is one of the most important meteorological parameters; however, it is not easily available at every solar site. To overcome this problem, we considered two approaches. First, consider solar irradiance as one of the input parameters, and the second approach is without considering solar irradiance as the input. In addition to the hourly forecast, we have also estimated the solar power generation for different seasons (summer, winter, monsoon, autumn, and spring). The results are compared with the standard techniques, which were found to be in good agreement in terms of various statistical measures like RMSE, MAPE, MAE, and coefficient of determination R2. The paper is organized as follows:

  • Section  "Data Description and Methodology" describes the data and the solar PV site used for the proposed research work. This section also explains the methodology used in the research work. The tools used for the estimation and the various pre-processing techniques used are elaborated on in this section.

  • Section "Results and Discussion": This section presents the results obtained from the various machine-learning algorithms used in the proposed research work. The results compared with the existing similar work done by the research community in the same field.

  • Section "Conclusion": This section concludes the research perspective along with the future direction.

Data Description and Methodology

Data Description

Data obtained from a solar power plant located in Dhar, Madhya Pradesh, India, for the amorphous silicon technology shown in Fig. 3(a). The total power generation capacity of this plant is 79.95 kW, as shown in Fig. 3(b). Three-year data collected from this site, covering 1096 days from January 1, 2020, to December 31, 2022. We collected 14,248 observations from the above site from 6:00 to 18:00. The above data is available on the cloud portal of AVAADA Energy Company (service provider: Intello Tech AMC Pvt. Ltd., website: https://portal.intellotechsolutions.co.in), as shown in Fig. 4. The meteorological parameters recorded at the plant location are year, month, time (hours), temperature (temp), and irradiance. However, the parameters wind speed, surface pressure, and humidity, which are not available at the planned location, are collected from the NASA website (https://power.larc.nasa.gov) and modified according to the requirements of the model. The data obtained from the site is in rough form, so before using it for estimation, data pre-processing is required. The major task of pre-processing includes the removal of ambiguous values, removing duplicate values, considering sunshine hour data, dividing the dataset into five seasons, and obtaining the missing values. The other important part is to convert the hourly data set into a daily data set using averaging techniques. The daily data set converted into a monthly data set by a similar approach. Estimation of solar power in various season done by considering appropriate months in that particular season.

Fig. 3
figure 3

a Graphical representation of the Dhar district, Madhya Pradesh- India (b) Solar Site View

Fig. 4
figure 4

Online cloud portal view of AVAADA energy company

  • Software & Equipment: The software used for the analysis is Python 3.8.8, and the version of the notebook server is 6.3.0. Processing is done via a Nvidia GeForce RTX 3090 24 GB GPU processor with 128 GB of RAM (Corsair Vengeance RGB PRO DDR4 3200 MHz) and a 3.2 GHz Intel i9 processor (12 cores, 24 threads). Python is the latest open-source tool used by researchers. A different machine learning model algorithm library is available, which is utilised for the estimation of solar power.

Methodology

The methodology to estimate solar power uses different machine-learning and deep-learning techniques. Figure 5 explained the complete process flow for solar power estimation. The estimation work is divided into the following steps:

Fig. 5
figure 5

Flowchart showing the methodology in forecasting solar power using AI techniques

  • Step-1: Data collection and pre-processing

To avoid simulating periods of darkness at night and diminished brightness, the original dataset was segmented to only cover the timeframe of 6:00–18:00. This restriction also reduced computational time in data processing. The data obtained from the site is in rough form, so before using it for estimation, data pre-processing is required. The major tasks of pre-processing include the removal of duplicate values, interpolation for missing values, normalisation of data, conversion of hourly data to day-wise and month-wise data using the averaging technique, and removal of ambiguous values.

  • Removal of duplicate and Interpolation for the missing values: The meteorological data collected from the plant exhibited some missing and duplicated values. Initially, the duplicate values eliminated, and the missing values were estimated using linear interpolation as part of the pre-processing procedure.

  • Normalization of data: The input data exhibits significant heterogeneity among the various parameter values, making it challenging to model. The significant range of data, as indicated by the Min (minimum) and Max (maximum) values, is evident from Table 3. In order to standardise the data within a consistent range of values between 0 and 1, it was subjected to normalisation using the max–min normalisation technique, as defined by the following Eq. (2):

    $${X}^{\mathrm{^{\prime}}}= \frac{X-{X}_{Min}}{{X}_{Max}-{X}_{Min}}$$
    (2)

    Where X’ represents the normalized value of the given data, X represents the input meteorological value, and XMin and XMax represent the minimum and maximum values of the data. Conversion of data from hour to daily and monthly format: The Data Acquisition Card (DAC) consistently captures meteorological variables and power generation data on an hourly frequency. The proposed study requires data in both daily and monthly formats. In order to accomplish this, the available data processed by utilizing an averaging methodology and subsequently converted into daily, monthly, and eventually seasonal formats.

  • Step-2: Data sorting and splitting

    Our dataset contains 24 h of data for solar power generation and meteorological parameters. Data sorting is required to remove the unnecessary values of data that are present in our dataset. In the process of data sorting, we have removed the values from 18:00–6:00 for which solar radiation is either very low or not available. This will also reduce computational time in data processing. After sorting the data, the next step is to split the data into different parts, i.e., training, testing, and validation. The dataset contains 14,248 observations, out of which about 80% are used for training purposes. The training data has 11,398 observations, while 10% of the data is used for testing, and the remaining 10% of the data is used for validation purpose, i.e., 1425 observations in each dataset.

  • Step:-3:- Machine Learning (ML) & Deep-Learning Algorithm Implementation

    Modern methods for very short-term (VST-1 min-1 h) and short-term (ST-1 h to 1 day) forecasting include ML algorithms and meta-heuristic optimisation methods inspired by the DNN model. In recent years, ML methods have outperformed conventional empirical methods for solar power forecasting in terms of results [37]. In the literature, there are numerous examples of ML methods used for estimating solar power by researchers [48,49,50,51]. So we have implemented Deep-Learning-Feedforward Neural Network (DL-FFNN) and Machine Learning techniques in the present study: Linear Regression (LR), Ridge, Random Forest (RF), Decision Tree (DT), Gradient Boosting Classifier (GBC), Least Absolute Shrinkage and Selection Operator (Lasso), Adaptive Boost Classifier (ADC), Support Vector Regression (SVR), K-Nearest Neighbour (KNN) and Elastic Net (EN). In DL- FFNN method has 10 layers with 512 neurons in each, and the Rectified Linear Unit (ReLU) activation function is used for solar power prediction. In this technique, Adam Optimizer has been used. The algorithm for ML and DL models utilised in the present study is available in the Python library. They are implemented using Python software on the notebook server 6.3.0. Processing is done via a Nvidia GeForce RTX 3090 24 GB GPU processor with 128 GB of RAM (Corsair Vengeance RGB PRO DDR4 3200 MHz) and a 3.2 GHz Intel i9 processor (12 cores, 24 threads).

  • Step:-4 Feature Engineering: Estimating power for seasonal variation

    Feature engineering involves extracting relevant features from meteorological data to estimate solar power production in various seasons. As one of the objectives of the proposed work is to find out the power variation in the different seasons, feature engineering provides valuable input parameters that are required for seasonal power generation. We use the one-hot encoding approach to encode the categorical variables season. Another feature of engineering includes extracting data only between 06:00 and 18:00 and removing dark hours. The next feature implemented is the extraction of cyclic patterns in hours and months, which used for estimating daily and monthly power generation. In python language pandas, numpy, sklearn, seaborn and matplotlib.pyplot library used in coding.

    In order to achieve precise predictions in deep learning, researchers have experimented with numerous layers containing different numbers of neurons, using various optimisation techniques. It has been determined that a DL-FFNN approach with 10 layers, each containing 512 neurons, utilizing the rectified linear unit (ReLU) activation function and the Adam optimizer, yields favourable results.

  • Step:-5 Accuracy Evaluation

    • To evaluate the accuracy of the machine learning models used in this study, we have applied the statistical metrics listed in Table 1. Table 1 includes three error metrics: MAE, MAPE, MSE, and RMSE. In order to achieve an accurate power forecast, it is crucial that the values of the error measure are minimised.

Table 1 Validation Metrics with formulae and description

Where, N is the number of observation and i is the ith observation. SSRegression and SSTotal is the sum squared regression error and sum squared total error respectively, which can be evaluated as

\({{\text{ss}}}_{\mathrm{Regression }}=\sum {\left({{\text{y}}}_{{\text{i}}}-{{\text{y}}}_{{\text{Regression}}}\right)}^{2}\)

Squared difference between each data points values and the regression values

\({{\text{ss}}}_{{\text{Total}}}=\sum {\left({{\text{y}}}_{{\text{i}}}-\overline{{\text{y}} }\right)}^{2}\)

Squared difference between each data points values and the mean values

Table 2 elaborate the importance of coefficient of determination i.e. R2 value. The value equal to 1 indicates that the model estimation is accurate i.e. no error. If the value is zero or less than zero, model performance is worst. Practically the value of R2 lies in between 0 and 1.

Table 2 Performance relation to R2 value
  • Overfitting and Under fitting: Overfitting occurs when our ML model attempts to include all (or more) of the data points in the dataset. As an outcome, the model starts to cache faulty values and noise from the dataset, reducing its efficiency and accuracy. Cross-validation, training with extra data, deleting features etc. used to reduce overfitting. Under fitting occurs when our ML model is not able to identify the underlying trend in the data. It happens when a model is unable to learn properly from the training dataset, resulting in reduced accuracy and inaccurate predictions.

Results and Discussion

This part provides a statistical description of numeric variables as well as a correlation analysis between different factors in the dataset. It also explains the performance metrics for various machine-learning approaches. Subsequently, a thorough evaluation conducted by comparing the current study with previous research, utilising diverse performance indicators such as the R2 score, RMSE, and others.

Statistical Information

Table 3 describes the range of different input and output parameters used for the present work. Meteorological parameters considered as input are hours, humidity, temperature, irradiance, wind speed, and pressure, while solar power used as output. Table 3 highlights the minimum and maximum values of each parameter, along with the mean value of each parameter. Prior to being used as input for machine learning models, the data-scaling method performed to prepare for the varying ranges of each parameter. The mean irradiance is approximately 382.97 Wh/m2, and the temperature is 27.68 °C, which shows that the site is appropriate for solar power generation.

Table 3 Range for meteorological variable

Correlation Coeffiecient

The correlation method used to determine the relationship between different input variables and the output. The outcome obtained from the Pearson correlation coefficient shown in Fig. 6. Correlation coefficients indicate the degree of correlation between two variables. Equation 3 represents the mathematical expression for Pearson's correlation coefficient.

$$\mathrm{Pearson\; correlation\; Coefficient}=r= \frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{\left[n\sum {x}^{2 }-{(\sum x)}^{2}\right]}\left[n\sum {y}^{2}-{(\sum y)}^{2}\right]}$$
(3)

Where x is the independent variable, y is the dependent variable, n indicates the sample size, and Σ denotes the sum of all values. The relationship of solar power with ambient temperature and irradiance is strong, with correlation coefficients of 0.36 and 0.59, respectively. Nevertheless, the variables of wind speed, humidity, year, and hours exhibit a rather weak correlation with output power.

Fig. 6
figure 6

Correlation analysis including encoded features for District Dhar, MP India

The next step is to extract features from available input parameters, which are required for seasonal power estimation. Therefore, four new variables, sine_hr(hour), sine_mon(month), season monsoon and season summer, were created as new features that enhanced the performance of machine learning models used for seasonal power estimation. The correlation of this new parameter sine_hr has a high correlation with power, i.e., 0.83, as shown in Fig. 7.

Fig. 7
figure 7

Correlation analysis for the Dhar District (MP-India) site, including encoded features

Metrics Results for Various ML Models

Prediction with Solar Irradiance as Input

Table 4 elaborates on the results of the various algorithms deployed for solar power estimation. Among all ML techniques, the LR approach has the highest accuracy. The performance of the ML model evaluated with the R2 score, MAE, MAPE, RMSE, and variance. The R2 scores of LR, Ridge, RF, and DT equal to 0.9999, which means that the accuracy of the estimation is approaching the actual value. However, the EN model had the lowest R2 score of 0.870249. We have used MAE, MSE, and RMSE metrics, which give the error in estimation. The LR model exhibited the lowest error values for MAE, MSE, and RMSE, i.e., 0.0091, 0.0146, and 0.0513, respectively. Whereas EN models had the highest error values of 4.6947, 35.3035, and 5.8085, respectively.

Table 4 Performance Parameters of various Machine Learning with and without Irradiance

We have compared the results of the proposed technique with similar work done by researchers in the same field. Table 5 compares the results of the current research approach with similar studies performed at different locations using machine-learning models and deep learning approaches. The results are in good agreement with the results achieved by different people.

Table 5 Comparison of the proposed work with existing work

Figure 8 illustrates the actual and forecasted PV power produced by different ML algorithms used in the present work. The estimations obtained using the DL-FFNN, LR, Ridge, DT, GBC, and SVR methods exhibit a higher degree of accuracy when compared to the predictions made by the KNN and EN models, as shown in Fig. 8.

Fig. 8
figure 8

Month-wise actual and predicted power by various ML models

Figure 9 shows the predicted versus actual power for testing data for all the ML algorithms. The LR, Ridge, RF, DT, GBC, and Lasso algorithms (a–f) make a straight-line graph. The ADC, SVR, KNN, and EN algorithms (g–j), on the other hand, make a graph that spread out between predicted and test values across the whole range. The LR algorithm graph is extremely linear due to its highest accuracy (highest R2 score), whereas the EN algorithm graph is the least linear due to its lowest accuracy (lowest R2 score).

Fig. 9
figure 9

Comparison Chart of testing data with predicted data

Figure 10 shows the performance metrics parameters for all proposed ML algorithms. Graphs show that the R2 score and error parameters like RMSE and MAE are inversely proportional to each other. LR and Ridge algorithms show (almost the same) lowest error and highest R2 score, which means the most accurate model. The arrow indicates that the error metrics increase as we move from the LR to the EN ML approach.

Fig. 10
figure 10

Relationship of performance metrics parameters with different ML Algorithms

Prediction without Solar Irradiance

One of the key objectives of this research is to determine solar power without using solar irradiance as an input parameter. The high cost of measuring equipment makes it difficult to obtain solar irradiance data at each plant location. Therefore, we have estimated the solar power for all machine-learning algorithms without including solar irradiance as an input parameter. Table 4 clearly shows a decline in the performance of the machine-learning model based on statistical measurements. However, the results indicate minimal deviation in terms of the R2 score. The R2 score, which represents the coefficient of determination, is 0.99995 when solar irradiance is used as input, and without irradiance, the R2 score is 0.99994 for the LR model. The root mean square error (RMSE) value is 0.121 while considering sun irradiance and 0.125 in the absence of solar irradiance. Figure 11 illustrates the correlation between the R2 score (with irradiance) and the R2 score (without irradiance) for different machine learning models. It is evident from the graph that the performance of the model decreases significantly when compared to other machine learning models, achieving a score of 0.95964 when considering solar irradiance and 0.93932 in the absence of solar radiation. Therefore, it can be concluded that we can estimate solar power without solar irradiance if the data for solar is not available at the plant location using the proposed techniques.

Fig. 11
figure 11

Relationship between R2 score (with Irradiance) and R.2 score (without irradiance)

Prediction for Seasonal Variation

Another goal of the present research is to forecast the power output for various seasons. The plant location in Thar district, Madhya Pradesh, experiences five distinct seasons: winter (December and January), spring (February and March), summer (April, May, and June), monsoon (July and August), and autumn (September, October, and November). Since each season occurs in different months throughout the year, it is necessary to have data organised month-wise in order to estimate seasonal power. Statistical averaging techniques used for monthly data compilation. Figure 12 displays the monthly forecast for different months throughout the year, and Fig. 13 represents the seasonal power variation. After completing the monthly forecast, the subsequent stage is to evaluate the seasonal power by employing feature engineering and an averaging technique. Our investigation demonstrates that the spring season, with an average power generation of 17.38 kW per day, is the most favourable period. Conversely, the monsoon season, with an average power generation of 11.26 kW per day, is the least appropriate. This monthly and seasonal power generation estimate is helpful to organise the solar power demand to schedule resources when power generation is poor in advance.

Fig. 12
figure 12

Month-wise solar power forecasting

Fig. 13
figure 13

Seasonal power forecast

Conclusion

This paper focuses on the prediction of solar power generation in Dhar District, Madhya Pradesh, India. Various meteorological data characteristics are utilised to estimate output power, both with and without considering irradiance. Multiple machine learning techniques were employed to model data covering a period of 36 months. The cross-correlation analysis revealed that temperature and irradiance are the two main meteorological variables that significantly influence the prediction of solar power. Based on the implementation of multiple machine-learning models, it can be concluded that the LR model has an R2 score of approximately 0.99995, with RMSE and MAE values of 0.121 and 0.0091, respectively. Similarly, the Ridge model has R2 score, RMSE, and MAE values of 0.99994, 0.122, and 0.0097, respectively, with solar radiation as the input variable. The EN model has the lowest performance among all models, with an R2 score of 0.87024, an RMSE of 5.941, and an MAE of 4.6947. In DL, the FFNN method has 10 layers with 512 neurons in each, and the Rectified Linear Unit (ReLU) activation function with Adam Optimizer has an R2 score of 0.9987 with RMSE and MAE values of 0.0542 and 0.0462, respectively. Solar PV power generation is now a measurable part of overall power generation from multiple sources. This forecast helps in effectively managing surplus or insufficient power supply throughout different months or seasons. By employing precise forecasting methods and effectively regulating the equilibrium between supply and demand, we can prevent issues such as power outages or excessive power generation. Additional meteorological factors, such as cloud ceiling, precipitation, altitude, sunshine length, visibility, air quality index (AQI), and others, can be considered to determine the relationship between expected power and to improve the model's accuracy, we can also combine these algorithms using the ensemble and stacking methods.