1 Introduction

Today, fossil fuels such as petroleum, natural gas and coal provide most of the world’s energy demand. However, they release harmful gases when burned, and they cannot be replaced when their limited supplies are ended. Renewable energy sources designate a vital part in reducing the harmful CO2 emissions and controls the dependency of fossil fuel energy. Effective deployment of renewable energy leads to attain energy security, climate change mitigation and economic benefits. Solar power resource development is one of the most favorable options for renewable energy production with less environmental impact. Mohandes et al. [1] stated that precise and continuing data on the prevailing global solar radiation are a vital and leading position for effective implementation and consumption of resources. Rigollier et al. [2] state that the photovoltaic system, thermal solar power plants and any other conventional solar energy technologies need accurate information on solar radiation of earth with surface repartition. Lohmann et al. [3] stated that the increasing attention in large-scale solar energy projects needs consistent data on solar energy potential and their constancy. Khatib and Elmenreich [4] stated that the solar radiation data deliver qualitative details on the amount of solar energy incident at a particular area on the surface of earth during a certain period.

Numerous studies have been executed to estimate the effectiveness of solar regression models by using meteorological data. Besharat et al. [5] state that the meteorological data-based models are the foremost commonly inspected and broadly utilized models around the world. Page (1964) made some modification on the model developed by Angstrom (Angstrom 1924) by using linear equations. He developed a model connecting solar radiation energy and proportional sunshine hours [6]. Wong and Chow [7] have measured mean daily and hourly global radiation, normal and diffuse radiations by using various radiation models. Tymvios et al. [8] have made qualified and relative analysis between the artificial neural network’s methodologies and angstrom’s regression models in assessing the global solar radiation. Gasser et al. [9] have developed novel temperature-based models for estimating the solar energy potential of Egypt. Hassan [9] analyzed independent models for the measuring solar radiation using meteorological parameters. Shahrukh Anis et al. [10] have made comparative studies on different models for measuring the monthly average solar energy potential in India.

The Kingdom of Saudi Arabia burns about a quarter of the oil it produces for the generation of electricity, and their domestic consumption has been rising at an alarming 7 percent a year (Jeffrey Ball 2015). Al-Ajlan [11] stated that the domestic household division in Saudi Arabia could be a main customer of power with 70% as all commercial and residential buildings in the kingdom need heat protection from hot climatic conditions. They also stated that the latest information appears a noteworthy yearly rate of increment of 14.8% in the household utilization of electric power. Khan [12] stated that Saudi Arabia consumes three times more than the global average. He also stated that the power consumption is increasing in a compound rate of 6% during the last 5 years. However, the country has very high direct normal irradiation (DNI) of about 2200 kWh/m2/year, which signifies the hope for solar power generation [13]. Pazheri [14] states that the effective deployment of renewables integrated with smart grid electric systems can decrease oil usage for electric power and reliance on fossil fuels. Baras [15] stated that, in order to utilize the high direct normal irradiance (DNI) and plenty of renewable prevailing in the country, it is scheduling for substantial deployment for both photovoltaic and concentrated solar power plants in the country. The kingdom announced recently that it is preparing to develop 9.5 GW of renewable energy under its Vision 2030 program. This is realizable for the kingdom since the ongoing rapid development in the field of solar industry. However, it faces several challenges such as accurate solar energy data, availability of advanced technology suitable with industrial and technical capability, implementation of conventional power cycles and issues related to the development of indigenous technology.

Pazheri [14] stated that the kingdom has made extensive strategic plan to extend the power generation through solar energy to meet an impressive share of country’s forthcoming power requirement. The existing projects in the kingdom have been propelled with solar radiation data, predicted mostly from remote sensing satellite information. Therefore, accurate and reliable solar energy estimation is more vital for solar energy production. The rising cost of advanced solar radiation measurements leads to alternative ways for predicting the solar energy potential. The solar radiation model is one of the alternative for effective estimation and analysis of solar radiation. Numerous studies are made and various power projects are scheduled on these days within the kingdom in order to accomplish its targets of solar capacity within the coming time. Munawwar and Ghedira [16] have performed an evaluation of renewable energy and solar energy development in the gulf region. The King Abdullah City for Atomic and Renewable Energy (K.A.CARE) has developed solar energy monitoring network in various centers throughout the kingdom for measuring solar energy potential. Zell et al. [17] have analyzed the preliminary data of solar radiation and reveal the accuracy and accomplishment of the monitoring network. Saleem and Ali [18] have analyzed the current status and future feasible applications of renewables in the Kingdom of Saudi Arabia. They also explained the practical and commercial aspects of major renewable sources in the oil-rich country. Sarah and Khalid [19] have analyzed the solar potential energy in the residential sector in the kingdom. Salam and Khan [12] clarified in what way one of the foremost non-renewable nations is playing regarding the worldwide encounter of energy supply against the energy resources for electric power generation.

Various solar energy models have been analyzed for different places in Saudi Arabia, and Mohandes et al. [1] applied the radial basis functions procedure for the approximation of monthly average daily global solar radiation on plane surfaces. They also used multilayer perceptrons network and standard regression models to analyze the performance of the techniques. Alawaji [13] has studied and assessed the R&D projects of solar energy accomplished by the ERI in the past 20 years. He also discussed numerous results obtained through specific projects executed in the Kingdom of Saudi Arabia. El-Sebaii et al. [20] have predicted and analyzed the solar energy in the city of Jeddah. They used meteorological parameters such as amount of cloud cover, air temperature, sunshine duration and moisture level for predicting the solar energy. Fariba Besharat et al. [5] have analyzed different empirical methods for assessing the global solar radiation in Saudi Arabia. They stated that empirical models using meteorological parameters, such as sunshine hours, temperature and cloud cover, are the more frequently and extensively used ones to predict global solar radiation and its relevant mechanisms at any region of the world.

The Kingdom of Saudi Arabia has more solar energy potential and has made extensive strategic plan to extend the power generation through solar energy to meet an impressive share of country’s forthcoming power requirements. However, the existing projects in the kingdom have been propelled with solar radiation data, predicted mostly from remote sensing satellite information. Also the solar radiation monitoring network has been started recently and has limited datasets. The Jubail Industrial City (study area) in the Eastern Province of Saudi Arabia is one of the largest industrial city and undergoes rapid industrial development. So site-specific, regression models are initiated, and the preliminary assessments of the solar energy potential of the city have been examined. The solar energy estimation program in the largest industrial city of Saudi Arabia (study area) has been recently started, and only limited datasets are available. So the present study uses simple regression methods.

In this study, three prominent metrological parameters such as ambient temperature, sunshine hours and relative humidity are used to make more accurate empirical models for the solar energy potential of the Jubail Industrial City. The city has one of the large industrial hubs in the world and considered as a ‘smart city’ in the Eastern Province of Saudi Arabia. During the past few years, the city is under fast development due to industrial expansion. Several industrial, commercial, residential and other infrastructure development activities have made in the city. The region of the city is environmentally important and has diverse ecology. Therefore, frequent monitoring of air and water contaminations is also significant for the industrial city. Effective utilization of green energy leads to overcome the impacts of the rapid industrialization on the environment and ecosystem of the Al-Jubail Industrial City.

The solar measurement program in the kingdom has been recently developed, and it has limited datasets of solar energy parameters. Also, most of the developed solar energy models are based on air temperature since the temperature is a greatest standard meteorological task. However, they are less accurate than sunshine-based models. Al-Mostafa et al. [21] stated that most of the sunshine-based models provide accurate results comparatively than that of models based on other meteorological parameters. Khorasanizadeh and Mohammadi [22] also showed that regression models based on the function of sunshine hours, air temperature and relative humidity provide more accurate results. The present study is to make simple empirical regression models for estimating the global horizontal irradiance (GHI) in the largest industrial city of Saudi Arabia by using available meteorological datasets along the study area.

2 Data and methods

2.1 Solar radiation data

The quantity of solar energy perceived on the surface of the earth is greatly complex with several meteorological factors and influenced by geophysical parameters and geography of the terrain. Meteorological information such as the surrounding clearness index, relative humidity, air temperature, sunshine duration, wind speed, clearness index are generally utilized parameters for the deduction of sun-powered vitality at numerous places in the world. Several solar energy models using weather data such as humidity [23], snow conditions [24], ambient temperature [25] cloudiness [26], sunshine duration [27, 28] have been established in the beginning of the quantification of solar energy around various places the world. Recently, several regression models to determine the horizontal global solar radiation on a plane surface as a function of sunshine duration have been made [10, 29,30,31,32]. Hofierka and Suri [33] found that incoming solar energy is proportionate to the ambient air temperature; however, the increase in the moisture in the atmosphere would decrease the solar energy falling on a horizontal surface. Solar energy modeling with multimeteorological parameters would provide more accurate estimation.

The postprocessed solar radiation data are obtained from the Renewable Resource Atlas (2020) of the King Abdullah City for Atomic and Renewable Energy (K.A.CARE). The Renewable Resource Monitoring and Mapping (RRMM) a program of K.A.CARE emphases on measurement, analysis and utilization of the renewable energy resources in the kingdom. The RRMM establishment monitors the process, standardization and maintenance of the entire network system consisting of approximately 46 stations throughout Kingdom of Saudi Arabia. Each station includes redundant radiation sensors for quality assurance to make quality radiation datasets. Solar radiation data are collected at 1-min resolution at the stations. Both automatic and manual data processes are employed for daily data quality review by the K.A.CARE. Short duration of data anomalies that occur is filled by interpolation methods. The uncertainties of each data value are calculated to reflect the nominal uncertainty of the monitoring equipment. The percentage of the base nominal uncertainty is about 5%. The quality and verified data obtained are used for making the empirical regression models for the city. The study area has limited solar energy datasets; therefore, the recent 5-year dataset contains several radiation parameters such as horizontal, diffused and normal solar radiations, air temperature, relative humidity, barometric pressure, wind speed, direction.

2.2 Methodology

The linear model consists of one dependent variable, global horizontal irradiance, Hg (in Wh/m2), and three independent variables such as ambient temperature, Ta (in °C), relative humidity, Rh (%), and sunshine duration, S (h). Figures 1, 2 and 3 show the monthly variation of three independent parameters of the solar energy. Figures 4, 5 and 6 show the scatter plots of solar radiation versus the model parameters ambient temperature, sunshine hours and relative humidity. The scatter plots also show the strong empirical relationship between the variables with global horizontal irradiance of the city. The mathematical relations connecting the solar energy-dependent variables were recognized as polynomials (linear, quadratic, cubic and quartic) and logarithmic, exponential and power models. The plot also shows the coefficient of determination and indicates an optimistic connection of solar energy with the meteorological parameters. Finally, three regression models have been developed and analyzed. The IBM SPSS® statistics software and Microsoft excel datasheet are utilized to process the data. The regression constants related to the solar energy parameters are estimated, and the scatter plots between the measured values are developed by using the SPSS® statistics software.

Fig. 1
figure 1

Monthly variation of air temperature

Fig. 2
figure 2

Monthly variation of sunshine hours

Fig. 3
figure 3

Monthly variation of relative humidity

Fig. 4
figure 4

Scatter plot of GHI with air temperature

Fig. 5
figure 5

Scatter plot of GHI with sunshine hours

Fig. 6
figure 6

Scatter plot of GHI with relative humidity

2.3 Proposed empirical models

2.3.1 Linear model

Linear models are simple and most convenient model which pronounces continuous response variable as a function of one or more predictor variables. They support to realize and predict the behavior of complex systems or data. The proposed linear model has one dependent variable; monthly average daily total solar radiation (Hg), besides three independent variables such as relative humidity (Rh), ambient temperature (Ta) and sunshine duration (S) as shown in following Eq. (1).

$$H_{\text{g}} = 3438.875 + 40.285 R_{\text{h}} - 41.59 T_{\text{a}} + 363.89 S$$
(1)

The correlation coefficients were measured to determine the association among the dependent and independent solar energy parameters. Moreover, the assessed values are adequately near to calculated values and so, created models beneath this category display a great estimation competence.

2.3.2 Quadratic model

A quadratic model will provide better correlation than the simple linear model. Akinoglu and Ecevit [34] state that the quadratic model is more suited for monthly mean global solar radiation estimation when the data for a bright sunshine duration are available. In order to get a solution that is more appropriate for the complex equations of solar energy, several iterations have been executed and the following optimal solution (Eq. 2) has been derived after 23 iterations.

$$\begin{aligned} H_{\text{g}} & = 20.9 R_{\text{h}}^{2} - 1.5 T_{\text{a}}^{2} - 2.3 S^{2} - 3.4 R_{\text{h}} \cdot T_{\text{a}} + 2.4 S \cdot T_{\text{a}} - 55.1 R_{\text{h}} \cdot S \\ & \quad - 366.8 R_{\text{h}} + 210.5 T_{\text{a}} + 1936 S - 7206 \\ \end{aligned}$$
(2)

2.3.3 Logarithmic model

A logarithmic model is a model that measures the magnitude of the thing it is measuring. It can also be seen as the inverse of an exponential model. Ampratwum et al. [35] stated that the quadratic and linear–logarithmic models are the preferred models when the sunshine is used. We also verify the natural logarithmic model (Eq. 3) with the optimal solution after 15 iterations.

$$H_{\text{g}} = 586.172 \ln \left( {R_{\text{h}} } \right) - 2521 \ln (T_{\text{a}} ) + 3877 \ln \left( S \right) + 5196$$
(3)

3 Results and discussion

3.1 Solar energy and parameter analysis

In this study, three meteorological parameters such as ambient temperature, sunshine hours and relative humidity collected during the year 2014 and 2015 are used to develop the regression models (linear, quadratic and logarithmic). The developed models are tested by using the solar energy data obtained during the year 2016. The relationship between the meteorological parameters is designated as basic mathematical functions. The three regression models specify influence of ambient temperature, sunshine duration and relative humidity on probable energy of the industrial city. All the three regression models indicate that these parameters affect the incident solar energy. Simple correlation coefficients determine the relationship between dependent variable and independent variables.

The monthly variations of ambient temperature, sunshine duration and relative humidity are shown in Figs. 1, 2 and 3. The variations of the measured and predicted GHI by using the three models are shown in Figs. 7, 8 and 9. The scatter plot of GHI for the three models are shown in Figs. 10, 11 and 12. It is basically experimented that the global solar energy is directly related to the ambient temperature and sunshine duration. However, the increasing moisture level in the atmosphere lowers the incoming solar energy. During the summer months, GHI values are more due to high temperature and long sunshine duration. In addition, GHI values are low in the winter months. However, the relative humidity is inverse to the solar energy potential of the study area. Mas’ud et al. [36] also stated that the positive value of the correlation coefficient between the average and mean temperature indicates that the rise in mean temperature leads to more global solar radiation, whereas the negative correlation coefficient shows that the rise in relative humidity leads to lowering solar radiation. Shrestha et al. [37] also detected that the solar energy and humidity obey a Gaussian function, whereas the temperature of air follows a sine function.

Fig. 7
figure 7

Measured and predicted GHI by using linear model

Fig. 8
figure 8

Measured and predicted GHI by using quadratic model

Fig. 9
figure 9

Measured and predicted GHI by using logarithmic model

Fig. 10
figure 10

Scatter plot of GHI in linear model

Fig. 11
figure 11

Scatter plot of GHI in quadratic model

Fig. 12
figure 12

Scatter plot of GHI in logarithmic model

The study also implies that the solar energy potential is decreasing with the increasing value of relative humidity. The presence of moisture and liquid content in the hot air, increases more scattering of solar radiation and thereby reduces the incoming solar energy. Nicholas et al. [38] stated that the increase in average relative humidity gives rise to a decrease in solar radiation and vice versa. This depicts an inverse relationship between average relative humidity and solar radiation intensity. Ettah and Nwabueze [39] stated that the capability of air to hold water vapor is mainly depending on its temperature and leads to hotter air that has a superior capability for holding water vapor than cooler air.

3.2 Statistical error estimation and model comparison

In order to determine the predictive accuracy and performance of these proposed empirical models, they are subjected to statistical error estimation techniques. The error estimation is useful to arrange the models statistically and to discover the leading prescient demonstrate. Major statistical strategies have been utilized to approve the exactness and viability of the models. These methods include the correlation coefficient (r), coefficient of determination (R2), mean bias error (MBE), root-mean-square error (RMSE), mean absolute percentage error (MAPE) and mean absolute bias error (MABE).

The correlation coefficient (r), coefficient of determination (R2) show how fruitful the suitability of the developed models, which can be calculated by utilizing Eqs. (1) and (2). The correlation coefficient (r) close to 1 shows the excellence and better superiority of the model. The top figures of the correlation coefficient (r), coefficient of determination (R2) indicate the best suited empirical model, and least figures represent the poorly suited.

$${\text{Correlation}}\,{\text{ coefficient}}\,\left( r \right) = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {H_{\text{gm}}^{i} - H_{\text{gp}}^{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {H_{\text{gm}}^{i} - \overline{{H_{\text{gm}} }} } \right)^{2} }}$$
(4)
$${\text{Mean}}\,{\text{ of }}\,{\text{the}}\,{\text{ GHI}} = \overline{{H_{\text{gm}} }} = \frac{1}{n}\mathop \sum \limits_{i = 1 }^{n} H_{\text{gm}}^{i}$$
(5)
$${\text{Total}}\,{\text{ Sum }}\,{\text{of }}\,{\text{Squares}}\, \, \left( {\text{SST}} \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\overline{{H_{\text{gm}} }} - H_{\text{gm}}^{i} } \right)^{2}$$
(6)
$${\text{Sum}}\,{\text{ of}}\,{\text{ Squares}}\,{\text{ of }}\,{\text{Regression}}\, \, \left( {\text{SSR}} \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\overline{{H_{\text{gm}} }} - H_{\text{gp}}^{i} } \right)^{2}$$
(7)
$${\text{Coefficient }}\,{\text{of }}\,{\text{determination }}\left( {R^{ 2} } \right) = \frac{{\text{SSR}}}{{\text{SST}}}$$
(8)

The residuals are measured from the vertical distance between a data point and the regression line of the fitted empirical model. It is the contrasts among the retort value and right to the retort value at each indicator esteem and residuals appraise irregular errors. Therefore, random residuals suggest fits of proposed regression model and regular pattern of residuals displays poorly fit model.

$${\text{Root}}\,{\text{ Mean }}\,{\text{Square }}\,{\text{Error}}\, \, \left( {\text{RMSE}} \right) = \sqrt {\frac{1}{n}\left[ {\mathop \sum \limits_{i = 1}^{n} \left( {H_{\text{gm}}^{i} - H_{\text{gp}}^{i} } \right)^{2} } \right]}$$
(9)
$${\text{Mean }}\,{\text{Bias }}\,{\text{Error}}\, \, \left( {\text{MBE}} \right) = \frac{1}{n}\mathop \sum \limits_{i = 1 }^{n} \left( {H_{\text{gm}}^{i} - H_{\text{gp}}^{i} } \right)$$
(10)
$${\text{Mean}}\,{\text{Absolute}}\,{\text{Bias}}\,{\text{Error}}\,\left( {\text{MABE}} \right) = \frac{1}{n}\mathop \sum \limits_{i = 1 }^{n} \left( {\left| {H_{\text{gm}}^{i} - H_{\text{gp}}^{i} } \right|} \right)$$
(11)
$${\text{Mean}}\,{\text{ Absolute}}\,{\text{ Percentage}}\,{\text{ Error}}\, \, \left( {\text{MAPE}} \right) = \frac{1}{n} \mathop \sum \limits_{i = 1 }^{n} \left( {\left| {\frac{{H_{\text{gm}}^{i} - H_{\text{gp}}^{i} }}{{H_{\text{gm}}^{i} }}} \right| \times 100 \% } \right)$$
(12)

where \(H_{\text{gm}}^{i}\) is the th measured value, H igp is the th predicted value, and n is the total number of observations.

All the regression models appeared a great degree of precision in the estimation of global solar radiation. Though the regression models are generated by using 2-year data, models have more accurate and predicting similar values. Table 1 shows the numerical parameters of the three developed models. The investigation appeared that the remainders for the created models are distributed arbitrarily about zero, which shows a great suitable. It implies that the correlation coefficient of the models is nearly 1 (which is more desirable) revealing the strong relationship of the meteorological parameters with solar energy. The coefficient of determination (R2) is relatively high (0.925) for the quadratic model. In addition, the MBE and MAPE errors are relatively low for the quadratic model, which also indicates the model is most suited for the city. The present study shows large RMSE and low MBE values. The RMSE gives information on the short-term performance of the regression models, whereas the MBE gives information on the long-term performance. A positive MBE indicates over the approximation, while a negative MBE represents the underapproximation. The MABE is the grade of efficiency of fitting used to develop the regression models. Almorox et al. [40] stated that low RMSE values indicate the best suited solar energy models. However, Okundamiya and Nzeako [41] stated that high RMSE values and less MBE values lead to error in statistical indicators and are not adequate for the assessment of the performance regression models.

Table 1 Statistical parameters of different models

4 Conclusions

In this study, three regression models have been developed based on prominent meteorological parameters such as ambient temperature, sunshine hours and relative humidity to estimate the solar energy potential of the industrial city. The three regression models specify the influence of meteorological parameters in the assessment of probable solar power in this region. All the three regression models indicate that these parameters affect the incident solar energy. Simple correlation coefficients determine the connection between dependent variable and independent variables. The quadratic model provided better correlation than the simple linear model. The values of the correlation coefficient and coefficient of determination of the quadratic model are 0.939 and 0.925, respectively, which are the highest among the three models. The mean absolute percentage error (MAPE) of the quadratic model is only 6.17%, and the other statistical errors are also comparatively low for the quadratic model. The present study is also experimented that the global solar energy is directly related to the ambient temperature and sunshine duration. However, the increasing moisture level in the atmosphere lowers the incoming solar energy. The statistical analysis states that the developed regression models appeared a great degree of precision in the estimation of global solar radiation in the city.