1 Introduction

Solar radiation is the energy radiated by the sun and is a very important source of energy on Earth. It consists of short and long waves spanning the infrared, visible, and ultraviolet ranges of the electromagnetic wave spectrum (Salby 2012). Solar energy is the most readily available energy source on Earth. It affects the climate of the Earth in the means of temperature, evaporation and causing precipitation. A miner change in solar radiation causes a notable effect on climate change (Huang et al. 2021). Starting with photosynthesis in plants and moving on to primary consumers and later consumers through consumption and respiration processes, radiant energy supports metabolism in an ecosystem (Raven et al. 2011). Power from solar radiation has also been employed as an energy source in the last decades to supply electricity and thermal energy, for example, through the use of photovoltaic panels, which are increasingly developed and used in many nations (Gul et al. 2016; Sathe and Dhoble 2017). Understanding solar irradiance characteristics is therefore beneficial for planning and engineering appropriate energy-conversion equipment and infrastructure, as well as for developing an efficient energy management plan for each area.

Many researchers use machine learning models for estimating and forecasting solar radiation globally and in different areas (see, e.g. Hissou et al. 2023; Alizamir et al. 2020; Guermoui et al. 2020; Fan et al. 2018). Etxegarai et al. (2022) and Pang et al. (2020) use machine learning models for forecasting solar irradiation in the USA. It is important to forecast solar irradiation but location, angle of incidence, season, time of day, and cloud covers are the variables that might affect the intensity of solar radiation on Earth’s surface (Matuszko 2012). Solar radiation is mainly absorbed by clouds, aerosols, vapors, and other components. Clouds are the most important agents that absorb solar energy before it reached the Earth’s surface. They can have a significant impact on the amount of ground radiation (Tzoumanikas et al. 2016; Kejna et al. 2021; Fountoulakis et al. 2021). Solar radiation from the top of the atmosphere can be absorbed by clouds for approximately 30 W/m2 (Cess et al. 1995, 1996).

Other than forecasting, many studies model ground solar radiation for its spatial and temporal trends (see, e.g. Liang and Xia 2005; Hammer and Beyer 2012; Cheung et al. 2015; McNeil et al. 2019; Vindel et al. 2020; Jin et al. 2022). McNeil et al. (2019) used a natural cubic spline model to investigate solar radiation patterns in southern Thailand. Patterns and trends in solar radiation trends in China were analyzed using linear regression (Liang and Xia 2005; Jin et al. 2022). Cheung et al. (2015) examined the spatial and temporal solar absorption patterns of Australian clouds.

Modelling solar radiation absorption by clouds is beneficial for understanding the quantity or intensity of solar radiation on Earth. So, there is a need for modelling to explore recent trends and patterns. This will be very helpful in planning and designing policies for the sustainable use of energy. Moreover, an appropriate statistical methodology to explore the patterns and trends is needed to investigate solar radiation absorption. The literature studies used black box methodology for forecasting but did not propose any methodology for exploring the patterns and trends. Also, very few studies have been conducted recently on modelling solar radiation absorption by clouds, especially in the US. Therefore, this study aimed to suggest a methodology to examine seasonal patterns and trends of solar absorption by clouds in the United States by examining the recent data.

2 Materials and methods

2.1 Data source and pre-processing

The hourly solar radiation statistics for the years 1998 through 2020 were downloaded from the National Solar Radiation Database’s website (US Energy Department 2022). The stations in Alaska and Hawaii, as well as the stations with insufficient data, were excluded to decrease the effects of imbalanced geography. As a result, data from 133 stations that were gathered between 1998 and 2020 was included in this study. The location of each station is shown in Fig. 1. Only twelve places, which were selected as a sample based on the difference in latitudes, are represented on the map as blue dots, while the other locations are shown as red dots to illustrate the exploratory methodology employed in this study. The locations for these stations were chosen with a balanced distribution to cover the entire country.

Fig. 1
figure 1

Locations of 133 NSRDB hourly radiation data stations in the United States from 1998 to 2020

Each station supplies hourly solar irradiation observed at the station on the ground and in extraterrestrial space. There are 201,624 records for each station covering the entire 1998–2020 time period. The hourly solar radiation data were aggregated into daily solar data to reduce the influence of the hours of daylight at different latitudes. In order to maintain the equal number of days for each year we omit the observation on February 29. Thus, there are 8,395 daily observations from each station used for further analysis over the 23-year period.

2.2 Statistical analysis

Solar radiation outside the Earth’s atmosphere can be obtained precisely using satellite measurements or theoretical calculations. The extraterrestrial irradiation (\({R}_{E}\)) for each station was calculated by the formula in Klein (1977).

$$\eqalign{{R_E} = & \left( {{{24} \over \pi }} \right)K\left( {1 + 0033\cos \left( {{{2\pi d} \over {365}}} \right)} \right) \cr & \left( {cos\phi cos\delta sin\omega + \omega sin\phi sin\delta } \right) \cr}$$
(1)

In this formula, \(\pi\) is the ratio of a circle’s circumference to its diameter equal to 3.14, \(K\)is the solar constant, \(d\) is the day of the year, \(\varphi\) is the latitude angle, \(\delta\) is the Earth’s declination, and \(\omega\) is the sunset hour angle. The computed\({R}_{E}\) is then used to determine the maximum possible solar intensity on the ground. It is widely known that solar energy is absorbed and diffused by the Earth’s upper atmosphere for around 20–30% (Stine and Harrigan 1985). We reduced the amount of solar energy by 0.25 to account for this loss. This ratio was used to fit the data. This means that 25% of extraterrestrial radiation is absorbed and diffused by the upper atmosphere. On a clear day, we can thus represent the solar energy as 0.75\({R}_{E}\) at the Earth’s surface. However, clouds frequently absorb solar energy before they reach the earth. As a result, we determine the daily percentage of solar radiation absorbed by the clouds \(\left({R}_{C}\right)\) using the following formula is as:

$${R}_{C}=\left[1-\frac{{R}_{G}}{{0.75R}_{E}}\right]\times 100\%$$
(2)

where \({R}_{G}\) is the observed solar radiation at the stations. The data was reduced from 365 to 73 periods each year by averaging the \({R}_{C}\) values over 5-days at each station. This was helpful as suggested by Cheung et al. (2015) and McNeil et al. (2019) in avoiding the serial correlation and severity of right skewness among the observation before doing statistical modelling. Moreover, this will be helpful to handle the missing observation if spam of 5-days. Cheung et al. (2015) called this 5-day average data as period solar radiation absorption data in the sequel. Thus, the data of each station used for further analysis consists of 1,679 records. For data analysis, the normality assumption was assessed by plotting residuals against normal quantiles and transformation was used to make the error residual normal if needed. The serial correlation of residuals was examined using the residual autocorrelation (ACF) plot (Venables and Ripley 2002).

A linear regression model was fitted to the 5-day averages of solar energy absorbed by clouds for each of the 133 locations. The solar radiation absorption from each station was analyzed by using linear regression for year \(i\), period \(j\), and the lag term as predictors. The model of each station is as follows:

$${y}_{t}^{*}=\alpha +\sum _{i=1}^{23}{\beta }_{i}{x}_{1i}+\sum _{j=1}^{73}{\beta }_{j}{x}_{2j}+{y}_{t-1}^{*}+{\epsilon }_{t}$$
(3)

where \(i=1, 2, 3, \dots , 23\) are the years with \({\beta }_{i}\) as the coefficient and \({x}_{1i}\) as the dummy variable for i-th year and j = 1, 2, 3, …, 73 are the periods with \({\beta }_{j}\) as the coefficient \({x}_{2j}\) as the dummy variable for j-th period. \({y}_{t-1}^{*}\)is a lag-1 of the solar radiation absorption, \({\epsilon }_{t}\) is the error term. The model described above determined the trend of solar radiation absorption at 133 stations across the whole country.

Then, factor analysis was used to group the regions into larger area in order to reduce geographical and spatial correlation. The oblique rotation method “Promax,” which can maximally separate the loadings, was used in this study. The factor analysis model with n factors, denoted by \({f}_{1,}{f}_{2},\dots ,{f}_{n}\)takes the form:

$${y}_{l}={\mu }_{l}+\sum _{k=1}^{n}{\gamma }_{k}{f}_{kl}+{\epsilon }_{l}$$
(4)

for \(l\) = 1, 2, …, 133, and \(k\) = 1, 2, …, n. Where \({y}_{l}\)is the solar radiation absorption station observation \(l\), \({\mu }_{l}\) is the average across 133 stations \({\gamma }_{k}\) is the k-th factor loading, and \({f}_{n}\) is the \(n\)-th common factor.

The average solar radiation absorption for each factor was calculated using the appropriate number of factors derived from factor analysis. A multivariate regression model was utilized to examine the average changes in solar radiation absorption for each of the following factors.

$${y}_{k}^{{\prime }}={\alpha }_{k}+{{\beta }_{k}f}_{k}+{\epsilon }_{k}$$
(5)

where \({\alpha }_{k}\) and \({\beta }_{k}\) are the intercept and regression coefficient for k-th factor, and \({\epsilon }_{k}\) is the error term. After fitting the multivariate regression model, the residuals’ normality was tested to see if this model is appropriate. Confidence interval plots were constructed to show the overall increase in solar radiation absorption at each factor. The R language was used to perform all statistical analyses and graphs (Core Team 2022).

3 Results and discussion

We use linear regression model and analysis the error term to understand its distribution and serial correlation. Figure 2 depicts the residuals from a linear regression model of solar radiation absorption using years, periods, and lag-1 as predictors on the left, which indicate that solar radiation absorption at the stations below 40oN did not follow the normality assumption (left). Therefore, we transformed solar radiation by taking the square root transformation and multiplying by 10. The Q-Q plots of residuals from transformed solar radiation absorption at the sites below 40°N show approximate normal distribution (right). The effect of autocorrelation and the number of lag elements were investigated using ACF plots of the twelve stations on the left. The plot indicates that only the first lag is significant. So, we use lag 1 in the linear regression model. The ACF plots in Fig. 3 of the right panel show that residuals from all stations exhibit within the limits and hence no autocorrelation after square root transformation and including lag-1 into the model.

Fig. 2
figure 2

Residual quantile-quantile (Q-Q) plots for twelve sample stations. Data without transformation on the left and on the right, we use square root transformation for stations below 40°N and no transformation for above 40°N

Fig. 3
figure 3

Residual autocorrelation (ACF) plots for the twelve sample stations from the model without transformation (on the left) and for stations south of 40 °N with a square root transformation (on the right)

Figure 4 depicts the solar radiation absorption data with fitted regression line. We account solar radiation absorption percent on y-axis from 12 example stations and the years 1998 to 2020 on the x-axis. The fitted values from the linear model are represented by the red line. The findings revealed that varied patterns of solar radiation absorption occur depending on location and season. There is a visible seasonal pattern among each station for each year and season. Thus, factor analysis was used to group the stations on the basis of geographical areas. Factor analysis identified seven major factors that dominate solar radiation absorption. As a result, the stations were geographically grouped into seven regions (Fig. 5), each with a distinct dominant factor. The seven elements correspond to the following geographic areas: North-West (Factor 1, includes 21 stations), North-East (Factor 2, includes 23 stations), Central (Factor 3, includes 19 stations), South-East (Factor 4, includes 17 stations), South-West (Factor 5, includes 20 stations), South (Factor 6, includes 16 stations), and North (Factor 7, includes 13 stations).

Fig. 4
figure 4

Solar radiation absorption of the original data (blue dots) and fitted values from linear regression model (red line) during 1998 to 2020

For each station whose factor loading was greater than or equal to 0.333 was categorized as belonging to the same factor. Figure 5 shows the seven geographical categories of the stations using factor analysis. The size of circle in at each location in proportional to its factor loading. The larger the size of a circle on the map shows the higher factor loading score. Two stations in Florida and two stations in Wyoming (see Fig. 5’s hollow circles) with factor loadings less than 0.333 was not classified in any region factors.

Fig. 5
figure 5

Seven geographic categories by factor analysis

The solar radiation absorption of the stations for each factor was averaged prior to developing the linear model to analyze the trend, as shown in Fig. 6. The larger differences in solar radiation absorption between the maximum and minimum were recorded in Factor 1, while the smallest differences in solar radiation absorption between the maximum and minimum were observed in Factor 5. Factors 1, 2, and 7, which are located in the North-West, North-East, and North, respectively, have average solar radiation absorption of more than 30%. In Factors 3, 4, and 6, which are in the Central, South-East, and South, respectively, the average solar radiation absorption was between 21 and 26%. Factor 5, located in the South-west, has the lowest solar radiation absorption of 13% on average.

Fig. 6
figure 6

The solar radiation absorption data (blue dots) and trends using linear regression (red line) during 1998 to 2020 for seven factors based on factor analysis

Figure 7 depicts the seasonal patterns of solar radiation absorption in each factor by month (left) and year (right). The findings revealed that from November to February, all regions of the United States have higher solar radiation absorption than during any other season. This is due to the winter seasons during these months. There is a noteworthy variation in solar radiation absorption in the North-West, North-East, and North respectively. The amount of solar radiation absorption is less in June to September, this may be due to summer season. Except for the South-East and South-West, with a slight rise in solar radiation absorption percentages in these seasons. The lowest solar radiation absorption is observed in the South-West based on the maximum, minimum and average solar radiation absorption percentages, three unique groups were observed: the first group includes the North-West, North-East, and North; the second group includes the Central, South-East, and South; and the third group includes the South-West.

Figure 8 shows the results of percentage of average trend of solar radiation absorbed with and 95% CI using the multivariate regression model. Results revealed that the average increase in solar radiation absorption was 0.015%. The solar radiation absorption in the North-West, Central, and South of the United States significantly decreased, whereas it significantly increased in the North-East, North, and South-East. The highest increase in solar radiation absorption was 0.12% in the North of the United States.

Fig. 7
figure 7

Predicted annual average percentage of solar radiation absorption by month (left) and year (right) with a 95% confidence interval (line, shaded)

Fig. 8
figure 8

Average percentage of solar radiation absorbed increased from 1998 to 2020 with confidence intervals of each station

The daily solar radiation absorption from 133 stations in the United States was analyzed by averaging the data into 5-day intervals from 1998 to 2020. Three models were used to examine seasonal patterns and trends in 5-day solar radiation absorption. Firstly, a linear model for all considering all stations. Secondly, the geographic regions were categorized into groups using factor analysis to reduce spatial correlation. Thirdly, the multivariate regression model was used to assess the average changes in solar radiation absorption over 5-day intervals for each factor. The results showed that the average increase in solar radiation absorption was 0.015%. Seasonal patterns of solar radiation absorption were observed in all regions, with the peak occurring in the winter from November to February and the lowest in the summer from June to September. The solar radiation absorption in the United States significantly decreased in the North-West, Central, and South, whereas it significantly increased in the North-East, South-East, and North.

The methods used in our study revealed the obvious seasonal patterns and trends in solar radiation absorption, providing a strong fit to the data. Furthermore, the models with suggested methodology reduced the problems of autocorrelation and spatial correlation, resulting in more reliable results. Solar radiation absorption has been utilized as a proxy to determine the trend of cloud cover in various research investigations (Boers et al. 2017; Pfeifroth et al. 2018; Freychet et al. 2019; Cherian and Quaas 2020; Dong et al. 2023).

In this study, the absorption of solar radiation was high in the winter (November to February) and low in the summer (June to September). This reflected that the solar radiation on earth decreased in winter and increased in summer. The finding is in accordance with a study conducted by Sun and Groisman (2004), which reported that the average cloud cover was highest in the winter (December–March) and lowest in the summer (June–September). Dong et al. (2023) also reported that surface solar radiation had a strong seasonality, with the largest changes in summer. It is evident that fewer clouds are present in the winter, allowing solar energy to penetrate the earth’s surface, whereas more clouds are present in the summer, allowing solar radiation to penetrate less.

According to the findings of this study, solar radiation absorption increased in the North, North-East, and South-East regions, mostly in the eastern part of the United States, while decreasing in the North-West, Central, and South regions, mostly in the middle and western United States. This can be indicated that the solar radiation in the North, North-East, and South-East had a decreasing trend, whereas the North-West, Central, and South had an increasing trend. Our finding in line with the studied conducted by Dong et al. (2023) which reported that the decrease in cloud cover in North America resulted in an increase in surface solar radiation. The finding of our study is consistent with the geographic features of the nation, which include humid summers and mild winters in the eastern section of the nation, including the Gulf Coast. Additionally, hurricanes, thunderstorms, and tornadoes are common in this area (Kunkel et al. 2013).

There are a few limitations to this study. First, climatological, and environmental factors such as temperature, rainfall, humidity, cloud level and particle matter or aerosol optical thickness were not considered as the predictive factors in this study, which could influence solar radiation on Earth. Instead of that, we used the length of the series, which can reduce the cost of more variable collection, and the results are reliable. The methodology can be used in different areas with different transformations and different lags based on the complexity of the data.

4 Conclusion

The seasonal patterns and trends of solar radiation absorption can be clearly discovered, revealing a peak in winter and a low in summer. Increasing trends of solar radiation were found in the North-East, South-East, and North and declining trends in the North-West, Central and South regions of the United States. The patterns and trends of solar radiation by location and time are helpful for climate scientist in making policies regarding climate change. Also, useful in managing renewable energy sources and planning more appropriate policies in future.