1 Background

Atmospheric radiation in the wavelength region of 4–100 μm that originates from the sun and often interacts with atmospheric constituents while travelling to the earth’s surface is known as downward longwave radiation (DLR) [50, 83, 123]. DLR is part of the energy budget balance of the earth’s surface, and it is useful for evapotranspiration, plant water demand, greenhouse effect, global warming detection, solar thermal and photovoltaic applications, meteorology, climate model, ecology, hydrology, etc. [16, 57, 95, 100, 110, 124, 126].

Because the atmosphere acts like a blackbody, DLR’s intensity depends on its interaction with the atmosphere. Longwave radiation from the sun occasionally passes through the atmospheric window to the ground without being absorbed, and the atmosphere sometimes transmits global solar radiation to longwave radiation. Hence, the magnitude of DLR is more during the day than at night. While the outgoing longwave radiation moves up, DLR moves down in the atmosphere, and their difference is known as the net longwave radiation. Atmospheric constituents that interact with DLR include O2, CO2, O3, H2O, CO, aerosols, etc. DLR also influences geographic parameters like the land cover, latitude, elevation and longitude [23, 27, 30, 83, 109, 123].

DLR surpasses the upward longwave radiation in the tropics such that tropical net longwave radiation biases toward DLR, unlike in the polar and mid-latitudinal regions. The relationship between the equatorial region and the sun is typical because the sun’s influence is about the same year-long due to its direct incident rays and small angular variations. Hence, equatorial solar radiations are high, and the equatorial atmosphere, soil and water bodies get quite hot [46, 63]. Furthermore, the sky at the equatorial zone is mostly cloudy, while the tropics act like the heat source of the globe, the polar regions behave like the heat sink. The earth achieves energy balance through heat flux exchange via air and ocean current movements between the tropics and the other regions [47, 92, 98].

Pyrgeometers, the instruments for direct measurement of DLR require calibration in specialised labs every 6 months to 2 years of use. Sometimes, they react to a different spectrum in the field from the one in the laboratory. To eradicate errors triggered by sensor sensitivity to the heat detained by thermopile sensor plate and doom, the instruments are still being improved upon. Additionally, pyrgeometers are costlier than their counterpart, the pyranometers used for measuring global solar radiation. Though there has been an increase in meteorological stations, there are few ground measurements of DLR resulting in the scarcity of its data due to issues with pyrgeometers [55, 67, 96, 97, 114].

Sometimes, models are deployed to alleviate the paucity of data. There are two major types of models: the traditional method and expert systems. The traditional method, otherwise known as regression, involves using the least square routine (or any other parametric approach like autoregressive representation or robust fitting) to obtain the empirical coefficient(s) with the mathematical expression(s) that govern a process [8, 21, 88]. On the other hand, expert systems deploy crude computational approximations of nervous system signals for learning, memorising, recalling and decision-making and other functions, to solve problems. Though expert systems can resolve challenges with proper mathematical expressions, their peculiarity lies in solving problems with no known or competent mathematical procedures. Expert systems are used in different fields for optimisation, classification, forecasting and others. Artificial neural networks (or neural networks), support vector regression, adaptive neuro-fuzzy inference system and hybrids are common expert systems for modelling [22, 41, 44, 58, 91].

Several researchers have used regression to model DLR. Angstrom [9] was the first to establish an expression for cloudless DLR in relation to screen-level atmospheric air temperature and water vapour pressure. Similar models have been developed by other researchers ([26, 28, 59, 122]; (Table 1)). Additionally, relative humidity and cloud info can be used for modelling DRL [20, 39, 45].

Table 1 Some of the uniquely different cloudless DLR regression models considered in this study

There are two main types of traditional modelling techniques for DLR: empirical and analytical [56]. While the empirical technique is based on data distribution, the analytical technique is based on the theory of the concerned process. However, it can be noticed that sometimes neither of the two techniques is independent of each other. The coefficients of most DLR models are location-dependent, and because the constants may require changes, testing before adopting them at another site is ideal [14, 29, 64, 102, 107]. Moreover, a regression model may have different coefficients for day and night periods [65].

Previous studies conducted in Ilorin (8.50 °N, 4.55 °E), Nigeria, modelled DLR with sunshine, vapour pressure, clearness index, photosynthetically active and global solar radiations, using data from September 1992 to August 1994 [115, 116, 118]. Amid clearness ratio and relative humidity, the pressure of vapour was the choicest predictor of all skies’ DLR, and Brunt [17] was shown as the best cloudless model for the site [115]. However, like in Abramowitz et al. [2], the new models in Udo [115] did not illustrate DLR as a function of the effective emissivity of the atmosphere, though useful meteorological parameters like water vapour pressure and relative moisture were deployed. The clearness index was also shifted from 0.60 to 0.62 in Udo [115].

Artificial intelligence (AI) is a technological field where machines imitate human intelligence in areas such as problem-solving, reasoning, creating, learning, perceiving, acting, communicating, walking and playing games. Not all AI can recall or learn from experience; some function based on already provided knowledge. AI for data science can be segmented into machine learning and deep learning. Machine learning infers a computerised system that learns and improves based on experience. Its major advantage is its ability to detect hidden relationships and patterns in data. Deep learning refers to computerised systems that process information from one stage (input) to another (output) and may include an in-between stage (hidden layer(s)), though it is also experience-based. In data science, terminologies like expert systems and soft computing are often used to refer to broad AI techniques associated with the workings of the neurons. The common learning frames for expert systems are supervised and unsupervised. In supervised learning, the system learns the hidden information in the data from the provided inputs and outputs. However, in unsupervised learning, the outputs are not given, but the directives that enable the system to produce the desired output are spelt out. Soft computing techniques are broad and include case-based reasoning, neural networks, rule-based systems, genetic algorithms, swarm intelligence, etc. [22, 44, 121].

Expert systems like neural networks (NN) are reputed for better performance than traditional models [41, 72]. To this end, NN has been used to model global solar radiation to meet the data needs of thermal and photovoltaic applications [32, 37, 77, 103, 119]. Compared with traditional methods, NN often deploys more atmospheric parameter inputs [75, 93]. Thus, the yardstick for comparing both modelling techniques is not levelled since a sufficient number of input parameters can improve certain regression models [6]. NN has also been used for computing DLR in climate models and retrieving data from satellite images [24, 25, 61, 76, 79], but rarely for estimating ground-based DLR from air temperature and water vapour pressure. Besides, Soares et al. [106] found DLR as an effective predictor in the NN estimation of diffuse solar radiation in a city in Brazil. Additionally, NN has been used to correct for dome heating of a pyrgeometer [86].

Seemingly, the state of the art for estimating global solar radiation is the use of expert systems such as neural networks, support vector regression (SVR) and adaptive neuro-fuzzy inference system (ANFIS), including hybrids [4, 68, 73, 91]. Depending on the applied technique, any of them can perform better than the others. For instance, SVR-polynomial or SVR-radial basic function was preferred over ANFIS and its hybrids in two separate studies [85, 94], as model parameters of the SVRs were identified, selected and aligned. In contrast to Olatomiwa et al. [85], the statistical indicators for both the training and testing phases of ANFIS and its hybrids are all close in Halabi et al. [43].

This study aims at modelling cloudless DLR in Ilorin, Nigeria, using air temperature and water vapour pressure by means of regression and soft computing techniques. A larger data set spanning about 5 years will be used. Furthermore, the exact value of 0.60 for clear skies would be used for the clearness index. To the best of our knowledge, this is the first study to compare a newly developed and other regression models with three expert systems of neural networks, support vector regression and adaptive neuro-fuzzy inference system in the modelling of cloudless DLR.

1.1 Methods

1.1.1 Data

In this study, ground-based cloudless DLR was modelled for the city of Ilorin (8.50 °N, 4.55 °E), Nigeria, using screen-level water vapour pressure and air temperature. Ilorin is one of the 52 Baseline Surface Radiation Network (BSRN) stations that measure atmospheric parameters like DLR [60, 70]. It should be noted that BSRN stations maintain high standards for measuring data, and the information on the instrumentation, geography and seasons of Ilorin have been previously reported [71, 114,115,116,117,118].

The periods considered in this work were from September 1992 to August 1994 and July 1995 to March 1998. Air temperature and water vapour pressure data from September 1992 to August 1994 were obtained from the Nigerian Meteorological Agency (NIMET) [115] because they were not measured at the BSRN station at the University of Ilorin, Nigeria. However, air temperature was measured at the station from July 1995 to March 1998, and alone with DLR, their readings were obtained from BSRN at https://www.pangaea.de/PHP/BSRN_Status.php. All water vapour pressure data were sourced from NIMET. While NIMET data were measured in a 3-hour interval, data from BSRN were taken every 3 or 2 min and were all reduced to daily values.

Additionally, the BSRN station at the University of Ilorin operated from 1992 to 2005 and could not be sustained due to lack of funds. Ilorin is in the central region of Nigeria and the city has three recognised seasons: dry, rainy and Harmattan. Measurement of DLR at the Physics Department, University of Ilorin, was done with an Eppley PIR 20468F3 pyrgeometer calibrated by the World Radiation Centre, Devon (WMRC), with calibration number 38002. The pyrgeometer was initially conditioned in 1980 by Newport, Eppley Lab, USA [117]. Furthermore, data obtained from NIMET were measured at the airport situated about 12 km from the site of the radiation measurement in the university. According to NIMET, water vapour pressure was measured with a barometer, and the values were reduced to mean sea level. Furthermore, a dry bulb thermometer was used to measure temperature.

1.1.2 Regression modelling

To eliminate the contributions of the cloud to DLR, the adopted criterion for clear skies as dictated by the clearness index, KT is given as:

$$ {K}_T=\frac{H}{H_o}\ge 0.60 $$
(1)

where H is the daily global solar irradiation (J/m2), and Ho is the daily extraterrestrial solar irradiation intercepted by a plane parallel to the earth’s surface (J/m2) [11].

Data sieved through Eq. (1) were further divided into a training (or experimental) set and a testing set because some modelling methods like the expert systems have the tendency to overfit. The suitability of some uniquely different clear skies models (Eqs. (2–13) in Table 1) was evaluated, and a new model was developed using the experimental data. Except for models with water vapour pressure and ambient predictors, those with predictors like humidity, dew point temperature, total column water vapour, periodic correction factor or cleanness indicator, as well as cloudy and all skies models were not considered in this study [2, 20, 31].

The processes for obtaining the new cloudless model are outlined below:

From the general expression for DLR, given as:

$$ \mathrm{DLR}=F{\varepsilon}_m\sigma {T}^4 $$
(14)

where F is the cloud cover and equals to 1 for clear skies, T is air temperature, σ is the Boltzmann constant and εm is the effective emissivity of the atmosphere [51, 102]. In situations where F = 1, Eq. (14) reduces to:

$$ \mathrm{DLR}={\varepsilon}_m\sigma {T}^4 $$
(15)

The expression for the effective atmospheric emissivity becomes:

$$ {\varepsilon}_m=\frac{\mathrm{DLR}}{\sigma {T}^4} $$
(16)

To model the radiation using the empirical technique, the emissivity ɛ and an independent factor were considered in the form of y = mx + c, in which y is the dependent variable (effective emissivity of the atmosphere), x is the independent variable while m and c are regression coefficients. So a fitting relationship between the effective emissivity of the atmosphere and the independent variable \( \left(\frac{e^j}{T^k}\right) \) (where e is water vapour pressure, j and k are constants) was sought in developing the new model for the radiation. The new model was tested in Spain to ascertain its adequacy.

1.1.3 Neural networks

While the Von Neumann machines perform tasks via complex sequential processors linking commands from one stage to another, neural networks (NN) perform tasks via parallel connectivity of simple processors. Thus, NN are not only faster but also tolerant of errors since a malfunction in one unit does not halt the whole system as in Von Neumann computers. Neural networks require a small amount of power to operate and improve by learning from experience. Because NN are practically black boxes, there is no need to investigate the inherent relationship between one variable and another (which could be relatively difficult), when modelling with it [40, 52].

In this section, for a neuron x with n inputs, the output y(x) is:

$$ y(x)=F{\sum}_{i=1}^n{w}_i{x}_i+b $$
(17)

where b is the bias or threshold term, i is an input unit, xi is an input value, wi is the weight for all xi and F is the nonlinear activation function, which can be typically sigmol, hard limit or linear.

Diverse training rules, specific topology, learning rules and other factors such as several neurons, error function, momentum factor, the rate of iteration, and layer and performance parameters in the algorithms combine to enable neural networks to function as they do. There are some NN schemes that are suitable for various applications. However, the feed forward neural network scheme was used in this study [66, 101]. The scheme where information flows through weighted connections only in the forward direction has three stages: the input, hidden layer(s) and output (Fig. 1). Each stage has layers that comprise a certain number of biological neurons or units, and all layers in each stage have individual connections to the previous layer.

Fig. 1
figure 1

The structure of the ANN system used in this study

The soft computing models in this study were implemented in MATLAB 2016a. The NN model consists of two input variables (water vapour pressure and air temperature), 10 neurons in the hidden layer and one output variable (DLR). Furthermore, the actualization of the NN model was done by separating 15% of the experimental data for testing, another 15% for validating and the remaining 70% for training. Finally, the viability of every model including NN was furthered examined with the data of the testing set.

1.1.4 Support vector regression

The study to support vector machine (SVM) began in the early 1960s by Vapnik and groups. It is also regarded as the Vapnik-Chenonenkis (VC) theory [12, 105, 120]. There are two categories of support vector machine: the support vector classification (SVC), which is used to solve classification problems, and the support vector regression (SVR), which is used in tackling estimation issues. A support vector machine is primarily a classier regression method that performs the required task of constructing hyperplanes in a multidimensional space with different class labels [105, 120]. Though there are slight differences between SVC and SVR problem formulations, the neural algorithms of SVR provide the best fit solution for the linear regression function of either linear or nonlinear hyperplanes. A typical structure for SVR is given in Fig. 2, where the biassing is before the final stage.

Fig. 2
figure 2

Architecture of the SVR system

In this section, to construct the optimal hyperplanes, a repeatable training algorithm, which minimises an error function, is used. Assuming there is a given set of input training data {(xi, yi), …, (xn, yn)}, ⊂ X ∈ d, where X is the space of the input patterns like X = d. The goal is to find a flat linear function that has the most deviation from the targets data yi for all possibly flat training data. If the error function is given as:

$$ \frac{1}{2}{w}^Tw+C{\sum}_{i=1}^N{\xi}_i+C{\sum}_{i=1}^N{\xi}_i^{\ast } $$
(18)

where wTw is the norm value, ξi and \( {\xi}_i^{\ast } \) are the slack variables, C is a constant for regularisation and N is the number of samples.

Then, we minimise subject to:

$$ {\displaystyle \begin{array}{c}{w}^T\phi \left({x}_i\right)+b-{y}_i\le \varepsilon +{\xi}_i^{\ast}\\ {}{y}_i-{w}^T\phi \left({x}_i\right)-{b}_i\le \varepsilon +{\xi}_i^{\ast}\\ {}{\xi}_i,{\xi}_i^{\ast}\ge 0,i=1,\dots, N\end{array}} $$
(19)

where b and bi are the biases, ɛ is the insensitive loss function or deviation and ϕ is the nonlinear map for the transformation from feature to space.

In a situation where the optimisation problem in the expression above is not feasible, a constraint v is introduced to overcome over-fitting such that the new error function becomes:

$$ \frac{1}{2}{w}^Tw-C\left(\upsilon \varepsilon +\frac{1}{N}{\sum}_{i=N}^N\left({\xi}_i+{\xi}_i^{\ast}\right)\right) $$
(20)

which we also minimise subject to:

$$ {\displaystyle \begin{array}{c}\left({w}^T\phi \left({x}_i\right)+b\ \right)-{y}_i\le \varepsilon +{\xi}_i\\ {}{y}_i-\left({w}^T\phi \left({x}_i\right)+{b}_i\ \right)-{y}_i\le \varepsilon +{\xi}_i^{\ast}\\ {}{\xi}_i,{\xi}_i^{\ast}\ge 0,i=1,\dots, N,\varepsilon \ge 0\ \end{array}} $$
(21)

SVM separates the unseen data using the construction of the hyperplanes after the identification of the two support vectors in regression problems. However, the procedures for meeting the set target ultimately create fairly large separations between points in feature space. If a method of computing the intramural outcome in a feature space is obtainable directly as a purpose to the initial input points, building a nonlinear learning machine to reduce the large space expansion of the error function can be done. It is accomplished by the computation of a kernel function denoted by K. For regression problems, there are some methods that engage nonlinear kernels and proceed to minimisation stages.

There are several kernel functions [105], and one of them is the Laplacian radial basis function or a radial basis function (rbf), which is given as:

$$ K\left(x,{x}_i\right)=\mathit{\exp}\left(-\frac{1}{\sigma^2}x-{x}_i^2\right) $$
(22)

In the cases of kernel functions, x and xi are vectors in the input space, i.e. attribute vectors that are computed from instructions, and sigma is the adjustment parameter of the slope in rbf. Another kernel function is the polynomial which is denoted by:

$$ K\left(x,y\right)={\left({x}^Ty+c\right)}^d $$
(23)

where x and y are the input vectors, c is an optional constant obtained during training, d represents the degree of the polynomial kernel function and T indicates the linearly transformed vector in the generalised dot product.

Apart from the radial basis function and polynomial, there are two other kernel functions, namely linear and Gaussian, which are also supported in MATLAB. The linear kernel function is given by the product of the two vectors x and y as:

$$ K\left(x,y\right)={x}^Ty+c $$
(24)

Lastly, the Gaussian kernel function, which is a type of the radial basis function kernel is given as:

$$ K\left(x,y\right)=\exp \left(-\frac{{\left\Vert x-y\ \right\Vert}^2}{2{\sigma}^2}\right) $$
(25)

In this study, the described four kernel functions were applied using the MATLAB computing platform. While the standards and kernel scale were set to true and auto, respectively, in the SVR system, the other parameters were left at default settings. Additionally, the ideal integer (degree) of the polynomial kernel function for the estimation of cloudless DLR was investigated.

1.1.5 Adaptive neuro-fuzzy inference system (ANFIS)

The adaptive neuro-fuzzy inference system (ANFIS) is a combination system that integrates the features of both fuzzy logic (FL) and NN systems [53, 54, 62]. A fuzzy logic system can handle both linguistic and numerically based knowledge, and it is generally considered as the nonlinear mapping scalar output data of an input (feature) data vector. FL is vast because there are numerous possibilities which lead to the formation of various mappings.

The structure of FL consists of a knowledge-based system that contains function definitions and the essential ‘if-then’ rules. These rules can be extracted from the numerical data or might be pre-provided. Crisp numbers are mapped into fuzzy sets by the fuzzifier. It is required to activate rules, which are based on linguistic variables and have an association with the fuzzy sets. Fuzzy sets help represent symbolic knowledge information in a more understandable, humane or natural form and can hold uncertainties at numerous levels. To manipulate a system based on fuzzy rules, the derivation of the essential if-then fuzzy rules, the dividing of the universes and acknowledgement of the membership functions are required.

Linguistic variables are used and incorporated in fuzzy rule-based systems to insert rationale, with the help of a series of logical rules containing if-then, which connect antecedent(s) and consequent(s), respectively. A fuzzy clause with a membership between 0 and 1 is an antecedent. Fuzzy rules can be in connection with operators which can act as multiple antecedents. This is done such that all its constituting parts are considered simultaneously and resolved in a single number. On the other hand, consequents may also comprise multiple parts, which can be aggregated later in the form of a single output of a fuzzy set.

Systems, which are accurate and rule-based, offer a high number of advantages. For example, they can see the fine line of difference between accurate and over-general rules: the latter will always have a lesser accuracy considering that the payoff is supposed to vary in relation with the inputs covered by the rule at large. Indeed, an approach based on accuracy can certainly lead to the evolution of the optimally general rules. In addition to this, it maintains a consistently correct and a consistently incorrect rule simultaneously, allowing the learning of a complete ‘covering map’ to process. However, with many rules, antecedents and consequents, the whole mapping process in FL is relatively slow and requires a boost.

The low computational deficiency of FL is compensated for by neural networks, and on the other hand, the reasoning ability of FL compensates the backpropagation fast learning schemes of NN [4, 53, 62, 68]. More information on the fundamentals and applications of ANFIS abound in specialised publications [1, 19].

In this section, assuming the fuzzy interference system has two crispy inputs of x and y, an output of z to be fuzzified with the fuzzy set: A1, A2, B1 and B2; then, the first-order Takagi-Sugeno system of if-then rules can be given as follows:

$$ {\displaystyle \begin{array}{c}\mathrm{Rule}\ 1:\mathrm{if}\ x\ \mathrm{is}\ {A}_1\mathrm{and}\ y\ \mathrm{is}\ {B}_1\mathrm{then}\ {f}_1={p}_1x+{q}_1y+{r}_1\\ {}\mathrm{Rule}\ 2;\mathrm{if}\ x\ \mathrm{is}\ {A}_2\mathrm{and}\ y\ \mathrm{is}\ {B}_2\mathrm{then}\ {f}_2={p}_2x+{q}_2y+{r}_2\end{array}} $$
(26)

where f1 and f2 are the respective outputs associated with the fuzzy sets, p1, p2, q1, q2, r1 and r2 are design parameters that are determined during training.

ANFIS modelling is in five layers (Fig. 3). The first layer is the fuzzification unit where the crispy inputs x and y are added to each node i which is associated with respective linguistic marker Ai or Bi-2, such that the membership function Oi, j estimates the membership grade of the given input as:

$$ {\displaystyle \begin{array}{cc}{O}_{1,i}={\mu}_{A_i}(x),& for\ i=1,2\\ {}{O}_{2,i}={\mu}_{B_{i-2}}(y),& for\ i=3,4\end{array}} $$
(27)
Fig. 3
figure 3

ANFIS structure consisting of the five stages

The linguistic markers Ais and Bis can be designated as high (H1), medium (M1) and low (L1) and corresponding H2, M2 and L2, respectively. The mean of all membership functions is given by the Gaussian membership function as:

$$ {\mu}_{\mathrm{Guassian}}(x)=\exp \left(\frac{1}{2}{\left(\frac{x-c}{\sigma}\right)}^2\right) $$
(28)

The parameters c and σ are adaptive and determine the degree of slope of the membership function.

The second layer is the membership unit, which determines the firing strength of a rule, by applying the product or the addition of the input variables with the membership functions that are incoming signals from the first layer. The function wi is the firing strength of the rule.

$$ {w}_i={\mu}_{A_i}(x)\times {\mu}_{B_{i-2}}(y), for\ i=1,2 $$
(29)

The next layer, which is the third, is where all nodes are fixed, and it is the portion that calculates the ration of the firing strength of the rule of the ith node to the sum of the firing strength of all rules and of all nodes. The third layer is otherwise regarded as the normalised layer.

$$ {O}_{3,i}=\frac{{\overline{w}}_i}{w_1+{w}_2}\ for\ i=1.2 $$
(30)

The fourth layer, also known as the de-fuzzy layer, takes the product of:

$$ {O}_{4,i}={\overline{w}}_i{f}_i={\overline{w}}_i\left({p}_ix+{q}_iy+{r}_1\right)\ for\ i=1,2 $$
(31)

The fifth layer is the output layer that takes the sum of the outputs from the previous layer as:

$$ {O}_{5,i}=\sum {\overline{w}}_i{f}_i=f\ out $$
(32)

By default, the input to output match was used with three languages of high, medium and low for each input. Hence (3 243) rules were created in the training phase and embedded into the back propagation neural network scheme of ANFIS.

1.2 Statistical indicators

The statistical measures used in this study for examining the extent of the appropriateness of the models are the correlation coefficient, r and the t test, ts. They are expressed in the equations below:

$$ r=\frac{n\left(\sum {V}_c{V}_m\right)-\left(\sum {V}_c\right)\left(\sum {V}_m\right)}{\sqrt{\left[n\sum {V_c}^2-{\left(\sum {V}_c\right)}^2\right]\left[n\sum {V_m}^2-{\left(\sum {V}_m\right)}^2\right]}} $$
(33)

and

$$ {t}_s=\sqrt{\frac{\left(n-1\right){\mathrm{MBE}}^2}{\left({\mathrm{RMSE}}^2-{\mathrm{MBE}}^2\right)}} $$
(34)

Furthermore,

$$ \mathrm{MBE}=\frac{\sum \left({V}_C-{V}_m\right)}{n} $$
(35)

and,

$$ \mathrm{RMSE}=\sqrt{\frac{\sum {\left({V}_c-\kern0.5em {V}_m\right)}^2}{n}} $$
(36)

where n is the total number of observations, MBE is the mean bias error, VC is the calculated value from the model, Vm is the measured value, and RMSE is the root-mean-square error.

It was desired that the magnitude of the correlation coefficient, r, either positive or negative, should be within the highest range of 0.90–1.00 and that statistical significance of the considered model estimates was attained. A given model’s estimates are statistically significant if its t test is lower than the critical t (or tα/2) at a selected (1 − α) % confidence level (or α level of significance) and (n − 1) degrees of freedom. The value of the critical t at an α level of significance and corresponding degrees of freedom in two tails is obtainable from statistical tables [13, 108]. In this study, alpha was chosen as 5 and from the statistical table, two tails critical t values for many observations above 30 range from 2.02 to 1.96.

2 Results

A few days were cloudless at Ilorin (Fig. 4) during the period considered in this study due to the heavy cloudiness associated with the equatorial regions. Out of 1735 days from two separate periods of measurements, 205 days were clear, representing a meagre 12% that is: \( \frac{205}{1735}\ x\ 100 \). Cloudiness seemed linked or inversely proportional to rainfall (Figs. 4 and 5). For example, August with the lowest number of clear days precedes September—the month with the highest rainfall. Furthermore, the monthly distribution of clear days seems to be high during the first part of the year when rainfall is low and small during the second part of the year when rainfall is high (Figs. 4 and 5). Furthermore, there seemed to be a remarkable connection between the beginning of a season and cloudiness. For instance, April had the highest number of clear days, yet the month of April may be regarded as the beginning of the rainy season. Similarly, November, which is the beginning of the dry season, also had a reasonably high number of clear days (Fig. 5). Therefore, there appeared to be a ‘season-change’ effect on clear days as well.

Fig. 4
figure 4

Monthly distribution of clear days in Ilorin, Nigeria

Fig. 5
figure 5

Monthly amount of rainfall distribution in Ilorin, Nigeria

From Table 2, the ideal relationship between the effective emissivity of the atmosphere and both atmospheric air temperature and pressure of vapour is \( {\varepsilon}_m\propto \frac{e}{T} \), which is like \( {\varepsilon}_m\propto {\left(\frac{e}{T}\right)}^{\frac{1}{7}} \) of Brutsaert [18]. After several attempts, the best fit for the effective emissivity of the atmosphere was found to be εm \( \propto \left(\frac{e}{T^{13}}\right) \), and the newly developed clear skies model was:

$$ \mathrm{DLR}=\left(1.014\left(\frac{1.0\times {10}^{30}\times e}{T^{13}}\right)+0.699\right)\sigma {T}^4 $$
(37)
Table 2 Correlation between the emissivity of the atmosphere and both water vapour pressure, e, and temperature, T, from September 1992 to August 1994

The results estimating cloudless DLR from both regression and soft computing methods are in Tables 3, 4 and 5. Generally, the performance of all models at the training phase (which also has a greater number of observations) was better than the testing phase (Tables 3, 4 and 5). And the correlation coefficient was often low for any model that its effective emissivity of the atmosphere can be represented by a constant or is not linked to water vapour pressure (Eqs. (16), (17), (18) and (24)). Seemingly, models with both the water vapour pressure and air temperature variables were better than those with only one variable, though no model correlated up to 90%. Despite the poor correlations of Swinbank [111], the regression model was always statistically significant, which is an indication that the high power of temperature somewhat replicates the ideal situation. The new model was the most impressive regression model in Table 3 because, in addition to the highest correlation coefficients, its estimates were also statistically significant.

Table 3 Statistical results for regression models’ estimations of cloudless DLR and the statistically significant models are in italics
Table 4 Statistical results for soft computing in the estimations of cloudless DLR and the statistically significant models are in italics
Table 5 Statistical results for each degree of polynomial kernel function of SVR in estimation of cloudless DLR and the statistically significant models are in italics

Furthermore, the investigation of the most suitable degree of the polynomial kernel function (poly) for estimating cloudless DLR on Table 5 reveals that the sixth was the best. Afterwards, the performance reduced such that beyond the eighth degree, the results became undesirable (Table 5). Contrastingly, Table 4 reveals that at default settings, the Gaussian kernel function SVR was better than the other three kernel functions (Table 4).

Maybe the performance of the new model could have been better if it were globalised given the level of its success in estimating cloudless DLR in a temperate climate. The new regression model was applied on the data in Jimenez et al. [56], and the statistical measure results at Barcelona, Spain, were: r = 0.9264, MBE = 16.9133, RMSE = 22.5964 and (for n = 40) ts = 7.1386. At 95% confidence level, the critical t value from the statistical table is approximately 2.02, and because the calculated t test is not lower than the critical value, it is accepted that the new model does not estimate cloudless DLR in Spain with statistical significance. Irrespective of the attained highest correlation level, the new model overestimated clear skies DLR at Barcelona with large error values. Furthermore, despite the new model not having a correlation of up to 90% at the location it was derived, it did at Barcelona, which indicates its wide applicability (though the coefficients may have to be localised for a better fit).

Comparison between the new regression model and the soft computing models (last model on Table 3 and all on Table 4) reveals that the former is as good as the latter. Moreover, most SVR models have deviation errors (MBE) greater than our regression model, though ANFIS has the lowest MBE values of any model. The implications of the low errors of ANFIS include being streamlined with the training data and the ability to impress with data not originally exposed to it. Furthermore, comparing the training phase to the testing phase, there is a distinguished higher variation in the performance of the expert systems than the regression models. For instance, if the correlation coefficient value of a regression model, say Eq. (A) is higher than another regression model, say Eq. (B) at training, it follows the same order at testing (Table 3), but that is not always the case with expert systems (Tables 4 and 5). Notably, the correlation coefficient is higher for NN than ANFIS during the training phase, but the reverse is the case during the testing phase (Table 4).

3 Discussion

Apart from the hefty cloudiness of the wet season, which is associated with a high concentration of water vapour, another factor that influences clearness of the sky is the Harmattan period that occurs between November and February. Heavy aerosols (dust-laden) and poor visibility accompany the hot Harmattan spell [3, 10]. Thus, the number of cloudless days is also low during the Harmattan season—say December and January like at some period of the rainy season (Figs. 4 and 5). Because of the distinctive seasonal clouds in this region, estimating DLR and climate modelling are difficult [74].

Given the influence of change in season on the clarity of the atmosphere, the strength and direction of the winds that induce change from one season to another, in turn, affect clear days at the site. The strength of the wind is strongest in April and weakest in November, and those 2 months have the highest clear days. While the moist south-westerly wind dominates during the rainy season, the dry north-easterly wind controls the dry season, and the beginning of the dominance of both winds in April and November corresponds with low atmospheric cloud cover [83, 84, 87].

The formation of matter, from elementary particles to light elements like hydrogen, helium and lithium and by extension the origin of atmospheric gases like H2O and CO2 that impact on DLR can be traced to the beginning of the universe. According to the Big Bang Theory, the initial temperature of the universe was infinite nearly 14 billion years ago. Then the universe was a small, extremely dense entity regarded as the ylem or cosmic egg. The ylem inferred a single instance when time and space coexisted (singularity). Time began with the explosion of the cosmic egg, leading to different eras. Temperatures were approximately: 1032 K at the Planck Era (0 s ≤ time ≥ 10−43 s), 1027 K at the Grand Unification Era (10–43 s ≤ time ≥ 10–36 s), 1027 K at the Electroweak Era (10–36 s ≤ time ≥ 10–33 s), 1015–1012 K at the Particle Era (10–12 s ≤ time ≥ 10–4 s), 109–107 K at the Era of Nucleosynthesis (3 min ≤ time ≥ 20 min), 105 K at the Era of Nuclei (3 min ≤ time ≥ 20 min), 3000 K at the Era of Atoms (time about 380,000 years ago) and 30 K during the Era of Galaxies (time about 500 million years ago). This present era is regarded as the Era of the Solar System, and its temperature is around 3 K.

Furthermore, the Particle Era can be divided into the first period of formation of quarks (10−12 s) when the temperature was about 1015 K; secondly, the period of formation of heavy particles when it was roughly 1014 K; and lastly, the era of light particle formation when temperature was approximately 1012 K. Though there are other theories on the beginning of the universe (e.g. steady state theory), the Big Bang Theory predicts the universal existence of background radiation at 3 K, and its verification had put to rest contradictory theories [38, 112, 113, 128]. Accordingly, the good fit of DLR to the high inverse power of temperature could be linked to the Particle Era in the Big Bang Theory when the temperature of the universe was around 1012 to 1015 K. That could imply that gases and other particles continue to exist by remaining in inverse equilibrium with their formation temperatures.

Perhaps, the mechanism of DLR should be understood from the angle of electromagnetic radiation interaction with matter, which produces diverse phenomena such as the photoelectric effect, Compton scattering and pair annihilation [15, 36]. Like the dielectric heating technology that raises the heat of food in a microwave oven, the interaction between moving gases and solar radiation in the air possibly results in high subatomic temperatures of atmospheric gases [5, 90]. However, it is not known if dielectric heating of particles not rotating in the atmosphere can lead to a temperature of this magnitude.

Since most of the models for DLR are location-dependent, it is not surprising that the others do not fit Ilorin well, as ours do not elsewhere. There are many variations in the composition of the global atmosphere, and its nature, whether dominated by water vapour or other gases like aerosols, to a large extent influences DLR [125]. Some researchers like AL-Lami et al. [7], Duarte et al. [34] and Lhomme et al. [64] developed their models by imitating Brutsaert [18] (Eq. 19), in which the effective emissivity of the atmosphere, εm takes the form of:

$$ {\varepsilon}_m=A{\left(\mathrm{e}/\mathrm{T}\right)}^{\mathrm{B}} $$
(38)

where A and B are constants. Analytically and empirically, depending on the unit of water vapour pressure, the values of B range between \( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$6$}\right. \) and \( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$10$}\right. \).

The derivation and explanation in Brutsaert [18] do not tally with ours because air temperature powers are significantly distinct, though both models divide water vapour pressure by air temperature in the expressions for the effective emissivity of the atmosphere. To the best of our knowledge, our study is the first to propose the present formula (Eq. (38)), where the inverse of temperature is to the power 13. The robustness of our model signifies that there may be gaps in our previous understanding of DLR processes in the atmosphere particularly at the equatorial region. With regards to using screen-level water vapour pressure and ambient conditions as predictors in clear skies DLR models, to the best of our knowledge again, ignoring the studies where older models were improved upon, until now the latest developed concepts were those of Guest [42] and Iziomon et al. [51] (Eqs. (22) and (23)). Some authors found that previously developed clear skies models were adequate; hence, there was no need to advance other concepts, while in some cases, their new models developed are similar to existing models [33, 104]. For instance, Niemelӓ et al. [78] came up with two comparable cloudless models for two boundary conditions of water vapour pressure; nonetheless, the models are similar to Eq. (15) [35] because each model’s effective emissivity of the atmosphere is directly proportional to water vapour pressure, with varying coefficients but same power of e. It is noteworthy to mention that unlike the clear skies DLR models, there are recent cloudy and all skies models for the radiation [2, 104, 127].

Efforts in the past [81, 82] revealed no significant concept about DLR like this present work. Contrary to the black box technique of the expert systems, regression models reveal the relationship between predictor(s) and the estimator. Although not shown, the correlation between the effective emissivity of the atmosphere, εm and\( \left(\frac{e}{T^{13}}\right) \) was slightly higher than those of εm with\( \left(\frac{e^2}{T^{11}}\right) \)and\( \left(\frac{\sqrt{e}}{T^{15}}\right) \). Since the power of temperature inversely depends on that of water vapour pressure, it is also likely that the absence of other gases in the model influences the power of the air temperature.

Ignoring other influences, H2O, CO2 and O3 are the major gases that control the thermal radiation of the atmosphere [49], and unfortunately only the state of H2O, which is the most important gas, is accounted for in most analytical and empirical regression expressions of DLR. Theoretical inferences are meaningful when backed by experiments; likewise, observational results should have theoretical bases, and both views support that data on atmospheric water vapour and air temperature are ideal for modelling DLR. Hence, water vapour and air temperature profiles in the atmosphere have been studied in both the theoretical and experimental viewpoints. Researchers have adopted different patterns like square root law or log law in interpreting the link between water vapour component and air temperature [17, 18, 31]. It is also assumed that air temperature at a certain power is appropriate in representing the behaviour of water vapour [111]. However, a suitable DLR model for every condition is difficult to come by. The problem lies in the inability to adequately characterise the nature of water vapour in DLR’s mechanism. At times, the situation could be compounded where data are rare to come by if other factors besides water vapour pressure and air temperature are included in the models [23]. Based on this study, air temperature needs to be reconsidered in its relationship with water vapour pressure in the absence of other gases or other factors like cloud cover information when modelling cloudless DLR. Therefore, the theoretical frame may have to be redefined. The model developed in this study can be applied to other equatorial regions, though at other climates, the coefficient would have to be localised. Preferably using data from diverse climates around the globe could produce a universal empirical expression that possibly fits all locations on the globe. Furthermore, the extensions of our model to both cloudy and all skies conditions should be considered in future studies.

Global solar radiation has been measured in many locations across Nigeria; however, DLR has only been measured in Ilorin. Consequently, in contrast to global solar radiation that has several developed regression models over many locations, due to lack of data, DLR has only a handful of models [80, 104]. Though there are cloudless DLR models other than Eqs. (14)–(25), as earlier mentioned, most of newer models are similar to the old ones and in some cases, researchers are localising the coefficients to fit locations of interest [26, 64, 65, 104]. Since the target of this study was to use water vapour pressure and temperature as predictors, also due to lack of data on other variables, clear skies models that deploy total water vapour column and dew temperature could not be tested [104].

As mentioned earlier, some researchers had concluded that expert systems perform better than regression models even when both techniques were not subjected to similar conditions. However, in this study, we have shown that that is not necessarily so if they are uniformly compared. Though SVR has been highly recommended [85, 91, 94], in this study, the model is associated with comparative high errors, unlike ANFIS with low error terms and consistent good correlations. The stability of ANFIS could be due to its peculiar combination of the schemes of NN with much fuzzy logic. It should be noted that relevant studies cited here model SVR with R open-source packages. Thus, the high SVR errors in this study are possibly due to changes in software or unaltered system parameters [85, 91, 94]. Nonetheless, the sixth polynomial kernel function should be used in the default settings for modelling with SVR. Regression models are relatively easy to implement. However, soft computing is for those with the expertise, and to date, the feedforward neural network scheme in MATLAB produces different results even when similar training methods are repeated. Such uncertainty calls to question the stability of some expert system techniques.

4 Conclusions

In this study, the tendency of regression models and expert systems of NN, SVR and ANFIS to estimate cloudless DLR at an equatorial location, Ilorin (8.50 °N, 4.55 °E), Nigeria, was investigated. Data from September 1992 to August 1994 and July 1995 to March 1998 were considered, while the clear days were divided into experimental and testing groups, water vapour pressure and air temperature were the used predictors. It was found that cloudiness is connected to changes in the season, and that a new regression model was incomparable to 13 others developed elsewhere. Possibly, the missing effect of other DLR gases reflects on the format of the new model, and the analytical relationship between water vapour pressure and air temperature in atmospheric DLR should be reconsidered in line with our model.

However, like other cloudless DRL models, our new empirical model was limited when tested at another location. Hence, we recommend that in future, DLR should be modelled with inputs from locations all over the globe in addition to the extension of our model to cloudy and all skies conditions. Though the soft computing models were generally better than the regression models; nonetheless, the new regression model outperformed NN on the test data. Furthermore, in SVR estimations of cloudless DLR, the sixth degree polynomial kernel function was superior to the Gaussian kernel function at default settings. Overall, ANFIS is recommended for modelling cloudless DLR owing to its peculiar low mean bias errors, which could be tied to the combination of enough human logics with fast processing techniques from neural networks. However, due to the relative complexity of modelling with expert systems, the new regression model is a viable option.