1 Introduction

Recently, the field of functional data analysis (FDA) has seen phenomenal growth in numerous areas of applications such as environmental science, medical, engineering and biomaterials in order to enhance the planning methods and increase the efficiency of services or products [1,2,3]. Therefore, the FDA method is considered to be one of the most advanced techniques using all available data as practical measurements and curves [4,5,6]. This is due to the emergence of many modern technologies and software that have made mathematical computing possible in analyzing data by using the FDA [7,8,9,10]. So, the quantitative structure of the FDA approach could be viewed as an acceptable methodology with a specific perspective compared to traditional statistical analyses [11,12,13,14]. Many contributions cover a wide range of statistical problems involving the FDA such as mathematical foundations, covariance operator estimation, functional depth, functional autoregressive processes, linear regression [15,16], semiparametric regression, nonparametric regression, spatial functional statistics, robust functional data analysis as well as sparsity in FDA [17,18,19].

Ramsay and Silverman [20] offered an excellent overview of FDA bases and expanded the application of classical mathematical science, such as primary component analysis, regression analysis, linear models and confidence intervals, to the practical system. In another related book, Ferraty and Vieu [21] provided studies on multiple parametric and nonparametric methods in order to produce useful tools for the FDA, while Kokoszka and Reimherr [22] provided a detailed description of the current quantitative structure of the FDA. In order to improve the credit card payment system, Laukaitis [23] studied the automatic regression model to obtain the cash flow index and transaction intensity using the FDA. Through this study, it is possible to predict and then solve the continuous stochastic problem for a full period of time. In another related study, Hyndman and Shang [24] introduced new tools to form smooth curves of large amounts of visualization using the FDA. These tools included a bagplot and boxplot that use the first two rigorous factor scores: the data depth of Tukey and the regions with the highest density. According to their results, the proposed tools detect the extreme values at a high-accuracy mathematical speed with a graphic representation simultaneously. In another work, Ikeda et al. [25] studied how loud observation data could be appropriate for smooth and continuous functions that capture the main features of plankton volatility, without the need for explicit assumptions or distribution factors for distribution by using the FDA. According to their findings, the FDA approach meets the requirements of stability and smoothness, without sudden changes in the curves, and is not influenced by any asymmetrical measurements and missing observations. In the case of gas emissions considering vectors, which have been component gas concentration values for each observation, Torres et al. [26] suggested a solution to the problem of the finding of extreme values in urban gas emissions using functional data analysis. On the basis of their results, for a period of 2 months after the study, they achieved the obtained carbon emission extreme values. In another related work, Shang [27] proposed many methods for visualizing functional time series which are implemented by R add-on package software in order to analyze the fertility data and sea surface temperatures data. These techniques obviously have the advantages of identifying the characteristics of a functional time series and also useful tools to visualize trends based on his findings. Recently, Suhaila and Yusop [28] used some of the FDA techniques such as functional descriptive statistics and functional analysis of variance in order to describe the spatial and temporal rainfall variations. The functional results provided a detailed description of the information about differences and Variabilities in rainfall profiles. Therefore, the FDA approaches were able to extract additional insights included in the curves and functions, which could not be available from classical statistical methods.

Based on the advantages and facilities which presented by the FDA methodology during the previous years; it is obvious that this methodology is considered an effective technique in dealing with diverse data and advanced analyses compared to the traditional approaches. Therefore, The FDA approach has been used for the current work on the predictive data and adaptation to rainfall data in the Taiz Region in order to assist decision-makers in planning well for the management of rainwater conservation and to understand the temporal changes during the last two decades. The basic process and associated statistical techniques are shown in the method section of the theoretical context of FDA methodologies. Results and discussion section present the findings of the case study to understand the rainfall measurements in Taiz City. Conclusions and summary are discussed at the end of this research.

2 Methods and materials

This section presents the mathematical background of the FDA methodology that will be applied in the current work. The Eqs. (1)–(12) are employed to complete the research steps. The major step after getting data is to convert the discrete data into functional data, the next step is to smooth the functional data and the last two steps are to obtain the functional results of rainfall data as shown in Fig. 1.

Fig. 1
figure 1

Summary of the most important steps used in this research

2.1 Constructing basis functions and Fourier system

Observations or measurements are usually measured in specific periods, such as days, months or years. In the first step of functional data analysis, observed discrete data are converted into functional data objects. Functional data objects are defined through the specification of a set of basis functions and a number of coefficients that describe a linear combination of basis functions. Suppose that the values of the discrete observations \(\varvec{ y}_{{\varvec{i }}} \left( {\varvec{t}_{\varvec{j}} } \right)\) are fitted using the standard model [3, 20]:

$$\varvec{ y}_{{\varvec{i }}} \left( {\varvec{t}_{\varvec{j}} } \right) = \varvec{x}_{\varvec{i}} (\varvec{t}_{\varvec{j}} ) + \epsilon_{{\varvec{ij}}} \varvec{ },\varvec{ }\quad \varvec{i} = 1, \ldots ,\varvec{n };\varvec{ j} = 1, \ldots ,\varvec{T}$$
(1)

where \(\left({\epsilon_{{\varvec{ij}}}} \right)\) are measurement errors, and the functions \(\varvec{ x}_{{\varvec{i }}} \left( \varvec{t} \right)\) are linear combinations of basis functions \(\varvec{ }\phi_{\varvec{k}} \left( \varvec{t} \right)\). This permits to illustrate most of the variation included in the functional data.

A basis function system is a set of familiar functions \(\phi_{\varvec{k}}\) that are mathematically independent of each other; these functions have the attribute that converge well by taking a weighted sum or linear combination of a sufficiently big number \(\varvec{K}\) of basis functions.

The function is constructed based on a set of functional building blocks \(\phi_{\varvec{k}}\), \(\varvec{k} = \varvec{ }1,\varvec{ }.\varvec{ }.\varvec{ }.\varvec{ },\varvec{ K}\) called basis functions expansion, which are combined linearly so that a function \(\varvec{x}\left( \varvec{t} \right)\) defined in this method is expressed in the mathematical format as [3, 28]:

$$\varvec{x}\left( \varvec{t} \right) = \mathop \sum \limits_{{\varvec{k} = 1}}^{\varvec{K}} \varvec{c}_{\varvec{k}} \phi_{\varvec{k}} \left( \varvec{t} \right) = \varvec{c^{\prime}}\phi \left( \varvec{t} \right)\varvec{ }$$
(2)

where the parameters \(\varvec{c}_{1} ,\varvec{ c}_{2} , \ldots .,\varvec{c}_{\varvec{k}}\) are the coefficients of the expansion. The matrix expression \(\varvec{c^{\prime}}\phi\) uses c to denote the vector of length \(\varvec{K}\) of coefficients \(\varvec{ c}_{\varvec{k}}\) and \(\phi\) to indicate a functional vector of length K comprising the basis functions \(\varvec{ }\phi_{\varvec{k}}\).

The Fourier basis is a system that is normally used for periodic functions and has the preference for high-speed computing and flexibility. The first function of the Fourier basis is the constant function. The remainders are pairs of sine and cosine with multiple integer base periods. The number of basis functions is always odd. The information that is required to define the Fourier basis system includes the number of basis functions K and the period T. The Fourier series can be written in the form of sine and cosine functions [3, 28]:

$$\hat{\varvec{x}}\left( \varvec{t} \right) = \varvec{c}_{0} + \varvec{c}_{1} \sin \left( {\varvec{\omega t}} \right) + \varvec{c}_{2} \cos \left( {\varvec{\omega t}} \right) + \varvec{c}_{3} \sin \left( {2\varvec{\omega t}} \right) + \varvec{c}_{4} \cos \left( {2\varvec{\omega t}} \right) + \cdots ,$$
(3)

Equation (3) is defined by the basis \(\phi_{0} \left( \varvec{t} \right) = 1,\varvec{ }\phi_{{2\varvec{r} - 1}} \left( \varvec{t} \right) = \sin \left( {\varvec{r\omega t}} \right)\varvec{ },\varvec{ }\,{\text{and }}\,\varvec{ }\phi_{{2\varvec{r}}} \left( \varvec{t} \right) = \cos \left( {\varvec{r\omega t}} \right)\). This basis is periodic, and the parameter \(\varvec{\omega}\) determines the period \(2\varvec{\pi}/\varvec{\omega}\).

2.2 Penalized smoothing

The roughness penalty is determined through the establishment of a functional parameter object with a basis, smoothing parameter and penalized derivative. The next equation called the penalized sum square of error (PENSSE) is used to estimate the coefficient vector in order to minimize errors of curves [3, 28].

$${\text{PENSSE}}_{\varvec{\lambda}} \left( \varvec{x} \right) = \mathop \sum \limits_{\varvec{j}} \left[ {\varvec{y}_{\varvec{j}} - \varvec{x}\left( {\varvec{t}_{\varvec{j}} } \right)} \right]^{2} +\varvec{\lambda}\int {\left[ {\varvec{D}^{4} \varvec{x}\left( \varvec{t} \right)} \right]^{2} {\text{d}}t}$$
(4)

where \(\varvec{x}\left( \varvec{t} \right) = \varvec{c^{\prime}}\phi \left( \varvec{t} \right)\); in the second term, the integrated fourth square derivative is called the roughness penalty, which calculates the overall curvature of the second derivative of x. The λ is a parameter for smoothing; it measures the exchange rate between the data normalization fit and smoothness. With the increase in λ, roughness is increasingly penalized and \(\varvec{x}\left( \varvec{t} \right)\) will turn into linear, and with the decrease in λ, the penalty is reduced and allows \(\varvec{x}\left( \varvec{t} \right)\) to fit the data better. The identical term for the vector of fits to the data is [20, 28]:

$$\varvec{S}_{\phi } \varvec{ y} =\varvec{\varPhi}\left( {\varvec{\varPhi^{\prime}W\varPhi } + \varvec{\lambda R}} \right)^{ - 1} \varvec{ }^{\varvec{ }} \varvec{\varPhi^{\prime}W y}$$
(5)

where n by K matrix Φ consists of the values of the K basis functions at the n sampling points, W is a weight matrix to warrant for probable covariate structure among residuals and y is the vector of discrete data to be smoothed. The \(\varvec{S}_{\phi }\) is called the projection operator corresponding to the basis system \(\phi\), and R is known as the penalty matrix.

The generalized cross-validation measure (GCV) is designed to establish and choose the superior value for smoothing parameter λ. The criterion is given by [20, 28]:

$${\text{GCV}}_{\varvec{\lambda}} = \frac{{\varvec{n}^{ - 1} \sum {[\varvec{y}_{\varvec{i}} - \varvec{x}\left( {\varvec{t}_{\varvec{i}} } \right)]^{2} } }}{{\left[ {\varvec{n}^{ - 1} \,{\text{trace}}\,\left( {\varvec{I} - \varvec{S}} \right)} \right]^{2} }}\varvec{ }$$
(6)

The degrees of freedom values are controlled by \(\varvec{\lambda}\), and the measure of the smooth fit is \(df\left( {\varvec{\uplambda}} \right) = {\text{trace }}(\varvec{S}_{{\phi ,\varvec{ }{\varvec{\uplambda}}}} )\), where the trace means the sum of the diagonal elements of \(\varvec{S}_{\phi } \varvec{ y }\). The best choice of λ is associated with the minimum value of GCV.

2.3 Functional statistics and location parameters

2.3.1 Functions of mean and standard deviation

The traditional statistics for multivariate data are conventional to functional data. The mean function of curves is given as [3, 20, 28]:

$$\varvec{\mu}\left( \varvec{t} \right) = \frac{1}{\varvec{N}}\mathop \sum \limits_{{\varvec{i} = 1}}^{\varvec{N}} \varvec{x}_{\varvec{i}} \left( \varvec{t} \right)\varvec{ }$$
(7)

Equation (8) describes the analysis of variance \(({\text{VAR}}_{\varvec{X}} )\), while Eq. (9) represents standard deviation (SD) [3, 20, 28]:

$${\text{VAR}}_{\varvec{X}} \left( \varvec{t} \right) = \frac{1}{\varvec{N}}\mathop \sum \limits_{{\varvec{i} = 1}}^{\varvec{n}} \left[ {\varvec{x}_{\varvec{i}} \left( \varvec{t} \right) - \bar{\varvec{x}}\left( \varvec{t} \right)} \right]^{2} \varvec{ }$$
(8)
$${\text{SD}}_{\varvec{X}} \left( \varvec{t} \right) = \sqrt {{\text{VAR}}_{\varvec{X}} \left( \varvec{t} \right)} \varvec{ }$$
(9)

2.3.2 Functions of covariance and correlation

The functional covariance summarizes the dependence of measurements across different argument values and is estimated for all \(\varvec{s}\) and \(\varvec{t}\) by [3, 20, 28]:

$$\varvec{\sigma}\left( {\varvec{s},\varvec{t}} \right) = \frac{1}{\varvec{N}}\mathop \sum \limits_{{\varvec{i} = 1}}^{\varvec{N}} (\varvec{x}_{\varvec{i}} \left( \varvec{s} \right) - \bar{\varvec{x}}\left( \varvec{s} \right))\left( {\varvec{x}_{\varvec{i}} \left( \varvec{t} \right) - \bar{\varvec{x}}\left( \varvec{t} \right)} \right)\varvec{ }$$
(10)

The associated correlation function is

$$\varvec{\rho}\left( {\varvec{s},\varvec{t}} \right) = \frac{{\mathop \sum \nolimits_{{\varvec{i} = 1}}^{\varvec{N}} (\varvec{x}_{\varvec{i}} \left( \varvec{s} \right) - \bar{\varvec{x}}\left( \varvec{s} \right))\left( {\varvec{x}_{\varvec{i}} \left( \varvec{t} \right) - \bar{\varvec{x}}\left( \varvec{t} \right)} \right)}}{{\sqrt {\mathop \sum \nolimits_{{\varvec{i} = 1}}^{\varvec{N}} \left( {\varvec{x}_{\varvec{i}} \left( \varvec{s} \right) - \bar{\varvec{x}}\left( \varvec{s} \right)} \right)^{2} } \sqrt {\mathop \sum \nolimits_{{\varvec{i} = 1}}^{\varvec{N}} \left( {\varvec{x}_{\varvec{i}} \left( \varvec{t} \right) - \bar{\varvec{x}}\left( \varvec{t} \right)} \right)^{2} } }}$$
(11)

In multivariate data analysis, these are the functional analogs of the variance–covariance structures and correlation matrices, respectively. The variability of the functional data is visualized by plotting the surfaces of covariance and correlation as a function of s and t as well as the corresponding insights such as heat map, fields and contours.

2.4 The singular value decomposition technique

Singular value decomposition (SVD) is a useful method in visualizing patterns of functional and multivariate data. This technique was proposed by Zhang et al. [29]; they employed the idea of projection pursuit by revealing just low-dimensional projections that show interesting features of the high-dimensional point. SVD technique decomposes high-dimensional smoothed multivariate data into singular rows, singular columns and singular values ordered according to the amount of interpreted variance. The SVD of \(\varvec{f}\left( {x_{i} } \right)\) is defined as [27, 29]:

$$\varvec{f}\left( {\varvec{x}_{\varvec{i}} } \right) = \varvec{s}_{1} \varvec{u}_{1} \varvec{v}_{1}^{\varvec{T}} + \varvec{s}_{2} \varvec{u}_{2} \varvec{v}_{2}^{\varvec{T}} + \cdots \varvec{ } + \varvec{s}_{\varvec{K}} \varvec{u}_{\varvec{K}} \varvec{v}_{\varvec{K}}^{\varvec{T}}$$
(12)

where the singular columns \(\varvec{u}_{1} , \ldots , \varvec{u}_{\varvec{K}}\) form K orthonormal basis functions for the column space extended by \(\left\{ {\varvec{c}_{\varvec{j}} ; j = 1, \ldots ,n} \right\}\); n is the number of curves; the singular rows \(\varvec{v}_{1} , \ldots , \varvec{v}_{\varvec{K}}\) form K orthonormal basis functions for the row space extended by \(\left\{ {\varvec{r}_{\varvec{i}} ; i = 1, \ldots ,p} \right\}\); and p is the number of covariates. The vector {\(\varvec{u}_{\varvec{K}}\)} is called singular column, and the vector {\(\varvec{v}_{\varvec{K}}\)} is called a singular row; the scalars \(\varvec{s}_{1} , \ldots .,\varvec{s}_{\varvec{K}}\) are called singular values. The matrix {\(\varvec{s}_{\varvec{K}} \varvec{u}_{\varvec{K}} \varvec{v}_{\varvec{K}}^{\varvec{T}}\); k = 1, …, K} indicates the SVD component.

2.5 A case study

Study area Taiz City is one of the largest cities in Yemen and located in the southwest of Yemen as shown in Fig. 2 [30]. Its average height from sea level is 1311 m. The elevation is 1200 m above sea level with latitude N 13° 42′ and longitude E 44° 55′. The mean annual rainfall and temperature are 588 mm and 24 °C, respectively, while the mean monthly evaporation is 140 mm [31].

Fig. 2
figure 2

Taiz governorate Map [30]

Data source Average monthly precipitation over Taiz City, Yemen, region 13.5901 N and 44.2639 E of Taiz City during the period from January 1998 to December 2018 was obtained from the Tropical Rainfall Measuring Mission satellite data. In the present dataset, we have N = 21 discrete measurements \(\varvec{y}_{\varvec{i}} \left( {\varvec{t}_{\varvec{j}} } \right)\varvec{ },\varvec{ t}_{\varvec{j}} \in \left[ {1,12} \right]\varvec{ },\varvec{ i} = 1, \ldots ,21\); the ith discrete observation \(\varvec{y}_{\varvec{i}} \left( {\varvec{t}_{\varvec{j}} } \right)\varvec{ },\varvec{ j} = 1, \ldots ,12\) denotes rainfall records for the ith year that are converted into smoothed functions \(\varvec{ y}_{\varvec{i}} \left( \varvec{t} \right)\). The discrete satellite data of average monthly precipitation \(\varvec{y}_{\varvec{i}} \left( {\varvec{t}_{\varvec{j}} } \right)\) are converted to smoothed curves \(\varvec{ y}_{\varvec{i}} \left( \varvec{t} \right)\) as continuous temporal functions by a base period of \(12\) months. This smoothing can be obtained by the predefine basis system.

Used software In the current study, the R software with the “FDA”, “rainbow”, “fields” and “plot3D” packages have been used to process the functional data [20, 27, 32, 33].

3 Results and discussion

This section is divided into four sections. Constructing functional data will be displayed in the first section, while penalized smoothing will be demonstrated in the second section. Then, functional statistics with visualization will be presented in the third section. The fourth section will display the results of singular value decomposition visualization.

3.1 Constructing functional data

The first step is to transform the discrete rainfall details into continuous functions or curves. Seasonal fluctuations and periodicity of the rainfall data of the entire series are often shown throughout the annual cycle. Then, Fourier basis functions are preferred. The choice of k = 12 can be justified to capture the precipitation variation within a month. In R software (FDA package), the discrete precipitation data were converted into functional data objects by using Fourier bases. All of the obtained functional data observations of the average monthly precipitation are represented in one plot, as shown in Fig. 3; this plot gives an idea that data are periodic. Therefore, this periodicity can justify the use of the Fourier basis functions, in order to mainly reach the peaks and circles. It is clear that certain basis functions are certainly too rough, especially for years with a much higher level of noise. Hence, as described in the next section, the penalized smoothing is used to improve functional results using general cross-validation (GCV) criteria.

Fig. 3
figure 3

Representation of all the functional data observations for average monthly rainfall using Fourier basis functions

3.2 Penalized smoothing for functional precipitation data

In the previous section, functional data objects are explained and constructed by using the Fourier basis expansion. The goal of penalized smoothing is to eliminate the contribution of the errors and noise and obtain the best estimate of curves \(x\left( t \right)\). The generalized cross-validation (GCV) criterion should be applied to choose the level of smoothing by arranging the harmonic acceleration operator with setting up a saturated basis that is capable of interpolating all the functional data of precipitation. The linear differential operator (LDO) is generated according to the formula \(\left( {2\pi /12} \right)^{2}\), and selecting the smoothing parameter is controlled based on the minimized generalized cross-validation (GCV) estimate. It is important to choose a smoothing parameter that improves the quality of smoothing, reflects the structure of the data well and also captures a high percentage of illustrated variance. Therefore, the best-chosen smoothing parameter is 0.05, which minimizes generalized cross-validation, and the corresponding degree of freedom is 4. The connection between lambda with generalized cross-validation and lambda with a degree of freedom is described in Fig. 4. The smoothed precipitation curves for all years are visible as an assembly of the smoothed lines in Fig. 5. The penalized GCV smoothing indicates that the result of the curves well reflects the structure of the functional data and that the following functional results are dependent upon them.

Fig. 4
figure 4

The above panel displays the relationship between the GCV criteria and the smoothing level lambda. The below panel displays the relationship between the degree of freedom and the smoothing level lambda

Fig. 5
figure 5

Plot of the smoothed functional data for average monthly rainfall using penalized smoothing with GCV; the smoothing parameter is 0.05

3.3 Functional statistics and visualization

3.3.1 Mean and standard deviation functions

The mean and standard deviation of functional data for average monthly precipitation are presented in Fig. 6a, the solid curve is the mean function of average monthly precipitation derived from the penalized smoothing and standard deviation curve is plotted in red dashed line. The confidence interval curves of the mean curve are presented in Fig. 6b, and the blue dashed lines represent 95% pointwise confidence intervals on the mean curve based on the penalized smoothing estimate of measurement plotted in Fig. 5 and standard deviation function plotted in Fig. 6a. The mean function provides a quick insight into the centering of functional data, and then the central curve of precipitation data can be determined. Generally, Fig. 6a shows that the maximum rainfall occurs in the summer season, especially in August and September. Besides, the maximum rainfall occurs in the spring season, especially in May and June, followed by a recession during the beginning and end of the year. The standard deviation gives us an idea about the typical variability of curves at any point of time t, but it gives no information on how the values of the curves at point t relate to those at point s; the more in-depth analyses will be obtained in the next subsections.

Fig. 6
figure 6

a Plot of the mean and standard deviation curves, b plot of confidence interval curves of the mean function

3.3.2 Functional covariance and correlation with visualization

The bivariate temporal variance–covariance surface by contour map with fields is shown in Fig. 7, and perspective plot with heat map is shown in Fig. 8. These measures have the usefulness of supplying detailed information regarding the covariance structure, on the other side, when it takes place across temporal permutation. We observe that the main part of the variability occurs in the two periods of the year, and it is insignificant elsewhere. That is, the highest variability occurs completely in September and August and also in June and May. These periods correspond approximately to the highest rainfall. In the added fields onto the contour map, the black color indicates a very high covariance across years, and the red color indicates a slightly high covariance. In the heat map added onto the perspective plot, the red color indicates a strongly high covariance across years, and the blue color indicates a low covariance across years. The visualization for the bivariate temporal correlation surface using correlation functions is presented by contour map with fields as given in Fig. 9; it shows that there are strong correlations over average monthly precipitation across all years.

Fig. 7
figure 7

Contour map on the left and fields plot on the right for estimated covariance surface from the rainfall data across all years

Fig. 8
figure 8

Plot perspective on the left and heat map on the right for the estimated covariance surface from the rainfall data across all years

Fig. 9
figure 9

Contour map on the left and fields plot on the right for correlation surface estimated from the rainfall data across all years

3.4 Singular value decomposition visualization

The main feature of the decomposition of a single value is to simultaneously view row information and column data from a two-sided matrix. It is intended to detect local variations and interactions in two-sided matrix years between rows as months and columns. Moreover, it connects the corresponding curves to the matrix.

Figure 10 displays the singular value decomposition plot of average monthly precipitation over the years 1998–2018. The SVD1 component clearly captures the seasonal patterns, whereas the SVD2 and SVD3 components display the contrasts of average monthly precipitation among different seasons and months. Residuals panel shows that the residuals are centered on zero as the number of singular value decomposition components increases, as shown in Fig. 10. Since the singular value decomposition visualization leads to the conclusion that there is evidence that the rainfall patterns oscillate, one might be interested in investigating which of the intersections between months and years is different from each other or which of them explains a greater variation in the amount of rainfall. Hence, the singular value decomposition image was used.

Fig. 10
figure 10

Plot singular value decomposition for the average monthly rainfall of Taiz city over the years 1998–2018

Figure 11 displays the singular value decomposition image for the average monthly precipitation of Taiz City between months and over the years 1998–2018. This plot significantly captures the variability between months and years, with the appropriate adjustment for the degree of color; the black color represents a high variability. Indeed, the local variations and interactions between years and months are highlighted. One finds that rainfall data are clearly visualized by a singular decomposition technique, and variations are shown based on rainfall amounts. The current findings are consistent with the study developed by Shang [27] who applied the singular value decomposition technique on the sea surface temperature dataset, and therefore, these results are considered a basis for researchers to conduct future studies.

Fig. 11
figure 11

Singular value decomposition image for the average monthly rainfall between months and over years 1998–2018

In comparison with previous studies, especially similar studies, this study has additional features such as the fields, heat maps and images as shown in Table 1 that are used for this study. Thus, it can be said that the visualization tools that were performed in this study, such as heat maps, fields and images, made it easier to understand the features of rainfall shapes and determine time variation. Moreover, these technologies have demonstrated the associated patterns of average monthly rainfall in the Taiz Region over the past two decades.

Table 1 Comparison of the methods used in the current study with the previous studies

4 Conclusions

This research analyzes and visualizes the average monthly rainfall throughout the last two decades for Taiz City. FDA approaches with emphasis on smoothing and visualization were modified and applied for the rainfall measurements as an important step in a full FDA. This methodology included penalized smoothing with generalized cross-validation criteria, functional parameters such as location scales, variance–covariance surfaces and correlation functions with visualization and singular value decomposition technique. Based on the results, the following main conclusions can be drawn from this work:

  1. 1.

    The entire rainfall observations were treated by FDA techniques like function data shown by curves which represent the actual phenomena better and display many graphic displays of rainfall data.

  2. 2.

    The penalized GCV smoothing made it easy to choose the best smoothing parameter so that the average monthly rainfall functional data were determined. Therefore, the noise was decreased, and errors were eliminated.

  3. 3.

    More information about the rainfall and distribution of precipitation rates in the Taiz area was provided by means of continuous temporal directions in the location parameters and scale covariates.

  4. 4.

    During specific periods, a singular value decomposition technique provided a deeper understanding of the variations in precipitation data over the Taiz Region and clearly showed intersections between years and months.

  5. 5.

    Completely functional findings have shown that rainfall rates differ significantly. Generally, the variability was very high during the summer season, high in spring, moderate in autumn and less or none in winter; May and September also represented the most rainfall rate.

  6. 6.

    Finally, based on findings, it is recommended to study future research that deals with other functional concepts such as functional principal component analysis, functional clustering, functional classification and functional regression. Furthermore, these techniques should be used in modeling and forecasting of precipitation with other associated variables.