Abstract
Analyzing Multivariate Functional Data (MFD) presents growing challenges in the context of climate change modeling due to many issues, such as coarse resolution, model complexity, and big data processing. In this regard, we introduced a Multivariate Functional Model-Based Clustering (MFMBC) method to analyze Multivariate Functional Rainfall and Temperature (MFRT) data. The data was collected spanning four decades (Jan.1980–Apr.2022) over 37 locations in Yemen. The main objective is to identify the underlying spatial–temporal dynamic structure of MFRT data and model the association/interrelationship between data. The proposed MFMBC method consists of three key phases: projecting MFRT data variation through Multivariate Functional Principal Component Analysis (MFPCA), identifying optimal clusters with Bayesian Information Criteria (BIC), and optimizing model parameters using Expectation–Maximization (EM) algorithm. According to the findings, three ideal clusters for MFRT data profiles were identified and labeled as severe, moderate, and high temperatures, which correspond to heavy, moderate, and light rainfall patterns. Cluster 1 had a negative nexus characterized by slight changes and low-peak rainfall with high changes and large-peak temperatures. Cluster 2 exhibited a natural nexus with a mild pattern in both rainfall and temperature. Cluster 3 had positive-nexus displayed significant variations with large-volume peaks in rainfall and temperature. Overall, these results help in assessing the complex interaction between rainfall and temperature over the spatial–temporal domain and offer valuable insights for policy-makers to address climate-related challenges.
Similar content being viewed by others
Data availability
All relevant data are accessible from the corresponding author upon request.
Change history
14 May 2024
A Correction to this paper has been published: https://doi.org/10.1007/s00477-024-02733-z
References
Acal C, Aguilera A, Sarra A, Evangelista A, Battista T, Palermi S (2022) Functional ANOVA approaches for detecting changes in air pollution during the COVID-19 pandemic. Stoch Env Res Risk Assess 36(4):1083–1101. https://doi.org/10.1007/s00477-021-02071-4
Al Buhairi MH (2010) Analysis of monthly, seasonal and annual air temperature variability and trends in Taiz city-Republic of Yemen. J Environ Prot 01(04):401–409. https://doi.org/10.4236/jep.2010.14046
Al-Masawa M, Manab N, Omran A (2018) The effects of climate change risks on the mud architecture in Wadi Hadhramaut, Yemen. In: The impact of climate change on our life, pp 57–77. https://doi.org/10.1007/978-981-10-7748-7_3
Almazroui M, Islam N, Saeed F, Saeed S, Ismail M, Ehsan A, Diallo I, Brien E, Ashfaq M, Martínez-Castro D, Cavazos T, Cerezo-Mota R, Tippett MK, Gutowski W, Alfaro E, Hidalgo H, Vichot-Llano A, Campbell J, Kamil S, Barlow M (2021) Projected changes in temperature and precipitation over the United States, Central America, and the Caribbean in CMIP6 GCMs. Earth Syst Environ 5(1):1–24. https://doi.org/10.1007/s41748-021-00199-5
AlSarmi S, Washington R (2011) Recent observed climate change over the Arabian Peninsula. J Geophys Res Atmos. https://doi.org/10.1029/2010JD015459
AL-wesabi I, Zhijian F, Philip C, Hanlin B (2022) A review of Yemen ‘ s current energy situation, challenges, strategies, and prospects for using renewable energy systems. Environ Sci Pollut Res 29:53907–53933. https://doi.org/10.1007/s11356-022-21369-6
Amouzay H, Chakir R, Dabo-Niang S, El Ghini A (2023) Structural changes in temperature and precipitation in MENA countries. Earth Syst Environ 7(2):359–380. https://doi.org/10.1007/s41748-023-00344-2
Arnone E, Ferraccioli F, Pigolotti C, Sangalli LM (2022) A roughness penalty approach to estimate densities over two-dimensional manifolds. Comput Stat Data Anal 174:107527. https://doi.org/10.1016/j.csda.2022.107527
Biswas J, Bhattacharya S (2023) Investigation of nonstationary association of monsoon temperature and precipitation extremes through past and future over East-Central India. Pure Appl Geophys 180(3):1143–1171. https://doi.org/10.1007/s00024-023-03242-w
Bouvet A, El Kolei S, Marbac M (2023) Investigating swimming technical skills by a double partition clustering of multivariate functional data allowing for dimension selection. 2016:1–21. http://arxiv.org/abs/2303.15812
Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300. https://doi.org/10.1007/s11634-011-0095-6
Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann Appl Stat 9(4):1726–1760. https://doi.org/10.1214/15-AOAS861
Bouveyron C, Celeux G, Brendan M, Adrian R (2019) Model-based clustering and classification for data science. Cambridge University Press, Cambridge
Bouveyron C, Jacques J, Schmutz A, Simoes F, Bottini S (2022) Co-clustering of multivariate functional data for the analysis of air pollution in the South of France. Ann Appl Stat 16(3):1400–1422
Chamroukhi F (2013) Robust em algorithm for model-based curve clustering. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–8. https://doi.org/10.1109/IJCNN.2013.6706758
Coffey N, Hinde J, Holian E (2014) Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data. Comput Stat Data Anal 71:14–29. https://doi.org/10.1016/j.csda.2013.04.001
Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38(2):1171–1193. https://doi.org/10.1214/09-AOS741
Ghumman AR, Ateeq-Ur-rauf, Haider H, Shafiquzamman M (2020) Functional data analysis of models for predicting temperature and precipitation under climate change scenarios. J Water Clim Change 11(4):1748–1765. https://doi.org/10.2166/wcc.2019.172
Golovkine S, Klutchnikoff N, Patilea V (2022) Clustering multivariate functional data using unsupervised binary trees. Comput Stat Data Anal 168:1–38. https://doi.org/10.1016/j.csda.2021.107376
Górecki T, Smaga L (2017) Multivariate analysis of variance for functional data. J Appl Stat 44(12):2172–2189. https://doi.org/10.1080/02664763.2016.1247791
Guo J, Ma S, Li X (2022) Exploring the differences of sustainable urban development levels from the perspective of multivariate functional data analysis: a case study of 33 cities in China. Sustainability (switzerland) 14(19):12918. https://doi.org/10.3390/su141912918
Hael M (2023b) Unveiling air pollution patterns in Yemen: a spatial–temporal functional data analysis. Environ Sci Pollut Res 30(17):50067–50095. https://doi.org/10.1007/s11356-023-25790-3
Hael M (2020) Modeling of rainfall variability using functional principal component method : a case study of Taiz region Yemen. Model Earth Syst Environ. https://doi.org/10.1007/s40808-020-00876-w
Hael M (2023a) Modeling spatial–temporal variability of PM2.5 concentrations in belt and road initiative (BRI) region via functional adaptive density approach. Environ Sci Pollut Res 30:0123456789. https://doi.org/10.1007/s11356-023-30048-z
Hyndman RJ, Shang HL (2010) Rainbow plots, bagplots, and boxplots for functional data. J Comput Graph Stat 19(1):29–45. https://doi.org/10.1198/jcgs.2009.08158
Ieva F, Paganoni A, Pigoli D, Vitelli V (2013) Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J Royal Stat Soc Series C Appl Stat 62(3):401–418. https://doi.org/10.1111/j.1467-9876.2012.01062.x
Iraqi AA, Abdallah AM (2022) Analysis of long-term climatic changes at Al-Hodeidah-Yemen during the period between 1985 and 2019. Theoret Appl Climatol 150(3–4):1067–1081. https://doi.org/10.1007/s00704-022-04178-2
Jacques J, Preda C (2012) Clustering multivariate functional data. In: COMPSTAT 2012: 20th international conference on computational statistics, pp 353–366
Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171. https://doi.org/10.1016/j.neucom.2012.11.042
Jacques J, Preda C (2014a) Functional data clustering: a survey. Adv Data Anal Classif 8(3):231–255. https://doi.org/10.1007/s11634-013-0158-y
Jacques J, Preda C (2014b) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106. https://doi.org/10.1016/j.csda.2012.12.004
Jansen M (2015) Generalized cross Validation in variable selection with and without shrinkage. J Stat Plann Inference 159:90–104. https://doi.org/10.1016/j.jspi.2014.10.007
Kokoszka P, Reimherr M (2017) Introduction to functional data analysis. Taylor & Francis Group, New York
Lackner M, Sajjadi B, Chen W (2022) Handbook of Climate Change Mitigation and Adaptation, 3rd edn. Springer, Cham. https://doi.org/10.1007/978-3-030-72579-2
Martino A, Ghiglietti A, Ieva F, Paganoni A (2019) A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. Stat Methods Appl 28(2):301–322. https://doi.org/10.1007/s10260-018-00446-6
Misumi T, Matsui H, Konishi S (2019) Multivariate functional clustering and its application to typhoon data. Behaviormetrika 46(1):163–175. https://doi.org/10.1007/s41237-018-0066-8
Neath A, Cavanaugh J (2012) The Bayesian information criterion: background, derivation, and applications. Wires Comput Stat 4:199–203. https://doi.org/10.1002/wics.199
Park J, Ahn J (2017) Clustering multivariate functional data with phase variation. Biometrics 73(1):324–333. https://doi.org/10.1111/biom.12546
R Core Team (2018) R : a language and environment for statistical computing. R foundation for statistical computing,Vienna, Austria,. https://www.r-project.org/
Ramsay J, Silverman B (2005) Functional data analysis. Springer Series in Statistics
Ramsay J, Wickham H, Graves S, Hooker G (2022) fda: functional data analysis. R package version 6.0.5. https://cran.r-project.org/package=fda.usc/fda.usc.pdf, Article R package version 5.5.1. https://cran.r-project.org/package=fda
Rhoden I, Weller D, Voit A (2022) Spatio-temporal dynamics of European innovation—an exploratory approach via multivariate functional data cluster analysis. J Open Innov Technol, Market, Complex 8(1):6. https://doi.org/10.3390/joitmc8010006
Salvo F, Ruggieri M, Plaia A (2015) Functional principal component analysis for multivariate multidimensional environmental data. Environ Ecol Stat 22(4):739–757. https://doi.org/10.1007/s10651-015-0317-8
Sandeep P, Reddy GPO, Jegankumar R, Arun Kumar KC (2021) Modeling and assessment of land degradation vulnerability in semi-arid ecosystem of Southern India using temporal satellite data, AHP and GIS. Environ Model Assess 26(2):143–154. https://doi.org/10.1007/s10666-020-09739-1
Schmutz A, Jacques J, Bouveyron C, Chèze L, Martin P (2020) Clustering multivariate functional data in group-specific functional subspaces. Comput Stat 35(3):1101–1131. https://doi.org/10.1007/s00180-020-00958-4
Schmutz J, Bouveyron C, Jacques J (2021) Package ‘funHDDC’ (R package version 2.3.1). https://cran.r-project.org/package=funHDDC
Schulman S (2021) Yemenis’ daily struggles between conflict and climate change. RUSI J 166(1):82–92. https://doi.org/10.1080/03071847.2021.1922196
Subyani AM, Hajjar AF (2016) Rainfall analysis in the contest of climate change for Jeddah area, Western Saudi Arabia. Arab J Geosci 9(2):1–15. https://doi.org/10.1007/s12517-015-2102-2
Suhaila J, Yusop Z (2017) Spatial and temporal variabilities of rainfall data using functional data analysis. Theoret Appl Climatol 129(1–2):229–242. https://doi.org/10.1007/s00704-016-1778-x
Wang G, Liu S, Han F, Di C-Z (2023) Robust functional principal component analysis via a functional pairwise spatial sign operator. Biometrics 79(2):1239–1253. https://doi.org/10.1111/biom.13695
Xu C, Sun Q, Lu C (2022) Analysis of rainfall and temperature characteristics and its correlation with Southern Oscillation Index in Beijing, China. Water Supply 22(4):4544–4557. https://doi.org/10.2166/ws.2022.116
Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590. https://doi.org/10.1198/016214504000001745
Funding
Haiqiang Ma's research is supported by National Natural Science Foundation (NNSF) of China (No. 12161042).
Author information
Authors and Affiliations
Contributions
MH: Conceptualization, Methodology, Formal analysis, Investigation, Data curation, Software, Writing–original draft. HM: Supervision, Conceptualization, Resources, Investigation, Methodology. AA : Data curation, Validation, Writing–review & editing. HA : Visualization, Writing–review & editing. AT : Software, Writing–review & editing. FA : Writing–review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare there is no conflict.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised due to the corresponding author Mohanned Abduljabbar Hael and co-author name Hamas A. Al-kuhali name corrections.
Appendices
Appendix 1 The spatial distribution of the selected sites in Yemen
Appendix 2 The cities’ names and their geographical characteristics
No | Station name | Latitude (°N) | Longitude (°E) | Elevation (m) |
---|---|---|---|---|
1 | Sanaa | 15.35 | 44.2 | 2257 |
2 | Manaha | 15.0742 | 43.7416 | 2233 |
3 | Taiz | 13.5789 | 44.0219 | 1369 |
4 | Mocha | 13.3167 | 43.250 | 2 |
5 | Dhubab | 12.9431 | 43.4102 | 2 |
6 | Turbah | 13.2127 | 44.1241 | 1772 |
7 | Ḩudaydah | 14.8022 | 42.9511 | 14 |
8 | Zabid | 14.200 | 43.3167 | 108 |
9 | Mukalla | 14.5333 | 49.1333 | 21 |
10 | Tarīm | 16.050 | 49.000 | 610 |
11 | Sayun | 15.943 | 48.7873 | 652 |
12 | Addis | 14.8833 | 49.8667 | 108 |
13 | Aden | 12.800 | 45.0333 | 38 |
14 | Ibb | 13.9667 | 44.1667 | 1893 |
15 | Yarim | 14.2978 | 44.3778 | 2638 |
16 | Dhamar | 14.550 | 44.4017 | 2427 |
17 | Zinjibar | 13.1283 | 45.3803 | 13 |
18 | Aḩwar | 13.5202 | 46.7137 | 34 |
19 | Şadah | 16.9358 | 43.7644 | 1875 |
20 | Ḩajjah | 15.695 | 43.5975 | 1733 |
21 | Midi | 16.321 | 42.813 | 8 |
22 | Rada | 14.4295 | 44.8341 | 2120 |
23 | Bayḑa’ | 13.979 | 45.574 | 2018 |
24 | Ataq | 14.550 | 46.800 | 1158 |
25 | Rawḑah | 14.480 | 47.270 | 793 |
26 | Bayḩan | 14.8007 | 45.7189 | 1129 |
27 | Laḩij | 13.050 | 44.8833 | 132 |
28 | Ghayz̧ah | 16.2394 | 52.1638 | 14 |
29 | Sayḩut | 15.2105 | 51.2454 | 4 |
30 | Marib | 15.4228 | 45.3375 | 1089 |
31 | Khamir | 15.990 | 43.950 | 2449 |
32 | Amrān | 15.6594 | 43.9439 | 2264 |
33 | Ḩadibu | 12.6519 | 54.0239 | 7 |
34 | Maḩwīt | 15.4694 | 43.5453 | 2026 |
35 | Ḩazm | 16.1641 | 44.7769 | 1114 |
36 | Jabīn | 14.704 | 43.599 | 2034 |
37 | Ḑali | 13.6957 | 44.7314 | 1519 |
Appendix 3 More technical details about Sect. 3 (Theoretical framework)
-
The general structure of basis functions \({\varvec{\psi}}\left( t \right)\) and its coefficients \({\mathbf{\mathcal{C}}}\) in the functional multivariate framework can be expressed as:
$$\begin{gathered} {\mathbf{\mathcal{C}}} = \left( {\begin{array}{*{20}l} {c_{11}^{1} \ldots c_{{1R_{1} }}^{1} } \hfill & {c_{11}^{2} \ldots c_{{1R_{2} }}^{2} } \hfill & \cdots \hfill & {c_{11}^{j} \ldots c_{{1R_{j} }}^{j} } \hfill \\ {c_{21}^{1} \ldots c_{{2R_{1} }}^{1} } \hfill & {c_{21}^{2} \ldots c_{{2R_{2} }}^{2} } \hfill & \cdots \hfill & {c_{21}^{p} \ldots c_{{2R_{j} }}^{p} } \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {c_{N1}^{1} \ldots c_{{NR_{1} }}^{1} } \hfill & {c_{N1}^{2} \ldots c_{{NR_{2} }}^{2} } \hfill & \cdots \hfill & {c_{N1}^{j} \ldots c_{{NR_{j} }}^{j} } \hfill \\ \end{array} } \right) \hfill \\ {\varvec{\psi}}\left( t \right) = \left( {\begin{array}{*{20}l} {\psi_{1}^{1} \left( t \right) \ldots \psi_{{R_{1} }}^{1} \left( t \right)} \hfill & {0 \ldots \ldots 0} \hfill & \cdots \hfill & {0 \ldots \ldots 0} \hfill \\ {0 \ldots \ldots 0} \hfill & {\psi_{1}^{2} \left( t \right) \ldots \psi_{{R_{2} }}^{2} \left( t \right)} \hfill & \cdots \hfill & {0 \ldots \ldots 0} \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {0 \ldots \ldots 0} \hfill & {0 \ldots \ldots 0} \hfill & \cdots \hfill & {\psi_{1}^{j} \left( t \right) \ldots \psi_{{R_{j} }}^{j} \left( t \right)} \hfill \\ \end{array} } \right) \hfill \\ \end{gathered}$$(11) -
Consequently, using the covariance estimator \(\hat{\varvec{v}}\left( {s,t} \right)\) in Eq. (5) and principal component \({\varvec{\xi}}_{l}\) in Eq. (6), the reformulating of the eigen-problem in Eq. (4) becomes:
$$\mathop \smallint \limits_{1}^{508} \hat{\varvec{v}}\left( {s,t} \right) {\varvec{\xi}}_{l} \left( t \right) dt = \eta_{l} {\varvec{\xi}}_{l} \left( s \right)$$$$\begin{gathered} \mathop \smallint \limits_{1}^{508} \frac{1}{N - 1}{\varvec{\psi}}\left( s \right) {\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}}{\varvec{\psi}}^{\prime } \left( t \right){\varvec{\xi}}_{l} \left( t \right)dt = \eta_{l} {\varvec{\psi}}\left( s \right){\varvec{b}}_{l}^{\prime } \Leftrightarrow \frac{1}{N - 1}{\varvec{\psi}}\left( s \right){\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}}\underbrace {{\mathop \smallint \limits_{1}^{508} {\varvec{\psi}}^{\prime } \left( t \right){\varvec{\psi}}\left( t \right)dt}}_{{\text{W}}}{\varvec{b}}_{l}^{\prime } \hfill \\ \frac{1}{N - 1}{\varvec{\psi}}\left( s \right) {\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}} {\varvec{W}} {\varvec{b}}_{l}^{\prime } = \eta_{l} \psi \left( s \right) {\varvec{b}}_{l}^{\prime } , \hfill \\ \end{gathered}$$(12)where \({\varvec{W}} = \mathop \smallint \limits_{1}^{508} {\varvec{\psi}}^{\prime } \left( t \right){\varvec{\psi}}\left( t \right)dt\) is a R × R Matrix (with R = \(\mathop \sum \limits_{j = 1}^{p} R_{j}\)) containing the inner product of our pre-defined basis functions.
-
The MFPCA scores are followed a Gaussian distribution \(\delta_{i}^{k} \sim {\mathbf{\mathcal{N}}}\left( {\mu_{k} ,\Delta_{k} } \right)\) with the mean function \(\mu_{k} \in \mathbb{R}\) and the covariance matrix \({{\varvec{\Delta}}}_{k}\), which is structured as:
$${{\varvec{\Delta}}}_{k} = \left( {\begin{array}{*{20}l} {\left[ {\begin{array}{*{20}c} {a_{k1} } & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & {a_{{kd_{k} }} } \\ \end{array} } \right]} \hfill & \cdots \hfill & 0 \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill \\ 0 \hfill & \cdots \hfill & {\left[ {\begin{array}{*{20}c} {b_{k} } & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & {b_{k} } \\ \end{array} } \right]} \hfill \\ \end{array} } \right) \begin{array}{*{20}c} {\left\{ {\begin{array}{*{20}c} {d_{1} } \\ \vdots \\ {d_{k} } \\ \end{array} } \right.} \\ {\left\{ {\begin{array}{*{20}c} {R - d_{1} } \\ \vdots \\ {R - d_{k} } \\ \end{array} } \right.} \\ \end{array}$$(13)
With the help of the covariance structure \({ }{{\varvec{\Delta}}}_{k}\), we have the ability to model the variance of the first \(d_{k}\) principal component with a high degree of accuracy, while the remaining components can be viewed as noise components that can be retained and modeled via the parameter \(b_{k}\).
Appendix 4 Outlier assessment of principal component (PC) scores
To assess the potential outlier values in the rainfall and temperature datasets, here we present supplementary results utilizing the graphical representation of the Bivariate Bag-Plot (BBP) method. This method involves projecting the scores of the first and second principal components and plotting them in a two-dimensional graph, as depicted in the figure below. According to this figure, the BBP results illustrate the 99% probability coverage for the rainfall and temperature data. Within the BBP framework, we can see the dark and light gray regions representing the inner and outer bag regions, respectively. The inner bag denotes the smallest depth region encompassing at least 50% of the observations, while the outer bag is the convex hull of the region obtained by inflating the inner bag using a factor α = 2.58, covering 90% of the estimated probability values. Any points located outside the fence (outer bag) regions are considered outliers. Nevertheless, no severe outliers have been identified based on our BBP findings.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hael, M.A., Ma, H., Al-Sakkaf, A.S. et al. Dynamic clustering of spatial–temporal rainfall and temperature data over multi-sites in Yemen using multivariate functional approach. Stoch Environ Res Risk Assess (2024). https://doi.org/10.1007/s00477-024-02700-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s00477-024-02700-8