Skip to main content
Log in

Dynamic clustering of spatial–temporal rainfall and temperature data over multi-sites in Yemen using multivariate functional approach

  • ORIGINAL PAPER
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

A Correction to this article was published on 14 May 2024

This article has been updated

Abstract

Analyzing Multivariate Functional Data (MFD) presents growing challenges in the context of climate change modeling due to many issues, such as coarse resolution, model complexity, and big data processing. In this regard, we introduced a Multivariate Functional Model-Based Clustering (MFMBC) method to analyze Multivariate Functional Rainfall and Temperature (MFRT) data. The data was collected spanning four decades (Jan.1980–Apr.2022) over 37 locations in Yemen. The main objective is to identify the underlying spatial–temporal dynamic structure of MFRT data and model the association/interrelationship between data. The proposed MFMBC method consists of three key phases: projecting MFRT data variation through Multivariate Functional Principal Component Analysis (MFPCA), identifying optimal clusters with Bayesian Information Criteria (BIC), and optimizing model parameters using Expectation–Maximization (EM) algorithm. According to the findings, three ideal clusters for MFRT data profiles were identified and labeled as severe, moderate, and high temperatures, which correspond to heavy, moderate, and light rainfall patterns. Cluster 1 had a negative nexus characterized by slight changes and low-peak rainfall with high changes and large-peak temperatures. Cluster 2 exhibited a natural nexus with a mild pattern in both rainfall and temperature. Cluster 3 had positive-nexus displayed significant variations with large-volume peaks in rainfall and temperature. Overall, these results help in assessing the complex interaction between rainfall and temperature over the spatial–temporal domain and offer valuable insights for policy-makers to address climate-related challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

All relevant data are accessible from the corresponding author upon request.

Change history

References

Download references

Funding

Haiqiang Ma's research is supported by National Natural Science Foundation (NNSF) of China (No. 12161042).

Author information

Authors and Affiliations

Authors

Contributions

MH: Conceptualization, Methodology, Formal analysis, Investigation, Data curation, Software, Writing–original draft. HM: Supervision, Conceptualization, Resources, Investigation, Methodology. AA : Data curation, Validation, Writing–review & editing. HA : Visualization, Writing–review & editing. AT : Software, Writing–review & editing. FA : Writing–review & editing.

Corresponding author

Correspondence to Mohanned Abduljabbar Hael.

Ethics declarations

Conflict of interest

The authors declare there is no conflict.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised due to the corresponding author Mohanned Abduljabbar Hael and co-author name Hamas A. Al-kuhali name corrections.

Appendices

Appendix 1 The spatial distribution of the selected sites in Yemen

figure a

Appendix 2 The cities’ names and their geographical characteristics

No

Station name

Latitude (°N)

Longitude (°E)

Elevation (m)

1

Sanaa

15.35

44.2

2257

2

Manaha

15.0742

43.7416

2233

3

Taiz

13.5789

44.0219

1369

4

Mocha

13.3167

43.250

2

5

Dhubab

12.9431

43.4102

2

6

Turbah

13.2127

44.1241

1772

7

Ḩudaydah

14.8022

42.9511

14

8

Zabid

14.200

43.3167

108

9

Mukalla

14.5333

49.1333

21

10

Tarīm

16.050

49.000

610

11

Sayun

15.943

48.7873

652

12

Addis

14.8833

49.8667

108

13

Aden

12.800

45.0333

38

14

Ibb

13.9667

44.1667

1893

15

Yarim

14.2978

44.3778

2638

16

Dhamar

14.550

44.4017

2427

17

Zinjibar

13.1283

45.3803

13

18

Aḩwar

13.5202

46.7137

34

19

Şadah

16.9358

43.7644

1875

20

Ḩajjah

15.695

43.5975

1733

21

Midi

16.321

42.813

8

22

Rada

14.4295

44.8341

2120

23

Bayḑa’

13.979

45.574

2018

24

Ataq

14.550

46.800

1158

25

Rawḑah

14.480

47.270

793

26

Bayḩan

14.8007

45.7189

1129

27

Laḩij

13.050

44.8833

132

28

Ghayz̧ah

16.2394

52.1638

14

29

Sayḩut

15.2105

51.2454

4

30

Marib

15.4228

45.3375

1089

31

Khamir

15.990

43.950

2449

32

Amrān

15.6594

43.9439

2264

33

Ḩadibu

12.6519

54.0239

7

34

Maḩwīt

15.4694

43.5453

2026

35

Ḩazm

16.1641

44.7769

1114

36

Jabīn

14.704

43.599

2034

37

Ḑali

13.6957

44.7314

1519

Appendix 3 More technical details about Sect. 3 (Theoretical framework)

  • The general structure of basis functions \({\varvec{\psi}}\left( t \right)\) and its coefficients \({\mathbf{\mathcal{C}}}\) in the functional multivariate framework can be expressed as:

    $$\begin{gathered} {\mathbf{\mathcal{C}}} = \left( {\begin{array}{*{20}l} {c_{11}^{1} \ldots c_{{1R_{1} }}^{1} } \hfill & {c_{11}^{2} \ldots c_{{1R_{2} }}^{2} } \hfill & \cdots \hfill & {c_{11}^{j} \ldots c_{{1R_{j} }}^{j} } \hfill \\ {c_{21}^{1} \ldots c_{{2R_{1} }}^{1} } \hfill & {c_{21}^{2} \ldots c_{{2R_{2} }}^{2} } \hfill & \cdots \hfill & {c_{21}^{p} \ldots c_{{2R_{j} }}^{p} } \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {c_{N1}^{1} \ldots c_{{NR_{1} }}^{1} } \hfill & {c_{N1}^{2} \ldots c_{{NR_{2} }}^{2} } \hfill & \cdots \hfill & {c_{N1}^{j} \ldots c_{{NR_{j} }}^{j} } \hfill \\ \end{array} } \right) \hfill \\ {\varvec{\psi}}\left( t \right) = \left( {\begin{array}{*{20}l} {\psi_{1}^{1} \left( t \right) \ldots \psi_{{R_{1} }}^{1} \left( t \right)} \hfill & {0 \ldots \ldots 0} \hfill & \cdots \hfill & {0 \ldots \ldots 0} \hfill \\ {0 \ldots \ldots 0} \hfill & {\psi_{1}^{2} \left( t \right) \ldots \psi_{{R_{2} }}^{2} \left( t \right)} \hfill & \cdots \hfill & {0 \ldots \ldots 0} \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {0 \ldots \ldots 0} \hfill & {0 \ldots \ldots 0} \hfill & \cdots \hfill & {\psi_{1}^{j} \left( t \right) \ldots \psi_{{R_{j} }}^{j} \left( t \right)} \hfill \\ \end{array} } \right) \hfill \\ \end{gathered}$$
    (11)
  • Consequently, using the covariance estimator \(\hat{\varvec{v}}\left( {s,t} \right)\) in Eq. (5) and principal component \({\varvec{\xi}}_{l}\) in Eq. (6), the reformulating of the eigen-problem in Eq. (4) becomes:

    $$\mathop \smallint \limits_{1}^{508} \hat{\varvec{v}}\left( {s,t} \right) {\varvec{\xi}}_{l} \left( t \right) dt = \eta_{l} {\varvec{\xi}}_{l} \left( s \right)$$
    $$\begin{gathered} \mathop \smallint \limits_{1}^{508} \frac{1}{N - 1}{\varvec{\psi}}\left( s \right) {\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}}{\varvec{\psi}}^{\prime } \left( t \right){\varvec{\xi}}_{l} \left( t \right)dt = \eta_{l} {\varvec{\psi}}\left( s \right){\varvec{b}}_{l}^{\prime } \Leftrightarrow \frac{1}{N - 1}{\varvec{\psi}}\left( s \right){\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}}\underbrace {{\mathop \smallint \limits_{1}^{508} {\varvec{\psi}}^{\prime } \left( t \right){\varvec{\psi}}\left( t \right)dt}}_{{\text{W}}}{\varvec{b}}_{l}^{\prime } \hfill \\ \frac{1}{N - 1}{\varvec{\psi}}\left( s \right) {\mathbf{\mathcal{C}}}^{\prime } {\mathbf{\mathcal{C}}} {\varvec{W}} {\varvec{b}}_{l}^{\prime } = \eta_{l} \psi \left( s \right) {\varvec{b}}_{l}^{\prime } , \hfill \\ \end{gathered}$$
    (12)

    where \({\varvec{W}} = \mathop \smallint \limits_{1}^{508} {\varvec{\psi}}^{\prime } \left( t \right){\varvec{\psi}}\left( t \right)dt\) is a R × R Matrix (with R = \(\mathop \sum \limits_{j = 1}^{p} R_{j}\)) containing the inner product of our pre-defined basis functions.

  • The MFPCA scores are followed a Gaussian distribution \(\delta_{i}^{k} \sim {\mathbf{\mathcal{N}}}\left( {\mu_{k} ,\Delta_{k} } \right)\) with the mean function \(\mu_{k} \in \mathbb{R}\) and the covariance matrix \({{\varvec{\Delta}}}_{k}\), which is structured as:

    $${{\varvec{\Delta}}}_{k} = \left( {\begin{array}{*{20}l} {\left[ {\begin{array}{*{20}c} {a_{k1} } & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & {a_{{kd_{k} }} } \\ \end{array} } \right]} \hfill & \cdots \hfill & 0 \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill \\ 0 \hfill & \cdots \hfill & {\left[ {\begin{array}{*{20}c} {b_{k} } & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & {b_{k} } \\ \end{array} } \right]} \hfill \\ \end{array} } \right) \begin{array}{*{20}c} {\left\{ {\begin{array}{*{20}c} {d_{1} } \\ \vdots \\ {d_{k} } \\ \end{array} } \right.} \\ {\left\{ {\begin{array}{*{20}c} {R - d_{1} } \\ \vdots \\ {R - d_{k} } \\ \end{array} } \right.} \\ \end{array}$$
    (13)

With the help of the covariance structure \({ }{{\varvec{\Delta}}}_{k}\), we have the ability to model the variance of the first \(d_{k}\) principal component with a high degree of accuracy, while the remaining components can be viewed as noise components that can be retained and modeled via the parameter \(b_{k}\).

Appendix 4 Outlier assessment of principal component (PC) scores

To assess the potential outlier values in the rainfall and temperature datasets, here we present supplementary results utilizing the graphical representation of the Bivariate Bag-Plot (BBP) method. This method involves projecting the scores of the first and second principal components and plotting them in a two-dimensional graph, as depicted in the figure below. According to this figure, the BBP results illustrate the 99% probability coverage for the rainfall and temperature data. Within the BBP framework, we can see the dark and light gray regions representing the inner and outer bag regions, respectively. The inner bag denotes the smallest depth region encompassing at least 50% of the observations, while the outer bag is the convex hull of the region obtained by inflating the inner bag using a factor α = 2.58, covering 90% of the estimated probability values. Any points located outside the fence (outer bag) regions are considered outliers. Nevertheless, no severe outliers have been identified based on our BBP findings.

figure b

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hael, M.A., Ma, H., Al-Sakkaf, A.S. et al. Dynamic clustering of spatial–temporal rainfall and temperature data over multi-sites in Yemen using multivariate functional approach. Stoch Environ Res Risk Assess (2024). https://doi.org/10.1007/s00477-024-02700-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00477-024-02700-8

Keywords

Navigation