Inhomogeneity detection in the rainfall series for the Mae Klong River Basin, Thailand

An accurate and complete rainfall record is prerequisite for climate studies. The purpose of this research study was to evaluate the homogeneity of the rainfall series for the Mae Klong River Basin in Thailand. Monthly rainfall data of eight stations in the Mae Klong River Basin for the period 1971–2015 were used. The double mass curve analysis was used to check the consistency of rainfall data, whereas the absolute homogeneity was assessed using the Pettitt test, standard normal homogeneity test, Buishand test, and von Neumann test at a 5% significance level. The results of these tests were qualitatively classified as ‘useful’, ‘doubtful’, and ‘suspect’ according to the null hypothesis. Results of the monthly time series indicated the rainfall data as ‘useful’ for 75% of the stations, while two stations’ data were classified as ‘doubtful’ (Stn130221) and ‘suspect’ (Stn376401). On an annual scale, seven out of eight stations data were classified as ‘useful,’ while one station (Stn376401) data were classified as ‘suspect’. Double mass curve analysis technique was used for the adjustment of inhomogeneous data. The results of this study can help provide reliable rainfall data for climate studies in the basin.


Introduction
The reliability of hydro-meteorological studies depends upon the accuracy and availability of long-term climate data. Due to the occurrence of extreme events such as droughts, floods, and extreme temperatures around the world, climate change studies have received increasing attention. A homogeneous climate series is only affected by natural factors of weather and climate. Often, non-climatic factors such as instrument replacement, change in position of the instrument, changes in observational procedures can seriously affect the accuracy of results. A homogeneous climate record is thus a prerequisite for climate change studies (Caloiero et al. 2020).
Two types of approaches have been used for checking the reliability of hydro-meteorological data series. The first approach is to compare the station data under consideration with the surrounding stations, which is known as relative homogeneity. An example of this approach is the double mass curve analysis (DMCA) method (Alexandersson 1986). In the second approach, statistical methods have been applied to the single station data series for the detection of inhomogeneities which are termed absolute homogeneity tests. Commonly used statistical tests include the standard normal homogeneity test (SNHT) (Alexandersson 1986), the Pettitt test (Pettitt 1979), the Buishand range test (Buishand 1982), and the von Neumann ratio test (von Neumann 1941). Since the relative homogeneity test considers the spatial pattern with the neighboring stations, they are considered more reliable. However, if the stations are widely located and have significant differences in their topography, then an acceptable good correlation may not exist between them. Consequently, statistical methods are preferred (Ahmed et al. 2020).
Both the relative and absolute methods of assessing inhomogeneities in hydro-meteorological data have been used by many researchers. Arikan and Kahya (2019) conducted a homogeneity analysis of monthly precipitation records for 160 meteorological stations in Turkey for the period 1974-2014 using absolute homogeneity tests. Results indicated 44 stations to be inhomogeneous. Double mass curve analysis (DMCA) was used to correct the inhomogeneity in the precipitation records. Ahmed et al. (2020) studied inhomogeneity detection in the precipitation series for Balochistan Province in Pakistan. The absolute homogeneity tests were applied at a 5% significance level to 14 climate stations for the period 1961-2009. Overall, greater portion of the time series data passed the SNHT but rejected the null hypothesis of homogeneous data for the von Neumann tests. Ay (2020) studied trends and homogeneity in temperature and rainfall series in the western Black Sea region, Turkey. The Runs test was used for homogeneity analysis of climate records. The monthly mean temperature for all stations showed no homogeneity. Kocsis et al. (2020) studied homogeneity and trends of precipitation time series in Keszthely, Western Hungary. The Pettitt test was employed for inhomogeneity detection in the annual, monthly, and seasonal precipitation records. Results indicated the importance of homogeneity tests which could help prevent the detection of false trends in the time series data. Suhaila et al. (2008) studied homogeneity analysis of daily rainfall series for 50 stations for the period 1975-2004 for peninsular Malaysia. Absolute homogeneity tests were applied due to sparse distribution of climate stations. Results showed that 10% of the stations had inhomogeneous data. Mair and Fares (2010) studied the homogeneity of the long-term rain gauges in Mākaha Valley, O'ahu, Hawai'i. Double mass analysis was used to investigate the data homogeneity. Pirnia et al. (2019) explored changes in streamflow due to climate and human activities in the Tajan River Basin in Iran using the Mann-Kendall test and double mass curve method. Gao et al. (2017) derived and used a double mass curve for hydrologic evaluation in the middle reaches of the Yellow River in China. The slope break was determined using the Pettitt test.
The Mae Klong River Basin is one of the 25 main river basins in Thailand. It has an abundance of water supplies, which are critical to Thailand's economy. Administratively, the Mae Klong River Basin is composed of 25 districts belonging to eight provinces. Three provinces namely Ratchaburi, Samut Songkhram, and Kanchanaburi contribute around 70% of the basin area. The Electricity Generating Authority of Thailand (EGAT) has been given authority to manage all systems of three major dams as well as operate water supply from reservoirs. The Royal Irrigation Department (RID) is in charge of allocating water in the main irrigation system and meeting downstream water needs at Mae Klong Dam. The main aim of this study was to assess the presence of inhomogeneities in the rainfall series for the Mae Klong River Basin using relative and absolute homogeneity methods which can help provide reliable data for hydro-meteorological modeling of water resources in the basin.

Study area
The Mae Klong River Basin lies between 13°8′ and 16°23′ N in latitude and 98°11′-100°13′ E in longitude in the west of Thailand as shown in Fig. 1. The total area of the basin is 30,167 km 2 . There are two main storage dams and two diversion dams. The two storage dams, Srinagarind (SNR) and Vajiralongkorn (VJK), are located on Khwae Yai and Khwae Noi Rivers, respectively. The two diversion dams, Tha Thung Na (TN) and Mae Klong (MK), are located on the Khwae Yai and Mae Klong Rivers, respectively. Two small tributaries Lam Taphoen River discharges to Khwae Yai River and Lampachi River drains to Khwae Noi River. The water inside the basin is supplied for domestic and industrial demands, hydelpower, and salinity control. Outside the basin, water is supplied to Bangkok Metropolitan Waterworks Authority (MWA) and to the neighboring Tha Chin Basin in the dry season. The most intensive water demand comes from the Greater Mae Klong Irrigation Project (GMKIP), which is located in the lower region of the basin (Khalil et al. 2018).

Data
The monthly rainfall data of eight stations (Stn130013, 130042, 130053, 130211, 130221, 130571, 376401, and 470161) were obtained from the Royal Irrigation Department (RID) and the Thai Meteorological Department (TMD) for the period 1971-2015. The spatial distribution of rainfall stations is shown in Fig. 2a. The percentage of missing rainfall data varied from 0.19% (Stn130053) to 9.81% (Stn376401) as given in Table 1. As the rainfall data were not available for the consecutive months and some full years, the missing data were filled using two approaches. The rainfall data were obtained for 18 stations in the basin, but due to a large number of missing data, only eight stations were considered in this study. Firstly, the missing data for these stations were filled based on strong correlation with the nearby stations (Xia et al. 1999;Garcia et al. 2006). In the second approach, a long-term mean was used to fill the missing data for the corresponding stations. The spatial distribution of mean annual rainfall in the basin is shown in Fig. 2b. The mean annual rainfall of the stations was interpolated using the inverse distance weighting (IDW) technique.

Double mass curve analysis (DMCA)
It is a graphical technique used to check the consistency of hydro-meteorological data. The inconsistency in rainfall data can be attributed to many reasons such as replacement of a rain gauge, shifting of the instrument to a new location, rainfall observation errors, calamities due to forest fires or landslides, etc. Inconsistency can be corrected using DMCA technique. The double mass curve is plotted by taking summation of a rainfall series for the rain gauge station whose consistency is to be checked against the summation of the average rainfall of the remaining rainfall gauge stations in the area under consideration. If the rainfall data are consistent, then the double mass curve should follow a straight line. A break in the slope of the line would indicate that conditions have changed at one location but not at another as this technique is based on the hypothesis that when each recoded data comes from the same parent population, they are consistent. The inconsistent rainfall data for a station can be adjusted as follows (Arikan and Kahya 2019) (1) b: Slope of graph to which records are adjusted. a: Slope of graph at time R o was observed.

Rainfall data homogeneity tests
Homogeneity tests are used to assess the effect of non-climatic factors such as changes in instruments, observation practices, station relocations, and station environments on climate time series data (Toreti et al., 2011). The absolute homogeneity of the rainfall data for the eight stations was assessed by four commonly used tests: the Pettitt test, the standard normal homogeneity test (SNHT), the Buishand test, and the von Neumann test. These tests were conducted at a 5% significance level using XLSTAT software version 2020.5.1 (Addinsoft, 2021). By X i (i = 1, 2, … ., T) , we refer to a series of T variables for which we observe x i (i = 1, 2, … ., T) at T successive times. Let ̂ be the mean of the T observed values and let ̂ be the biased estimator of their standard deviation (we divide by T) (XLSTAT_help 2020).

Pettitt test
Pettitt test (Pettitt 1979) is a nonparametric test and is independent of the distribution of data. It is based on the ranked-based Mann-Whitney test which is used to detect the shift in the time series. In this test, the null hypothesis is Ho: The T variables follow one or more distributions that have the same location parameter. For a two-tailed test, the alternative hypothesis is Ha: There exists a time t from which the variables change of location parameter. The statistic used for the Pettitt test is computed as follows: Let We then define The Pettitt statistic for the alternative hypothesis is The p value can be calculated by p = 2 exp −6K 2 T T 3 +T 2 when the value of p is smaller than the significance level (∝) , then the null hypothesis is rejected. The time t when K T occurs is the change point.

Standard Normal Homogeneity Test (SNHT)
The SNHT (standard normal homogeneity test) was developed by Alexandersson (1986) to detect a change in a series of rainfall data. The test is applied to a series of ratios that compare the observations of a measuring station with the average of several stations. The ratios are then standardized. The series (2) of X i corresponds here to the standardized ratios. The null and alternative hypotheses are determined by: Ho: The T variables X i follow an N (0, 1) distribution. Ha: Between times 1 and , the variables follow an N( 1 , 1) distribution, and between + 1 and T, they follow an N( 2 , 1) distribution.
The Alexandersson statistic is defined by: The statistic T 0 derives from a calculation comparing the likelihood of the two alternative models. The model corresponding to Ha implies that 1 and 2 are estimated while determining the parameter maximizing the likelihood. The Buishand test (1982) can be used on variables following any type of distribution. But its properties have been particularly studied for the normal case. In his study, Buishand focuses on the case of the two-tailed test. In the case of the Q statistic, the null and alternative hypotheses are given by:

Buishand test
Ho: The T variables follow one or more distributions that have the same mean.
Two-tailed test: Ha: There exists a time from which the variables change of mean.
The Buishand's Q statistics are computed as follows: In the case of the R statistic (R stands for range), the null and alternative hypotheses are given by: Ho: The T variables follow one or more distributions that have the same mean.
Two-sided test: Ha: The T variables are not homogeneous for what concerns their mean.
The Buishand's R statistic is computed as: von Neumann test The von Neumann ratio is defined by: where the expectation of N is 2 when the X i have the same mean. This test does not allow for the detection of time at which change occurs.
For the four tests, XLSTAT evaluates the p value and an interval around the p value by using a Monte Carlo method.

Qualitative Interpretation of the homogeneity tests
The results of the homogeneity tests were interpreted following the approach suggested by Wijngaard et al. (2003). Based on the results of the four tests for inhomogeneity assessment, the data series of rainfall stations were divided into three classes (Wijngaard et al. 2003).
Class U: 'useful'-one or zero tests reject the null hypothesis at the 5% significance level.
Class D: 'doubtful'-two tests reject the null hypothesis at the 5% significance level.
Class S: 'suspect'-three or four tests reject the null hypothesis at the 5% significance level.

Consistency check of Rainfall data
The double mass curve analysis (DMCA) technique was used to check the consistency of rainfall series for the Mae Klong River Basin before the application of the statistical tests used for absolute homogeneity analysis. The double mass plots for the eight stations are shown in Fig. 3. It can be seen that for the six stations (Stn130013, 130053, 130211, 130221, 130571, and 470161) the data follow a straight line which indicates the consistency of rainfall data for these stations.

Inhomogeneity detection in each calendar month
The monthly time series of eight stations were grouped into each calendar month (January to December) which resulted in 45 values for each rainfall station based on available data for the period 1971-2015. The four tests (Pettitt test, SNHT, Buishand test, and von Neumann test) were applied to each month for each station. These tests were applied at a 5% significance level.

Pettitt Test
The results of p values for the Pettitt test are given in Table 2. Any p value less than the significance level of = 0.05 will indicate inhomogeneity in the corresponding time series. For five stations (Stn130013, 130042, 130211, 130221, and 470161), the null hypothesis was satisfied for all the months. Three stations (Stn130053, 130571, and 376401) were found to be inhomogeneous as the p values were less than 0.05. The data for Stn130053 were inhomogeneous for the month of April. Stn130571 had p values less than the threshold for 2 months (April and December). The data of Stn376401 were found to be inhomogeneous for 4 months (March, April, July, and September). The results varied between different months. For the month of April, three stations' rainfall data were found to be inhomogeneous.  Table 4. The data for the three stations (Stn130013, 130053, and 470161) were found to be homogeneous for all months. The months of April and July could not pass the test for three stations each. Stn376401 indicated inhomogeneous data for 4 months (March, April, July, and September) with p values less than the threshold of 0.05.       Table 5 The p values obtained from von Neumann test for each calendar month and station

Von Neumann test
Bold number represents p < 0.05 indicating inhomogeneity in the corresponding time series at a 5% significance level of (Wijngaard et al. 2003) and are shown in Table 6. It can be seen that the rainfall data for five stations can be classified into class U (useful) for all the months. The data at Stn130571 can be classified into class S (suspect) in the month of April. Stn130221 data were classified into class D (doubtful) in October. The data at Stn376401 showed D in April while indicated S for 3 months (March, July, and September).

Inhomogeneity detection in annual rainfall series
For detection of inhomogeneity in the rainfall data on an annual scale, the monthly data were aggregated into total annual rainfall for all stations during the period 1971-2015. The results of the four tests are given in Table 7. It is observed that the results of seven stations can be classified in class U (useful) as these stations pass three or more tests by satisfying the null hypothesis. Stn376401 could not pass any of the four tests and was classified into class S (suspect). This result is consistent with the double mass curve analysis (Fig. 3) which indicated a clear break in slope in the year 2000.

Inhomogeneity detection in monthly rainfall series during 1971-2015
The absolute homogeneity tests were applied to the monthly rainfall series of all stations for the period 1971-2015. The results obtained from these tests are given in Table 8. Six stations data were classified into class U (useful) as they satisfied the null hypothesis of homogeneous data for three out of four tests. The monthly data for Stn130221 were classified into D (doubtful) as it could not pass two of the four tests with p values less than the threshold of 0.05. Stn376401 was classified into class S (suspect) as it failed to satisfy the null hypothesis of homogeneous data in three tests. Results also showed that all the stations failed the null hypothesis for the von Neumann test. The DMCA technique was applied to adjust the rainfall data for the Stn376401. As shown in the DMCA plot in Fig. 4, there is a clear break in the slope of the line during the year 2000. This is also confirmed by SNHT and Buishand range tests (March 2000). However, the Pettitt test indicated the point of change in March 1999. The slope of   Table 9. It can be seen that the rainfall series has passed the three tests at a 5% significance level with p values above the threshold of 0.05. The graphical representation of SNHT before and after the rainfall data homogenization along with the double mass curve is shown in Fig. 4. Since the data have passed three tests, it can be classified as 'useful'. Overall, the data for the eight stations can be considered useful for hydro-meteorological studies in the Mae Klong River Basin.

Conclusions
This study examined the homogeneity of the rainfall series for each calendar month, monthly time series, and annual rainfall data for eight stations in the Mae Klong River Basin for the period 1971-2015. Both the relative and absolute homogeneity tests were used. The relative homogeneity of the rainfall data was assessed through double mass curve analysis (DMCA), whereas the absolute homogeneity was investigated through the four commonly used statistical methods (the Pettitt test, SNHT, Buishand test, and von Neumann test) at a 5% significance level. Results showed that for each calendar month, rainfall data of five stations (Stn130013, 130042, 130053, 130211, and 470161) were found to be 'useful' in all months. The rainfall data for Stn130221 were found to be 'doubtful' in the month of October. The data of Stn130571 were found to be 'suspect'  in April. Stn376401 were classified into 'doubtful' for the month of April, while 'suspect' for the months of March, July, and September. For annual rainfall, all the stations data were classified as 'useful' except for Stn376401 which were classified as 'suspect.' For the monthly time series, the data for Stn130221 were classified as 'doubtful' and for Stn376401 as 'suspect', while for the remaining stations, the rainfall data were classified as 'useful'. The double mass curve analysis method was used to adjust the monthly rainfall data of Stn376401. After adjustment, the data of Stn376401 passed the null hypothesis of homogeneous data for three out of four homogeneity tests and were classified as 'useful.' Overall, the rainfall time series data can be considered useful for hydro-meteorological studies of water resources in the basin.