Markov Chain Analysis of Rainfall over East Asia: Unusual Frequency, Persistence, and Entropy in the Summer 2020

Record-breaking rainfall occurred over East Asia during the summer of 2020. However, in which aspect the summer of 2020 can be differentiated from the other years remains to be quantified. To this end, this study employs Markov chain analysis to quantify summer rainfall variability over East Asia using three Markov descriptors for heavy precipitation events of over 10 mm day−1: frequency, persistence, and entropy (i.e., irregularity). It is found that the heavy rainfall during the summer of 2020 can be attributed to an anomalously high frequency of rainfall in the central China and Japan and greater rainfall persistence over eastern China and Korea. Empirical orthogonal functions (EOFs) are used to analyze interannual variation in the descriptors using a few primary modes. For the summer 2020 period, the first and second modes for frequency account for the enhanced frequency over central China, and this is linked to sea surface temperature anomalies over the North Pacific, the equatorial eastern Pacific, and tropical Indian Ocean. For persistence, the first mode dominates the anomalous rainfall persistence observed during the summer of 2020. Similar but weak behavior can be also seen by the modes for entropy.


Introduction
Summer rainfall over East Asia (20°N-50°N, 100°E-150°E) is strongly linked with the East Asian summer monsoon (EASM; Wang and Linho 2002;Wang et al. 2007) and is responsible for most of the annual mean rainfall for eastern and southern China, Korea, and Japan (Endo and Kitoh 2016). In previous studies, the summer rainfall over East Asia has been considered a subsystem of the EASM and can be explained by the movement of a stationary front that leads to Meiyu in China, Changma in Korea, and Baiu in Japan. Generally, Meiyu develops between the tropical monsoon and continental air mass, while Baiu is influenced by the tropical monsoon, North Pacific high, and Okhotsk Sea air mass. In contrast, Changma is affected by all of these air masses and exhibits more complex characteristics (Seo et al. 2011(Seo et al. , 2015. The onset and retreat of these rainbands lead to variability in the rainfall over East Asia. It is well known that the EASM is associated with various climate patterns such as the El Niño-Southern Oscillation (ENSO), global teleconnections, the Arctic Oscillation, the western North Pacific summer monsoon, and sea surface temperature (SST) variability. As reported by many previous studies, the EASM has a negative relationship with the western North Pacific high on an interannual timescale (Hsu and Lin 2007;Li and Wang 2005;Oh and Ha 2016). Lau and Weng (2002) and Wang et al. (2000) have found a close relationship between the western North Pacific high and the ENSO. Rainfall over East Asia is enhanced by combined effects of ENSO and Pacific Decadal Oscillation due to the increased moisture flux transport (Lee et al. 2019). The variability of SST over the Kuroshio extension region also plays an important role in maintaining the high-pressure anomaly (Ham et al. 2019).
Recently, a number of observational studies have reported that summer rainfall over East Asia has intensified (Kim et al. 2005;Zhou et al. 2009;In et al. 2014;Park et al. 2020). In South Korea, summer rainfall has increased over the recent Communicated by: Yoo-Geun Ham decades (Ho et al. 2003;Choi 2004;Kwon et al. 2017;Choi et al. 2008). Several studies have also reported a global increase in the frequency and intensity of extreme events (IPCC 2012;Donat et al. 2013;Alexander 2016). It has been reported that anthropogenic activity has affected the global water cycle, leading to heavier precipitation (IPCC 2013). Human activity is particularly intensive in East Asia, consequent effect is expected to lead to significant changes in temperature and precipitation in this region. However, the impact of climate change on the EASM is still subject to debate. Some previous studies have shown that summer rainfall over East Asia will increase under global warming (Yun et al. 2008;Chen and Sun 2013;Seo et al. 2013;Wang et al. 2018;Li et al. 2019;Park et al. 2020;Tung et al. 2020), but other studies have reported the opposite (Zhu et al. 2012;Burke and Stott 2017;Zveryaev and Aleksandrova 2004).
The present study is motivated by the heavy rainfall over East Asia during the summer of 2020, which caused extensive floods in central and southern China and Japan. The southern areas of Korea also recorded a months' worth of rainfall in a single day. To be more specific, the averaged precipitation rate during the June to August (JJA) period of 2020 exceeds 13 mm day −1 over this region (Fig. 1a). If the climatological mean is removed, positive anomalies can be clearly observed over most of East Asia at over 5 mm day −1 , with some negative anomalies over southern China (Fig. 1b). Studies that examines the mean and anomaly of precipitation levels may have limit the understanding of summer precipitation outliers for a particular year because the intensity, location, onset, and retreat of stationary fronts change every year. Complex changes in the front over East Asia that forms Meiyu, Changma, and Baiu should thus be investigated by multilateral factors using the frequency, persistence, and regularity of precipitation events, rather than simply the amount.
In this study, we conduct Markov chain analysis to investigate the frequency, persistence, and entropy (i.e., irregularity) of summer rainfall over East Asia. This method was originally applied to ecological communities (Hill et al. 2004) and was first adapted for meteorology by Mireruch et al. (2010). Recently, Kim et al. (2017) examined cold extreme temperature events over the Korean Peninsula using Markov chain analysis. Although extreme weather is difficult to classify in a simple manner, Markov analysis turns out to be useful for characterizing such events. Thus, we investigate the characteristics of the 2020 summer rainfall and its association with other meteorological factors through Markov chain analysis. In addition, primary modes of the Markovian statistical values and their correlations with SSTs are examined.
The remainder of this paper is structured as follows. In Section 2, we describe the data used in this study and the method of Markov chain analysis. The characteristics of summer rainfall descriptors are examined in Section 3. In Section 4, the relationship between the primary modes of the Markov descriptors with global SSTs is identified. A summary and conclusion are provided in Section 5.
We employ a Markov chain, which is a stochastic process used to describe the possibility of the present state based only on the previous state (Norris 1998). A typical example is a simple weather forecast of whether it will be rainy or sunny today based on the weather conditions yesterday. To build a Fig. 1 Spatial distributions of (a) total and (b) anomaly of JJAmean precipitation in 2020, where the anomaly is obtained by subtracting the JJA mean for the period of 1979-2020. The unit is mm day −1 simple weather model using a Markov chain, two matrices are required: the empirical distribution π and empirical transition matrix P (Hill et al. 2004;Mieruch et al. 2010): where i and j represent the weather conditions, e.g., a rainy or sunny day, n j is the number of weather conditions j,and n ij is the number of transitions from condition j at time t to condition i at time t + 1. In other words, π indicates how many rainy (or sunny) days occurr over the entire analysis period., while P represents the probability of a rainy (or sunny) days the following day given the current state. Based on π and P for the Markov chain, we can derive three climate descriptors for a weather system: frequency, persistence, and entropy (Kim et al. 2017). The frequency of specific weather condition j is defined as which is identical value to the number of j groups, not individual instances of j. For example, if 10 rainy days occur in succession, the frequency for rain is 1, not 10. The persistence for j is calculated as which indicates the average of successive rainy days in a group, since frequency(j) is the same value as the number of j groups. For example, for 10 successive rainy days, the persistence is calculated to be nine days. Lastly, the entropy for j, which represents the irregularity of j, is defined as Entropy ranges from 0 to 1. Using the relation 'P ij + P jj = 1' for a simple two-state model (Eq. (2)), entropy can be interpreseted as a parabola function of P jj . It is noticeable that the entropy(j) reaches a minimum when P jj =0 (only one occurrence) and P jj =1 (everlasting j whenever j occurs). On the other hand, maximum entropy occurs when P jj = 0.5, which represents an unexpected persistence of state j (refer to Fig. 2 in Kim et al. 2017).
(1-5) to a real weather system, we calculate the three climate descriptors for rainfall events over East Asia (20°N-50°N, 100°E-150°E) for the 1979-2020 period. Here, weather condition j is defined as rainfall of ≥10 mm day −1 , which corresponds to around the 80th percentile of daily precipitation within East Asia over these 42 years. We firstly investigate the spatial patterns for the climatological field of the three descriptors and the anomalous field for the summer of 2020. Then we decompose the descriptors into their 1st, 2nd, and 3rd empirical orthogonal function (EOF) modes to find attributions to the 2020 summer rainfall.

Characteristics of Summer Rainfall Descriptors Using Markov Analysis
Markov chain analysis is used to decompose daily JJA rainfall over China, Korea, and Japan into frequency, persistence, and entropy. Figure 2 shows the spatial distribution of the mean and standard deviation during the 1979-2020 JJA period for the three Markov descriptors. Frequent rainfall mainly occurs over China at around 20°-36°N, on the Korean Peninsula, and in Japan, with a stationary rainband sweeping through during the summer (Fig. 2a). In particular, inland areas of southern China, such as Guangxi, Hunan, and Guangdong, experiences more frequent rainfall events than other regions. On the other hand, high rainfall persistence occurs in the southeast coastal areas of China, Taiwan, and the south of Japan (Fig. 2b). The mean entropy tends to be high in regions with a high rainfall frequency because a higher number of events may lead to greater discontinuity within a specific time range (Fig. 2c). However, the maximum entropy (~0.9) is observed over typical Changma regions from the middle and south of eastern China to southern Japan . Based on Eq. (5), the probability of successive rainfall events in these regions is 0.5, which represents low predictability for rain on a day following rainfall. The variability of the three Markov descriptors is shown in the lower panel of Fig. 2. Frequency and persistence vary dramatically within the main precipitation zone, whereas entropy has a large variability north of 35°N, where the mean value is relatively low.
To assess how the summer rainfall over East Asia in 2020 year is different from other normal years, the anomalies for the Markov climate descriptors are investigated (Fig. 3). Interestingly, precipitation is dominant to the north of the Yangtze River in China and in southern Russia (Fig. 3a), while the coastal regions of southeastern China have less frequent rainfall events. The persistence of daily precipitation is high over the middle of eastern China and South Korea (Fig.  3b), and the entropy is higher at high latitudes (Fig. 3c). Here, we note that the entropy is obtained from a parabolic function of the probability of successive rainfall events (e.g., Fig. 2c in Kim et al. 2017). The parabolic function shows the maximum steepness near its minimum or maximum values. Thus, abrupt changes in the entropy over the northern China are reasonable features for very low entropy (Figs. 2c and 3c). Therefore for the rainfall pattern of 2020 (Fig.  1b), the enhanced rainfall over central China and Japan is primarily due to the higher frequency of rainfall events of ≥10 mm day −1 , while that over eastern China and Korea is due to the higher persistence.
To diagnose vertical static stability of the atmosphere, two thermodynamic parameters, equivalent potential temperature (θ e ) and saturated equivalent potential temperature (θ e * ), are analyzed. The atmosphere is conditionally unstable if ∂θ e * /∂z is negative in a lower atmosphere, indicating positive buoyancy by forcing the air parcel upward for saturation. The vertical structure of the parameters are averaged over central and southern China (100°E-120°E, 20°N-35°N) and Korean Peninsula and Japan (120°E-140°E, 30°N-40°N), respectively (Fig. 4). In panel a, it can be seen that the atmosphere has conditionally unstable structure climatologically during 1979-2020. On top of this, the amount of water vapor in 2020 is higher than mean of 1979-2020, inducing enhanced instability. The vertical velocity above 700 hPa has increased in both regions in 2020, compared to the 1979-2020 average value. Particularly strong ascent can be seen in the lower atmosphere of central and southern China in 2020.
To further clarify the characteristics of the summer precipitation, the EOF analysis is conducted for the three Markov descriptors over 105°E-135°E, 25°N-45°N. The EOF1 for frequency is positive over northern East Asia, which indicates that the occurrence of rainfall of more than 10 mm day −1 is high, and negative over southern East Asia (Fig. 5a). The EOF1 explains 14.1% of the variation in frequency. For the EOF2, the spatial distribution displays a monopole pattern elongated in the west-east direction at 30°N (Fig. 5b). The spatial distribution of the EOF3 has a tripole pattern, indicating an out-of-phase relationship between the rainfall over central and eastern China and Japan and that over northeastern and southeastern China (Fig. 5c). The first principal component (PC1; Fig. 5d, black) and the second (PC2; Fig. 5d, red) increase in 2020, which indicates the higher rainfall frequency during the summer of 2020. However, the PC3 is close to 0 for 2020 (indicated by blue, respectively, in Fig. 5d).
In terms of persistence, the EOF1 shows that the number of consecutive days with more than 10 mm of daily precipitation is long in central and southern China and Korean Peninsula (Fig. 6a), though this explains a smaller percentage of the  variance (9.3%) than does the EOF1 for frequency. The EOF2 for persistence exhibits a dipole pattern, with large maxima in the central and southern China (Fig. 6b), while the EOF3 displays a tripole pattern. Interestingly, the PC1 only exhibit substantially large value during the 2020 JJA period, which explains the increased persistence of precipitation over East Asia during this period.
One possible reason why the major EOF mode of frequency and persistence for summer precipitation exhibit differently even in the same domain might be related to monsoon rainfall variation in intraseasonal timescales. The EASM includes Changma (Meiyu or Baiu) and tropical cyclone regimes, with intraseasonal timing dependent upon location (Park et al. 2020). The frontal rainband is typically active in early summer, whereas tropical cyclone mainly affects East Asia in late summer (Chen et al. 2004). As a result, the climatological mean frequency of precipitation for August (not shown) dominates the seasonal mean frequency (Fig. 2a), while the persistence for June and July can represent the climatological summer mean persistence. Figure 7 presents the results for precipitation entropy. Unlike the other descriptors, the variability in entropy is particularly large above 30°N, although the EOF1 of frequency (Fig. 5a) and that of entropy (Fig. 7a) exhibit some similarity. The high variability in the high latitudes is consistent with the results displayed in Fig. 2c. Consistent with the enhanced persistence over central and southern China and Korean Peninsula during 2020 (i.e., Fig. 6a, d), high irregularity in JJA 2020 is concentrated in the other area, such as over northeastern (Fig. 7d). However, because the EOF1 explains only 8.3% of the variance, we conclude that for this descriptor modal decomposition is not clear.

Relationship with Atmospheric Circulation and SSTs
It has been shown that one of the crucial components of the East Asian summer precipitation is the western Pacific subtropical high, which is characterized by an anomalous anticyclone over the western North Pacific at the lower troposphere (He and Zhou 2014). In particular, the location and intensity of the summer rainband is closely associated with the extension or contraction of the western Pacific subtropical high (Zhu et al. 2011;Du et al. 2017;Liu et al. 2019). Previous studies have suggested that the maintenance of the western North Pacific results from the SST forcing and the air-sea interaction in adjacent and distant regions (Lee et al. 2006;Kim et al. 2009). Therefore, it is necessary to investigate the characteristics of precipitation in terms of frequency and persistence in related to atmospheric circulation and SST. To examine this, a linear regression analysis is conducted using the PC time series for the Markov chain variables (Fig. 4d, 5d, and 6d) for geopotential height and SST. The long-term trends in the SST are removed in our analysis so that year-to-year variation can be investigated.
The EOF1 for frequency is primarily associated with the SST of the Pacific region, while EOF2 and EOF3 are also influenced by the SSTs of the Indian and Atlantic Oceans (the left column in Fig. 8). Given that the EOF1 corresponds to changes in the occurrence of precipitation dipole over East Asia (Fig. 5a), it can be concluded that the warming of the western Pacific and the North Pacific contributes to more frequent precipitation within the region (Fig. 8a). These SST anomalies are accompanied by a positive anomaly at a geopotential height of 500 hPa near 30°-35°N, which can enhance moisture flux convergence in northern East Asia (Fig. 9a). A La Niña-like SST anomaly is also be observed over the equatorial eastern Pacific. These SST anomalies are accompanied by positive and negative height anomalies near 30°N, 120°E and 50°N, 110°E, respectively. For the EOF2, both the warming of the Indian Ocean and the cooling of the Barents and Kara Seas are linked to the dipole frequency pattern (Fig. 8b). In addition, locally cooled SSTs near the Korean Peninsula are present, which may contribute to reducing the frequency of precipitation over northern China. This EOF2 is accompanied with dipole height anomalies (Fig. 9b), which leads to convergence and hence upward motion of the air near 30°N (not shown). For EOF3, tripole SST anomalies can be seen from the East China Sea to the Sea of Okhotsk, although the association is weaker than those for the other two EOFs (Fig. 8c).
A local connection with the SST is also observable for the EOF1 for persistence, while the EOF2 and EOF3 are associated with the SSTs of more distant areas (middle column in Fig. 8). For example, warm anomalies in the western subtropical Pacific (Fig. 8d) contribute to prolonged precipitation over southern China (Fig. 6a), presumably through westward wind anomaly associated with the highs near 25°N and 50°N (Fig. 9d). The SSTs for the EOF2 (Fig. 6b) are characterized  (Fig. 6c) has an association with La Niña-like SSTs in the equatorial Pacific and cooling in the Indian Ocean (Fig. 8f).
Because the PC1 for frequency (the black line in Fig. 5d) and the three PCs for persistence (Fig. 6d) are particularly high during the 2020 JJA period, we examine whether the 2020 SST anomalies consist of patterns that combine the regressed SSTs (Fig. 8). During the period of JJA 2020, warming is observed in the Indian and western Pacific Oceans and weak cooling in the East China Sea (Fig. 8a). Once the linear trend in SSTs is controlled for, the cooling is more pronounced up to the Sea of Okhotsk (not shown). The similarity between the observed SSTs and the regressed SSTs is measured using pattern correlations (Table 1). For the entire globe and for the domain that excludes the Southern Hemisphere extratropics (20°S-90°N), the pattern correlations exhibit We find that the high pattern correlations that the warm SST anomaly over the North Pacific and the cool SST anomaly in the equatorial eastern Pacific played a key role in the increased frequency of JJA precipitation in 2020. For example, the values for the PC1 for frequency are 0.39 and 0.39, respectively, for the entire globe and the domain for the equator and Northern Hemisphere. The moderate correlation with the SST regressed onto the PC2 for persistence (Fig.  8e) is also of note, suggesting a role of the Indian Ocean and western Pacific. This analysis is consistent with some recent studies, which conclude that the extremely strong Northwest Pacific anticyclone anomaly accompanied by both La-Niña like SST forcing in tropics and intensified Indian Ocean warming is responsible for extreme rainfall in summer 2020 (Fang et al. 2021;Pan et al. 2021;Tang et al. 2021). However, further analysis is required to quantify the relative importance of the observed SST anomalies.

Summary and Discussion
Using daily precipitation observations for the 1979-2020 period and Markov chain analysis, we examined the characteristics of summertime precipitation over East Asia, including the anomalous rainfall during the 2020 JJA period. Three Markov climate descriptors for daily precipitation of more than 10 mm day −1 (frequency, persistence, and entropy) explained the changes and variability in the precipitation over East Asia. In JJA 2020, China, Korea, and Japan, experienced anomalously heavy rainfall with record high precipitation rates of over 13 mm day −1 . The anomalous rainfall over central China and Japan can be attributed to substantial increases in rainfall frequency. In contrast, in Korea and some central regions of China near the coast, frequency was not as important; rather, higher persistence accounted for the anomalous precipitation in JJA 2020 in these areas.
To further investigate the large-scale variability of the Markov descriptors, we conducted EOF analysis individually for frequency, persistence, and entropy. The first three modes for East Asia together explained 15%-30% of the variance depending on the descriptors. In JJA 2020, large values for the PC1 time series were observed for frequency and entropy, while the other two PCs were close to 0. This is an advantage in that the EOF analysis reduces the effective number of degrees of freedom. For persistence, all three PCs were large, indicating that all three primary modes play a role in the enhanced rainfall in eastern China and Korea.
To further understand the relationship between the EOFs and global SSTs, linear regression analysis was performed using the PCs for the Markov climate descriptors. We found that the warm SSTs over the North Pacific and cold SSTs over the equatorial eastern Pacific were highly correlated with the higher rainfall frequency and entropy during the summer of 2020. In addition, warming in the western Pacific may have contributed to the higher persistence over the same period.
Although a Markov chain analysis allows multiple components of summer rainfall to be investigated, caution should be taken in reducing complex precipitation events into two simple states (i.e., precipitation above or below 10 mm day −1 or not). However, we found that our results remained robust when the precipitation events were defined as 5 mm day −1 or 7 mm day −1 . Even the anomalous Markov descriptors for JJA 2020 does not show substantial changes between the threshold values of 10 mm day −1 and the 80th percentile, despite some notable differences over very arid regions, such as Mongolia and northwest China (now shown). Another advantage of Markov chain analysis is that objective climate descriptors that quantify the frequency, persistence, and entropy are obtainable. Finally, our results show that some PC time series exhibit upward or downward trends in frequency and persistence over recent decades. It will be interesting to examine the causes for these trends and whether they will continue in the near future. Table 1 Pattern correlations between the regressed SST patterns (Fig.  8) and the 2020 SST anomaly (Fig. 10)