1 Introduction

Providing the useful climatic prediction on sub-seasonal to seasonal (S2S) time scales (usually 10–90 days) can be a great asset to government and business policymakers but is still a worldwide challenge for meteorologists (Zhang et al. 2013; White et al. 2017; Zhou et al. 2019). On S2S time scales, the forecast period is too long for the atmosphere to memorize its initial state (Lorenz 1975) and too short for the atmosphere to acquire sufficient influences from slow evolving parts of the earth climate system (for example sea surface temperature; von Neumann 1955). Due to this, studying factors that possess longer climate memory than the atmosphere but evolve faster than the ocean is an important key to solve the S2S forecast issue. For example, the Madden–Julian Oscillation with the variability on 30–60 days in the atmosphere is taken as one of those factors (Madden and Julian 1994; Zhang 2005). On the land, surface soil moisture (SM) is a variable that has between synoptic and seasonal climate memory (Dirmeyer et al. 2009; Guo et al. 2011). SM is an essential factor that can directly or indirectly affect atmospheric variables, including surface air temperature, boundary-layer stability, precipitation (Zhang and Dong 2010; Huang and Margulis 2013; Liu et al. 2017a). Hence, exploring the sub-seasonal variation of SM (SSV-SM) is imperative for understanding of S2S variations of the atmosphere.

SM is an important factor in the S2S climate forecast because it can significantly affect the atmosphere on sub-seasonal time scales. Many studies have pointed out that SM can significantly influence the numerical S2S forecast through improving land-surface processes and initial conditions (Seo et al. 2019). S2S forecasts of near surface variables (e.g., precipitation and surface air temperature) can be significantly improved when accurate SM is considered in the model initial condition (Koster et al. 2004, 2011; Boisserie and Cocke 2012; Hirsch et al. 2014). This improvement probably also relates to sub-seasonal features of SM, for example, SM memory and SSV-SM. SSV-SM is found to have the significant impacts on the atmosphere. In Europe, SSV-SM can reduce occurrence of the extreme hot events (Jaeger and Seneviratne 2011). According to a simple land–atmosphere coupled model, Bellon (2011) pointed out that SM can dissipate the variability of the monsoon system on sub-seasonal time scales. In West Africa, a significant interaction is found between SM and the West African monsoon on sub-seasonal time scales, and the monsoon circulation can be adjusted by SM through surface energy fluxes (Taylor 2008). In addition, the 15-day westward-propagation mode of the West African monsoon is enhanced and organized by SSV-SM (Lavender et al. 2010). For the Indian Monsoon, SSV-SM significantly impacts the formation of the sub-seasonal oscillation in the monsoon circulation (Saha et al. 2012). Overall, SSV-SM is very important for the sub-seasonal variation of the atmosphere, especially over monsoon regions.

SM has variations on different time scales, e.g., from synoptic to decadal. During a rainfall process, surface soil can keep the rainwater dropping on land but most rainwater quickly infiltrates into deep soil in the first several days (McColl et al. 2017a). SM also exhibits a strong seasonal cycle, for example, Duerinck et al. (2016) found that SM is high (low) in Illinois during January–April (May–August). On the inter-annual time scale, the variation of precipitation can significantly cause the inter-annual variation of SM (Liu et al. 2017b). On the decadal time scale, the SM over East Asia shows drying in the early 1960s, wetting during 1979–1993, and a resumption of drying in 1994 (Cheng et al. 2015). So far, less attention has been paid to features of SSV-SM, which are still not clear.

The time variation and spatial distribution of SM are mainly related to precipitation, because precipitation is the dominant factor that results in SM (Kato et al. 2006; Qian et al. 2006). However, this does not mean that SM variations are the same as precipitation variations. Furthermore, SM is also governed by loss terms according to the equation of land water balance (Katul et al. 2007; McColl et al. 2017b). Speed of soil water loss determines SM memory. Due to SM memory, variations of SM maybe different from those of precipitation on sub-seasonal time scales. Therefore, it is interesting to investigate what role is played by precipitation in formation of SSV-SM. Furthermore, the monsoon is the prevailing climate over eastern China (Wang and Ho 2002; Ding and Chan 2005), and thus SSV-SM could be an important factor affecting the atmosphere over eastern China. Figuring out characters of SSV-SM can potentially contribute to understanding of the sub-seasonal variability of the atmosphere over eastern China.

The time–frequency domain is an important aspect of SM and a focus of many researches. For example, Katul et al. (2007) explore the spectral feature of SM over a planted loblolly pine region in southeastern US to establish a model to simulate the time signal of SM. Spectral signal is further used for the comparison between station and satellite data (Su et al. 2015; Moler et al. 2018; Neuhauser et al. 2019); to explore effects of land surface processes on SM variation (Nakai et al. 2014); and to reveal the dynamical formation of the SM memory (Ghannam et al. 2016). Spectral analysis is also a useful tool to check whether there is periodic variation in SM. Recently, Liu et al. (2017a) discovered the periodic variations of SM from 0.125 to 12 month over the Great Plain of the US, and attributed them to the coherence between precipitation and SM. However, compared results from all those studies, spectral features of SM vary in different regions on different time scales, and less attention was paid on sub-seasonal features of SM over eastern China. Therefore, we explore the spectral and spatial features of SSV-SM and the roles played by precipitation in SSV-SM over eastern China in the present study.

The structure of this paper is as follows. Section 2 introduces the data and methods. Temporal and spatial features of SSV-SM are shown in Sect. 3. Section 4 provides discussions on effects of precipitation and SM memory on SSV-SM. Summary and further discussions are presented in Sect. 5.

2 Data and methods

2.1 Data

The station observed volumetric water content of soil at 0–10 cm (in g cm−3) is collected by the Jiangsu and Zhejiang Meteorological bureaus in eastern China through a method of the Frequency Domain Reflection. The SM data are from the Liuhe (LH) and Chunan (CA) stations in Jiangsu and Zhejiang provinces, respectively. The locations of LH (118.85° E, 32.37° N) and CA (119.03° E, 29.61° N) are shown by the green triangles in Fig. 1a, b, respectively. The time length of the hourly station SM that we can acquire is from August 15, 2013 to August 27, 2017 (August 15, 2013–December 31, 2018) at the LH (CA) station. During those periods, the hourly SM in the two stations is generally observed consistently with no outlier, except for some missing values in several hours, but time intervals of the missing data are not longer than 24 h. These missing values are filled by the quadratic-spline interpolation, and the SM unit is changed into m3 m−3 using the water density of 103 kg m−3.

Fig. 1
figure 1

Correlation coefficients between the SM in SMAP and at the LH station (a), as well as those between the SM in SMAP and at the CA station (b). The location of the station is marked by the green triangle on each panel. The shaded areas are significant at the 5% level based on the Student’s t test. The correlation is conducted during March 31, 2015 to August 27, 2017 (to December 31, 2018) for LH (CA) station. The two blue curves show the Huang (top curve) and Yangtze (bottom curve) rivers, respectively

Besides station data, SM that is remote sensed by the satellite mission of the Soil Moisture Active Passive (SMAP; https://smap.jpl.nasa.gov) is also applied. Although two instruments are carried by SMAP, the radar stopped functioning after a few months since launching, and only radiometer measures SM. In spite of this, SMAP products still have SM (in m3 m−3) with both high accuracy and resolution. The SM data in this study are the level-4 product, which is surface soil moisture (0–5 cm) at a 3-h time interval on 9 × 9 km horizontal grids. The level-4 product is a result of assimilating SMAP L-band brightness temperature into a land surface model (Reichle et al. 2018). The time length of this product is from March 31, 2015 to present but the data used in this study are from March 31, 2015 to December 31, 2018. The common period between the SMAP SM and station SM at LH (CA) is from March 31, 2015 to August 27, 2017 (December 31, 2018).

In this study, SM reanalyses (in m3 m−3) are the US National Centers for Environmental Prediction/Department of Energy Reanalysis II (NCEP/DOE RII; https://rda.ucar.edu/pub/cfsr.html), the NCEP Climate Forecast System Reanalysis (CFSR; https://esrl.noaa.gov/psd/data), and the ERA-Interim reanalysis (ERA; http://apps.ecmwf.int/datasets). Hereafter, NCEP/DOE RII is denoted as NCEP for short. NCEP and CFSR SM are on 192 (longitude) × 94 (latitude) Gaussian grids, and the horizontal resolution is about 1.875° × 1.9°. The horizontal resolution of the ERA SM is 1.5° × 1.5°. NCEP and CFSR (ERA) SM are at a depth of 0–10 cm (0–7 cm). All the three reanalyses have a 6-h time interval during 2013–2018, which covers time periods of both the station and SMAP data.

In addition to reanalysis, SM of the Global Land Data Assimilation System (GLDAS) is also employed (https://ldas.gsfc.nasa.gov/gldas/). GLDAS is assimilated by various satellite and station observations (Rodell et al. 2004). The GLDAS product is generated by four land surface models, which are the Common Land Model version 2.0 (CLM; Dai et al. 2003), the Mosaic model (Koster and Suarez 1996), the Noah model versions 2.7.1 and 3.3 (Chen et al. 1996; Koren et al. 1999), and the Variable Infiltration Capacity model (VIC; Liang et al. 1994). Furthermore, there are three versions of the GLDAS product, which are the GLDAS versions 1, 2.0, and 2.1. GLDAS 1 contains outputs from four models, which are CLM, Mosaic, VIC, and Noah 2.7.1 (Noah2). GLDAS 2.0 and 2.1 use only Noah 3.3 (Noah3) but with different forcing data (Rui and Beaudoing 2018). All the GLDAS data in this study are during 2013–2018, but data of GLDAS 2.0 are not used because it only covers the period of 1948–2010. Therefore, there are five sets of GLDAS data in this study, which are CLM, Mosaic, VIC, Noah2, and Noah3. The unit of GLDAS data is in kg m−2 and further changed into m3 m−3 through using the water density of 103 kg m−3 at depths of model surface layers. The depths of surface layers in those models are 0–10 cm, except those are 0–9.1 cm and 0–2 cm in CLM and Mosaic models, respectively. The 3-h GLDAS data are with a 1° × 1° horizontal resolution.

Finally, the station (hourly), SMAP (3 h), reanalysis (6 h), and GLDAS (3 h) SM are averaged into the daily mean with the unit of m3 m−3 for analysis. In addition, the daily precipitation at the LH and CA stations is obtained from the China Mereology Administration. The level-3 product of the Tropical Rainfall Measuring Mission (TRMM) 3B42RT precipitation is download from the ftp of ftp://trmmopen.gsfc.nasa.gov/pub and on 0.25° × 0.25° horizontal grids at the 3-h time interval. The 3-h TRMM precipitation is also average into the daily mean for analysis. The precipitation data are used to explore formation of SM spectra.

2.2 Spectral and wavelets analysis

Spectral analysis is a method that can help to investigate a time series within frequency domain. In other words, the oscillations of a time series over various periods can be identified through the spectrum. This is based on the Fourier transform, which can be carried out through the Discrete Fast Fourier Transform technique, and the Markov red noise is further used for the significant test of the power spectra (Wilks 2006). If two time series are provided, the cross spectral analysis is a method to estimate the relationship between the two series in frequency domain. The cross spectrum is obtained by the matrix multiplication between Fourier series of one dataset and the complex conjugate of Fourier series of the other dataset (Panofsky and Brier 1958; Thompson 1979). Fourier series is obtained through the Fourier transform. The real (imaginary) part of cross spectrum represents the cospectrum (phase) signal, and the coherence can be further obtained through the cross spectrum. Besides the spectral analysis, the wavelet analysis is another useful tool to analyze data of the time series within frequency domain, and allows one to further obtain power spectrum varies on both frequency and time. The wavelet transform depends on wavelet bases, and the Morlet wavelet is a common wavelet base used in the atmospheric sciences (Torrence and Compo 1998). The details of the spectral and wavelet analyses are well documented in abovementioned books and references, and one can also find their functions in many statistical software.

3 Temporal and spatial features of SSV-SM

3.1 Representative SM over eastern China

In order to check whether the SM variations at LH and CA can represent the SM variations over eastern China, Fig. 1 shows the correlation coefficients between the station SM at LH (CA) and the SMAP SM at each grid point over eastern China during their common period. In Fig. 1, the shaded areas are significant at the 5% level. In Fig. 1a for the LH station, the significantly positive correlation coefficients are generally found between 30° N and 35° N and greater than 0.5 to the east of 110° E. The significantly negative correlation coefficients are found over southwestern China, and generally from − 0.4 to − 0.1. In Fig. 1b for the CA station, the significantly positive coefficients are generally between 23° N and 30° N and greater than 0.5 to the east of 110° E. The significantly negative relationship is to the west of 105° E. The coefficients are from − 0.6 to − 0.1. Hence, the variations of SM at the LH and CA stations together can generally represent the variations of SMAP SM over eastern China.

Figure 2 further shows the time series of daily SM at the LH (CA) station. In addition, Fig. 2 provides the time series of SMAP SM near the LH (CA) station, which is calculated through averaging the data at four grid points that have the nearest distances to the location of the station. It should be noted that the size of SMAP grid is 9 km × 9 km, and ground station data is at a geographical point. Due to differences of measurements and spatial scales of the data, there are bias between station and SMAP SM. To eliminate impacts of this bias to the minimum during analysis, the time mean of each dataset is removed. The time mean for each time series is calculated during the common time period between SMAP and station SM. Time variations of station and SMAP SM that depart from their mean are very alike (Fig. 2a, b), for example, amplitudes of the SMAP SM anomalies are quite close to those of the station SM anomalies. Figure 2c, d show the scatter plot between the station and SMAP SM, and the x-axis (y-axis) represents the station (SMAP) SM. The SM in Fig. 2c, d are standardized through removing their means and then divided by their standard deviations, respectively. The means and standard deviations are calculated during the common period between SMAP and station SM. The red line is a regression line, and can present a correlation between those two datasets. The correlation coefficient between the station and SMAP SM is 0.71 (0.67) at LH (CA) and significant at the 5% level. In both Fig. 2c, d, it can be found that the fit between the SMAP and station SM is better when the station SM anomalies are positive than negative. When the station SM is negative, the anomalies of SMAP SM are of generally lower magnitude than those of the station SM. In other words, the SMAP product has smaller SM anomalies than the station observations under the dry condition over eastern China. This may relate to differences in the measurement in techniques between SMAP and station.

Fig. 2
figure 2

Panels a and b are the daily time series of SM that depart from their mean values at the LH and CA stations, respectively. The black line is the station observations collected by the Frequency Domain Reflection. The blue line in each panel of a and b shows the time series of the SM of the SMAP level-4 product, which are obtained by averaging the SMAP SM on the four grid points with the nearest distances to the location of the station. The SM at the LH (CA) station is during August 15, 2013 to August 27, 2017 (to December 31, 2018). The SM in SMAP is during March 31, 2015 to December 31, 2018. The mean of each time series is removed, and thus the anomalies (in m3 m−3) are obtained. The anomalies are divided by their standard deviations, and shown in Panels (c, d). Panels c, d are scatter plots for the station SM (x-axis) against the SMAP SM (y-axis) at the LH and CA stations, respectively. The red line in each panel of c and d is the regression line. The means and standard deviations are calculated during the common period between SMAP and station SM

To further cross verify representative of the station data, the correlation maps between the station and reanalysis (GLDAS) SM during their common periods are shown in Figs. 3, 4). In Fig. 3, the panels on the top (bottom) row present the correlation coefficients between the SM at LH (CA) and that of the reanalysis. The columns are for ERA (Fig. 3a, b), CFSR (Fig. 3c, d), and NCEP (Fig. 3e, f). In Fig. 3, the correlation patterns for both the LH and CA stations are similar to those shown in Fig. 2, but the correlation values are much smaller. For ERA (Fig. 3a, b), the significantly positive correlation coefficients are from about 0.1 to 0.5, and the large values (≥ 0.5) only occur around the stations. For CFSR (Fig. 3c, d), the significantly positive coefficients are from 0.1 to 0.6, and the large values (≥ 0.5) are also on the grid points around the stations. For NCEP, the coefficients are from about 0.1 to 0.3 and smaller than those for ERA and CFSR. The correlation between the reanalysis and station SM is smaller than that between the SMAP and station SM.

Fig. 3
figure 3

Same as Fig. 1, but for the reanalysis of ERA (a, b), CFSR (c, d), and NCEP (e, f). The top (bottom) panels are the correlations for the LH (CA) station. The correlations are conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

Fig. 4
figure 4

Same as Figs. 1 and 3, but for the GLDAS SM, including the SM from the models of CLM (a, b), Mosaic (c, d), Noah2 (e, f), Noah3 (g, h), and VIC (i, j). The panels a, c, e, g, and i (b, d, f, h, and j) are the correlations for the LH (CA) station. The correlations are conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

The relationship between the GLDAS and station SM is shown in Fig. 4. The correlation patterns are also similar to those for SMAP (Fig. 1) and reanalysis (Fig. 3). The largest values of positive correlation are about 0.8 for CLM (Fig. 4a, b), 0.6 for Mosaic (Fig. 4c, d), 0.6 for Noah2 (Fig. 4e, f), 0.5 for Noah3 (Fig. 4g, h), and 0.8 for VIC (Fig. 4i, j). The areas with the large correlation coefficients (≥ 0.5) in CLM (Fig. 4a, b) and VIC (Fig. 4i, j) are greater than those in the rest of the models but are smaller than those in SMAP. In Mosaic, Noah2, and Noah3, the large values (≥ 0.5) are generally around the stations. The correlation coefficients are larger for Mosaic (Fig. 4c, d) and Noah2 (Fig. 4e, f) than those for Noah3 (Fig. 4g, h). It is noted that the correlation coefficients between the reanalysis (GLDAS) and station SM during the same time periods as the SMAP SM are also computed, and the same results are obtained (figures not shown).

In general, there are significantly positive correlations between the SM at the LH/CA station and that from various sources over eastern China. The relationship between the SMAP and station SM is much closer than those between the reanalysis/GLDAS and station SM. Apart from SMAP, the GLDAS SM has a better correlation with the station SM than the reanalysis SM does, especially for CLM and VIC. When the SMAP SM is bin averaged into resolutions of GLDAS and reanalysis SM, SMAP SM still has the largest correlation with the station SM. This indicates that the spatial scales of those datasets have less influence on the results. So far, the results in the present study are also not sensitive to the choice of the time periods during 2013–2018, and thus we use as much of the data during 2013–2018 as we could. Finally, the variations of SM at the LH and CA stations can generally represent the SM variations over eastern China.

3.2 Spectral analysis of SM over eastern China

The spectral analysis of SM at the LH (CA) station is conducted from August 15, 2013 to August 17, 2017 (to December 31, 2018) and shown in Fig. 5a, b. To remove synoptic variations, the daily SM mean is averaged into the pentad mean (5-day mean). In addition, power spectra of SM are normalized through multiplying its frequency. The Markov red noise is applied to get significance of the spectra. The spectra above the red line shown in Fig. 5 are significant at the 5% level and filled with red color. Only the spectra during 10–90 days are presented. In Fig. 5a, SM has a significant spectral peak between 22 and 28 days at the LH station. At the CA station (Fig. 5b), a significantly spectral peak is found around 18–20 days. Therefore, there are significant spectra on sub-seasonal time scales for the SM variations at the LH and CA stations over eastern China.

Fig. 5
figure 5

Spectral analysis of the observed SM at the LH (a) and CA (b) stations. Before the spectral analysis, the SM is averaged into pentad mean to remove synoptic variations. The x-axis is the period of spectra, and the y-axis is the power spectra. The power spectra are normalized by multiplying the frequency of the spectra. The red line in each panel indicates the Markov red noise at the 5% significance level, and the significant spectra are filled with red color. The spectral analysis is conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

Using the same method as that in Fig. 5, the power spectra of SM over eastern China are calculated at each grid points for SMAP during March 2015–December 2018 and for reanalysis and GLDAS during 2013–2018. After that, the periods with the maximum peak of significant spectra during 10–90 days are obtained and shown in Fig. 6. Over eastern China, the periods are generally found during 10–40 days in SMAP (Fig. 6a), 10–45 days in ERA (Fig. 6b), 10–50 days in CFSR (Fig. 6c), 10–25 days in NCEP (Fig. 6d), 10–40 days in CLM (Fig. 6e), 10–30 days in Mosaic (Fig. 6f), 10–50 days in Noah2 (Fig. 6g), 10–50 days in Noah3 (Fig. 6h), and 10–50 days in VIC (Fig. 6i). In addition, the distributions of those periods are irregular, and the patterns among those data are not alike. However, there are significantly periodic variations of SM over eastern China over the 10–30-day period, though SM are at different depths from different data sources.

Fig. 6
figure 6

Periods over which the maximum significantly spectral peaks are found during 10–90 days. The same method as that used in Fig. 5 is performed to get the spectra at each grid point for the SM data, including SMAP (a), ERA (b), CFSR (c), NCEP (d), CLM (e), Mosaic (f), Noah2 (g), Noah3 (h), and VIC (i). Only periods with the maximum spectral peaks that are significant at the 5% level are shown. The spectral analysis is conducted during 2013–2018, except that is during March 31, 2015 to December 31, 2018 for SMAP

Figure 7 further presents the power spectra of the Morlet wavelet analysis for the SM at the LH and CA stations. In Fig. 7a, b, the wavelet analysis is performed through using the pentad SM during August 15, 2013 to August 17, 2017 (to December 31, 2018) for the LH (CA) station. The background red noise is used to test significance of wavelet spectra, and the shaded areas in Fig. 7 are significant at the 5% level. In Fig. 7, the interval between two minor ticks of the x-axis is a month. In Fig. 7a for the LH station, the significant power spectra over the period of 10–30 days are mainly found during March–July of 2014, March–October of 2015, April and August–October of 2016, and March–August of 2017. The significant spectra over the period of 30–60 days generally occur during August–December of 2015, May–September of 2016, and January–August of 2017. For the spectra over the period of 60–90 days, they are significant from August to November of 2016. In Fig. 7b for the CA station, the significant spectra over the period of 10–30 days mainly occur from August 2013 to November 2015, during March–November of 2016, during May–November of 2017, and during July–August of 2018. The significant spectra over the period of 30–60 days are mainly found during April–November of 2014, February–December of 2015, April–August of 2017, and in July of 2018. There are significant spectra over the period of 60–90 days from September 2014 to August 2015 and February 2016, July of 2016, and April to November of 2017.

Fig. 7
figure 7

Morlet wavelet analysis for the pentad SM data at the LH (a) and CA (b) stations. The contours are power spectra, and the shading areas are significant at the 5% level based on the background red noise. The y-axis is the period of the power spectra. The wavelet analysis is conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

Overall, the significant spectra over the period of 30–90 days do not occur every year but can be found in any season, which may relate to the inter-annual background of climate. For the significant power spectra over the period of 10–30 days, most of them are found during spring to autumn in every year but less are found in winter. This is probably associated with the rainy season over eastern China. This result agrees with the findings by Liu et al. (2017a), who investigated the wavelet spectra of SM over the southern Great Plain of the US. They find that the significant wave spectra of SM during 0.125–2 month over the Great Plain are mainly during the local rainy season (spring to early summer).

3.3 Spatial distribution of SSV-SM

Previous sub-sections mainly explore the time variation of SM. In this sub-section, spatial distributions of SSV-SM in the various data are presented. According to the results of the wavelet analysis (Fig. 7), the significant spectra of SM over 10–30-day period mainly occur during the rainy season. Hence, the SM during May–September, which is mainly the rainy season over eastern China, are analyzed. Over eastern China, the empirical orthogonal function (EOF) is performed on SM that is band-pass filtered with the cut off of 10–90 days (Lanczos filter; Duchon 1979). The EOF method can generally present the typical patterns of SM over eastern China. The EOF method decomposes data into principal components (PCs, also called time series) and orthogonal functions (or empirical orthogonal functions, denoted as EOFs). Figure 8 shows the first three EOF patterns of SMAP (Fig. 8a1-3), ERA (Fig. 8b1-3), CFSR (Fig. 8c1-3), NCEP (Fig. 8d1-3), CLM (Fig. 8e1-3), Mosaic (Fig. 8f1-3), Noah2 (Fig. 8g1-3), Noah3 (Fig. 8h1-3), and VIC (Fig. 8i1-3). Additionally, the EOF patterns of the band-pass filtered TRMM precipitation are shown in Fig. 8j1-3. In Fig. 8, the EOFs are conducted on the daily, band-pass filtered data during 2015–2018.

Fig. 8
figure 8

First three EOFs of the band-pass filtered SM and precipitation with the cut off of 10–90 days. The SM data include SMAP (a1–3), ERA (b1–3), CFSR (c1–3), NCEP (d1–3), CLM (e1–3), Mosaic (f1–3), Noah2 (g1–3), Noah3 (h1–3), and VIC (i1–3). The precipitation is the TRMM data (j1–3). The variances explained by each EOF are presented on each panel. All the EOFs are conducted for the daily data during May–September of 2015–2018

The first EOF of the SMAP SM explains about 16.5% of the total variance and shows a seesaw pattern (Fig. 8a1). There are positive (negative) anomalies between 20° N and 30° N (30° N and 40° N). The second EOF explains about 15% of the total variance, and there are positive anomalies along the Yangtze River Valley around 30° N. The third EOF explains about 9.8% of the total variance with two positive anomalous centers over southern and northern China but the weakly negative anomalies around the Yangtze River Valley. The first three EOFs explain about 41.3% of the total variance. For the reanalysis (Fig. 8b–d), the patterns of the three EOFs are very similar to those for SMAP but the first (second) EOF pattern for the reanalysis corresponds to the second (first) one for SMAP. The positive anomalies in the first EOF for those reanalysis are further south than those in the second EOF for SMAP. For the GLDAS SM (Fig. 8e–i), the patterns of the first three EOFs are also very similar to those of the SMAP SM, and the order of the three EOFs is the same as that of the SMAP EOFs. However, in Noah3 (Fig. 8h1-2), the first two EOFs are further south than those in SMAP, which are similar to the patterns in the reanalysis. For the TRMM precipitation (Fig. 8j1-3), the EOF patterns are similar to those for SMAP, reanalysis, and GLDAS, but the variances explained by the first three EOFs are about 24.7%, which is smaller than those of the SMAP SM. In addition, the negative anomalies around the Yangtze River Valley in EOF3 (Fig. 8j3) are weaker than those for SMAP, reanalysis, and GLDAS.

Table 1 shows the correlation coefficients between the time series (PCs) of the first three EOFs (Fig. 8) for SMAP and the other data during May–September of 2015–2018. The correlation is calculated between each pair of the PCs, for example the PC1 of SMAP is correlated with the PC1 of TRMM, and etc., except that the correlation is calculated between the PC1 (PC2) of SMAP and the PC2 (PC1) of the reanalysis. All the correlation coefficients are significant at the 5% level according to the Student’s t test. Those correlations can generally represent the similarity among the EOF patterns shown in Fig. 8. The correlation between SMAP and TRMM is around 0.4–0.5. It is found that the correlation coefficients are the largest between SMAP and GLDAS (around 0.9), except those for Noah3 in which the correlation is about 0.6–0.7 that is even smaller than those for the reanalysis. The correlation between SMAP and reanalysis (about 0.3–0.8) is smaller than that between SMAP and GLDAS. Generally, the spatial distribution of SM over eastern China on sub-seasonal time scales is close to that of the TRMM precipitation. Compared with the reanalysis, the SM patterns in GLDAS are more similar to those in SMAP, except in Noah3. However, the first three EOFs of the TRMM precipitation can only explain about 25% of the total variance, which is much smaller than those of SMAP (about 40%). Moreover, the correlation coefficients are about 0.4–0.5 between the TRMM precipitation and SMAP SM. This means that besides precipitation, the spatial distributions of SSV-SM over eastern China are also affected by other factors.

Table 1 Correlation coefficients between the PCs of SMAP and the other data, including the TRMM precipitation, ERA, NCEP, CLM, Mosaic, Noah2, Noah3, and VIC SM

4 Discussions on effects of precipitation and SM memory

4.1 Coherence between SM and precipitation

Strong coherence between precipitation and SM is reported by previous study over southern Great Plain of the US (Liu et al. 2017b). Thus, to check whether SSV-SM on 10–30 days is mainly caused by the variation of precipitation, the spectra of precipitation at the LH and CA stations are firstly presented in Fig. 9. The spectral analysis in Fig. 9 uses the same method as that in Fig. 5. In Fig. 9, no significantly spectral peak is found at both the LH and CA stations during 10–90 days, except that the spectra at the CA station (Fig. 9b) are significant around the period of 10 days. However, those significant spectra of precipitation (Fig. 9) do not correspond to the significant spectra of SM in Fig. 5.

Fig. 9
figure 9

Same as Fig. 5 but for the station observed precipitation. The spectral analysis is conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

The cross spectra between the SM and precipitation at those two stations are further examined (Fig. 10). The cospectrum and phase of cross spectra are also presented in Fig. 10. The cospectrum provides the extent to which oscillations of two time series are with the same/opposite (positive/negative) signs at zero lag. The phase represents the lead/lag (positive/negative) of precipitation to SM. For the LH station (Fig. 10a), the significant cross spectra are found over the periods of 12, 15, 18, 28–30, and 40–45 days, but none of the periods corresponds to the periods of SM on 22–28 days (Fig. 5). In Fig. 10b for the cospectrum, the precipitation and SM are with the same (opposite) signs on 15, 18, 28–30, and 40–45 days (12 days). In Fig. 10c for the phase, the precipitation leads (lags) the SM on 18, 28–30, and 40–45 days (12–15 days). In other words, positive (negative) precipitation anomalies lead positive (negative) anomalies of SM over the periods of 18, 28–30, and 40–45 days. Positive (negative) precipitation anomalies lag positive (negative) anomalies of SM on 15 days. Positive (negative) precipitation anomalies lag negative (positive) anomalies of SM on 12 days.

Fig. 10
figure 10

Cross spectra between the SM and precipitation at the LH (ac) and CA (df) stations. The panels a and d show the cross spectra, in which the significant cross spectra are above the reference line and filled with red color. The reference line/significance test is based on the coherence square (Julian 1975; Thompson 1979). The panels b and e are the cospectrum, and the panels c and f present the phase of the cross spectra. The positive cospectrum (phase) is filled with yellow color and the negative one is filled with blue color. The cross spectra are conducted during August 15, 2013 to August 27, 2017 (to December 31, 2018) for the LH (CA) station

For the CA station, the significant cross spectra are over the periods of 15, 20, 32–35, and 45–58 days (Fig. 10d). Only the period of 20 days is on 18–20 days (Fig. 5) during which the SM spectra are significant. For the cospectrum, the precipitation and SM are with the same signs over those periods (Fig. 10e). In Fig. 10f for the phase, the precipitation leads (lags) the SM over the periods of 15, 20, and 45–58 days (32–35 days). This means that positive (negative) precipitation anomalies lead positive (negative) anomalies of SM over the periods of 15, 20, and 45–58 days. Positive (negative) precipitation anomalies lag positive (negative) anomalies of SM on 32–35 days. In general, the cross spectra for the LH and CA stations reflect the complicate interactions between SM and precipitation on different sub-seasonal time scales. Moreover, only the SM variation over the period of 20 days at CA is associated with precipitation, because the significant coherence shows that precipitation leads SM and their anomalies share the same sign over this period. However, the significant spectra of SM over the 10–30-day period generally have less coherence with the spectra of the precipitation, which is different from the findings over the Great Plain of the US (Liu et al. 2017a). This indicates that the spectra feature of SM varies over different regions.

4.2 A short discussion on SM memory

In previous sub-section, it is found that the significant spectra of SM over the 10–30-day period are not consistent with the spectra of precipitation. When simplified SM is created from precipitation with simplified memory (i.e., e-folding), significant spectra can be found on S2S time scales, but there is still less coherence between simplified SM and real SM (figures not shown). According to the land water budget equation (Katul et al. 2007; McColl et al. 2017b), the SM variation is determined by both precipitation and loss terms. The loss rate of soil water determines SM memory, which may smooth out synoptic variations and cause significant periods of SM on 10–30 days. This could also contribute to the formation of the spatial distribution of SSV-SM over eastern China. According to Katul et al. (2007), SM memory is determined by many factors, e.g., evapotranspiration, runoff, and drainage, which can vary with time and thus make effects of SM memory on SSV-SM much complicate. Due to lack of evapotranspiration and drainage data for the surface soil at the two stations, it is difficult to calculate the actual SM memory at the two stations to estimate effects of SM memory on SSV-SM in the present study, but this could be a focus of future studies.

5 Summary and discussions

In this study, the SM at the LH and CA stations is used to explore SSV-SM over eastern China. Compared with the SMAP, reanalysis, and GLDAS data, the variations of SM at the two stations can generally represent the variations of SM over eastern China. Over eastern China, the anomalies of the SMAP product and station observation are quite alike under wet condition, but the SMAP product has smaller SM anomalies than the station observations under the dry condition. The relationship between the station and GLDAS SM is weaker than that between the station and SMAP SM, but is greater than that between the station and reanalysis SM. In the GLDAS data, the CLM and VIC data have better correlations with the station SM than the rest of the data. In reanalysis, the ERA and CFSR SM have better correlation with the station SM than the NCEP SM. Therefore, SMAP SM can well capture the time variation of SM, and is a better substitute data than GLDAS/reanalysis for station SM over eastern China. This finding is consistent with those obtained by Zhu et al. (2019a), who evaluate the SMAP SM over the Huai River basin, which is a small part of eastern China.

Through the spectral analysis, it is found that the SM variations at the LH and CA stations have the significantly spectral peaks over the periods of 22–28 and 18–20 days, respectively. Through using the SM of SMAP, reanalysis, and GLDAS, it is further found that the SM variations at most of the model grid points over eastern China have the significantly spectral peaks over the period of 10–30 days. Although distributions of the periods with significant spectra are not alike among different data, the SM in most parts of eastern China have significantly periodic variations over the period of 10–30 days in all the data. The Morlet wavelet analysis also shows the significant spectra over 10–30-day period during every year, which are mainly occurs during the rainy season over eastern China, and thus the variations of SM over the period of 10–30 days are associated with rainfall background. The spectra over the other periods (30–90 days) are also significant but do not occur every year. The variation on 30–90 days may relate to the inter-annual background of climate or signal from deep soil. The similar result is also found by Liu et al. (2017a) over the Great Plain of the US that significant periodic variation of SM is during 0.125–2 month in the rainy season. However, the periodic variation of SM on sub-seasonal scales is not identified by other studies that also conducted spectral analysis on SM from hours to multi years (e.g., Katul et al. 2007; Nakai et al. 2014). This means that there are regional differences of occurrence of the significant periodic variation of SM. Distributions of the spectral peaks are inconsistent with each other among different data, especially for the GLDAS and reanalysis data, suggest that the improvement of SM spectra could be an objective of land model development. Thus, SM spectra could be another metric to look at while calibrating a land model. The improvement of the sub-seasonal aspects of SM in a land model could improve the numerical S2S forecast of SM over eastern China (Zhu et al. 2019b).

The periodic variation of SM over the Great Plain in the US is attributed to the coherence between precipitation and SM (Liu et al. 2017a). However, the S2S variations in precipitation do not seem to offer full explanation for the S2S variations in SM over eastern China. The spectral analysis is conducted on the precipitation at the LH and CA stations. No significant spectrum is found on 10–90 days, except the spectra around 10 days. Moreover, through using the TRMM precipitation, it is found that the significantly spectral peaks of the precipitation over eastern China are mainly over the period of 10–15 days (figure not shown), which is much smaller than those of the SMAP SM. In addition, the cross spectra between the SM and precipitation at the LH and CA stations are further examined to check the effects of precipitation on formation of SM spectra. It is found that there is less significant coherence between SM and precipitation over period that is with significant spectra of SM, except for the period of 20 days at CA. For the significant cross spectra over the other periods, the cospectrum and phase of the cross spectra are either positive or negative. All kinds of the combination between the cospectrum and phase exist, for example a positive cospectrum with a positive phase, a negative cospectrum with a positive phase, and so on. This indicates the interaction between precipitation and SM is complicated on different sub-seasonal time scales. So far, spectra of precipitation are not the main reason causing the SM spectra on sub-seasonal time scales over eastern China.

For the spatial distribution, the first three EOFs of the band-pass filtered SM data of SMAP, reanalysis, and GLDAS present the spatial distribution of SM on 10–90 days over eastern China. The first three EOFs of those data can explain about 40% of the total variance. In SMAP and GLDAS, the first EOF shows a seesaw pattern over eastern China, and there is a positive (negative) anomalous center over southern (northern) China. The second EOF presents an anomalous center along the Yangtze River Valley. In the third EOF, there are two positive centers of anomalous SM over southern and northern China, but weak negative anomalies around the Yangtze River Valley. In the reanalysis, the first (second) EOF corresponds to the second (first) EOF of the SMAP SM, but the patterns are generally the same. The first three EOFs of the TRMM precipitation are also analyzed, and the patterns are similar to those of SM. The first three EOFs of the TRMM precipitation can explain about 25% of the total variance. The correlation coefficients between the PCs for those data are further calculated to examine the relationships among each pair of the EOF patterns. The correlation between the PCs of SMAP and GLDAS is around 0.9, except those of Noah3. The correlation between the PCs of SMAP and reanalysis is around 0.8, which is greater than those of Noah3. The correlations between the PCs of the GLDAS SM and TRMM rainfall are about 0.4–0.5. This indicates that the spatial patterns of SM over eastern China on sub-seasonal time scales are significantly correlate to the precipitation distribution on sub-seasonal time scales. However, precipitation cannot explain all the spatial distributions of SSV-SM. This means other factors also contribute to the formation of the spatial distribution of SM over eastern China on sub-seasonal time sales, for example SM memory, which may smooth out synoptic variations and cause significant periods of SM on sub-seasonal time scales.

In the present study, the data length is generally during 2013–2018. Recently, there is a study found that the inter-annual climate background may have effects on SSV-SM (Wang et al. 2019). Thus, as the increase of SM observations over eastern China, the sub-seasonal features of SM on different climate backgrounds may exhibit differently. In summary, for temporal aspect, SSV-SM is not directly obtained from the sub-seasonal variation of precipitation; for spatial distribution, SSV-SM can be an important pattern as a forcing for the atmosphere over eastern China. This study provides information that may be useful for improving S2S forecasting in eastern China.