1 Introduction

Climate change monitoring and simulation rely on reliable and extensive historical meteorological observations (Hanesiak John and Wang 2005; Jones et al. 2008; Li et al. 2010; Si et al. 2012, 2014). However, most long-time climate records do not adequately represent climate change characteristics because of changes in instrumentation, observation methods, station relocation, and other non-climate factors; these changes result in a distortion in climate change monitoring (Aguilar et al. 2003; Jiang et al. 2010). Therefore, many studies have investigated climate data homogenization globally, as early as the beginning of the 1980s (Quayle Robert et al. 1991; Chenoweth 1992; Peterson Thomas 2003; Wijngaard Janet et al. 2003; Della-Marta Paul and Wanner 2006; Haimberger et al. 2012). These studies found that it is more accurate and easier to analyze meteorological elements based on time series where significant breakpoints have been removed (Winkler Julie et al. 1981; Alexandersson 1986). In recent years, homogenized historical temperature and precipitation observations in China have been developed by Chinese meteorologists; these represent a breakthrough in homogenization of surface observations (Li et al. 2009, 2012; Xu et al. 2013), but homogenizing other meteorological elements remains an open question. The lack of data has restricted data assimilation, prediction, and forecasting for climate change; accordingly, it is important to increase the number of homogenization products available for current meteorological data processing.

Wind speed statistics, as a basic descriptor of regional or local climate change, have provided significant information in global climate change, transportation, urban construction, city planning, utilization of wind energy resources, and other fields (Xu et al. 2006; Yiannis and George 2014; Fang 2014; Soukissian and Karathanasi 2016). Therefore, it is necessary to homogenize climate data before assessing interannual and multidecadal variabilities in wind speed (Pryor et al. 2009; Fu et al. 2011; Chen et al. 2013; William and Martha 2014). Wind speed is a random variable easily influenced by the underlying surface, especially for monthly or daily observations, and time series tend to fluctuate due to local climate effects. In addition, the probability distribution of wind speed data is not a standard normal distribution, which limits the application of many homogenizing methods. There are a few, rare cases showing wind speed homogenization in China and worldwide, but they have focused on the detection and correction of mean annual wind speed time series, which are less affected by seasonal variability (Liu 2000; Cao et al. 2010; Xin et al. 2012). Recently, the observed daily or monthly near-surface wind speed series over Canada, Beijing, Netherlands, Sweden, Spain, and Portugal have been homogenized by Wan et al. (2010), Li et al. (2011), Cusack (2013), Minola et al. (2016), and Azorin-Molina et al. (2014, 2016), respectively. The corresponding climatology, wind speed trends, and derivative climatic factors (e.g., windstorms or extremes) were also assessed. Wan et al. (2010) analyzed near-surface wind speeds recorded at 117 stations in Canada covering 1953–2006 using the penalized maximal t test (PMT). Forty-nine pressure triangles were also used to calculate geostrophic wind speed series for use as a reference series in the homogeneity test on the mean monthly surface wind speed series. However, Azorin-Molina et al. (2014) indicated that in areas of complex topography surrounded by ocean or sea surfaces, wind is not solely driven by surface pressure gradients, but may be also governed by Earth’s surface friction force; therefore, they used a reference series based on monthly wind speed data simulated using the Mesoscale Model-5. Moreover, in Cusack (2013), a reference wind is defined as the average near-surface wind from four other stations instead of the geostrophic wind. Wind speed data in Sweden was also homogenized by Minola et al. (2016) using geostrophic winds based on air pressure observations from three weather station sites as reference, similar to the work by Wan et al. (2010). Daily wind speed observations from 12 stations in the Greater Beijing Area for 1960–2008 were homogenized using the Multiple Analysis of Series for Homogenization method (MASH) by Li et al. (2011); they found that regional and local climate characteristics of wind speed in Beijing area were objectively reproduced using the adjusted data. One drawback is that data series studied must have the same initiation time, so data with differences in sequence length are difficult to combine using the MASH method. Therefore, it is necessary and important to select a representative and more applicable method for reference series construction and homogenization of monthly or daily near-surface wind speed.

In Tianjin, since the 1970s, Velder wind pressure gauges at 70% observation stations have been replaced by EL anemometrographs in succession, and around the year 2004, automatic observation gradually replaced the traditional manual observation. The substantial number of measuring devices, based on different observation principles or physical methods, combined with the possibility of imperfections in automatic instrumentation, could cause inaccurate statistics as a result of some discontinuities in the wind speed series. At the same time, more than 90% of the observation stations have been relocated due to continuous local city planning and reconstruction. Moreover, with new changes in the regulation of the ground service system, the tasks of some stations at Baodi, Xiqing, and others have been changed; for instance, the number of observations has gone from four to three, or vice versa, so it is inevitable for discontinuities to appear in the time series.

Therefore, in this study, homogenization of monthly mean near-surface wind speed at 13 observation stations in Tianjin in the period 1951–2014 was analyzed using RHtestV4. Metadata was combined using different statistical methods and local information to establish a reference series; this should be a valuable reference for homogenizing climate data at a smaller spatial scale. Furthermore, wind speed climatology and trends based on the adjusted data were assessed by comparison with other studies in China and worldwide. Errors were compared to those from similar data products derived from the National Meteorological Information Center (NMIC) in order to illustrate the quality of the adjusted time series in Tianjin.

2 Data sources and processing

The datasets used in this study were obtained from historical Chinese surface meteorological records and their metadata by a special fund for basic meteorological data from the National Meteorological Information Center (NMIC) (Ren Zhihua et al. 2012). The errors from observation and transcription were corrected in the processing through quality inspection and quality control. Errors from false or missing data owing to informatization were also corrected; confirming the research data in this study is accurate and reliable. The metadata was used to describe the historical evolution of each meteorological station, as a reference for judging whether the detected shifts in wind speed series are reasonable. In this study, monthly mean wind speeds for 2 min at 13 observation stations in Tianjin during the period from station establishment to 2014 were selected (Fig. 1); missing data was interpolated using the average values of the entire sequence.

Fig. 1
figure 1

Geographic distribution of the 13 observation stations in Tianjin; dots and triangles indicate urban and rural stations obtained from Si et al. (2014), respectively

The observation stations in Tianjin are distributed at the same altitude and have no significant terrain difference. Based on metadata, there has been no significant height change in the wind speed observation instruments used at each station; therefore, we analyzed the basic data directly instead of first using non-standard anemometer height adjustments, as in Wan et al. (2010) or Cusack (2013). However, some inhomogeneities may have been created in the wind speed series at stations Tianjin (number 54527) and City (number 54517) because of station number exchanges and observation business before the year 1992. To avoid this probability, we reverse the series of the two stations for the period from establishment to December 31, 1991, and then define them according to current site information. This data processing is much more advanced than previous research (Liu 2000; Cao et al. 2010).

3 Methodology

To ensure credible analysis results, the method employed here, for judging homogeneities of monthly mean near-surface wind speed in Tianjin, combined information from station metadata and statistical test results. The statistical method used a data homogenization software package called RHtestV4, which is not limited by the time series length, unlike the MASH method used in Li et al. (2011). It has a friendly graphical user interface (GUI) to homogenize the time series at each station and its matching reference series based on the R environment. This software package can detect, and adjust for, multiple breakpoints (shifts) that potentially exist in a data series using the penalized maximal t test (PMT) (Wang et al. 2007) and penalized maximal F test (PMF) (Wang 2008a). Other benefits of this method are as follows: (1) the correction can be applied to cases with or without a reference series, which are embedded in a recursive testing algorithm, (2) the time series lag-1 autocorrelation (if any) is empirically accounted for, and (3) the problem of uneven distribution of false alarm rate and detection power is also mitigated using empirical penalty functions (Wang 2008b). The method is ideal for surface air temperatures and total solar radiation in Tianjin for the last 60 years (Si and Xie 2015a; Si and Xu 2015b), and has been widely used in other regions for homogenizing temperature, precipitation, wind speed, and relative humidity time series by previous researchers (Alexander Lisa et al. 2006; Wan et al. 2010; Aiguo et al. 2011; Kuglitsch Franz et al. 2012; Wang et al. 2014).

The breakpoints identified in the wind speed series were adjusted using quantile matching (QM) (Wang et al. 2010) in the RHtestsV4 software package. The objective of the QM adjustments is to modify the series so that the empirical distributions of all segments of the de-trended base series match. It can account for a seasonal discontinuity, and importantly, the annual cycle, lag-1 autocorrelation, and linear trend of the base series were estimated in tandem while accounting for all identified shifts.

3.1 Creation of the reference series

A much stronger autocorrelation commonly occurs in climate time series on a monthly scale, and is easily affected by local weather disturbances and seasonal variability. Accordingly, to obtain the best results, here the monthly candidate and its reference series in the calendar order were transformed into series in a calendar year with the same month; this technique has not been used in previous studies on the homogenization of near-surface wind speed (Wan et al. 2010; Li et al. 2011; Cusack 2013; Minola et al. 2016; Azorin-Molina et al. 2014, 2016). The change signal in the climate time series may be a systematic error, but it may also be a periodic variation in local climate (Li 2016). The rapid urbanization in Tianjin is remarkable, and has likely impacted regional climate and environment patterns, such as climate warming, air pollution, and anthropogenic changes. Therefore, the creation of a reference series in this study is still motivated by the same factors as previous historical homogenization data products in Tianjin (Si and Xie 2015a; Si and Xu 2015b), to preserve the objectivity of climate data. Therefore, the method uses observations from nearby stations in the Tianjin network (Fig. 1), instead of using the geostrophic wind speed series reflecting atmospheric circulation (Wan et al. 2010; Minola et al. 2016) or model simulation data not affected by human activities on the ground (Azorin-Molina et al. 2014).

The reference series for wind speed from 13 stations in Tianjin covering 1951–2014 was built using two methods. To obtain a reference series as homogeneous as possible, one method uses the Standard Normal Homogeneity Test (SNHT) (Alexandersson 1986), and the other uses the Peterson-Easterling (P-E) technique (Peterson Thomas and Easterling David 1994). The detected series were formed by means of a ratio method using the reference series created above, namely, the ratio of the base series and reference series. Then, the two created series from 13 stations were tested using SNHT; the reference series, based on the identified results combining with their metadata information, as well as the results identified from the annual and monthly mean wind speed series at the same 13 stations without a reference series for 1951–2014 using PMF (99% level), the final results are shown in Table 1.

Table 1 Creation of the homogeneous reference series at 13 stations in Tianjin

The primary procedures to create the reference series r i using SNHT is based on the length of the wind speed time series and its location in the network of 13 in situ Tianjin stations. Out of five nearby stations, the three with the highest correlation coefficient with each near base series were selected as the reference stations. They have almost the same time series length as the base series; as much as possible, the locations of the three stations form a triangle with the base station as the center.

$$ {r}_i=\left[\sum_{j=1}^3{\rho}_j^2{X}_{j i}/\overline{X_j}\right]/\sum_{j=1}^3{\rho}_j^2 $$
(1)

where ρ j is the correlation coefficient between the base station and reference station X ji , and \( \overline{X_j} \) is the mean value of wind speed time series at each reference station.

The most important steps to create a reference series using the P-E technique are as follows: (1) turn the annual or monthly mean surface wind speed series from Tianjin’s 13 stations into a \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) time series using first difference methods; (2) calculate the correlation coefficients between each nearby base series based on each station \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) time series from the network of 13, the series with the significantly highest value are selected for the candidate reference station’s \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) series; (3) the \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) series are tested to predict the base station using a multivariate randomized block permutation (MRBP) test to reduce the accidental probability events in correlation test; (4) combine the five highest positively correlated station series that have passed the MRBP test (probability values are less than or equal to 0.01) for that year into a reference \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) value using a weighted mean, which is the same as the SNHT previously described; (5) finally convert the reference \( \raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dt$}\right. \) series into a reference series r i .

$$ {\left(\raisebox{1ex}{$ dT$}\!\left/ \!\raisebox{-1ex}{$ dT$}\right.\right)}_i={T}_{i+1}-{T}_i,\kern0.5em i=1,\kern0.5em 2,\kern0.5em \dots \dots 64 $$
(2)

The detected series Q i are created using the ratio method based on these two types of reference series r i .

$$ {Q}_i=\frac{\raisebox{1ex}{${Y}_i$}\!\left/ \!\raisebox{-1ex}{$\overline{Y}$}\right.}{r_i} $$
(3)

where Y i is the base series and \( \overline{Y} \) is the mean value of the base series.

It is necessary to carry out a standardization Z i for the detected series Q i before processing the homogenization, with the purpose of making the wind speed time series fluctuate around 1, and approach a normal distributionN(0, 1).

$$ {Z}_i=\left({Q}_i-\overline{Q}\right)/{\sigma}_Q $$
(4)

where \( \overline{Q} \) is the mean value of Q i and σ Q is the standard deviation of Q i .

3.2 Determining discontinuities in the wind speed series

Good metadata are one of the most indispensable elements for homogenizing the observed data (Aguilar et al. 2003). Here, the judgment of breakpoints in wind speed series was mainly focused on detailed station metadata information, taking their accuracy and objectivity into account from a subjective perspective. This is different from the processes used in Azorin-Molina et al. (2014), who had limited metadata, where breakpoints were fixed only using a mathematical approach based on statistical significance. Therefore, in this study, station metadata from NMIC and a large number of historical archive information about each station obtained from the Tianjin Meteorological Archives were used, providing a basis for determining and correcting breakpoints in the wind speed series. The detected series Z i from 13 stations on an annual or monthly scale was tested using PMT (Wang et al. 2007), and potential breakpoint locations were fixed using station metadata. Combined with the test results from creating the corresponding reference series, significant breakpoints were clearly supported by station metadata and removed from the detected annual and monthly series at the same time as adjustment. The statistically significant confidence level is at 95%.

4 Homogenization of monthly mean wind speed series

4.1 Statistics on the number of breakpoints and their related reasons

The detection information shown in Table 2 indicates that 10 stations have significant breakpoints, 77% of the Tianjin total, suggesting that inhomogeneities in the wind speed series are common in Tianjin. The total number of breakpoints is 55; 60% of the inhomogeneous stations have four or more breakpoints, 30% of the inhomogeneous stations have two breakpoints, and there is only one station with three breakpoints. Table 2 also indicates that relocation has the greatest effect on the shifts in monthly mean surface wind speed series, followed by the effects of instrument change, then automation, accounting for 36, 33, and 29% of the total breakpoints, respectively. The main causes for discontinuities in wind speed series were found to be similar in the homogenization of Canadian near-surface wind speed by Wan et al. (2010). Six stations with four or more breakpoints accounted for 46 breakpoints, and the proportion of related shifts due to relocation, instrument change, and automation were 43, 30, and 26%, respectively.

Table 2 Statistical analysis of the number and related reasons for the shifts in monthly mean wind speed series

Figure 2 shows the number of breakpoints in the period from 2002 to 2004 as highest in monthly mean surface wind speed series covering 1951–2014; the metadata indicates this was primarily due to the effects of automation and relocation. The number of breakpoints in the 1969–1970 and 1988–1989 periods was primarily caused by instrument change and relocation, also referring to the metadata.

Fig. 2
figure 2

Temporal distribution of the number of shifts in monthly mean surface wind speed series for 1951–2014

4.2 Causes of the inhomogeneities in the wind speed series

Table 3 provides specific causes for the shifts in monthly mean wind speed series. In the 10 inhomogeneous stations, instrument change was the most common cause, followed by automation. Although relocation caused the largest number of breakpoints (Table 2), the inhomogeneous stations were impacted the least. Station metadata indicate that the breakpoints at stations Baodi (54525), Dongli (54526), Ninghe (54529), and Tanggu (54623) were all caused by instrument change, focusing on the type and height change; but the instrument type change from Velder wind pressure gauges to EL anemometrographs had the greatest impact, causing 69% of all breakpoints at the four stations. The effect of instrument height change was mainly at station 54529, where the height increased by 4.2 m; this effect accounted for 31% of all the breakpoints at the four stations.

Table 3 Homogeneity correction information for monthly mean surface wind speed series at 13 stations in Tianjin

Figure 3 shows the time series of annual mean raw and adjusted monthly mean wind speed at four representative stations. The annual wind speed series values at station 54525 (Fig. 3a) were abnormally larger before the year 1970, and for stations 54526 and 54623 with the shift before year 1974 and 1995, respectively, but the values at station 54529 (Fig. 3b) are unusually smaller before the shift in 1997 due to the instrument height increase. However, the shifts in the wind speed series at each station caused by instrument change were almost entirely corrected with homogenization, which makes the series relatively flat and continuous, shown in Fig. 3a, b.

Fig. 3
figure 3

The time series of annual means of the raw (dashed curve) and adjusted (solid curve) monthly mean surface wind speed at a Baodi, b Ninghe, c Tianjin, and d Dagang stations

The shifts in monthly mean surface wind speed series at Tianjin (54527) and Hangu (54530) stations were both due to the effects of automation, based on their metadata. These shifts occurred between 2002 and 2004, especially for station 54527, where the monthly series from January to December in 2003 all have breakpoints. The annual values of wind speed based on the raw monthly series at station 54527 (Fig. 3c) were abnormally smaller before 2004 due to automation, but much larger for station 54530 prior to 2004. Similarly, data homogenization almost corrected the shifts at the two stations caused by automation, making their wind speed data relatively reasonable.

Relocation had a much greater effect on station Dagang (54645) than on station Beichen (54528), which were both significantly impacted. The shifts in station 54645 due to relocation were found in the other 11 months except February 2003, resulting in abnormally larger wind speed values before 2003 (Fig. 3d). The metadata information indicated that automation resulted in an instrument-type change, from EL anemometrographs to 15-1A-type EL wind speed sensors. Moreover, the effects of relocation impacted two factors: the observation environment and instrument height (Wan et al. 2010). Statistics in Table 3 also show that shifts in wind speed series at the City (54517) and Jinghai (54619) stations were both due to the effects of instrument change combined with automation, but the shift amplitude was lower.

Accordingly, the predominant reason for inhomogeneities in the Tianjin monthly mean surface wind speed series was instrument change, including changes in type and height. The automation effect increased the impact of instrument change. However, homogenization seemed to correct the shifts in the time series caused by non-climate factors at each station, making the wind speed characteristics relatively reasonable. This is consistent with many previous studies on the homogenization of near-surface wind speed in China (Liu 2000; Cao et al. 2010).

4.3 Probability density distribution of all QM adjustments

Figure 4 shows the probability density distribution of all QM adjustments applied to monthly mean surface wind speed series identified as inhomogeneity at the 10 Tianjin stations. The curve of empirical estimate indicates that the positive adjustments were larger than those of negative, 55.4 and 44.6% in the whole QM adjustments, respectively. The amplitudes of positive adjustments of probability density above 0.2 ranged from 0.2 to 1.2 m s−1, 85% of the positive QM adjustments derived from the Kernel density estimate. The negative adjustments ranged from −0.1 to −1.2 m s−1, 80% of the negative QM adjustments. The monthly mean surface wind speed series at the 10 inhomogeneous stations range had QM adjustments from −3.4 to 3.2 m s−1, with mean and median values of 0.03 and 0.3 m s−1, respectively.

Fig. 4
figure 4

Probability density distribution of all QM adjustments applied to monthly mean surface wind speed

5 Assessment of the quality of adjusted monthly wind speed data

In this study, the quality of homogenized wind speed data for 1951–2014 in Tianjin was evaluated by (1) comparing variance and trend changes of monthly mean surface wind speed before and after adjustment and (2) comparing the long-term trend changes with those of geostrophic wind speed based on daily air pressure observations. Error analysis was also employed between our data and the China Homogenized Monthly Wind Speed Datasets (V. 1.0) released on December 2015 by NMIC.

5.1 Variance comparison before and after adjustment

The variance is used to measure the average data change around its mean in the sample. In general, amplitude differences between climate variables in different areas can also be judged using variance (Huang 2000). Figure 5 provides the variance distribution of the annual time series based on the raw and adjusted monthly mean surface wind speed. Figure 5a, b indicates that annual variances based on the adjusted monthly data were less than those calculated from the raw ones, and the average variances for the entire Tianjin area were 0.343 and 0.416 m s−1, respectively. These characteristics were also reflected in the seasonal wind speed time series at each station (figure omitted). Differences in the variance between raw and adjusted data from Fig. 6 indicate that station Dagang (54645) was the station with the largest difference in annual variance based on raw and adjusted monthly wind speed data, with a value of 0.75 m s−1, especially in spring, decreasing to about 1.18 m s−1. The metadata information suggests that station 54645 was influenced more by growing urbanization since 2003 due to relocation from rural to suburban, resulting in a smaller mean wind speed (Zhou and Shu 1994; Cui and Shi 2012; Wu et al. 2016). However, homogenization corrected the bias caused by relocation, making the wind speed change more reasonable for station 54645. Accordingly, this implies that the homogeneity adjustment reduced the large dispersion in mean wind speed data caused by some inhomogeneity factors, improving wind speed change stability. In contrast, annual and seasonal variances based on the adjusted monthly data were larger than those calculated using the raw data from station Tianjin (54527); the annual variance increased by 0.27 m s−1 compared with the raw data, and similarly, the amount increased in spring with a maximum of 0.41 m s−1. From the metadata, the location of station 54527 was in low-lying land surrounded by building obstacles before 2010; but this effect was largely a local environmental impact, rather than urbanization. The surrounding area was more flat and open after 2010; thus, the observed mean wind speed increased. However, the reason for the shifts in the wind speed series at station 54527 in Table 3 was automation (also may be called as instrument-type change); the effect of relocation seems to be concealed. The analysis results from Sect. 4.2 indicate that the relocation effect was also a type of instrument change. Therefore, the instability in wind speed changes after adjustment at station 54527 is reasonable. Moreover, Fig. 6 also shows that there is a clear increase in variance in the winter wind speed series at Tanggu station (54623) after adjustment. Station 54623 is the only land observation located near the coast in Tianjin; therefore, a winter variance change in wind speed series due to weather systems is reasonable; winter brings frequent cold air activity, followed by more sea gales than in other seasons. Similar findings were found by William and Martha (2014) who researched the Beaufort/Chukchi Sea coastal region. Coastal locations exhibited higher wind speeds than interior stations in all months of the year, with maximum monthly wind speeds occurring in the winter months. Minola et al. (2016) found similar results.

Fig. 5
figure 5

Variances in the annual time series based on the raw (a) and adjusted (b) monthly mean surface wind speed from 13 in situ stations in Tianjin; units: meters per second

Fig. 6
figure 6

Differences in variance between raw and adjusted annual and seasonal mean wind speed at 10 stations

5.2 Spatial distribution of trends comparison before and after adjustment

Figure 7 shows the spatial distribution of trends in annual and seasonal time series based on the raw and adjusted monthly mean wind speed. The Dagang (54645) area shows decreasing trends in the annual mean wind speed (Fig. 7a) in Tianjin, which was transferred to the City area (54517) (Fig. 7b) with homogeneity adjustment. This is also reflected in the trend distribution in the seasonal mean wind speed (Fig. 7d, f, h, j), which is consistent with the climate change characteristics in Tianjin. Station metadata indicates that City station (54517) was impacted more prominently by urbanization due to its location in the center of Tianjin, and the decreasing trend in wind speed is clear compared to other stations (Hou et al. 2013; Dou et al. 2015). In addition, Ninghe (54529) also had a decreasing trend, displayed in Fig. 7b, compared to the raw wind speed trend distribution (Fig. 7a). Station 54529 was severely impacted by urbanization in the last few years; the surrounding environment did not conform to observation conditions required by the ground meteorological service, thereby station relocation has being carried out. This demonstrates that recent urbanization was not associated with continuous large-scale city construction; on the contrary, the surrounding open rural and suburban areas were the first targets for urban construction. As a result, trends in near-surface wind speed series at stations in open rural environment decreased due to building growth, which impacted stations from urbanization. This is also one of the reasons that there will be worse climate at rural stations in the next few years compared to urban stations (Si et al. 2014).

Fig. 7
figure 7figure 7figure 7

Spatial distribution of trends in the time series of annual and seasonal means of raw (a, c, e, g, i) and adjusted (b, d, f, h, j) monthly mean wind speed for 13 in situ stations in Tianjin; a and b indicate annual, c and d indicate spring, e and f indicate summer, g and h indicate autumn, i and j indicate winter; significant at the 95% level, units: meters per second/10 a

The spatial distributions of seasonal mean wind speeds were more consistent when homogenized compared to raw trends (Fig. 7c, e, g, i). Trends in the northern and central regions were smaller, while in the southwest (city area) they had large decreasing values (Fig. 7d, f, h, j). The trend distributions were similar in spring and winter (Fig. 7d, j) with larger decreasing trends, and almost identical in summer and autumn (Fig. 7f, h) with smaller reducing trends. This is consistent with monsoon circulation dominating climate in the Tianjin area.

5.3 Trend amplitudes before and after adjustment

As shown in Fig. 8, the wind speed series after adjustment at all 10 stations were coincident, with a reducing trend at the 95% significant level. This result is consistent with current researches on assessment of observed near-surface wind speed trends in China (Rong and Liang 2008; Cheng 2010; Zou et al. 2010; Fu et al. 2011; Guo et al. 2011; Chen et al. 2013; Zhang et al. 2014; Xiong 2015), and also similar to trends reported for other parts of the world (Pryor et al. 2009; Wan et al. 2010; William and Martha 2014). Based on the adjusted data, the trends in the regional average annual and seasonal (spring, summer, autumn, and winter) wind speed series were −0.277, −0.325, −0.228, −0.223, and −0.336 m s−1/10a, respectively (Fig. 8b), significantly less than the raw values of −0.335, −0.404, −0.271, −0.296, and −0.373 m s−1/10a (Fig. 8a). This feature was also reflected in the spatial distribution of wind speed variation, shown in Fig. 7. The statistics based on the adjusted data here were generally consistent with those from Li et al. (2011), who evaluated the regional trends in annual and seasonal wind speed series based on homogenized observations in Beijing, −0.260, −0.300, −0.120, −0.220, and −0.390 m s−1/10a, respectively (Table 4). Both this study and the prior study indicate that the decreasing amplitudes of wind speed trend in spring and winter were much larger than those in summer and autumn; this is also reflected in studies on annual and seasonal wind speed change over China (Xu et al. 2006; Jiang et al. 2010; Guo et al. 2011; Fu et al. 2011; and Chen et al. 2013). Accordingly, based on the adjusted data, characteristics of near-surface wind speed change in Tianjin are consistent with those of regional background climate in China.

Fig. 8
figure 8

Time series trends in annual and seasonal means of the raw (a) and adjusted (b) monthly mean wind speed from 10 in situ stations in Tianjin, significant at the 95% level, units: meters per second/10a

Table 4 Mean wind speed trend magnitudes over China from previously published studies, units: m s−1/10a

However, compared to the previous studies of regional wind speed trends in China in Table 4, the results here are larger, but with the same order of magnitude as the results of Li et al. (2011). This difference is primarily due to the highest amplitudes having been found in North China; Xu et al. (2006) pointed out that the decreasing trend in annual wind speed series in North China for 1969–2000 was 0.290 m s−1/10a (Table 4), similar to the results here and in Li et al. (2011). The Tianjin region is located in North China, and has the same regional characteristics while also reflecting the territorial climate change. For the processing of research data, the wind speed data have been homogenized systematically by Li et al. (2011) (italicized marking); in the studies of Xu et al. (2006), Guo et al. (2011), and Chen et al. (2013), the authors deleted stations with significant breakpoints detected by metadata or mathematical statistics (asterisk marking) but did not homogenize data. Original data from the studies of Jiang et al. (2010) and Fu et al. (2011) involved no quality control possibly owing to their research needs. Therefore, the differences in results obtained in this study and previously may have been caused by inhomogeneities of the data (Yan et al. 2016). Thus, the trend changes based on adjusted wind speed data here better represent the integration of Beijing, Tianjin, and Hebei.

To further illustrate this study’s significance, we analyzed the changes in annual and seasonal mean wind speed series at different station types in Tianjin (Fig. 1). We used the classification for urban-rural stations from Si et al. (2014) for North China and a regional average method for urban or rural areas from Si et al. (2012). Shown in Table 5, the regional trends in annual and seasonal mean wind speed for urban and rural areas all displayed a significant reduction based on the data before and after adjustment at the 95% significant level, consistent with the entire Tianjin region (Fig. 8). The amplitudes of the trend changes, decreasing trends in wind speed for urban areas based on the adjusted data, were all larger than those in rural areas, unlike the raw data, which is reasonable considering the characteristics of climate change in the city (Zhou and Shu 1994; Cui and Shi 2012; Zhu et al. 2017). Here, urban-rural regional trend differences were likely due to the impact of urbanization; the average annual wind speed caused by urbanization in the Tianjin area decreased by −0.046 m s−1/10a, consistent with the influence of urbanization on wind speed over Beijing (Li et al. 2011; −0.050 m s−1/10a). The reduction in both was larger than that for all of China, ranging from −0.020 m s−1/10a (Xu et al. 2006) to −0.010 m s−1/10a (Jiang et al. 2010), although this effect is not observed in raw wind speed data from urban-rural areas.

Table 5 Annual and seasonal wind speed trends with and without “obvious UHI stations” based on raw and adjusted data in Tianjin

5.4 Comparison with trend changes in geostrophic wind speed

The calculation of geostrophic wind speeds was based on Wang et al. (2009) and Wan et al. (2010). The methods forming pressure triangles for each station were based on the procedures to create reference series by SNHT using daily air pressure observations from three nearby weather station sites; the results are shown in Table 6. The data used to calculate geostrophic wind speed were obtained from the China Daily Basic Meteorological Elements Datasets (V. 3.0), also derived from the special fund for basic meteorological data by NMIC. The integrity rate for each element was generally above 99%, and validity rates were all close to 100%, so the quality and integrity were improved in comparison to similar data products previously released by NMIC, and ensures the reliability of the calculated geostrophic wind speeds.

Table 6 Pressure triangles and their geostrophic wind speed trends for each observation station in Tianjin

*Significant at the 95% level

The annual and seasonal geostrophic wind speeds (geowind) at each station are shown in Table 6; they indicate a decreasing trend for the last 50 years (significant at the 95% level are marked with an asterisk). This is consistent with the long-term trend changes in the corresponding near-surface wind speed series (Fig. 8b), illustrating that the wind speed weakened in Tianjin. This physical change may have been related to a weakened system circulation, as suggested by previous studies on near-surface wind speed in China (Xu et al. 2006; Jiang et al. 2010; Guo et al. 2011). The amplitudes of trend change in annual and seasonal geowind at each station were basically the same, the reduction of which was all less obviously than the corresponding near-surface wind speeds (Fig. 8b). However, they were almost the same order of magnitude as trend changes in upper-air wind speed over China (Zhang et al. 2009), and similar to the Kendall trends in 95th and 99th standardized geowind percentile series of the four seasons in the Northeast Atlantic region (Wang et al. 2009). Based on the large-scale circulation background, it is more reasonable that the decreasing trends in near-surface wind speed series have slowed in Tianjin based on the adjusted data (Fig. 8b). However, long-term trend changes in annual and seasonal mean wind speed based on homogenized data at Tianjin (54527) and Ninghe (54529) stations showed a significant decrease in comparison with the raw data (Fig. 8b). This may have been affected by atmospheric circulation, but also by human activities, such as urbanization (Xu et al. 2006; Guo et al. 2011; Azorin-Molina et al. 2014). Results from Sect. 5.3 indicate that the urbanization effect on weakening annual mean wind speed in Tianjin was about 16%, based on the adjusted data. Historical issues, such as observational environment, were also the main factors that affected the discontinuities in the long time series. The wind speed series after adjustment at station 54527 reflected the surrounding environment, including urbanization. There was no detection of local environmental changes at station 54529 owing to the increasingly severe urbanization process. In summary, the real characteristics of region and local climate change in Tianjin were better reflected by the homogenized wind speed series.

5.5 Comparison with the similar data products

Wind speed datasets from NMIC were optimized using the geostrophic wind speed based on sea level pressure from Wan et al. (2010) as a reference series for Eastern China. For the western region, the reference series was constructed using a single station detection combined with adjacent station optimization. The discontinuities in wind speed series caused by station relocation and instrument change were effectively removed using the method RHtest combined with their station metadata. As a result, the regional long-term climate characteristics in China are more accurately reflected based on these surface wind speed data.

Two statistics of standard error (SE) and mean absolute error (MAE) were used here, and the specific methods can be found in Soukissian and Karathanasi (2016).

The values of annual and seasonal SE between the data adjusted here and NMIC were nearly equal, and also for those of MAE in Fig. 9. However, the error amplitudes for MAE (Fig. 9b) were almost less than those of the SE (Fig. 9a), indicating that the average errors between the two kinds of wind speed data were much smaller. The annual and seasonal values of MAE with an error range between 0.2 and 0.4 m/s and probability density up to more than 1.3, and above 1.7 for the autumn data. However, for the SE, the probability density was more than 1.0, and the error range was primarily between 0.3 and 0.6 m/s. In a comparison of the error amplitudes at each station (Table 7), 100% of observation stations had MAE values between 0 and 1.0 m/s for the two kinds of annual or monthly wind speed data, except 92.3% for February data. Correspondingly, there were fewer observation stations with values of SE in the same range, but all reached above 84.6%. The stations with large errors were primarily 54517 and 54527, of which the average annual and monthly values of SE were 1.1 and 1.0 m/s, respectively, and the MAE values were 0.7 and 0.8 m/s. This may have been principally related to the data pre-processing. For NMIC, a number of stations were exchanged before December 31, 1991. This was different than our method, outlined in Sect. 2, which may have caused different breakpoints between the two datasets in the homogenization process, and thus led to some differences in the data adjustment results.

Fig. 9
figure 9

Probability density function of standard error (a) and mean absolute error (b) between the data adjusted here and those from NMIC

Table 7 The proportions of errors from 0 to 1.0 m/s between the stations adjusted here and those from CMA; units: percentage

Detailed station metadata plays an important role in judging breakpoints in time series for homogenization of climate data (Aguilar et al. 2003). In this study, we pre-processed the data before investigating the historical evolution of each observation station, avoiding judgment errors caused by inhomogeneities from the original data. Moreover, in the adjustment process, to respect the rationality of firsthand observations, we only corrected breakpoints supported by reliable metadata. In NMIC, many breakpoints were removed in the absence of metadata, so the results in this study are relatively conservative. However, in general, 0–1.0 m/s is just a median range breeze on the wind speed scale, with a much smaller error, according to Specification for Ground Meteorological Observation from CMA. Moreover, station relocation and instrument change were also the main reasons for the discontinuities in China monthly mean surface wind speed series based on data descriptions by NMIC, similar to our data. So it was just mentioned in climate data homogeneity studies in China by Li (2016) that different researchers (or team) homogenized on the same climate time series might have some differences, but this difference range should be in a certain limit. Consequently, the homogenized wind speed data for the Tianjin area were highly consistent with those from NMIC; thus, the results in this study should be reliable.

6 Summary and discussion

Using historical Chinese surface meteorological records by a special fund for basic meteorological data from the National Meteorological Information Center (NMIC), monthly mean 2-min near-surface wind speed for the last 60 years in Tianjin was processed. Data were homogenized using RHtestV4 based on the Standard Normal Homogeneity Test (SNHT) and Peterson-Easterling (P-E) techniques to create a reference series, and combined with station metadata. The conclusions are as follows:

  1. 1.

    The detection results suggest that inhomogeneities are a prevalent problem in the near-surface wind speed data in Tianjin, 77% in all stations. Stations with four or more breakpoints were the most common, 60% of the inhomogeneous stations. Subjective analysis of station metadata indicate that the number of shifts in wind speed series caused by relocation is the largest, accounting for 36% of the total breakpoints, followed by the effects of instrument change and automation, accounting for 33 and 29% of the total breakpoints, respectively. But the specific reasons for the inhomogeneity of wind speed data indicate that instrument change was the predominant cause for shifts in Tianjin near-surface wind speed series, including changes in instrument type and height, of which the type change was a major factor.

  2. 2.

    QM adjustments for shifts at each station resulted in larger positive adjustments than negative, 55.4 versus 44.6% in total QM adjustments. The amplitudes of positive adjustments ranged from 0.2 to 1.2 m s−1, accounting for 85%, and negative adjustments ranged from −0.1 to −1.2 m s−1, accounting for 80%. The mean and median of the QM adjustments for monthly mean wind speed series at the 10 inhomogeneous stations were 0.03 and 0.3 m s−1, respectively.

  3. 3.

    The variances in monthly mean wind speed data before and after adjustment were compared. Homogenization reduced the large dispersion in wind speed data caused by inhomogeneous factors, enhancing wind speed data stability at each station. It also homogenized local environment effect in some stations.

  4. 4.

    A comparison of trend changes before and after adjustment suggests that spatial distributions of wind speed trend based on the adjusted data agreed with characteristics of urban climate change in the Tianjin region. The trends in regional average annual and seasonal (spring, summer, autumn, winter) wind speed series based on the adjusted data were −0.277, −0.325, −0.228, −0.223, and −0.336 m s−1/10a, respectively. These were generally consistent with those measured in Beijing by Li et al. (2011). The trend amplitudes also reflected a weakening wind speed in spring and winter that was larger than those in summer and autumn in Tianjin, and also similar to annual and seasonal wind speed variations over China (Xu et al. 2006; Jiang et al. 2010; Guo et al. 2011; Fu et al. 2011; Chen et al. 2013). At the same time, a comparison to the trends of geostrophic wind speed illustrated that wind speed series based on the homogenized data were more reasonable than those from the raw data, regardless of the trend change or amplitudes.

  5. 5.

    Error analysis between our adjusted data and the data from NMIC indicates that annual and seasonal values of MAE were generally less than those of SE, illustrating that the average errors between the two types of wind speed data were smaller. One hundred percent of observation stations with values of MAE were concentrated between 0 and 1.0 m/s for annual or monthly wind speed, except 92.3% for February data. Correspondingly, observation stations with values of SE in the same range were all above 84.6%, so the adjusted wind speed data were consistent with those from NMIC, showing the method reliability. An analysis of stations with large errors showed that data pre-processing in this study, based on station metadata, was conducive to reliable homogenization.

In this paper, the provincial monthly mean wind speed data were homogenized using two methods, the penalized maximal t test (PMT) and the penalized maximal F test (PMF), and achieved a reasonable correction result. However, the erroneous judgment of the position of some breakpoints in the wind speed series could occur due to different artificial experience or lack of detailed station metadata information in data processing, and introduce bias in the sequence correction. In addition, the wind speed data did not have a normal probability distribution; standardization was carried out for the detected series before homogenization in order to force the values to approach a normal distribution. The properties of this distribution are not fully consistent with the statistical hypothesis in the homogeneity test. Consequently, the adjusted monthly mean surface wind speed data can only be defined as “relatively homogeneous data,” rather than “absolute one.” However, from the perspective of application, the quality of climate data in Tianjin was greatly improved by homogenization in this study, and reduced some uncertainties for business and scientific applications.