Local modeling of weighted mean temperature in Iran and its impact on GNSS meteorology

Weighted mean temperature (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{m}$$\end{document}Tm) is used to determine water vapor content, precipitable water vapor, and integrated water vapor (IWV) in GNSS. This parameter is highly correlated with climate conditions as well as the type of the region. The case study is performed in Iran which has diverse climate. ERA5 reanalysis datasets were used at a compact grid of 0.125 × 0.125 between 2007 and the end of 2019 to model the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{m}$$\end{document}Tm. The data obtained from 12 radiosonde stations along with an IGS station located in Tehran were employed in this research. Five models were examined for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{m}$$\end{document}Tm. Bevis model, linear grouping model (LGM), and linear nearest grid point model (LNGPM) were considered as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{m}$$\end{document}Tm linear models, and harmonic model (HM) and GPT2w model were used as nonlinear models. In LGM method the study region was divided into smaller areas with different linear model coefficients using spatial grouping method. The local model in each radiosonde station was considered as a reference. According to the results, the accuracy of linear models (Bevis and LGM model) was between 3 and 8 K (radiosonde data as reference); also 7 out of 12 stations in the LGM had higher accuracy than the Bevis model (based on RMSE). The accuracy of the two GPT2w models and the harmonic model was higher than the previous two models, and it was between 2 and 4 K. The IWV values were obtained using zenith total delay observations of IGS station located in Tehran using 5 models and were compared with the IWV values of the radiosonde station. The accuracy of the values in three linear models, Bevis, LGM, and LNGPM, was, respectively, 0.2, 0.17, and 0.14 kg m−2, and in the two nonlinear models, GPT2w and HM, was 0.13 kg m−2.


Introduction
Troposphere is among the most important sources of error in space geodesy. The tropospheric delay strongly depends on climatic conditions, so this delay varies with changing climatic conditions (Böhm & Schuh 2013). This delay consists of two components: hydrostatic delay and nonhydrostatic delay. Hydrostatic delay depends on air pressure and temperature, while the second component depends on water vapor. Although the nonhydrostatic component constitutes only about 10% of the total tropospheric delay (Seeber 2008), it makes the largest error in calculating the tropospheric delay due to large variations in water vapor content as well as the strong dependence of this parameter on the location. Therefore, accurate modeling of parameters related to nonhydrostatic delay plays an important role in modeling this part of tropospheric delay. The weighted mean temperature ( T m ) is one of these parameters. This parameter is used as a conversion factor for converting ZTD to IWV and PWV (Davis et al. 1985). IWV and PWV parameters play an important role in weather forecasting and atmospheric conditioning. Nowadays, ZTD can be achieved with high accuracy using satellite methods. Therefore, IWV and PWV parameters can be calculated with higher accuracy by a suitable model for T m that is compatible with the conditions of the region. The first step in determining this parameter is to use meteorological data. Radiosonde data as well as 1 3 reanalysis datasets are considered important data sources that have been widely used for modeling and estimating tropospheric errors over the past decade (Böhm et al. 2006;Böhm et al. 2015;Nafisi et al. 2012;Zhou et al. 2020). Reanalysis datasets have recently received more attention due to their arrangement in a regular grid and having a global scale. In addition, radiosonde data are also used to evaluation and control of the estimations due to their high accuracy (Chen et al. 2017;Liu et al. 2018). Reanalysis datasets are obtained from different combinations of data sources, and different organizations are in charge of providing these data. The European Center for Medium-Range Weather Forecasts (ECMWF) is one of the centers offering a variety of meteorological data products (https:// www. ecmwf. int/ en/ forec asts/ datas ets/ browse-reana lysis-datas ets). Many researchers have tried to model the T m and provide a model for identifying the behavior of this parameter using the above-mentioned data sources. The first and simplest model is the one proposed by Bevis, which converts surface temperature ( T s ) to T m by the linear relationship ( T m = 0.72T s + 72 ) (Bevis et al. 1992). This is a simple, but very practical model for determining the T m , because the T m can be achieved by using the T s , which can be easily measured with high accuracy. The important point is that the coefficients of this model may vary according to each region. Therefore, many researchers have calculated the coefficients of this model based on local meteorological data and obtained a local linear model. Liou obtained linear model coefficients for Taipei station as T m = 1.07T s − 31.5 to compare the PWV results using GPS, microwave radiometers, and radiosondes (Liou et al. 2001). Boutiouta and Lahcene also obtained these coefficients as T m = 0.96T s + 14.79 using radiosonde data in Algeria (Boutiouta & Lahcene 2013). Li obtained the coefficients for the linear model between T s and T m as T m = 0.65T s + 87.08 using the data of three radiosonde stations located in Hunan, China ). Chen calculated the same coefficients as T m = 0.61T s + 102.11 using data from a radiosonde station located in Guilin, China (Chen et al. 2017).
Based on what mentioned above, it is obvious that linear model coefficients vary in different regions and climates. Therefore, the meteorological data of the same region should be used to develop a local model. Using radiosonde stations for a small area is appropriate; however, the model obtained from a radiosonde station cannot be generalized to other areas. The study area in this research was Iran. This country has long been known for its diverse climate. Due to its mountains as well as its proximity to open waters and deserts, this country has all kinds of climatic conditions and differences among the four seasons. Therefore, as in previous research, a local linear model cannot be obtained for this region with merely radiosonde data. In this study, ERA5 reanalysis datasets were used at a compact grid of 0.125° × 0.125° and each grid point was considered as a station. Then, grouping and clustering methods were employed in ArcGIS software (ESRI) to perform modeling.
Nowadays, grouping and clustering methods are widely used in various sciences such as earth sciences, economics, medicine to identify and simplify the behavioral patterns of phenomena. Venkatramanan used grouping method in GIS environment to evaluate the quality and characteristics of groundwater (Venkatramanan et al. 2015). Also, Cudjoe (2020) conducted a study on spatial distribution of groundwater using spatial and statistical analyses and clustering techniques in South Africa (Cudjoe 2020). In this study, the linear model coefficients were obtained for each grid point. Then, the region was divided into smaller areas using spatial grouping methods and considering the two components of the linear model coefficient. The GPT2w model is another global model that can be used to obtain the T m for any desired point. This model is an empirical model obtained using ERA-Interim data (Böhm et al. 2015). In this model, only by having the location of a point, the meteorological parameters of temperature, pressure, water vapor pressure, as well as T m along with their annual and semi-annual amplitude can be obtained. Wang et al. compared this model with the Bevis model and the results of ERA-Interim data on radiosonde stations around the world (Wang et al. 2016). Another common model is used to study the T m , known as the harmonic model. This model includes an average T m with an amplitude of the annual cycle of T m . Schueler et al. (2001) developed this model using numerical models of the National Centers for Environmental Prediction (NCEP) on a global scale. In this study, Bevis models, linear grouping model (LGM), linear nearest grid point model (LNGPM), GPT2w, and harmonic model (HM) were used to investigate the T m in Iran. In the next step, the effect of the results on the IWV parameter was investigated using ZTD observations at IGS station in Tehran. In this research, in Sect. "T m , PWV, and IWV retrieval", the basic concepts of T m and its relationship with IWV and PWV are discussed. Section "Datasets and methodology" introduces the datasets used as well as the methodology. In Sect. "Results and discussion", the results of weighted mean temperature of each method are presented, and in Sect. "Conclusion", the conclusions of this research are presented. Appendices A and B also show how the T m is calculated from radiosonde data and ERA5 data.

T m , PWV, and IWV retrieval
According to Eq. 1, the zenith delay is divided into hydrostatic and nonhydrostatic components (Hopfield 1969): ZTD is the zenith total delay that can be obtained by GNSS methods with the accuracy of about ± 7 mm (1) ZTD = ZHD + ZWD. (Pacione & Vespe 2008), and ZHD is the zenith hydrostatic delay and can be calculated by standard models, e.g., the Saastamoinen model as presented in Eq. 2 within a millimeter accuracy (Böhm & Schuh 2013;Davis et al. 1985;Saastamoinen 1972): where p is the surface pressure, is the latitude of the station, and h el is the geodetic height of the station. On the other hand, it is very difficult to find an equation for modeling zenith nonhydrostatic delay, due to the variations in water vapor in the atmosphere, and high accuracy for this parameter cannot be achieved with surface measurements. Equation 3 is one of the zenith nonhydrostatic delay models (Askne & Nordius 1987): where e , T m , and denote water vapor pressure, weighted mean temperature, and water vapor correction factor, respectively, also k ′ 2 and k 3 are the atmospheric coefficients, g m is the mean gravitational acceleration, and R d is specific dry gas constant. As mentioned earlier, the accuracy of zenith nonhydrostatic delay models is very low, so this delay is unknown in Eq. 1 and it was calculated by known parameter ZHD and estimated ZTD. In GNSS methods, precipitable water vapor (PWV) is associated with ZWD by Eq. 4 (Bevis et al. 1994): is obtained by Eq. 5 as follows: where k 1 , k 2 , and k 3 denote atmospheric coefficients, R d and R are coefficients of dry and wet gases, and is water density in the liquid state. The integrated water vapor (IWV) parameter is also sometimes used to indicate the water vapor content in the atmosphere. IWV is defined as integral Eq. 6: where is water density. According to Eq. 7, PWV and IWV are numerically correlated with each other in the same water density (Bevis et al. 1992): In Eq. 5, according to the error propagation law Dividing both sides of Eq. 8 by PWV In Eq. 9, value of ≈ 6 * 10 −5 K −1 ; therefore, the relative error of determining PWV is approximately equal to the relative error of T m . In other words, achieving an accuracy of 1% for PWV requires an accuracy of 2.74 K for the T m (Bevis et al. 1994;Wang et al. 2005). Therefore, it is very important to have an accurate model for calculating the T m based on the conditions of the region and using data related to the same region. The T m can be calculated by Eq. 10 ( Davis et al. 1985): where e and T are water vapor pressure and temperature in the zenith direction, respectively. h 0 and h tp are the heights of the station (surface height or antenna height) and heights of troposphere, respectively. In this formula, the height of troposphere can be 13-30 km depending on the region (Hofmann-Wellenhof et al. 2008). However, in general, the height of the troposphere is considered such that the water vapor content can be ignored. In the following section, data sources and methodology are presented.

a. Radiosonde data
Radiosondes are a set of sensors and instruments that are sent by a balloon to reach the upper layers of the atmosphere to measure the atmospheric parameters along its path. The air temperature, dew point temperature, and speed and direction of wind are presented in standard pressure layers along with (7) IWV = .PWV.
height. These data are usually provided over a 12-h period and can be retrieved free of charge from the National Oceanic and Atmospheric Administration (NOAA) agency at: https:// ruc. noaa. gov/ raobs. The water vapor pressure can be calculated by Eq. 11 in each layer using the dew point temperature presented in each layer (Rózsa 2011): where e is the water vapor pressure and T d is the dew point temperature. When using radiosonde data, it is essential to pay attention to imprecise and sometimes erroneous observations. Therefore, prior to using raw observations, the procedure of eliminating erroneous observations should be done . See Appendix A for more information.
(11) e = 6.112 * exp 17.62T d 243.12 + T d , Figure 1 shows the distribution of 12 radiosonde stations in Iran. Radiosonde data from 2007 to the end of 2019 were downloaded. Table 1 also shows the names and locations of the radiosondes used in this study.

b. ERA5 reanalysis datasets
Nowadays, reanalysis datasets are considered important data sources for modeling meteorological parameters. Given that these data are presented at a regular grid with high spatial and temporal resolution, they are very popular for modeling. In preparing these data, various meteorological sources, including radiosondes, have been used, but over time, several versions of these data have been provided, each of which has an improvement over the previous version. ERA5 is also among the reanalysis dataset that has recently received great attention. Zhou investigated the performance of this dataset compared to ERA-Interim dataset and found that ZTD and SPD values obtained from ERA5 dataset had better results than ERA-Interim (Zhou et al. 2020). The ERA5 data used in the present study had a spatial resolution of 0.125° and a 6-h temporal resolution. These data were downloaded from the northern latitude [24°-41°] and eastern longitude [43°-64°] geographical regions. The observation period of these data was from 2007 to the end of 2019 and included 23,153 grid points. Three parameters of temperature, specific humidity, and geopotential number were used in 37 standard pressure layers to calculate the T m . Appendix B describes the steps for calculating the T m from the ERA5 data. Figure 1 illustrates the mean values of T m in this observation period. The mean values of this parameter in the region vary between 260 and 290 K. The highest values were associated with the south and southeast, and the lowest values were observed in the north and northwest.

c. IGS data
The international GNSS service (IGS) has provided free access to high-quality GNSS data since 1994. IGS accurately estimates and archives GNSS products including orbital, satellite and receiver clock, ionospheric and tropospheric information and then distributes them in IGS ftp sites. Atmospheric data include tropospheric ZTD information and north and east horizontal gradients as well as ionospheric TEC information. The ZTD data used in this section had a nominal accuracy of 4 mm (http:// www. igs. org/ produ cts) and provided daily with a temporal resolution of 300 s. There was only one IGS station in the study area, located in Tehran. The ZTD data of this station were used to convert the results obtained from T m modeling to IWV. Also, the results of the two grouping models as well as Bevis model were converted to IWV using ZTD and compared.

d. Grouping analysis
Grouping is among the methods used to simplify phenomena. In grouping, events that have common features are placed in the same group, so the groups have the most differences with each other. In spatial grouping, the grouped phenomena are location-dependent and have one or more characteristics, based on which the grouping is done. In this study, each grid point was regarded as a phenomenon in grouping and the linear model coefficients of each grid point were considered as characteristics used in grouping. The spatial grouping was performed on grid points using GIS environment, Esri ArcGIS software, and Python programming language. The groups of 2 to 15 components were statistically evaluated to determine the optimal number of groups in grouping. Calinski-Harabasz pseudo-F statistic (CH index) was calculated for each number of groups. CH index represents intra-group similarities and inter-group differences. This index can be calculated by Eq. 12 as follows: R is calculated as follows: For TSS and ESS, we have: where n is the number of grid points, n i is the number of grid points in the ith group, n c is the number of groups, and n is the number of variables used for grouping, which was equal to 2 in this study. V k ij is the value of the kth variable from the jth grid point in the ith group. V k is the mean value of the kth variable, and V k i is the mean value of the kth variable in the ith group. TSS is the sum of squared deviations from the total mean, and ESS is the sum of squared deviations from the mean in each group. In the following, the results of the mentioned datasets are presented.

e. GPT2w and harmonic model (HM)
The GPT2w model is a global model for calculating meteorological parameters including the weighted mean temperature developed by Böhm et al. (2015). In this model, the T m can be reached using the location. Many researchers have used this model along with other local models to obtain the T m in their target region (Sun et al. 2019;Wang et al. 2016). This model is made using ERA-Interim data. Due to the fact that ERA5 data are more up-to-date than ERA-Interim data, in this study, to compare the results, the results of this model are given along with other models. The harmonic model is another model for T m modeling which consists of two parts: the first part is an average annual T m and the second part is an amplitude to consider the annual period of T m . Schueler et al. used

a. Local model and accuracy of Bevis model
The T m values were obtained for each station using radiosonde station data (Eq. 9). The accuracy of the fitted model (local model) was compared with that of Bevis model in each station to evaluate the efficacy and accuracy (RMSE) of Bevis model. Figure 2 presents the T m values compared to surface temperature. The simple red line represents the best fitted line based on the least squares on OIII radiosonde station data located in Tehran. The blue circular dashed line shows the line obtained from Bevis model. As it can be seen, Bevis model clearly had a deviation from the local model. This difference increased with increasing surface temperature. Table 2 shows the coefficients of local linear and Bevis models with their accuracy. Bevis' model was about 2 K less accurate than the local model. The Bevis model accuracy was also obtained in other radiosonde stations. Figure 5 indicates the Bevis model (blue line with rhombic dots) accuracy ranging from 3.859 (OISS in Fig. 1) to 8.357 K (OIKK in Fig. 1) in radiosonde stations. Stations located in warmer areas such as the station located in Kerman (OIKK in Fig. 1) and Bandar Abbas station (OIKB in Fig. 1) had the lowest accuracy in Bevis model.

b. Linear grouping model (LGM) and linear near grid point model (LNGPM) with ERA5 data
The linear model coefficients were calculated for each grid point to model the weighted mean temperature using ERA5 data. The values of coefficient a 1 ( T m = a 1 .T s + a 0 ) in the region ranged from 0.406 to 1.16 and for a 0 from − 61.627 to 163.263 for 23,153 grid points. Table 3 indicates the statistical values of these two coefficients. Based on the results, the best model for the region can be T m = 0.737T s + 64.087 . The multivariate spatial grouping was used in ArcGIS software to locally model the T m . The linear model coefficients at each grid point were employed as grouping variables. The groups of 2 to 15 members were statistically evaluated to determine the optimal number of groups in grouping. The CH index was calculated in each group. Figure 3 shows the CH index values for each number of groups. As it can be seen, n c = 3 with the highest CH index values was known as the best number of groups. Table 4 also indicates the statistical values of these three groups. Figure 4 presents the coverage area of each group in the region. Accuracy of T m in LGM was evaluated using radiosonde data as a reference. Figure 5 shows the RMSE results of this method (red line     with square dots) along with 4 other methods. The results showed that 7 out of 12 stations ('OIMB', 'OIFM', 'OIKK', 'OITT', 'OIII', '9999(1)', 'OICC') had higher accuracy in grouping model than Bevis model. In order to evaluate the accuracy of the linear model obtained from the grid points, the linear model of the nearest grid point (LNGPM) to each radiosonde was considered. The RMSE of this model in the region was estimated at around 3.766 to 5.956 K. Figure 5 shows the RMSE results of this method (green line with circular dots) compared to the other 4 methods. Five stations that had lower accuracy in LGM than Bevis model also had lower accuracy in LNGPM model (except for 'OIKB' station), indicating that the accuracy of numerical data in grouping results is very important.

c. Harmonic model with ERA5 data (HM) and GPT2w
In this study, using ERA5 data, the parameters of HM were calculated by the least squares method. Table 5 shows the results of the parameters of this model for each of the radiosondes. RMSE results of this method were obtained by considering radiosonde data as a reference between 2 and 4 K which shows that the HM has a higher accuracy in the region than the linear model. Figure 5 shows the RMSE results of this model (gray line with triangle points) along with 4 other models. The results of GPT2w model for radiosonde stations were also calculated. The results of this model were very close to the HM, and except for one station ('9999(1)' station) it had a slightly lower accuracy than the HM. Figure 5

d. Evaluation by IGS station
In this section, the impact of T m modeling on IWV parameter is investigated. The IWV values of LGM, LNGPM, HM, GPT2w, and Bevis models were estimated using ZTD values obtained from IGS station located in Tehran. The IWV of the local model was also calculated using radiosonde data. Figure 6 shows the IWV calculation steps of the five models.
The IWV values derived from radiosonde data were used as a benchmark values to compare the accuracy of the other five models. Given that IWV values vary in different seasons, the accuracy of the five models was estimated in each season separately. Table 6 shows the RMSE values of IWV for each model. According to this table, the accuracy of GPT2w and HM models is very close to each other and is higher than linear models. The highest difference between these models is in summer, which is about 0.1 kg m −2 between the Bevis and GPT2w. In spring, the highest difference between the Bevis and HM models is about 0.1 kg m −2 .

Conclusion
T m is used in determining IWV and PWV in GNSS methods. This parameter has been modeled in different regions since the last decade. This study was conducted in Iran, which has a diverse climate. In order to find the best model, 5 models were used to calculate the T m . Three linear models and two nonlinear models were considered. Bevis models and linear grouping model (LGM) and linear nearest grid point linear model (LNGPM) as linear models and harmonic model (HM) and GPT2w model as nonlinear models were investigated. ERA5 data were used for modeling over a 13-year period between 2007 and end of 2019; radiosonde data from this period were also downloaded. These data were used as a reference to measure the accuracy of the obtained models. Accordingly, the accuracy of Bevis model in the study area was estimated to be 3.859-8.357 K. Also, the accuracy LNGPM ranged from Fig. 6 Steps of calculating IWV from 5 models and comparing them with the results of radiosonde data 3.766 to 5.956 K. In LGM method the region was divided into three groups using the grouping method. The results showed that 7 out of 12 stations in LGM had increased accuracy. It was found that the final results are related to the accuracy of grid points. HM and GPT2w models had very close accuracy and in most stations had higher accuracy than linear models. The T m values obtained from five models were converted to IWV using the ZTD observations of the IGS station in Tehran. According to these results, the accuracy of GPT2w and HM models was very close to each other and higher than the accuracy of linear models. By dividing the results in different seasons, it was found that the biggest difference between the models in summer was between Bevis and GPT2w models and in spring between Bevis and HM models was about 0.1 kg m −2 .
According to the results, the use of HM and GPT2w models leads to higher accuracy for T m as well as IWV than linear models for this region.

Data availability statement
All data are available in the public domain, and the source address is given in the paper.

Calculation of T m from radiosonde data
Radiosonde data provide temperature, dew point temperature, altitude and pressure levels. According to Eq. 9, the dew point temperature is converted to water vapor pressure. As mentioned, radiosonde data sometimes contain data with high error or large gaps. Therefore, these data must be filtered before using it. He et al. applied six filters in the use of radiosonde data, which are described below ). 1-The first layer with data should not be more than 20 m from the height of the station. 2-The difference between the two layers containing data should not be more than 10 km. 3-The difference between the two layers of successive pressure should not be more than 200 hPa. 4-The total number of layers should not be less than 20 layers. 5-The height of the highest layer should not be less than 20 km. 6-The highest height of moisture measurement should not be less than − 4 ( mean troposphere height and standard deviation). In order to calculate the integral of Eq. 8, Eq. 17 in numerical mode can be used: where i is the index of the layer and Δh i the thickness of the layer i . Also, e i and e i+1 are the water vapor pressure in , layers i and i + 1 ; T i and T i+1 the temperature are in layers i and i + 1 . N is also the total number of layers.

3
where e is the water vapor pressure in hp, p is the air pressure in hp, and q is the specific humidity, and ε is calculated from Eq. 23 and is equal to 0.622: According to the above, in each pressure layer, the values of temperature, water vapor pressure, and altitude are specified. Equation 17 can be used to calculate the integral of Eq. 8.