Use of RegCM gridded dataset for thunderstorm favorable conditions analysis over Poland—climatological approach

The paper analyzes equivalent data for a low density meteorological station network (spatially discontinuous data) and poor temporal homogeneity of thunderstorm observational data. Due to that, a Regional Climate Model (RegCM) dataset was tested. The Most Unstable Convective Available Potential Energy index value (MUCAPE) above the 200 J kg−1 threshold was selected as a predictor describing favorable conditions for the occurrence of thunderstorms. The quality of the dataset was examined through a comparison between model results and soundings from several aerological stations in Central Europe. Good, statistically significant (0.05 significance level) results were obtained through correlation analysis; the value of Pearson’s correlation coefficient was above 0.8 in every single case. Then, using methods associated with gridded climatology, data series for 44 weather stations were derived and an analysis of correlation between RegCM modeled data and in situ thunderstorm observations was conducted with coefficients in the range of 0.75–0.90. The possibility of employing the dataset in thunderstorm climatology analysis was checked via a few examples by mapping monthly, seasonal, and annual means. Moreover, long-term variability and trend analysis along with modeled MUCAPE data were tested. As a result, the RegCM modeled MUCAPE gridded dataset was proposed as an easily available, suitable, and valuable predictor for thunderstorm climatology analysis and mapping. Finally, some limitations are discussed and recommendations for further improvements are given.


Introduction
Thunderstorms are regarded as commonly occurring atmospheric phenomena over the area of Poland and Central Europe in the summer season. Because of their intensity, violent nature, as well as socio-economic consequences, thunderstorms are also regarded as an extreme phenomenon whose spatial and temporal variability pattern is an important element in analysis and risk assessment for a given geographic area.
In order to conduct a reliable climatologic analysis of thunderstorm occurrence probability as well as to investigate longterm variability, it is essential to obtain data which are temporally (at least a 30-year period) and spatially continuous. The homogeneity of the data series is crucial in this process also.
As it is well known, thunderstorms are a phenomenon which is discontinuous both spatially and temporally. In spite of the development of meteorological radars and modern methods of lightning detection as well satellite products, climatological long-term data for Poland are based on visual observations made at weather stations.
The strongly local nature of thunderstorms, their spatial variability, and their subjectivity of visual observations made at weather stations make traditional observational data difficult to interpolate. Moreover, due to the low density of the weather station network, reliable interpolation and map creation are nearly impossible.
This issue was raised by the IPCC et al. (2001). Attention was called to the lack of sufficient information on the course of such small-scale phenomena needed to analyze trends in thunderstorm occurrence with respect to progressing climate change.
The report stressed also the growing need for using climate models with proper spatial resolution to assess patterns and analyze spatial patterns relying on homogenous and spatially continuous data (Goody et al. 2002;Trenberth et al. 2002).
Recent years in research on spatial and temporal patterns of thunderstorm occurrence prove that data from observational data reanalyses can be used successfully (Bengtsson et al. 2004(Bengtsson et al. , 2007Dee et al. 2011). Because of their homogenous character, reanalysis serve as a valuable background for the long-term analysis of the variability of atmospheric phenomena (Bengtsson et al. 2004(Bengtsson et al. , 2007Dee et al. 2011). For this reason, reanalysis data were also used in the assessment of the variability of thunderstorm phenomena occurrences in the most recent IPCC report (2013). However, they are not sufficient as far as spatial resolution is concerned, and consequently, numerous attempts in employing downscaling methods and various types of climatic models have been made (e.g., Cavazos and Hewitson 2005;Yoshimura and Kanamitsu 2008;Dettinger 2013).
Data derived from a model does not contain information on a specific thunderstorm occurrence; it may however be used for determining conditions that favor thunderstorm occurrences over a given area, which has been confirmed by numerous works in this field (e.g., Marsh et al. 2007Marsh et al. , 2009Trapp et al. 2007Trapp et al. , 2010Lombardo and Colle 2010;Allen et al. 2011;Gensini et al. 2014a). Of course, one needs to approach results with a certain dose of criticism towards analytical errors resulting from errors in observational data and errors that follow the implemented method of modeling (Thorne and Vose 2010).
Nevertheless, modeled data, following validation, give the possibility to create a temporally and spatially continuous dataset, so that the climatology of conditions that favor thunderstorms can be reconstructed on various scales, including a global scale (Brooks et al. 2003;Riemann-Campe et al. 2011), continental (Romero et al. 2007), or a regional or national scale (Schneider and Dean 2008;Gensini and Ashley 2011;Holley et al. 2014).
The main aim of this paper is an evaluation of the possibility of the use of data obtained from the Regional Climate Model (RegCM) in order to determine conditions that favor thunderstorm occurrences in Poland in the period 1966-2010. Additionally, possible use in climatological analysis on various temporal scales including long-term variability analysis was investigated.
2 Data and methodology

Observations vs. modeled data
In the period 1966-2010, there were 44 weather stations (synoptic type) in Poland that possessed almost continuous records of thunderstorm observations. On the scale of a relatively large country (ca. 313,000 km 2 ), the number of stations is definitely insufficient to create fully reliable climatologic maps with spatially continuous coverage. Additionally, the lack of a regular spread of the stations (especially in the south and east of the country; Fig. 1) results in the need for significant data extrapolation, which causes further errors.
Because of the abovementioned factors as well as the qualitative character of observational data (apart from duration of a phenomenon, it is difficult to determine its intensity through measurement), modeled data were analyzed instead. In addition, it was decided to conduct a climatologic analysis of conditions that favor convective phenomena occurrences, as we cannot derive data on the actual thunderstorm occurrence from a model.
To obtain spatially continuous data, data from the NCEP/ NCAR (United States National Centers for Environmental Prediction-National Center for Atmospheric Research) (Kalnay et al. 1996;Kistler et al. 2001) were used. Data with a temporal resolution of 6 h were used for the surface level (2 m a.s.l.) and for 17 isobaric levels in the atmosphere. Because of insufficient spatial resolution (2.5°grid; Fig. 1) on the regional scale, the data were used as inputs in the process of downscaling (Wilby and Wigley 1997)   The model, thanks to improved parameters of atmospheric physics and data on water resources as well as land cover, allows for the modeling of weather conditions for a domain on a regional scale (Giorgi and Anyah 2012). Thus, homogenous data for the surface layer (2 m) and 23 upper levels (with 50 mb as the uppermost) with improved temporal and spatial resolution (3 h and ca. 20 km, respectively) in a regular grid were obtained.

MUCAPE as a predictor for thunderstorm occurrence
To determine the conditions that favor thunderstorm occurrences in Poland, it was decided that the CAPE parameter (Convective Available Potential Energy, J kg −1 ) would be used, considering both forecasters' experience and numerous international studies on its use as a convective phenomenon predictor (e.g., Rasmussen and Blanchard 1998;Brooks et al. 2003;Craven and Brooks 2004;Manzato 2005;DeRubertis 2006;Brooks et al. 2007;Adams and Souza 2009;Kunz et al. 2009;Riemann-Campe et al. 2011) as well as studies on the long-term variability of convective conditions (Ye et al. 1998;Trapp et al. 2007;Brooks 2013).
CAPE provides information on atmospheric instability by determining the amount of potentially available convective energy (positive instability energy value in an analyzed vertical profile). Testing several CAPE index derivatives allowed to select MUCAPE (Most Unstable CAPE, J kg −1 ) as the final predictor. MUCAPE is the index determining the instability energy computed for an air parcel at the most unstable isobaric level (the level at which the CAPE index has the highest value in the analyzed profile). It is used in thunderstorm detection as an index that is less sensitive to fluctuations in surface moisture and the boundary layer environment than CAPE, which considers buoyancy over the depth of the atmosphere (Lombardo and Colle 2010;Allen et al. 2011).
As a threshold value determining possible thunderstorm occurrence, the value of MUCAPE >200 J kg −1 was selected. This choice was influenced by an analysis of thunderstorm occurrences in Poland as well as an analysis of studies that have addressed this issue. While it is true that even CAPE >100 J kg −1 is regarded as a threshold designed to ensure that the environment can be characterized as being convective (Brooks et al. 2003;Kunz et al. 2009;Gensini et al. 2014b), in the USA, however, thunderstorms appear when CAPE is mainly about 500 J kg −1 (Rasmussen and Blanchard 1998). In Europe, the same type of comparison shows lower CAPE values during stormy weather, for example, 260 J kg −1 in Spain and 210-400 J kg −1 in Switzerland (Siedlecki 2009).
In the study conducted for The Netherlands, Haklander and Van Delden (2003) determined a value of MUCAPE ≥168 J kg −1 as an optimal threshold for thunderstorm probability forecasting. In the few studies carried out for thunderstorm conditions in Poland, a value of CAPE >300 J kg −1 has been used most often (Malinowska 2011). However, this value eliminates most cases of thunderstorms that occur outside of the main thunderstorm season. Brooks et al. (2003) confirm that lower values of CAPE can be accompanied by severe thunderstorms. The threshold value of 200 J kg −1 selected for this study appears to be optimal for climatologic analysis using an annual approach, especially because the intensity of the thunderstorm phenomenon is not the main subject of this paper, but rather the temporal and spatial potential of conditions that favor the occurrence of convection.
Data modeled via the RegCM were first used to calculate MUCAPE values and then were validated in relation to the occurrence of favorable (extreme) parameters. All grid points for a chosen domain were analyzed for each of the eight observational terms for all days in the period 1966-2010. To compute the MUCAPE index, 23 sigma areas (areas of model calculations) were taken into account. For every given level, a CAPE index value was calculated using the equation proposed by Moncrieff and Miller (1976): where g-acceleration due to gravity zLFC-height of the level of free convection zEL-height of the equilibrium level(neutral buoyancy) T vp -virtual temperature of the specific parcel T ve -virtual temperature of the environment z-height Next, the maximum for all 23 levels for a given point in a set time period was taken as MUCAPE.

Mapping methods and gridded climatology possibilities
Basing on the set MUCAPE threshold, the number of cases that met the Bappropriate^conditions was calculated for every point of a regular grid for each month of a given year in the period 1966-2010. Then, values were computed for a mean monthly, seasonal, and annual frequency. Next, they were interpolated with the use of radial basis functions to create a continuous raster dataset. In this way, more than 550 maps of frequency of MUCAPE >200 J kg −1 occurrence over the territory of Poland were generated. By using spatial analysis tools provided by geographic information systems (GIS), it was possible to determine long-term variability of the analyzed index for any selected site (Fig. 2).
The solutions used in the study follow methodology associated with so-called gridded climatology (Tveito et al. 2000(Tveito et al. , 2005Ustrnul 2001;Ustrnul and Czekierda 2006;Hahn and Warren 2007;Perčec Tadić 2010, Cecil et al. 2014) and offer new possibilities in the case of insufficient in situ data. These methods are helpful especially in long-term variability and trend analysis.

Modeled and reanalysis data for MUCAPE climatology analysis over Poland
The evaluation of possible use of modeled data was preceded by RegCM data quality validation. The RegCM monthly number of days with MUCAPE >200 J kg −1 was calculated and correlated with the same parameter computed from upper air data (aerological data/soundings) derived from five aerological stations: three Polish (Wroclaw, Leba, and Legionowo) and two located nearby (Prague and Poprad) (Fig. 1). The Prague data series was particularly valuable because of the number of soundings conducting daily at the station (every 6 h at 00, 6, 12, and 18 UTC in contrast to two soundings obtained at 00 and 12 UTC at other stations), which seemed to be closest to the modeled data temporal resolution. All soundings were obtained from the University of Wyoming sounding archive (University of Wyoming 2014). Figure 3 shows the completeness of the data available at the Wyoming database.
Due to data gaps, three time intervals were selected for correlation analysis. The basic study period for all the stations analyzed was 1997-2010 ( Table 1a). Eight years (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) with the most complete data were considered separately ( Table 1b). Because of best data availability, results for Prague  and Poprad (1978Poprad ( -2010 were tested for long time intervals (more than 30 years taking into account all gaps, Table 1c).
The data from the Polish weather stations were correlated additionally with the monthly number of thunderstorms observed at the nearest station.
The relationship between datasets was examined with the use of Pearson's correlation coefficient (PCC).
All of the calculated PCC values indicate a strongly or very strongly positive relationship between MUCAPE calculated from RegCM data and soundings (PCC > 0.8, cf. RegCM vs. soundings in Table 1a, b and c). The lowest values were obtained for the Wroclaw station (PCC = 0.812) and for Leba (PCC = 0.836) over the period 1997-2010, which can be explained by a smaller number of soundings in comparison to other stations (Fig. 3). The analysis of the complete data set from 2003 to 2010 indicated a stronger relationship (0.888, Leba and 0.894, Wroclaw). In both cases, the strongest relationship between RegCM data and aerological data was observed for Poprad data, even though the number of soundings was, most of the time, smaller than that for Prague.  Table 1a, b). For Polish stations, in both cases, the strongest relationship was observed for Legionowo. It should be mentioned that during the period of the homogenous number of available aerological soundings (Table 1b),  obtained PCC values are close to identical in the case of the investigation of the relationship between observational data both with the RegCM data and soundings. Therefore, it can be argued that the derived RegCM data set is quite reliable and can be implemented in further analyses. Such a conclusion can be confirmed also by charts of monthly patterns with conditions that favor thunderstorm occurrences versus recorded observations of thunderstorms (Fig. 4). The RegCM data demonstrates temporal variability of convective phenomena quite well, and only in the case of Leba (Fig. 4a), convective phenomena are presented by this data as less frequent when compared to the aerological data. The cause of this difference may lay in the spatial resolution of the RegCM data (ca. 20 km, see section 2.1), which makes it impossible to present local climate characteristics. A larger number of thunderstorms were noted in Leba in comparison with the rest of the Polish coastline. For this reason, the station in Leba, as the only coastal one, was included into a different thunderstorm region by Bielec-Bąkowska (2003 and. Taking into account the long-term correlation analysis for Prague and Poprad, a strong relationship between data sets and a slightly higher correlation coefficient for Poprad were also observed (Table 1c). However, by analyzing the monthly pattern of the number of days with MUCAPE >200 J kg −1 (Fig. 5), it can be observed that aerological data for Prague is quantitatively similar to the modeled data (Fig. 5a), which almost certainly was caused by more sounding data available per day (conducted every 6 h). In Poprad (706 m a.s.l.), located at the foot of the High Tatra Mountains, the environmental conditions for convection are more favorable. Therefore, a conclusion can be drawn that aerological data are being underestimated (Fig. 5b), whereas results obtained with the RegCM data represent the actual frequency of conditions that favor thunderstorm occurrences (as these could occur during the day in-between the aerological soundings carried out).
3.2 Use of MUCAPE RegCM dataset for spatial analysis and mapping-mean monthly, annual, and seasonal thunderstorm frequency Using explanatory data analysis methods, RegCM modeled data were tested for its possible use in spatial analyses and the creation of climatologic maps as a predictor of thunderstorm occurrences in Poland. The use of the data as an explanatory variable was investigated for three temporal scales: monthly, seasonal (thunderstorm season from May to August), and annual.
Modeled data for the entire 1966-2010 period were used as an input into GIS zonal statistical methods, which made it possible to generate maps of average monthly probability of days with conditions that favor thunderstorm occurrence in every month of the year (Fig. 6). The output maps can be regarded as climate maps (45-year analysis) presenting temporal and spatial variability of convective conditions in Poland. The maps quite accurately represent actual winter season thunderstorm occurrences in Poland. During the winter season, the spatial pattern of thunderstorm occurrence is reversed-the most frequent observations are in the 0 10 20 30 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009

Soundings
RegCM data a b Fig. 5 Long-term variability of the monthly number of days with MUCAPE >200 J kg −1 derived from the RegCM data and aerological soundings in Prague (a) and Poprad (b) (period of data accessibility from the University of Wyoming (2014)) northwestern part of the country, with smaller frequency towards the southeast. The frequency is very small, usually 1-2 % and close to 0 % in December and January. During the warm half of the year, more frequent thunderstorm occurrence in the southeastern part of Poland is typical, with a maximum occurring between May and August (Bielec-Bąkowska 2003, Kolendowicz 2004. In order to conduct a validation and confirm the possibility of using RegCM data as a variable that explains thunderstorm occurrences in Poland, the relationship between the modeled data and the mean monthly frequency of thunderstorm observations at these stations was investigated. Values of the monthly mean frequency of occurrence of days with MUCAPE >200 J kg −1 were attributed to the sites of the weather stations. To present the characteristics of the relationships between the data sets, scatterplots were used (Figs. 7 and 8). For data sets that did show a positive linear dependence, the value of the coefficient of determination was additionally computed (R 2 ). RegCM data showed a relatively strong dependence on observational data for the entire year and for the thunderstorm season-viewed both as a whole (Fig. 7) and when divided into separate months from May to August (Fig. 8). On a monthly scale, the best match between modeled data and thunderstorm observational data was found for June (R 2 > 0.8) and July (R 2 close to 0.7), which happens to be when thunderstorm activity is at its maximum and is characterized by a quite regular spatial pattern. A completely random distribution and no defined relationship are shown in scatterplots for the non-thunderstorm season, especially for the winter months. Because of the low frequency of thunderstorm observations at that time, the results should not be treated as a proof of the poor value of modeled data, especially due to the fact that thunderstorm occurrence in Poland in this period can be described as a random event (Kolendowicz 2004). Considering all of the results, it may be acknowledged that the data set provided by RegCM can be a valuable explanatory variable in numerous spatial analyses.

Use of the MUCAPE RegCM dataset for long-term spatial variability
In the last step of the research process, the possibility of using obtained modeled data for long-term variability spatial analysis was investigated.
The results of validation carried out with the use of aerological data (soundings) presented in chapter 2.1 confirmed that the derived gridded data can be successfully used to present MUCAPE index long-term variability. It can be suggested that the data can equally accurately show longterm variability of any other CAPE derived index, which makes it an exceptionally valuable material when the number of upper-air stations is small, as it is in the case of Poland.
To check the spatial relationship between long-term modeled and in situ data, correlation coefficients were calculated for 44 weather stations in Poland. As it was investigated in terms of long-term annual variability, a series consisting of 540 elements, monthly values for 45 years (45 × 12), were compared. PCC values for all 44 points were statistically significant at p = 0.001 level and varied from 0.75 in the Baltic Sea region to 0.90 in the southeastern part of the country. In the second step, PCC values were interpolated to obtain a spatially continuous map (Fig. 9).
An attempt was also made to check how PCC values varied spatially in particular periods of the year. Because of the revealed lack of a linear relationship and infrequent occurrence of thunderstorms in winter seasons (Fig. 8), the analysis was conducted for the period with the most active Bthunderstorm months^(May, June, July, and August) and as a whole season (May to August) as well as for the year, taking into account all component data for the 45-year long period (Table 2). Monthly data from June were the most consistent for modeled and observational data (statistically significant results for all studied sites, PCC median = 0.54). The data for the rest of the studied months as well as for the thunderstorm season were also characterized by a statistically significant relationship for the majority of the weather stations (>80 %, PCC median about 0.40). Slightly worse (but sufficiently valuable still) results were obtained for the comparison of data series for the whole year, as PCC was statistically significant in 72.1 % of cases (PCC median = 0.38).
The research results clearly show that RegCM data on conditions favoring thunderstorms can provide a valuable input into various climatologic analyses. First, as continuous gridded data, they can become the basis for climate maps. Thanks to its geospatial format the dataset can be easily processed for very many iterations. Figure 10 shows the example of climate maps of the annual frequency of days with

Discussion
Knowledge on spatial and temporal patterns of extreme phenomena is a key factor in risk analysis on various levels. Thunderstorms are one of the most evident examples of such phenomena. Therefore, the analysis of thunderstorms' climatologic aspects at various spatial and temporal scales and the study of long-term variability are exceptionally important. Yet, multiple difficulties emerge during their analysis, as thunderstorms are an example of phenomena that occur locally (small-scale phenomena). Climate maps based only on the results of visual observation performed at weather stations usually contain errors that result from the interpolation of spatially discontinuous data. Therefore, there is a need for additional variables that explain the distribution of thunderstorm phenomena over time and in spatial terms. Today, the development of modeling techniques and easy access to data from reanalysis allows one to obtain spatially continuous information on indices that help characterize atmospheric instability for a vertical profile; such indices are usually available for data from aerological soundings done only at a few stations in the country. One of the available predictors is MUCAPE (Most Unstable CAPE) instability index. In the study of the 45-year period , the index appears to be a very good predictor for both the spatial and temporal distribution in an annual, seasonal, and, for some months, monthly occurrences of thunderstorms in Poland. Thanks to numerical modeling and GIS (geographic information systems) techniques acquired data can serve as the basis for complex analyses both on a national and regional scale in accordance with gridded climatology methods. The purpose of the conducted analysis was to test the relevance of data obtained using the Regional Climate Model in order to determine conditions that favor the occurrence of days with thunderstorms. The possible use of gridded data (derived from the RegCM model) in mean annual, seasonal, and monthly analysis was also studied. Additionally, long-term variability and trend analysis were investigated. Above all, it can be stated that modeled data can be successfully used for expanding the base of information provided by very limited aerological data (only a few stations in Poland) and it can be a valuable source of data for studies on CAPE climatology for the territory of Poland.
Furthermore, most of the cases of the relationship between modeled and observational data that were studied show satisfactory Pearson's correlation coefficient values (statistically significant at least at p = 0.05). Only analyses for months other than those of the thunderstorm season did not show the usefulness of data derived from RegCM. This may have been caused by an exceptionally low frequency as well as randomness in thunderstorm occurrences in this time period in Poland, especially when the whole data set is taken into account. It may be argued that modeled data reflect the seasonal diversity of thunderstorm occurrences particularly well-with a minimum in December and maximum in June and July.
It may be concluded that the method used in this study, which consists of the selection of favorable convective ambient conditions for describing thunderstorms' spatial and temporal occurrence patterns, is a good extension from observation data, especially in research studies on a regional scale and spatial analyses based on gridded climatology. It can be particularly helpful for analysis in areas with few observational  (RegCM dataset, 1966(RegCM dataset, -2010 points or when such points do not represent the diversity of the whole area. Spatially continuous gridded data allow also to use them as environmental variables in order to explain the spatial distribution of thunderstorms in Poland, and this can serve as a basis for interpolation in regression kriging methods (Hengl 2007), especially on the scale of a year or the months in the thunderstorm season. However, it needs to be remembered that modeled data are averaged and are not always sufficient to fully reflect local diversity of the geographic environment (e.g., most of currently used models still do not possess accurate information on relief and land cover).
The results suggest also the possibility of using modeled data to describe long-term variability and to conduct trend analysis of thunderstorm occurrence in Poland, which is crucial in recent climate change research.
In summary, the selection of the MUCAPE index as a measure of the potential number of days with thunderstorm occurrence turned out to be helpful and satisfactory.
As far as certain limitations of the analysis are concerned, the validation was conducted via the point to point approach. The use of buffer zones around weather stations may probably improve the results, as not all of the thunderstorms in the given area are observed at a weather station.
Nevertheless, the paper serves as a baseline for further research and reveals possible improvements, including additional predictors of convective phenomena and improved climate models.