1 Introduction

Smart grid (SG) introduces a highly environmentally- friendly context for the digital power customers. This level of reliability is achieved utilizing modern equipment.

There is a need for theoretical models based on common weather conditions that can be used for the prediction of daily or by-shift power distribution system interruptions as well as for interruption risk assessment based on immediate weather conditions. The reliability of power distribution systems is dependent on many variables such as load capacity and customer base, maintenance, and age and type of equipment. Prevailing weather conditions often are most responsible for degraded reliability; yet, this variable is often overlooked in reliability analysis.

Much of the focus of modeling the effects of weather on power distribution systems has remained on extreme weather conditions [13]. There is also a body of work that includes weather as a factor in the analysis of specific fault causes [4, 5]. However, models that use the combined effects of common weather conditions to predict the total number of daily or by-shift interruptions are presently unavailable.

Florida Power and Light (FPL), the largest utility in Florida, has been providing reliability data on interruptions to our research. In addition, weather data from the National Climatic Data Center (NCDC) is available to academic institutions or governmental bodies free of charge. This data is reported by 886 automated surface observation stations (ASOSs) located at airports around the country. By a simple modeling of the daily common weather data received from NCDC, the total daily number of interruptions can be consistently stochastically predicted with an R2 value as high as 50% in simulations using actual interruption data as the target value [6]. This indicates that weather has invisibly affected other cause codes.

Aside from the obvious culprit for interruptions, i.e. lightning, and ground or line-to-line faults caused by vegetation and/or wind, the effects of common weather conditions on power reliability events have rarely been addressed and, even then, only broadly; that is, as a variable with few states such as fair, cold, windy, or raining. Tests performed on contaminated insulators have shown that the electrical characteristics of the insulators are altered when exposed to natural wetting, such as humidity or rain [7]. Coronal effects are more pronounced with lower barometric pressure and can affect flashover rates [8]. Other weather or environmental phenomena may also contribute to power reliability events in ways that have not been considered.

Since the interruption data used to generate the models in [6] included all interruptions described by all cause codes for an entire day and the weather data used was daily maximums or averages collected from point sources not usually central to the area being studied, this modeling was conducted with relatively inaccurate data. The fact that the results were consistently good indicates that there is a hidden weather component other than weather in many of the cause codes and that those hidden components can be modeled more precisely by decreasing the period during which the weather data is collected from daily to hourly and by improving the location of the point weather source.

Based on existing models for Florida, we expand a novel comprehensive framework to include weather variables not generally occurring in Florida. Translating NCDC data and FPL interruption data into applicable shapes, we will use statistical methods and neural network theory to simulate and modify the models as needed.

This paper is organized as follows. Section 2 provides background and need to develop this model along with the data requirements. Section 3 presents the simulation for the interruption prediction model using multivariable regression model and then on trained neural network for best possible approximations. In Section 4, the application of the model is evaluated and the output results of the proposed interruption prediction scheme for several different geographical regions are compared with the actual interruptions. Section 5 concludes the paper and outlines future work.

2 Data analysis and processing

Incorporating relevant, existing models, this study will develop novel theoretical models of the effects of common weather conditions and apply them to the problem of predicting the daily or by-shift number of interruptions in power distribution systems, and to the development of real-time interruption risk assessment capabilities.

FPL has been providing reliability data of more than 4 million customers to the principal investigator of this effort and his team of researchers for the last three years. The reliability group at FPL is well-versed in the area of extreme weather research, particularly given the number of hurricanes they have endured. However, they have indicated to the researchers their belief that research into the effects of common weather conditions on interruptions is a unique and necessary field of study and have promised to continue to be an industry partner providing reliability data for this research. In addition, exhaustive literature searches have identified no studies that have used actual daily or hourly (short-term) common weather data to develop theoretical models that can be validated through simulation and experimentation.

Weather data from NCDC can be downloaded online and includes both daily summaries and hourly, and even half-hourly, reporting. Additionally there are automated weather observation stations (AWOSs) and smaller weather observation sites that contribute data to the NCDC. FPL is installing its own weather stations at service centers centrally located in FPL’s various management areas, providing an additional source of weather data. As can be seen in Fig. 1, direct weather, such as lightning, has been determined to be the cause of 10% of the interruptions. However the researchers have found that, when daily weather variables are considered in the modeling function, the resulting forecasts of the daily number of interruptions can be stochastically predicted with an R 2 value in the neighborhood of 50% in simulations using actual interruption data as the target value [9]. This suggests a weather component in most of the interruption cause codes, as illustrated in Fig. 1.

Fig. 1
figure 1

Reported interruption causes

The first goal can be accomplished by the use of hourly (or even half-hourly) weather data, and the second can be achieved by reorganizing the interruption data reported by substations into datasets that are geographically centered on ASOSs. The addition of the privately owned FPL weather stations also will help centralize the interruption data around the point weather data source.

The limitation of developing a model of the effects of common weather on reliability in Florida alone, as the researchers have been doing, is that Florida has a unique environment and climate; hence, the resulting models may not be broadly applicable. To counter this limitation, interruption data from other regions will be used to broaden the range of weather conditions being modeled.

The daily summary data that the researchers have been using to date often creates files with up to 40 columns and 14000 rows for computer analysis. The inclusion of hourly reporting and the use of interruption data from additional sources will increase tremendously the amount of data that requires archiving and correlation. Therefore, creation of a database that can manage that amount of information will be the first priority. Additionally, the weather data that is downloaded from NCDC is in an ASCII format that is not readily importable to the analysis software that will be used. Presently the researchers are employing custom written software to extract the daily weather information from the NCDC files and format it properly. To advance the project, additional software will be configured to handle the hourly NCDC data, the weather data provided by FPL’s weather stations, and any other required data that is not properly formatted.

The development of theoretical models will begin with models already developed for Florida and be expanded to include weather variables not generally occurring in Florida. Researchers will use statistical and neural network software to simulate the models, modifying them as needed. The inclusion of data from other regions will broaden the range of weather conditions for which the models can be validated.

In an effort to avoid duplication of previous efforts, the team intends to incorporate those existing models that are relevant to the project. For example, load prediction that involves the use of temperature and humidity to calculate the comfort zone, or heating and cooling degrees, is a mature technology that can be of use to this project, and these studies will not be repeated.

However, load flow prediction does not address power reliability directly, and the studies of flashovers due to ice buildup are geographically and causally specific. Validation of these models will be to produce significantly accurate predictions of the number or frequency of interruptions through simulations using actual weather and interruption data. The predictions will be probabilistic rather than deterministic and will provide a means of risk assessment rather than a fixed value for the number of interruptions that can be expected. This effort will provide a real capability to determine risk. R 2 of the predictions will be a statistic of interest for daily and by-shift predictions. Narrower periods would include hourly risk probability assessments.

This study will create a better understanding of the relationship between common weather conditions and the number of interruptions which, in turn, will facilitate a completely new spectrum of research on the reliability of power distribution systems. The predictor model provides the power industry with an opportunity to reduce the downtime of power interruptions by proper distribution of the service work force. These models also can be used for research into the relative reliability of a system under different weather conditions or at different times. Actual weather and interruption data can be used to train the predictor models, and then theoretical weather data can be entered into the trained models. The predictions can be used to rate the robustness of a system to common weather conditions. In addition, this process can be repeated at intervals before and after maintenance or reliability enhancement programs are implemented. This will enable researchers to determine whether these programs are producing the desired results.

The general data processing and interruption prediction process is shown in Fig. 2. The data received from various sources are processed and tagged according to geographical locations. Then the files are combined for same locations for weather and fault conditions. The LS data is collected separately and combined as per longitude and latitudes. Interruption information gathered/collected is identified by their GPS location and merged as per the flow chart below. For whole process a code has been written to make the process of filtering, sorting, sifting, combining and tagging automated.

Fig. 2
figure 2

Flowchart of the proposed weather-based interruption prediction method

In [10], efficient operation of a power system is defined based on prediction of power demand- how much demand, where and when it is required to maintain efficiency of power grid. Ambient temperature affects the variation in power demand. Peak and average temperature varies largely between summer and winter days. The effect of ambient temperature movement away from the optimum temperature was modeled. Heating degrees (HD) and cooling degrees (CD) are re-calculated, rather than using a fixed optimum temperature, using local conditions to organize data for each location or management area (MA). Processing on raw weather data is done following established power reliability and industrial practices. These factors and their calculations have been included in the second level of Fig. 2 (‘organize data as per MA’) by organizing data per local areas listed as MAs.

3 Simulation for interruption prediction

The proposed interruption prediction method which is used in this paper is based on two factors: 1) historical weather condition; 2) historical interruption data.

3.1 Interruption prediction based on weather parameters

This section illustrates the analysis of distribution network response based on variable weather conditions. Four foremost weather characteristics are considered in order to achieve an acceptable forecasting about the distribution network response in the form of the number of interruptions. The average temperature T, two-minute maximum sustained wind speed S g, daily total rainfall R, and daily number of lightning strikes n LS are the four influential weather factors which will be considered in this study.

In the following four sections, the equations for each of these factors are calculated solely and the effect of other weather factors is neglected temporarily. After evaluating the effect of each factor, the equations will be integrated so that the main equation for modeling the relation between weather and reliability will be extracted.

  1. 1)

    Temperature

In order to achieve the relation between the temperature and the number of interruptions, a curve-fitting has been done, which is shown in Fig. 3. The regression equation shown in this figure is

Fig. 3
figure 3

Variation of mean N versus average temperature

$$N_{\text{avg}}^{\text{tem}} = 78.79 - 2.076T + 0.01523T^{2}$$
(1)

where \(N_{\text{avg}}^{\text{tem}}\) represents the average number of interruptions as a function of temperature.

If the first derivative of (1) is equal to 0, the optimum temperature T opt is calculated. In this temperature, the minimum number of interruptions will occur.

The relationship between temperature and the number of interruptions was carried out based on two effective parameters, heating degree T h and cooling degree T c. Based on the ASOSs data, T opt is equal to 65°. However, it is important to recalculate the optimum temperature for each region locally.

$$N_{\text{avg}}^{\text{tem}} = N_{ 0}^{\text{tem}} + \alpha_{1} T_{\text{h}} + \alpha_{2} T_{\text{h}}^{2} + \alpha_{3} T_{\text{c}} + \alpha_{4} T_{\text{c}}^{2}$$
(2)

where \(\alpha_{i}\) (\(i = 1,\;2,\;3,\;4\)) is the coefficient of polynomial and should be calculated based on the curve-fitting utilizing historical temperature data.

  1. 2)

    Wind

Four variables combine to calculate the severity of wind: a formula based on sustained wind speed, wind and gust speed, and the storm length. It also depends on some other factors such as climate [11]. Based on the cubic relation of the number of interruptions in power distribution network and wind speed, the following relationship can be concluded as

$$N_{\text{avg}}^{\text{wind}} = N_{0}^{\text{wind}} + \beta_{1} S_{\text{g}} + \beta_{2} S_{\text{g}}^{2} + \beta_{3} S_{\text{g}}^{3}$$
(3)

where \(N_{0}^{\text{wind}}\) is the constant value of the interruption calculation considering wind; \(\beta_{i}\) (\(i = 1,\;2,\;3\)) are the coefficients of the polynomial.

The regression analysis of the number of interruptions N versus S g shows that the correlation obtained through this process is considerable, R 2 = 99.3% and this validates the existence of a cubic relation between period N and S g.

  1. 3)

    Rain

The correlation of humidity to the number of interruptions is not as strong as the wind and temperature versus the number of interruptions. Although this correlation is not as strong as that of the wind, it cannot be neglected. For this purpose, (4) is developed for modeling the effect of rainfall on the number of interruptions.

$$N_{\text{avg}}^{\text{rain}} = N_{0}^{\text{rain}} + \gamma_{1} R_{1} + \gamma_{2} R_{2} + \gamma_{3} R_{3}$$
(4)

where \(R_{i}\) (\(i = 1,\;2,\;3\)) are the rainfall as exhibited in (5); \(N_{0}^{\text{rain}}\) is the constant value of the interruption calculation considering rainfall; \(\gamma_{i}\) (\(i = 1,\;2,\;3\)) are the coefficients of the polynomial.

$$\left\{ \begin{aligned} R_{1} &= \left\{ {\begin{array}{*{20}l} {{\text{Rain}},} & {0'' \le t_{\text{Rain}} \le 1''} \\ {0,} & {\text{otherwise}} \\ \end{array} } \right. \hfill \\ R_{2} &= \left\{ {\begin{array}{*{20}l} {{\text{Rain}},} & {1'' \le t_{\text{Rain}} \le 2''} \\ {0,} & {\text{otherwise}} \\ \end{array} } \right. \hfill \\ R_{3}& = \left\{ {\begin{array}{*{20}l} {{\text{Rain}},} & {2'' \le t_{\text{Rain}}} \\ {0,} & {\text{otherwise}} \\ \end{array} } \right. \hfill \\ \end{aligned} \right.$$
(5)
  1. 4)

    Lightning

Highly-scaled electrical discharges between the cloud and a piece of earth as a result of dielectric strength of the air, called lightning. This natural phenomenon may have detrimental effects on the distribution network. In Florida, lightning tends to occur in storm cells that may be localized and only pass over a sparsely populated area. It is shown in (6) how to calculate the number of interruptions considering the lightning strikes.

$$N_{\text{avg}}^{\text{light}} = N_{0}^{\text{light}} + \lambda n_{\text{LS}}$$
(6)

where n LS is the number of lightning strikes; \(N_{0}^{\text{light}}\) is the constant value of the interruption calculation considering lightning strikes; λ is the coefficients of the polynomial.

The reliability and lightning strike data has been provided by one of the largest utilities in Florida. Additionally, historical weather data from NCDC used for calculation the coefficients in (24) and (6). The mentioned weather data is based on 886 ASOSs located in different geographical regions [12], thousands of lines of data. This data can be joined with weather data and used as the input of neural network analysis, and the wide span of this study made it distinguished [13].

A comprehensive model for forecasting the number of interruptions based on the weather condition is proposed.

$$N_{\text{avg}}^{\text{weather}} = N_{\text{avg}}^{\text{tem}} + N_{\text{avg}}^{\text{wind}} + N_{\text{avg}}^{\text{rain}} + N_{\text{avg}}^{\text{light}}$$
(7)

where \(N_{\text{avg}}^{\text{weather}}\) is the total number of interruptions caused by weather conditions; \(N_{\text{avg}}^{\text{tem}}\), \(N_{\text{avg}}^{\text{wind}}\), \(N_{\text{avg}}^{\text{rain}}\) and \(N_{\text{avg}}^{\text{light}}\) are the number of interruptions caused by temperature variation, wind, rain and lightning respectively. By simplifying (7) and combining all of the constant terms, the final model will be derived as (8). \(N_{0}^{\text{weather}}\) is the constant term in the total number of weather-based interruptions calculation and the other terms in this equation are already defined.

$$\begin{aligned} N_{{{\text{avg}}}}^{{{\text{weather}}}} = {\mkern 1mu} & N_{0}^{{{\text{weather}}}} + \alpha _{1} T_{{\text{h}}} + \alpha _{2} T_{{\text{h}}}^{2} + \alpha _{3} T_{{\text{c}}} \\ & + \alpha _{4} T_{{\text{c}}}^{2} + \beta _{1} S_{{\text{g}}} + \beta _{2} S_{{\text{g}}}^{2} + \beta _{3} S_{{\text{g}}}^{3} \\ & + \gamma _{1} R_{1} + \gamma _{2} R_{2} + \gamma _{3} R_{3} + \lambda _{{}} n_{{{\text{LS}}}} \\ \end{aligned}$$
(8)

3.2 Regression analysis

Regression analysis methods have been applied to the study of the power system [14] in widespread areas, such as the derivation of an exact transient stability boundary of the power system [15]; contingency severity assessment for power system voltage security studies [16]; applications of transient stability forecasting [17]. In [10] spatial electric load forecasting approaches are discussed comprehensively.

Regression analysis has been done for five management areas (MAs) in order to analyze the effect of each weather factor on the total number of interruptions. The value in the regression equation illustrates the percentage of total variations for each prediction variable.

The database, which is used for this study, includes three year data. The regression analysis has been done two times; first, it has been done on the weather and the number of interruptions with raw data and R 2 values calculated for different MAs. The values are in \([36.9\% ,\;43.3\% ]\). Secondly, the regression analysis of weather and the number of interruptions based on (8) has been done, results are in \([45.2\% ,\;50.1\% ]\) for different MAs, as the first analysis shows the maximum value is for the fifth MA and the minimum value happens at the third MA. Fig. 3 shows the R 2 value in five regions. As Fig. 4 represents, utilizing (7) (which studies the effect of different weather parameters separately) leads to different regression values in comparison with (8) (which considers all of weather parameters together to achieve the interruption prediction). Consequently, both methods, separately or integrated weather-based interruption prediction are in an acceptable range of R 2.

Fig. 4
figure 4

R 2 calculated by two different methods by MA

There are many ways to enhance R 2 value. For instance, considering barometric pressure as another weather variable and utilizing historical interruption data leads to an average value of 50% for R 2.

The number of interruption prediction has been done by the neural network method using historical weather data from ten MAs [18, 19]. Neural network has been already used for biomedical prediction purposes [20]. The results illustrate that by increasing the geographic area and predicting the number of interruptions for the total region, the accuracy of the prediction decreases. Weather data considered as input and output represents the interruption prediction based on weather data.

4 Evaluation of proposed interruption prediction method

In this section, the proposed method is evaluated by simulating a neural network model using combined data from more than 10 management areas (MAs). Although the study took time and recording data at many points became cumbersome; the results were very interesting. The results showed that the predictor was able to predict number of interruption for this large area analysis with good level of accuracy. It was seen that as we increase the geographic area and try to predict the number of interruptions for the whole region, the accuracy of the system decreases.

Fig. 5 demonstrates the predicted interruptions and actual number of interruption for 14 different management areas.

Fig. 5
figure 5

Interruption prediction vs actual number of interruptions for multiple Mas

As it is represented in Fig. 5, in most regions the interruption prediction obtained an acceptable result in comparison with the actual occurred interruption.

The accuracy of the system improved by increasing the number of rows of data and by reducing the duration of the data collected. Fig. 4 shows a snapshot of one week of 2008 of actual and the predicted values from six MAs. For the region of study the MA’s are grouped together to form a ‘service area’ (SA). One of the expected uses of the predictor in future (ongoing effort) is to install these models at SA for long and short term planning of manpower, equipment and spare parts.

5 Conclusions

Several models exist for extreme weather condition failure rates, and there are models for the baseline failure rates due to aging and other causes of equipment failure. Interruptions as a function of common weather conditions comprise a gap between those models, and this research will bridge that gap.

This paper introduced a comprehensive framework for interruption prediction considering the weather conditions. In the proposed method, the historical weather data from NCDC and the historical interruption data have been used in order to forecast the interruptions of the power system.

Appropriate implementation of this method leads to save time by predicting the number of interruptions. Whenever the number of interruptions is forecasted based on historical weather data, power system equipment failure rates, and aging of distribution network components, the utilities can prevent a major percent of these events by establishing preventive maintenance programs. Hence, the number of spontaneity interruption will be reduced considerably because they are not unexpected and the system operator is equipped to face such problems and solve them immediately and without any delay. This awareness of the power distribution network situation helps to achieve an acceptable level of reliability and the improvement of reliability is one of the main objectives of moving to smart grid.

The proposed method is implementable on the future power system. In the proposed approach, the variable weather conditions are also considered. The capability of considering the weather conditions in the reliability calculations in terms of interruption prediction is one of the significant breakthroughs of this paper.