1 Introduction

Major blackouts are rare, and no two blackout scenarios are the same. Severe blackouts in modern power systems, often with catastrophic consequences, were usually caused by a cascading development of emergency situation with system collapse as the final result. A cascade occurs when there is a sequential tripping of numerous transmission lines and generators in a widening geographic area [1], often with one event or state leading to another in a cause and effect manner [2].

Any occurrence of real blackout events always draws attention to the issue of power system stability and at the same time provides invaluable source information for assessment of stability technologies. This paper takes the 2011 Southwest America blackout and the 2012 India power blackouts as examples for analysis. On the afternoon of September 8, 2011, an 11-minute system disturbance occurred in the Pacific Southwest and affected parts of Arizona, Southern California, and Baja California, Mexico, leading to cascading outages and leaving approximately 2.7 million customers without power [3]. All of the San Diego area lost power, and it took 12 h to restore 100 % of its load. The India power blackouts on the 30th and 31st of July in 2012 followed the same pattern of fast cascading evolution, resulting in collapse of Northern Regional Grid with a load loss of 36 GW and collapse of Northern, Eastern and North-Eastern regional grids with a load loss of 48 GW respectively [4].

Although no single factor was responsible for these blackouts and weak grid structures as well as lack of coordination of system management and operation were key causes of failing to prevent cascading outages, this paper focuses on aspects which are closely related to stability technology for prevention and control of cascading outages. Although each event has a distinct feature, all these events share some common characteristics. A comparative study can therefore reveal control measures that are effective for all events as well as control measures that are effective only for each individual event.

The post-event analysis of all the above-mentioned blackout events showed that the system concerned was not being operated in a secure N − 1 state. However, this did not prevent the inquiry reports from giving detailed in-depth cause analysis and recommendations [3, 4]. Based on the inquiry reports, this paper concentrates on presenting both advanced but mature stability technology applications and fundamental measures necessary to be strengthened, aiming to provide reference for both industrial practice and academic research.

2 Event evolution features

A brief summary of the sequence of events for each exemplar case is given below to clarify the mechanisms of cascading evolution and analyze potentially possible ways of prevention and control of system collapse.

2.1 The 2011 Southwest America blackout [3]

A schematic structure of Southwest power grid is shown in Fig. 1, which consists of Imperial Irrigation District (IID) grid, San Diego Gas & Electric (SDG&E) grid, Arizona Public Service Co.(APS)grid, Southern California Edison (SCE) grid, Comisión Federal de Electricidad (CFE) grid and Western Area Power Administration—Lower Colorado (WALC)grid. There are three parallel transmission corridors through which power flows into the area where the blackout occurred. The first transmission corridor consists of a single 500 kV Hassayampa–North Gila line (H–NG), a major transmission corridor that transports power in an east–west direction. The second corridor is Path 44 which includes the five 230 kV lines in the northernmost part of the SDG&E system that connect SDG&E with the San Onofre Nuclear Generating Station (SONGS). The third transmission corridor, shown as the “S Corridor” on Fig. 1, consists of lower voltage (230, 161 and 92 kV) facilities and feeds power to WALC and north IID.

Fig. 1
figure 1

Schematic structure of Southwest power grid in USA

The loss of the H–NG 500 kV line (taken as 0 s) initiated the event. Power flows instantaneously redistributed throughout the system, increasing flows through IID 92 kV and 161 kV systems to the north of the southwest power link and creating sizeable voltage deviations and equipment overloads. Significant overloading occurred on three of IID’s 230/92 kV transformers located at the Coachella Valley (CV) and Ramon substations, as well as on Path 44. The flow redistributions, voltage deviations, and resulting overloads had a ripple effect, as transformers, transmission lines, and generating units tripped offline, initiating automatic load shedding throughout the region in a relatively short time span.

For instance, in 37.5 and 38.2 s, IID’s CV transformer banks No. 2 and No. 1 tripped by overload protection relays; in less than 5 min, IID’s Ramon 230/92 kV transformer tripped leading to local voltage collapse and followed by automatic under-voltage load shedding, multiple generator tripping and line tripping in IID’s northern 92 kV system; in about 8 min, WALC’s Gila 161/69 kV transformers tripped due to time-overcurrent protection; in about 9 min, the Yucca 161/69 kV transformers 1 and 2 and Pilot Knob-Yucca 161 kV transmission line tripped by overload protection followed by tripping of the YCA combined cycle plant on the Yuma 69 kV system, hastening the collapse of the Yuma load pocket; in 10 min, the El Centro-Pilot Knob 161 kV line tripped by Zone 3 protection isolating the southern IID 92 kV system onto a single transmission line from SDG&E-the S Line, forcing all of the remaining load in IID to draw through the SDG&E system and pushing the aggregate current on Path 44–8,400 amps, well above the trip point of 8,000 amps, and 3 s later, the S Line RAS at Imperial Valley Substation initiated the tripping of two combined cycle generators at Central La Rosita in Mexico driving Path 44 flow to about 9,500 amps, and another 4 s later, the S Line RAS tripped the S Line itself creating an IID island leading to load tripping mostly in its southern 92 kV system.

Just seconds before the blackout, Path 44 carried all flows into the San Diego area as well as parts of Arizona and Mexico. In less than 11 min after H–NG tripped, the excessive loading on Path 44 initiated an intertie separation scheme at SONGS, which separated SDG&E from Path 44, led to the loss of the SONGS nuclear units, and eventually resulted in the complete blackout of San Diego, CFE and Yuma grids.

2.2 The 2012 India power blackouts [4]

A sketch of India power blackout event evolution is shown in Fig. 2. The India power grid consists of five regional grids, namely Northern (NR), Western (WR), Southern (SR), Eastern (ER) and North Eastern (NER) grids. SR grid is connected to ER and WR grids through asynchronous links, and the remaining four regional grids operate in synchronism.

Fig. 2
figure 2

Sketch of India power blackout event evolution

The characteristics of pre-disturbance system conditions on 30th and 31st July 2012 were similar. The system was weakened by multiple outages of transmission lines in the WR–NR interface and effectively the 400 kV Bina-Gwalior-Agra line was the only main AC circuit available between WR–NR interface with high loading due to power overdrawn by some of the NR utilities. Triggering events of the two blackout events were the same, i.e., tripping of 400 kV Bina-Gwalior line by its zone-3 protection of distance relay due to load encroachment (taken as 0 s).

The WR–NR power transmission interface tripping immediately led to such an emergency state: there was a sudden large power imbalance between the sending WR grid and the receiving NR grid and the NR loads were met through WR–ER–NR route which was not purely AC transmission lines but a complex long-distance network consisting of generators, lines, transformers, distribution feeders, customers and so on. As a result, fast relative motion between WR generators and NR generators led to loss of synchronism and out-of-step oscillation. The system was then split by line tripping near the oscillation center due to operation of line distance relays.

In the July 30 blackout, the oscillation center was in the NR–ER interface and in 4 s the corresponding tie lines tripped, isolating the NR system from the WR–ER–NER system with a power imbalance of 5,800 MW. The NR grid system collapsed due to under frequency and further power swing within the region. The WR-ER system survived without any further splitting within the WR-ER grid due to tripping of a few generators in this region on high frequency.

In the July 31 blackout, the oscillation center was in the ER, near ER–WR interface, and in 6 s, a small part of ER (Ranchi and Rourkela), along with WR, got isolated from the rest of the NR–ER–NER grid with a power imbalance of 3,000 MW. Power deficiency in the NR–ER–NER grid led to multiple tripping of lines and generators attributed to internal power swings, under frequency and overvoltage at different places. In 1 min, the power swing in the NR–ER interface resulted in further separation of the NR from the ER–NER system. Subsequently, all the three grids (NR, ER and NER) collapsed. The WR system, however, survived due to over-frequency generator tripping.

In both blackout events, the SR system, which was getting power from ER and WR, survived after automatic under-frequency load shedding (UFLS) and HVDC power ramping.

2.3 Comparison of the blackouts from a stability perspective

The above-mentioned cases have some similarities. Systems in all events were operated in an unsecured N-1 state prior to the initial disturbance. The cascading processes were triggered by the tripping of a key transmission interface and developed as a result of power flow rerouting and inadvertent operation of protection or control devices. Therefore, preventive control measures would have effectively avoided the cascading outages. (Table 1)

Table 1 Comparative analysis of characteristics and control

The most prominent difference is the speed with which the cascade evolved. In the 2011 Southwest America blackout, it was a relatively slow process with equipment overloading as the main stability problem. There did exist opportunities for interrupting the cascading development with the assistance of proper technology. In the 2012 India power blackouts, however, triggering event and subsequent events happened so fast and the interrelated stability phenomena were so complex that the only effective measures seemed to be either preventive control in normal operating state or emergency control immediately after the first event occurrence.

3 Self-evident stability technology applications

It is quite obvious from analysis based on the event inquiry reports [3, 4] that applications of some mature technologies should be very effective for prevention and control of cascading outages, as summarized and emphasized below.

3.1 Offline dynamic simulation

A deep and thorough system study is the most fundamental measure for secure system operation, and offline dynamic simulation studies should be carried out in order to grasp the system characteristics under different grid conditions and anticipated scenarios.

While the inquiry report of the India blackouts calls for the formation of a task force to study the grid security issues, that of the Southwest America blackout lists in detail quite a few key findings and recommendations in this respect. For instance, in various time-scale system studies including long-term, seasonal, day-ahead and online analysis, many relevant power companies did not reflect external grids and lower voltage level grids in a sufficiently detailed and accurate way in their computer simulation models, resulting in a lack of understanding of dynamic mutual influence among different regions and voltage levels. This in turn led to problems such as not sharing study results and real-time information with neighboring grids, inappropriate relay settings of low-voltage facilities and so on. In addition, Western Electricity Coordinating Council (WECC) did not perform the required review and assessment of all NERC-defined Special Protection Systems (SPSs) in their areas. This serious negligence of mutual influence between stability control systems and system dynamics made it impossible to grasp system dynamics appeared during the cascading process or to determine proper control measures.

Therefore, whether it is offline study, online analysis or real-time monitoring, all facilities that impact bulk power system reliability should be covered, which should be selected not by geographic location, voltage level or functional classification but by deep understanding of system characteristics.

3.2 Online stability analysis and preventive control

Since the 2003 North America blackout [1], there have been extensive activities worldwide in research, development and application of online stability analysis techniques [58], a result of the necessity of assessing system security and making control decisions based on actual operating conditions.

Unfortunately, the India control centers did not have any analytical tool to periodically conduct an assessment of the system security condition and the state estimator results were not quite reliable [4]. Deployment of dynamic security assessment (DSA) has been listed as one of the recommendations.

In the 2011 Southwest America blackout case, real-time tools of the affected grids were inadequate and nonoperational when needed to provide early-warning information and preventive control decisions. For instance, if IID’s DSA system had identified in advance the impact of H–NG line tripping and given preventive control suggestions, the operators could have taken action before the line tripping to adjust the system condition and thus the overload tripping of CV transformers could have been avoided so that the cascading evolution could have been interrupted at its initial stage. In addition, if IID’s DSA system had alarmed the operators that on that day, total power flow of the two CV transformers was at such a high level that only preventive control measures could have avoided overload tripping of the second transformer upon the tripping of the first, the operators could have dispatched additional generation to alleviate the transformer loading to avoid the loss of both transformers. Moreover, if IID had in its DSA system modeled its neighboring network using APS’s complete topological model with real-time measurement data instead of a highly simplified equivalent model of pseudo-generators plus tie lines, the DSA could have assessed the impact of a sudden loss of H–NG on its power system before it actually happened and IID could have taken proper preventive control actions before the H–NG line tripped.

3.3 Real-time situational awareness

The importance of network visualization ability was recognized and emphasized by the inquiry committee of the 2012 India power blackouts, as strengthening the communication network, ensuring reliability of data at load dispatch centers and deploying WAMS were put forward as recommendations [4].

In the 2011 Southwest America blackout, changes in power flow and voltage magnitude as a result of H–NG line tripping were noticed by affected power companies but many control centers were unable to know in real time the causes and impacts of these changes because of not having adequate situational awareness of their neighboring systems. The result was inability to take action or even misjudgment of the event evolution trend.

In the post-event analysis of the September 8th blackout, phase measurement unit (PMU) data proved extremely valuable in constructing the sequence of events and validating simulation results. However, PMUs did not play a part in observing system behaviors in real time. For instance, tripping of the H–NG line led to the increase of phase angle difference between voltages of the two terminals. Post-event simulation showed that this angle difference exceeded APS’s synchro-check relay setting of 60 degrees and re-dispatch of a significant amount of generation was required to reduce this difference to the acceptable value. Without situational awareness of the real-time value of this phase angle difference, WECC and California Independent System Operator(CAISO)were informed by APS that the line could be quickly reconnected. If PMUs were used to monitor the voltage angle difference between two ends of the tripped line, the time required to reclose the H–NG line would not have been wrongly estimated and WECC probably would have taken action to reduce loading on Path 44 to prevent its automatic tripping.

During the fast cascade stage, inadequate situational awareness led to inability to take timely emergency control measures. Neither the operators’ monitoring nor the automatic alarming of Path 44 power flow exceeding the limit was correlated to SONGS separation scheme, which was a lack of real-time early warning against the effect of this limit violation. Otherwise, the automatic operation of the SONGS separation scheme could have been prevented by earlier emergency control actions such as manual load shedding.

3.4 Automatic emergency control

It is interesting to note that the inquiry report of the 2012 India power blackouts [4] is of the opinion that after loss of about 5,000–6,000 MW to Northern Region, had the UFLS relays operated, the grid could have been saved but the report of the 2011 Southwest America blackout [3], much more detailed and thorough, shows that the post-event simulation analysis could not explain why UFLS performance could not have prevented the SDG&E system from frequency collapse.

In fact, all the blackout events showed that equipment along the power flow rerouting path during the dynamic process could be tripped by protection relays earlier than UFLS actions. Therefore the cascading development such as successive islanding could not possibly be interrupted by solely depending on the distributed control measures such as UFLS. Moreover, the emergency control problem should be formulated as that of the blackout defense mechanism in case of sudden loss of a key power transmission interface which although was a single transmission line in all the above cases. Therefore, the primary control objective should be to prevent cross-grid rerouting of large power flow and the follow-up uncontrolled equipment tripping and system islanding, and automatic generator tripping and load shedding triggered by the event of a sudden loss of transmission corridor is the most effective emergency control measure to prevent similar cascading outages. The economic rationality of deploying this type of emergency control can be analyzed from the point of view of risk reduction.

Even at the later stage of cascade, this could be also effective for preventing collapse of the islanded system and reducing loss of load. For example, active load shedding triggered by Path 44 tripping signal would have been such a measure.

4 In-depth reflections on further R&D directions

From the point of view of system operation, those listed in the previous section are mature technologies to enhance protection against cascading outages. However, in face of the increasing scale, complexity and flexibility of modern power systems, several fundamental measures including further R&D efforts should be taken to ensure the effectiveness of technological measures.

4.1 System fault identification criterion of protection and control devices

It was common that in blackout events, many Zone 3 distance protection relays perceived by mistake the current and voltage conditions due to load encroachment or system oscillations as line fault. Also, it is still not clear whether ineffectiveness of UFLS schemes had something to do with the device operation criterion. All these deserve further studies.

In addition, in modern power systems, the AC system fault characteristics can be severely impacted by adjacent HVDC links or renewable energy source injection together with its complex control systems. Traditional fault identification criterion is based on AC measurements and should be reviewed so as to prevent protection and control devices from unintended operation or failing to operate as a result of being unable to adapt to new system characteristics.

4.2 Analysis of mutual influence between control logic and system response

Even if protection and control devices can properly identify system fault scenarios and operate correctly according to predesigned logic of device-level or local grid level, the concept of integrity of system stability should be always borne in mind and thus the device operation logic should be taken as an integral part for grasping global characteristics of the system as a whole, in order to perform coordination among different control actions and between control action and system dynamics.

It is clear from the blackout cases that offline simulation studies failed to predict such complex system behavior and so much uncontrolled equipment tripping. As more and more strong interconnection in modern power systems leads to ever-increasing cross-regional influence of protection and control operation, mutual interaction via system responses and coordination among distributed and independently operated protection relays or regional emergency control systems must be stressed. This is to prevent system real-time responses provoked by local control actions from going beyond the scope of offline studies and pre-designed control logic, thus avoiding the passive situation of uncontrolled equipment tripping.

4.3 Verification of adaptability of control effect to system operating conditions

It can also be seen from the blackout cases that there could hardly be any records of correct active control actions during the cascading evolution. Even if the fault identification criterion and control logic are appropriate judged by the locally measured system conditions, the flexible system state changes can still make the pre-specified control actions have deficient, excessive or even unintended or undesirable effects. One analogy is the totally different location of the oscillation center of the India system dynamics on 30 July from that on 31 July 2012.

Therefore, adaptability of control effects of the protection and control actions to the actual system operating condition should be verified in an online environment and once there is any problem, the system operating state should be adjusted or the automatic control decisions should be refreshed according to the online analysis results [5, 6]. In this way, control measures can be both proactive and orderly in case of system emergency.

The typical time needed to complete one round of calculation is 5–15 min in the current on-line DSA system and the contingency list normally contains only N − 1/N − 2 scenarios. Therefore,adaptability of control effect can be assessed only when the power system reaches a relatively stable condition after the already occurred disturbance and before the happening of another pre-specified disturbance. It needs further consideration how to include in the contingency list scenarios of cascading nature which best reflect the most up-to-date system state.

4.4 Real-time operational management of emergency control measures

The 2012 India blackouts revealed that practically there was no load relief from defense mechanisms like UFLS, which was a natural result of violation of the various system security related standards and power companies’ attention solely on overdrawing from the grid.

Therefore, measures to ensure stability control’s effectiveness in system emergency should not limited to safeguarding reliability of control devices on the grid side through operational maintenance, and real-time monitoring and management should be extended to control execution terminals on the generation and load side such as monitoring available quantity of generator tripping or load shedding in order to guarantee sufficient control effects once necessary. To this end an effective management mechanism is necessary and functional authority of dispatch control centers for adjusting inappropriate control measures should be ensured.

The centralized management function of stability control system can be further elevated to more advanced function such as real-time correlated monitoring. For instance, correlated monitoring of power quantities (such as Path 44 loading) and operation logic of control systems (such as SONGS separation scheme) can be implemented to give timely warning about effects of control system operation as a result of grid operation state changes.

4.5 Improvement of simulation accuracy

Simulation is the fundamental tool in various stages of grid planning, system operation analysis, protection and control decision-making, online DSA, post-event analysis and so on. Each occurrence of large disturbance in actual power systems provides an opportunity for checking accuracy of simulation via actual system responses.

Analysis in the inquiry report of the 2012 India blackouts is rather qualitative because of insufficient field data records and lack of detailed post-event simulation study results.

The inquiry report of the 2011 Southwest America blackout gives examples of insufficiency of system simulation models in reproducing the actual event. For instance, it was unable to predict either the tripping of the SONGS generation units or the collapse of SDG&E and CFE systems by using WECC dynamic simulation models for near-term and long-term planning purposes. In the post-event simulations the SDG&E system should have been prevented from frequency collapse by the UFLS operational performance, which was quite the opposite of reality. Only by further addition of recorded details of the actual event to the system model, including UFLS activation programs and automatic switching logics of capacitors, did the simulation results of the islanded region become better aligned to the actual event following operation of the intertie separation scheme at SONGS. In addition, the impedance value of IID’s CV transformers was 0.1 per unit in WECC’s planning model but 0.05 per unit in its DSA model. As a result of this difference, there was an error of approximately 16 % of the CV transformer loading in the online DSA calculation result compared to the result obtained using the planning model. This demonstrates the importance of data and modeling accuracy as CV transformers were such key facilities during the event evolution process.

There were large deviations of grid voltage and frequency from nominal values in all the blackout cases, particularly at the final stage of cascading development. Whether simulation algorithms can appropriately deal with abnormal excursions of voltage and frequency should also be reviewed.

5 Conclusions

The 2011 Southwest America blackout and the 2012 India blackouts are used as examples to form the discussion in this paper. Through analysis of system behavior and event evolution causes of these two events, the study shows that opportunities of interrupting the cascading failure can be found in various stages, whether they belong to preventive or emergency control technologies.

In order to use stability technologies to prevent and control cascading outages, it is important to construct a power system security defense infrastructure based on advanced conceptual design. However, to ensure the effectiveness of technological measures during cascading evolution process, several fundamental measures need to be strengthened first. These include a complete, thorough and timely understanding of system dynamic characteristics, system fault identification criterion of protection and control devices, analysis of mutual influence between control logic and system response, verification of adaptability of control effect to system operating conditions and improvement of simulation analysis.

Centralized generator tripping and load shedding automatically triggered by the event of sudden loss of a transmission interface is the most effective emergency control measure for prevention of cross-grid rerouting of large power flows, uncontrolled equipment tripping and successive grid islanding. To ensure the dependability and reliability of control effects, operational management of stability control measures should be strengthened and authorization for control centers to adjust these measures should be guaranteed.