1 Introduction

1.1 Motivation

Estimating the state of power grid, i.e., recovering bus voltages and phase angles, was initially formalized in the 1970s. State estimation involves designing algorithms that leverage the data collected by various measurement units across the grid as well as other information about power grid (e.g., topology and dynamics) in order to form an estimation for the state of power grid [1]. These state estimations serve multiple purposes including informing control actions, predicting loads, updating pricing policies and identifying abnormalities in power grid. In support of these tasks, various types of measurements are collected and transmitted to a control center via remote terminal units. Therefore, intelligent electronic devices and state estimation algorithms are key to build a real-time network model within the energy management system (EMS) [2, 3]. Traditional state estimation approaches, which are conducted centrally in power grid control center, perform three main routines [4].

  1. 1)

    Observability analysis: its role is to determine whether a unique state estimation can be characterized for the state of the system. Observability analysis is generally performed prior to state estimation.

  2. 2)

    State estimation: it is responsible for characterizing an optimal estimation for the complex voltages at different buses by leveraging the real-time measurements.

  3. 3)

    Bad data detection: the estimations formed are used in order to determine whether the measurements bear any errors, identify them when they are deemed to exist, and eliminate them in order to enhance state estimation fidelity.

   There exists a rich literature of various approaches to bad data detection under different assumptions on the data model or network topology. The existing design principles for bad data detection often use gross measurement errors, that is the difference between the measurements and the estimations of measurements, which is found by using the state estimation. When such gross measurement errors are small enough, the estimation is deemed reliable, and when the errors are large enough, the measurements are considered to contain bad data [4]. Such bad detector approaches are effective against the bad data that has a random cause (e.g., failure in power grid). Nevertheless, when the disruptions are structured (not random), there exists a high likelihood that the bad data can bypass the bad data detectors. For instance, when the disruptions affect the measurements in a way that they conform to the physical laws and the topology of power grid, they can appear as legitimate measurements [5]. Such a possibility raises concerns about security vulnerabilities that state estimation faces, which can be capitalized by the adversaries to launch attacks. Such attack, for instance, can contaminate the measurements without being detected, while misleading the state estimators, rendering wrong estimations for the system. The possibility of such attacks is especially strenuous as more advanced measurement units are incorporated into EMS.

The effectiveness of cyber attacks for contaminating the measurements and misleading the state estimators, while remaining hidden from bad data detector strongly hinges on the extent of information that the attackers possess on power grid. The two extreme cases in which the attackers either have full and perfect information about power grid, or have no information extensively studied in the literature. When the attacker has no information, all it can do is to produce random bad data. Such bad data can be efficiently detected by using the traditional bad data detectors [4], even though the existing approaches, as we will discuss later are not optimal. On the other hand, in which the attackers have full and perfect information about all the dynamics of power grid, the attacks can be designed intelligently so that they appear as legitimate data and can bypass the traditional bad data detection algorithms [5]. While such attacks can cause severe damages, assuming that they are not realistic. Specifically, the strong assumption that all the instantaneous dynamics of power grid fully known to the attackers is hard to meet in practice.

In this paper, we propose a framework for recovering the state of the system while facing the potential risk that the measurements are contaminated by random bad data or structured bad data. Furthermore, we assume that when the data is contaminated by structured bad data (i.e., attacks), the attackers are assumed to have only partial information about power grid topology and its time-varying dynamics. The objective of this framework is two-fold. The primary objective is forming a reliable estimation for the state. The second objective of forming reliable state estimation pertains to detect whether there exists any source of random or structured bad data in the measurements.

1.2 Existing studies

The focus of this paper is on false data injection attacks (FDIAs). The main objective of the FDIAs is to disrupt power grid functions while avoiding the possibility of being detected by bad data detectors [6]. Even though the FDIAs mainly aim to distort state estimation, their disruptions exceed and can affect a wide range of control and dispatch decisions. More specifically, a compromised estimation of the system state can lead to non-optimal dispatch. There exist studies that investigate the minimum number of measurements that should be tampered with to make an effective attack. The dynamics between the number of measurement units protected (or compromised) and the effectiveness of the attacks are studied  in [7].

An analytical approach to evaluate the impact of FDIAs that can evade bad data detectors and affect electricity market is formalized in [8]. The study in [9] presents another FDIA design strategy, which maximizes the generated market revenue with a single measurement attack. Based on the multi-step electricity price (MEP) model introduced in [10], the impact of FDIAs in the real-time market against MEP is investigated in [11]. In order to incorporate the inter-temporal constraints, [12] proposes an attack strategy to withhold generation capacity for profit by manipulating the ramp constraints of the generators during look-ahead dispatch. In [13], an FDIA strategy based on the geometric characterization of the real-time marginal prices on the state space of power grid is proposed.

Game-theoretic approaches to model the interactions between the attack and defense strategies are investigated in [14, 15]. Specifically, the study in [14] focuses on mechanisms based on attacks that disrupt state estimation, and consequently, manipulate the ensuing decisions that rely on the state estimation. The study in [15] examines the compromising of the communication channels that carry the measurement information to manipulate market decisions. The idea of directly jamming the pricing signals is studied in [16], where the attackers can make a profit without intruding the power system and changing the reported data. The study in [17] analyzes attack strategies by using a nonlinear model for power systems and state estimators. The impacts of adversaries with limited information about the network on the market operations are studied in [18,19,20].

2 Preliminaries

2.1 Bad data and attack models

Consider a general non-linear system model, the measurement vector \({{\varvec{y}}}\in {\mathbb {R}}^n\) is related to the state of the system \({\varvec{x}}\in {\mathbb {R}}^m\) according to:

$$\begin{aligned} {{\varvec{y}}}= {\varvec{h}}({\varvec{x}}) + {{\varvec{z}}}\ \end{aligned}$$
(1)

where \({\varvec{h}}\) captures the dynamics and topology of power grid, and \({\varvec{z}}\) accounts for the measurement noise. This model represents the instances at which the only source of the contamination in the measurements is noise. Furthermore, when there exists random failure in the network (e.g., malfunctioning measurement units) or an attacker or a group of attackers compromising the measurements, the non-linear system model changes according to:

$$\begin{aligned} {\varvec{y}}= {\varvec{h}}({\varvec{x}}) + {\varvec{z}}+ {\varvec{b}}\ \end{aligned}$$
(2)

where \({\varvec{b}}\) accounts for the effects of injected random or structured bad data. Based on the currently widely-used approaches, for a given state estimation, denoted by \({\hat{{\varvec{x}}}}\), the set of measurement is considered to contain bad data based on a gross measurement test. Specifically, it is decided that the bad data exists if the gross measurement error exceeds a pre-specified threshold \(\tau\), i.e.:

$$\begin{aligned} {\text{ declare bad data if }}\quad \Vert {\varvec{y}}-{\varvec{h}}({\hat{{\varvec{x}}}})\Vert _2\ge \tau \ \end{aligned}$$
(3)

   The key weakness of such a bad data detector is that it does not detect bad data vectors \({\varvec{b}}\) that are designed properly so that the distorted measurement \(\begin{aligned} {\varvec{h}}({\varvec{x}})+{{\varvec{z}}}+{\varvec{b}}\ \end{aligned}\) appears as a legitimate measurement vector. For instance, in a linearized system model, when \({\varvec{b}}\) is aligned in the range space of the Jacobian matrix \({\varvec{H}}\) (found by linearizing \({\varvec{h}}\)). It can bypass the residue-based detectors, as discussed in [5]. Furthermore, even when bad data is detected, the only existing remedy is to collect fresh measurements in the hope of having better data, and subsequently, producing a reliable estimation.

The primary cause of such weaknesses for the bad data detector in (3) is that: ① the state estimation and the bad data detection decisions are treated as independent routines as it ignores the inherent coupling between the two decisions; and ② it tends not to fully capitalize on the rich redundancy in the measurements because the dimension of the observation space n is significantly larger than that of the state space m. When these two routines (e.g., state estimator and bad data detector) are designed by properly leveraging the fundamental underlying connection, and the redundancy in measurements is capitalized effectively, it is possible to mitigate the effects of bad data to a large extent. Specifically, while the objective is estimating the state, a decision should also be made, in parallel, about the underlying observation model. These combined decisions can be cast as a composite hypothesis test problem, in which hypothesis \({{H}_0}\) represents the model in which the only data contamination is noise, and hypotheses \({{H}_1}\) and \({{H}_2}\) represent the cases in which the data is contaminated by structured and random bad data, respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} {{{H}}_0}: &{} {{\varvec{y}}}\;=\;{\varvec{h}}({\varvec{x}})\;+\;{{\varvec{z}}}\\ {{{H}}_1}: &{} {{\varvec{y}}}\;=\;{\varvec{h}}({\varvec{x}})\;+\;{{\varvec{z}}}\;+\;{\varvec{b}}\quad {\text{ structured bad data }}\\ {{{H}}_2}: &{} {{\varvec{y}}}\;=\;{\varvec{h}}({\varvec{x}})\;+\;{{\varvec{z}}}\;+\;{\varvec{b}}\quad {\text{ random bad data }} \end{array}\right. } \end{aligned}$$
(4)

We remark that cases of random and structured bad data are treated under different models to emphasize that the nature and models of the data under these two scenarios are distinct. Specifically, random bad data accounts for the naturally-occurring failures in power grid such as line outages when power grid is stressed. The disruptions in the measurements when such failures occur often follow a random behavior. In contrast, under cyber attacks, the disruptions are designed carefully in order to impose a certain interruption on the functions in power grid. For instance, an attacker exploits some information about the network in order to launch an attack that effectively distorts the state estimation \({\hat{{\varvec{x}}}}\), while not being detected by the bad data detector.

2.2 Information model of attacker

In this paper, the focus is on the data injection attack model presented in (2). In such models, the attacker tampers with the measurement units (e.g., phasor measurement units) such that they report false data to the network operator. Such attacks can lead to a series of disruptions in the monitoring (e.g., state estimation) and the ensuing actions (e.g., generation and dispatching).

As commented earlier, the effectiveness and the design of the effective cyber attacks strongly hinge on the amount of information that the attacker has about the network topology and dynamics. All such information is embedded in \({\varvec{h}}\). For instance, in a linearized system with \({\varvec{H}}\), this information is embedded in the entries of \({\varvec{H}}\). In order to distinguish the full information about the network and what is known to the attacker, we define \({\bar{{\varvec{h}}}}\) as the partial information about \({\varvec{h}}\) known to the attacker. For instance, in a linearized setting, instead of the full information about \({\varvec{H}}\), the attacker knows only a noisy version of this matrix, which we denote by \(\bar{{\varvec{H}}}\). Clearly, the case of \({\bar{{\varvec{h}}}}={\varvec{h}}\) represents the scenario in which the attacker has full information about the network. In this paper, we consider a general setting and do not impose any constraint on the relevance of \({\bar{{\varvec{h}}}}\) and \({\varvec{h}}\). Such an assumption facilitates a wide range of attack information models. All the analyses provided are general and can be applied to all choices of \({\bar{{\varvec{h}}}}\). Such choices span the scenario of fully informed attackers (\({\bar{{\varvec{h}}}}={\varvec{h}}\)) to the more practical assumption. And it has only partial information about \({\varvec{h}}\) available to the attacker.

For a given \({\bar{{\varvec{h}}}}\), the attack strategy can be modeled as a function that maps \({\bar{{\varvec{h}}}}\) to \({\varvec{b}}\), i.e.,

$$\begin{aligned} \phi :{\mathbb {R}}^{n\times m}\rightarrow {\mathbb {R}}^{n\times 1} \end{aligned}$$
(5)

We remark the optimal design of \({\varvec{b}}\) in a linearized system when the information about the Jacobian matrix associated with \({\varvec{h}}\) is fully known to the attackers [21].

3 State recovery under bad data

3.1 Data models

We define the sets \(\varOmega _{{\varvec{x}}}\subseteq {\mathbb {R}}^m\) and \(\varOmega _{{\varvec{b}}}\subseteq {\mathbb {R}}^{m}\) as the spaces of valid values for \({\varvec{x}}\) and \({\varvec{b}}\), respectively. Furthermore, we assume that \({\varvec{x}}\) and \({\varvec{b}}\) are distributed in their designated spaces \(\varOmega _{{\varvec{x}}}\) and \(\varOmega _{{\varvec{b}}}\) according to known statistical models. It is noteworthy that the distribution of \({\varvec{x}}\) in space \(\varOmega _{{\varvec{x}}}\) can be found by leveraging the historical data about the state parameters. When such patterns are not available or they are not reliable enough for forming a statistical model, we assume that \({\varvec{x}}\) is distributed in \(\varOmega _{{\varvec{x}}}\) according to a uniformed distribution. Similarly, by leveraging the information and the historical data on the failure patterns of the measurement units, the distribution of \({\varvec{b}}\) in space \(\varOmega _{{\varvec{b}}}\) can be characterized. Finally, in case of structured bad data (attacks), due to the unknown nature of the attacks or attack strategies, we assume that \({\varvec{b}}\) takes values in its designated space \(\varOmega _{{\varvec{b}}}\) according to a uniformed distribution.

We denote the the probability density function (PDF) of \({\varvec{x}}\) under hypothesis \({H}_i\) by \(\pi _i({\varvec{x}})\), for \(i\in \{0,1,2\}\). Finally, by accounting for the randomness of the noise measurements \({{\varvec{z}}}\), the measurements under hypothesis \({H}_i\) are distributed according to:

$$\begin{aligned} {H}_i: \;\; {{\varvec{y}}}\sim {f}_i({{\varvec{y}}}\;|\;{\varvec{x}}) \quad \text{ and }\quad {\varvec{x}}\sim {\pi}_i ({\varvec{x}}) \end{aligned}$$
(6)

where \({f}_i\) is the PDF of \({{\varvec{y}}}\), which is governed by the distribution of noise. Based on this formulation, the state estimation problem reduces to concurrently detect the true hypothesis and estimate the unknown vector \({\varvec{x}}\).

There exists a few sub-optimal approaches in solving such combined problems. All these approaches decouple the joint problem into two disjoint estimation and detection routine. One major class is that the problem is reduced to an estimation-driven detection. An estimation is formed under each hypothesis, reducing it into a pure detection problem, and then an optimal detection routine is carried out. The most prevalent approach in this direction is the generalized likelihood ratio test (GLRT). The second major class involves parallel detection and estimation, in which multiple estimations are formed under various hypotheses. Also, a detection decision is formed in parallel. If the detection rules are in favor of hypothesis \({H}_i\), then the estimation formed under hypothesis \({H}_i\) will be admitted as the estimation of interest. Despite their popularity, all such approaches are sub-optimal.

In this paper, we take a radically different approach to treat the combined problem. Aiming to form reliable estimations, we provide a natural formulation in which the objective is optimizing a relevant cost function, while in parallel, controlling the detection power. This approach results in novel optimal designs for estimators that are designed based on a decoupling approach.

3.2 Bad data detection

To formalize the detection routine and characterize optimal detection rules, we start by defining a randomized test with decision rules denoted by \(\{\delta_0({{\varvec{y}}}),\delta _1({{\varvec{y}}}),\delta _2({{\varvec{y}}})\}\). In this test, given data \({{\varvec{y}}}\), the rule \(\delta _i({{\varvec{y}}})\) denotes the likelihood of deciding \({H}_i\) for \(i\in \{0,1,2\}\). These probability terms satisfy:

$$\begin{aligned} {\left\{ \begin{array}{l} \delta _i({{\varvec{y}}})\ge 0\\ \sum\limits _{i=0}^2\delta _i({{\varvec{y}}})=1 \end{array}\right. } \end{aligned}$$
(7)

   Accordingly, we define the decision vector as \({{{\varvec{\delta}} }}({{\varvec{y}}})=[\delta _0({{\varvec{y}}}),\delta _1({{\varvec{y}}}),\delta _2({{\varvec{y}}})]\). Furthermore, we denote the true hypothesis and the decision of the detector by \({T}\in \{{H}_0,{H}_1,{H}_2\}\) and \({D}\in \{{H}_0,{H}_1.{H}_2\}\), respectively.

Based on these definitions, the probability of deciding in favor of hypothesis \({H}_i\) while the true hypothesis is \({H}_j\), for \(i\ne j\) is given by:

$$\begin{aligned} {P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))= & {} {\mathbb {P}}({D}={H}_i\;|\;{T}={H}_j)\nonumber \\= & {} \int _{{{\varvec{y}}}}\int _{{\varvec{x}}}\;\delta _i({{\varvec{y}}})\; f_j({{\varvec{y}}}\;|\;{\varvec{x}})\;\pi _j({\varvec{x}})\;{\mathrm{d}}{\varvec{x}}\; {\mathrm{d}}{{\varvec{y}}}\nonumber \\= & {} \int _{{{\varvec{y}}}}\;\delta _i({{\varvec{y}}})\; f_j({{\varvec{y}}})\;{\mathrm{d}}{{\varvec{y}}}\end{aligned}$$
(8)

   We have six such error probability terms. Next, by defining the estimation costs, we show how these detection error terms can be integrated with the estimation cost to form a combined approach for designing the estimators and detectors.

3.3 State estimation

Based on the observed data \({{\varvec{y}}}\), besides discerning the underlying true model \({H}_i\), we also form an estimation for \({\varvec{x}}\). We denote the estimation of \({\varvec{x}}\) based on the collected data \({{\varvec{y}}}\) by \({\hat{{\varvec{x}}_i}}({{\varvec{y}}})\). To quantify the fidelity of the estimation under hypothesis \({H}_i\), we define the cost function \({C}_i({\varvec{x}},{\hat{{\varvec{x}}_i}}({{\varvec{y}}}))\), which captures the difference between the estimation and the ground truth. A popular cost function pertains to the minimum mean-square error (MMSE) criterion, which is given by:

$${C}_i({\varvec{x}},{\varvec{u}})=\Vert {\varvec{x}}-{\varvec{u}}\Vert ^2$$
(9)

For a given generic cost function \({C}_i({\varvec{x}},{\varvec{u}})\), we will also evaluate the average posterior cost function. Such an average cost function quantifies the estimation error cost after observing \({{\varvec{y}}}\), and it is given by:

$$\begin{aligned} {C}_{i,p}({\varvec{u}}\;|\;{{\varvec{y}}})&\,{\mathop {=}\limits ^{\scriptscriptstyle \triangle }}\,{\mathbb {E}}_{i,{\varvec{x}}}[{C}_i({\varvec{x}},{\varvec{u}})\;|\;{{\varvec{y}}}] \end{aligned}$$
(10)

where the expectation is computed with respect to \({\varvec{x}}_i\) under hypothesis \({\varvec{H}}_i\). Therefore, the minimum average posterior cost function is given by:

$$\begin{aligned} {C}_{i,p}^*({{\varvec{y}}})\,{\mathop {=}\limits ^{\scriptscriptstyle \triangle }}\,\inf _{{\varvec{u}}}{C}_{i,p}({\varvec{u}}\;|\;{{\varvec{y}}}) \end{aligned}$$
(11)

   These cost functions have pivotal roles in designing the estimator and detector as they capture the quality of estimation. Finally, the optimizer of the average posterior cost is [22] denoted by:

$$\begin{aligned} {\hat{{\varvec{x}}}}_{i}^{*}({{\varvec{y}}})\,{\mathop {=}\limits ^{\scriptscriptstyle \triangle }}\,\arg \inf _{{\varvec{u}}}{C}_{i,p}({\varvec{u}}\;|\;{{\varvec{y}}}) \end{aligned}$$
(12)

3.4 Combined state recovery and bad data detection

In this subsection, we propose an approach that incorporates both estimation and detection decision rules in a unified framework. Given randomized detection rules \({\varvec{\delta }}({{\varvec{y}}})\) and state estimators \({\varvec{u}}_i({{\varvec{y}}})\), under hypothesis \({H}_i\), we define the conditional average estimation costs as:

$$\begin{aligned}{J}_i (\delta _i({{\varvec{y}}}),{\varvec{u}}_i({{\varvec{y}}}))&\,{\mathop {=}\limits ^{\scriptscriptstyle \triangle }}\,{\mathbb {E}}_{i,{\varvec{x}}}[{C}({\varvec{x}},{\varvec{u}}_i({{\varvec{y}}}))\;|\;{D}={H}_i] \end{aligned}$$
(13)

The expectation is taken with respect to \({\varvec{x}}\) and \({{\varvec{y}}}\) under \({H}_i\). Given the individual cost functions under different hypotheses, we aggregate the three cost functions into a unified one. Specifically, for a given measurement vector \({{\varvec{y}}}\), randomized detection rules \({\varvec{\delta }}({{\varvec{y}}})\), and estimators \({\varvec{u}}({{\varvec{y}}})=[{U}_0({{\varvec{y}}}),{U}_1({{\varvec{y}}}),{U}_2({{\varvec{y}}})]\), we define:

$$\begin{aligned} {J}({\varvec{\delta }}({{\varvec{y}}}),{\varvec{u}}({{\varvec{y}}}))=\max _{i\in \{0,1,2\}}\;{J}_i(\delta _i({{\varvec{y}}}),{\varvec{u}}_i({{\varvec{y}}})) \end{aligned}$$
(14)

   This aggregate cost function captures only the performance of the estimators. To integrate the quality of the detectors, which are captured by the probability terms \({P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\) defined in (8), we formulate the combined problem as the one that minimizes the estimation performance subject to controlled quality for the error probability terms \({P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\) according to:

$$\begin{aligned} \left\{ \begin{array}{ll} {{{\mathcal {P}}}}(\varvec{\alpha })\,{\mathop {=}\limits ^{\scriptscriptstyle \triangle }} \inf \limits _{{\varvec{\delta }}({{\varvec{y}}}),{\varvec{u}}({{\varvec{y}}})} {J}({\varvec{\delta }}({{\varvec{y}}}),{\varvec{u}}({{\varvec{y}}})) \\ {\mathrm{s.t.}}\quad {{P}}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij} \;\;\;\;i\ne j \end{array} \right. \end{aligned}$$
(15)

Parameters \(\varvec{\alpha }=[\alpha _{ij}]\), where \(\alpha _{ij}\in (0,1)\) ensure that the probability of declaring \({H}_i\) while the underlying true hypothesis is \({H}_j\) are controlled in a desired level. In the next section, we discuss how the problem \({{\mathcal {P}}}(\varvec{\alpha })\) can be solved in a closed form.

4 Optimal state estimator and decision rule

4.1 Feasibility of \({{\mathcal {P}}}(\varvec{\alpha })\)

Note that solving (15) does not always have a feasible solution for any arbitrary choice of \(\{\alpha _{ij}\}\). Specifically, from the Neyman-Pearson (NP) theory, we know that when facing a multi-hypothesis testing problems, the probability of decision errors cannot be made arbitrarily small at the same time. The set of simultaneously feasible choices of \(\{\alpha _{ij}\}\) can be found by solving the following problems, in which five of the error probabilities are controlled to remain below a specified threshold, and the sixth term is minimized. Without the loss of generality, we aim to minimize \({P}_{01}({\varvec{\delta }}({{\varvec{y}}}))\), while controlling the rest of error terms, i.e.:

$$\begin{aligned} \left\{ \begin{array}{ll} {\beta} {\triangleq} \min \limits _{{\varvec{\delta }}({{\varvec{y}}})} {P}_{01}({\varvec{\delta }}({{\varvec{y}}})) {}\\ {\mathrm{s. t.}} \quad {P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij}\quad (i,j)\ne (0,1) \end{array}\right. \end{aligned}$$
(16)

   This problem can be solved readily by leveraging the same line of argument as in NP test [22]. Note that solving (16) is merely for the purpose of characterizing the solution \({\beta}\) and not the decision rule. Once this problem is solved, if \({\beta}\) satisfies \({\beta} {\le} {\alpha}_{01}\), then the combined estimation and detection problem in (15) is feasible, and vice versa.

4.2 Optimal state estimator

Close scrutiny of (15) indicates that the estimators appear only in the objective function of the optimization problems and the constraints depend only on the detectors. This observation suggests that the problem in (15) can be decomposed into two problems. Firstly, the estimators are characterized for any given set of detectors. Specifically, for any given choices of the detectors \({\varvec{\delta }}({{\varvec{y}}})\), the optimal estimators can be found as the solution to:

$$\begin{aligned} \inf _{{\varvec{u}}({{\varvec{y}}})}&{J}({\varvec{u}}({{\varvec{y}}})) \end{aligned}$$
(17)

This observation is summarized in the following theorem.

Theorem 1

(state estimator) The solution to the optimization problem

$$\begin{aligned} {\bar{{\varvec{x}}}}^*({{\varvec{y}}})\;=\;\arg \inf _{{\varvec{u}}{\varvec{x}}({{\varvec{y}}})} {J}({\varvec{\delta }}({{\varvec{y}}}),{\varvec{u}}({{\varvec{y}}})) \end{aligned}$$
(18)

is

$$\begin{aligned} {\bar{{\varvec{x}}}}^*({{\varvec{y}}})=[{\hat{{{\varvec{x}}}}}^*_0({{\varvec{y}}})\;,\;{\hat{{{\varvec{x}}}}}^*_1({{\varvec{y}}})\;,\; {\hat{{{\varvec{x}}}}}^*_2({{\varvec{y}}})] \end{aligned}$$
(19)

The proof can be found in Appendix A.

Irrespectively of the structure of the detection rules, the result of Theorem 1 is that the Bayesian estimators are optimal. This implies that the combined estimation and detection problems can be reduced to a bad data detection problem, which we will investigate in Subsection 4.3, followed by state estimators with the structures in (12).

4.3 Optimal bad data detectors

With the estimators designed in the previous subsection, these estimators can be substituted into the problem (15), rendering a purely detection problem. This detection problem can be found as the solution to:

$$\begin{aligned} \left\{ \begin{array}{ll} {{\mathcal {P}}}(\varvec{\alpha })= \inf \limits _{{\varvec{\delta }}({{\varvec{y}}})} J({\varvec{\delta }}({{\varvec{y}}}),{\bar{{\varvec{x}}}}^*({{\varvec{y}}})) \\{} {\mathrm{s.t.}}\quad {P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij} \;\;\;i\ne j \end{array} \right. \end{aligned}$$
(20)

   For this purpose, we define:

$$\begin{aligned} {\tilde{J}}({\varvec{\delta }}({{\varvec{y}}})) = \inf _{{\bar{{\varvec{x}}}}({{\varvec{y}}})} {J}({\varvec{\delta }}({{\varvec{y}}}),{\bar{{\varvec{x}}}}({{\varvec{y}}})) = {J}({\varvec{\delta }}({{\varvec{y}}}),{\bar{{\varvec{x}}}}^*({{\varvec{y}}})) \end{aligned}$$
(21)

which transforms (20) as:

$$\begin{aligned} \left\{ \begin{array}{ll} {{{\mathcal {P}}}}(\varvec{\alpha })= \inf \limits _{{\varvec{\delta }}({{\varvec{y}}})} {\tilde{J}}({\varvec{\delta }}({{\varvec{y}}}))\\ {\mathrm{s.t.}} \quad {{P}}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij} \;\;\;i\ne j \end{array} \right. \end{aligned}$$
(22)

   By solving \({{\mathcal {P}}}(\varvec{\alpha })\) in (22), we find the closed-form characterization for the detection rules, which essentially determine whether the system is suffering from bad data, and if so, whether it is structured bad data (attack) or random bad data. For this purpose, by recalling the definitions of \({J}_i\), J, and \({\tilde{J}}\) in (13), (14), and (21), respectively, we obtain:

$$\begin{aligned} {\tilde{J}}({\varvec{\delta }}({{\varvec{y}}}))\, {=}\,&{J}({\varvec{\delta }}({{\varvec{y}}}),{\bar{{\varvec{x}}}}^*({{\varvec{y}}}))\nonumber \\ {=}&\max _{i\in \{0,1,2\}}{J}_i({\varvec{\delta }}({{\varvec{y}}}),{\hat{{\varvec{x}}}}^*_i({{\varvec{y}}}))\nonumber \\ {=}&\max _{i\in \{0,1,2\}} {\mathbb {E}}_{i,{\varvec{x}}}[\mathsf{\it C}({\varvec{x}},{\hat{{\varvec{x}}}}^*_{i}({{\varvec{y}}}))\;|\;\mathsf{\it D}=\mathsf{\it H}_i]\nonumber \\ {=}&\max _{i\in \{0,1,2\}} {\mathbb {E}}_{i,{\varvec{x}}}[\mathsf{\it C}_{i,{{\mathrm{p}}}}^*({{\varvec{y}}})\;|\;\mathsf{\it D}=\mathsf{H}_i]\nonumber \\ =&\max _{i\in \{0,1,2\}}\frac{\int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;f_i({{\varvec{y}}})\;\mathsf{\it C}_{i,{p}}^*({{\varvec{y}}})\;{\text{d}}{{\varvec{y}}}}{\int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;f_i({{\varvec{y}}})\;{\text{d}}{{\varvec{y}}}} \end{aligned}$$
(23)

   We remark that each of the three terms involved in (23) is quasi-linear in \({\delta} _i({{\varvec{y}}})\), which are quasi-convex [23]. Furthermore, weighted maximum preserves quasi-convexity. Hence, the term \({J}({\varvec{\delta }}({{\varvec{y}}}))\) is quasi-convex and can be solved by finding the solutions to an equivalent family of feasibility problems [23,24,25]. More specifically, for solving \({{\mathcal {P}}}(\varvec{\alpha })\), we first characterize a relevant feasibility problem. For characterizing and solving such a feasibility problem, based on (23), for any given \(t\in {\mathbb {R}}_+\) that satisfies \({\tilde{J}}({\varvec{\delta }}({{\varvec{y}}}))\le t\) for \(i\in \{0,1,2\}\), we have:

$$\begin{aligned} \int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;{f}_i({{\varvec{y}}})\;\big [{C}_{i,p}^*({{\varvec{y}}})-{t}\big ]\;{\mathrm{d}}{{\varvec{y}}}\;\le \; 0 \end{aligned}$$
(24)

   As a result, for any given set of values \(\varvec{\alpha }\), which controls bad data detection power and the real number t, we generate the following feasibility problem:

$$\begin{aligned} \left\{ \begin{array}{ll} \text{ set\,all }\;\;{\varvec\delta }({{\varvec{y}}})\;\;\text{ that satisfy } \\ {{{\mathcal {Q}}}}({\varvec\alpha,t })\triangleq \int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;{f}_i({{\varvec{y}}})\;\big [{C}_{i,p}^*({{\varvec{y}}})-{t}\big ]\;{\mathrm {d}}{{\varvec{y}}}\le 0\\ {P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij} \;\;\;i\ne j \end{array} \right. \end{aligned}$$
(25)

   The relationship specified in (24) indicates that the two problems \({{\mathcal {Q}}}(\varvec{\alpha },t)\) and \({{\mathcal {P}}}(\varvec{\alpha })\) are related according to:

$$\begin{aligned} {\left\{ \begin{array}{ll} \begin{array}{ll} \text{ if }\;\;{\mathcal{Q}}({\varvec\alpha},{t})\ne {\phi} &{} {\text{{ then}} }\;\;{{{\mathcal {P}}}}(\varvec{\alpha })\le {t} \\ \text{ if }\;\;{\mathcal{Q}}({\varvec\alpha},{t} )={\phi} &{} {\text{{then}} }\;\;{{{\mathcal {P}}}}(\varvec{\alpha })> {t} \end{array} \end{array}\right. } \end{aligned}$$
(26)

   Based on this property, it can be readily verified that the optimal value of \({{{\mathcal {P}}}}(\varvec{\alpha })\) can be found through a bi-section search with the steps detailed in Algorithm 1.

Algorithm 1: Detection algorithm

  1:   Initialize \(t_{\min }=0\) and \(t_{\max }=\mathbb {E}[{{C}_i({\varvec{x}},{\mathbf{0}})\;|\;{{\varvec{y}}}]}\)

  2:   Evaluate the average posterior costs in (10)

  3:   repeat

  4:      \(t_0\leftarrow (t_{\min }+t_{\max })/2\)

  5:         Solve \(\tilde{\mathcal{Q}}(\varvec{\alpha },t_0)\)

  6      if\(\tilde{\mathcal{Q}}(\varvec{\alpha },t_0)> 0\)

  7:         \(t_{\min }\leftarrow t_0\)

  8:      else

  9:         \(t_{\max }\leftarrow t_0\)

  10:      end if

  11:      until\(t_{\max }-t_{\min }\le \epsilon \) for \(\epsilon \) sufficiently small

  12:      \(\mathcal{P}(\varvec{\alpha })\leftarrow t_{\max }\)

  13:      Output \({\varvec{a}}\) and \({\varvec{c}}\) to characterize the rules in (36)

Based on the connections between the two problems \({{{\mathcal {Q}}}}(\varvec\alpha,{t })\) and \({{{\mathcal {P}}}}(\varvec{\alpha })\), we have observed that for solving \({{{\mathcal {P}}}}(\varvec{\alpha })\), we can instead solve \({{\mathcal {Q}}}(\varvec\alpha,{t })\) combined with a bi-section search. In the next step, we specify how to optimally solve \({{\mathcal {Q}}}(\varvec\alpha,{t })\). In order to proceed, we introduce the slack variable \(\gamma\) and define the following auxiliary problem, which can be readily verified to be convex.

$$\begin{aligned} \left\{ \begin{array}{ll} \inf \limits _{{\varvec{\delta }}({{\varvec{y}}})} \gamma \\ {\text{s.t. }} \tilde{{{\mathcal {Q}}}}({\varvec\alpha},{t })= \int _{{{\varvec{y}}}}{\delta _i}({{\varvec{y}}})\;{f_i}({{\varvec{y}}})\;\big [{{C}}_{i,p}^*({{\varvec{y}}})-{t}\big ]\;{\mathrm{d}}{{\varvec{y}}}\le \gamma \\ {{P}}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij}+\gamma \;\;\;\;\;i\ne j \end{array} \right. \end{aligned}$$
(27)

   Based on the definitions of the problems \({{{\mathcal {Q}}}}({\varvec{\alpha}},{t })\) and \(\tilde{{{\mathcal {Q}}}}({\varvec{\alpha}},{t })\), we have the following two statements, which are equivalent:

$$\begin{aligned} {{{\mathcal {Q}}}}({\varvec{\delta}},{t })\;=\;\emptyset \quad \Longleftrightarrow \quad\tilde{{{\mathcal {Q}}}}({\varvec{\delta}},{t })\;>\; {\varvec{0}} \end{aligned}$$
(28)

   This equivalent implies that for establishing the feasibility of \({{{\mathcal {Q}}}}({\varvec{\delta}},{t })\), we need to equivalently compare the value of \({\tilde{{{\mathcal {Q}}}}}({\varvec{\delta}},{t })\) with a fixed threshold. As the final step, we characterize the solution of \({\tilde{{{\mathcal {Q}}}}}({\varvec{\delta}},{t })\), which in turn provides a closed-form characterization of the decision rules \({\varvec{\delta }}({{\varvec{y}}})\).

For solving \(\tilde{{{\mathcal {Q}}}}({\varvec{\delta}},{t })\), which is a convex problem, we firstly form the Lagrangian by assigning the non-negative Lagrange multipliers \(a_i\), \(i\in \{0,1,2\}\) to the constraints:

$$\begin{aligned} \int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;{f}_i({{\varvec{y}}})\;\big [{C}_{i,p}^*({{\varvec{y}}})-{t}\big ]\;{\mathrm{d}}{{\varvec{y}}}\le \gamma \end{aligned}$$
(29)

and assigning the non-negative Lagrangian multipliers \(c_{ij}\), for \(i\ne j\) and \(i,j\in \{0,1,2\}\), to constraints

$$\begin{aligned} {P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\le \alpha _{ij}+\gamma \end{aligned}$$
(30)

   By defining \({\varvec{a}}=[a_i]\) and \({\varvec{c}}=[b_{ij}]\), which satisfy:

$$\begin{aligned} \sum _ia_i+\sum _{ij}b_{ij}=1 \end{aligned}$$
(31)

the Lagrangian is given by:

$$\begin{aligned} {\varvec{\mathcal{L}}}({\varvec{\delta }},\gamma ,{\varvec{a}},{\varvec{c}})\triangleq & {} \left( 1-\sum _ia_i-\sum _{ij}b_{ij}\right) \gamma \nonumber \\+ & {} \sum _ia_i\int _{{{\varvec{y}}}}\delta _i\;{f}_i({{\varvec{y}}})\; \big [{C}_{i,p}^*({{\varvec{y}}})-t\big ]\;{\mathrm{d}}{{\varvec{y}}}\nonumber \\+ & {} \sum _{ij}c_{ij}\Big ({P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))-\alpha _{ij}\Big ) \end{aligned}$$
(32)

   As a result, the dual of the Lagrangian function is given by:

$$\begin{aligned} {g}({\varvec{a}},{\varvec{c}})&\,\triangleq \min _{{\varvec{\delta }},\gamma }\mathcal{L}({\varvec{\delta }}, \gamma ,{\varvec{a}},{\varvec{c}}) \nonumber \\&= \min _{{\varvec{\delta }},\gamma }\Big \{\sum _ia_i\int _{{{\varvec{y}}}}\delta _i({{\varvec{y}}})\;{f}_i({{\varvec{y}}})\;\big [{{C}}_{i,p}^*({{\varvec{y}}})-{t}\big ]\;{\mathrm{d}}{{\varvec{y}}}\nonumber \\&\qquad +\,\sum _{ij}c_{ij}\Big ({P}_{ij}({\varvec{\delta }}({{\varvec{y}}}))\Big )\Big \}\nonumber \\&\qquad-\, \sum _{ij}c_{ij}\alpha _{ij} \end{aligned}$$
(33)

   By leveraging the expression of \({P}_{ij}({\varvec{\delta }})\) in (8), the Lagrangian dual can be equivalently state as:

$$\begin{aligned} g({\varvec{a}},{\varvec{c}})= \min _{{\varvec{\delta }},\gamma } \sum _i\int \delta _i({{\varvec{y}}}) A_i({{\varvec{y}}}){\mathrm{d}}{{\varvec{y}}}-\sum _{ij}c_{ij}\alpha _{ij} \end{aligned}$$
(34)

in which we have defined:

$$\begin{aligned}{A}_i({{\varvec{y}}}) \triangleq a_i {f}_i({{\varvec{y}}})\;\big [{C}_{i,p}^*({{\varvec{y}}})-{t}\big ]+\sum _{j\ne i}{c_{ij}}{f_{j}}({{\varvec{y}}}) \end{aligned}$$
(35)

   Based on these observations and properties, the optimal detection rules are formalized in the next theorem.

Theorem 2

The problem \({{{\mathcal {P}}}}(\varvec{\alpha })\) has a globally optimal solution and the decision rule \({\varvec{\delta }}({{\varvec{y}}})\) that optimizes \({{{\mathcal {P}}}}(\varvec{\alpha })\) (and \( g({\varvec{a}},{\varvec{c}})\)) is given by:

$$\begin{aligned} \left\{ \begin{array}{cc} \delta _0({{\varvec{y}}})=1 &{} \quad {\mathrm{if}}\;\; A_0({{\varvec{y}}})\ge \max \{ A_1({{\varvec{y}}}), A_2({{\varvec{y}}})\}\\ \delta _1({{\varvec{y}}})=1 &{} \quad {\mathrm{if}}\;\; A_1({{\varvec{y}}})\ge \max \{ A_0({{\varvec{y}}}), A_2({{\varvec{y}}})\}\\ \delta _2({{\varvec{y}}})=1 &{} \quad {\mathrm{if}}\;\; A_2({{\varvec{y}}})\ge \max \{ A_0({{\varvec{y}}}), A_1({{\varvec{y}}})\} \end{array}\right. \end{aligned}$$
(36)

   As a result, based on Theorem 2, we start by computing the Lagrange multipliers \({\varvec{a}}\) and \({\varvec{c}}\), in order to compute the constants \( A_i({{\varvec{y}}})\). These constants determine the system operates under which model.

5 Case study

In this section, we evaluate the performance of the optimal framework on the IEEE 14-bus system, in which the measurement units undergo potential false data injection attacks. We evaluate both a DC linearized system and the AC non-linear system models. In this model, any combination of the 14 measurement units on the buses can be compromised.

The benchmark method to compare against the approach developed in this paper is the detection-driven approach. In this approach, the effect of the state parameters are ignored, and a purely detection problem is considered to determine whether the measurements are entirely legitimate, or they bear random or structured bad data. This is carried out by performing a simple hypothesis testing over the three possible hypotheses \(\{{H}_0,{H}_1,{H}_2\}\) defined in (6). Once a decision is formed, based on that an estimator is designed to form reliable state estimations.

We compare the average estimation cost for a detection-driven approach, where the correct decision about \(\{{H}_0,{H}_1,{H}_2\}\) is followed by Bayesian estimation. The degradation in the estimation cost normalized by the estimation cost under an attack-free setting is depicted in Fig. 1, which shows how the estimation quality suffers from the existence of the random and structured bad data. The plots in this figure illustrate the variations of this estimation quality versus \(\alpha \triangleq \alpha _{ij}\), which control the detection error rates, as specified in (15).

Figure 1 consists of three curves, one representing the estimation cost averaged over all the costs under different hypothesis q, the best estimate among different estimations under different models \(q_{\max }\), and the worst estimate among different estimations under different models \(q_{\min }\). Besides, we also depict the performance of the detection-driven approach, which appears as one isolated point. The detection-driven approach is forced to take a specific detection quality. It does not enjoy the flexibility of the optimal approach that can place any desired emphasis on the estimation detection and bad data/attack detection problems. Furthermore, the detection-driven approach produces considerable weaker estimations.

Fig. 1
figure 1

Normalized estimation performance versus \(\alpha\)

6 Conclusion

In this paper, we have investigated the non-linear state estimation in power system when the system is vulnerable to structured or random bad data. Forming estimations in such scenarios is inherently coupled with detecting the true model of the system. We have shown that all the existing approaches are sub-optimal, which essentially decouple the involved estimation and detection routines. Based on that premise, we have provided a general framework that treats the state estimation and bad data detection problem in a unified way. We have characterized the optimal state estimators and bad data detectors in closed forms.