1 Introduction

The operation and structure of the power systems are changing due to the energy transition, i.e., penetrations of distributed generation and displacement of fossil fuel generators. Moreover, the structure of the power system has evolved continuously to cope with climate change and weather variations. As a result, the power system has become more decentralized for the secure and reliable electricity supply in remote and regional networks. Decentralized energy systems often consist of renewable energy generators such as photovoltaic (PV), and wind generation (WG), together with storage systems such as battery energy storage, flywheel, and others [1].

1.1 Literature gap

Figure 1 illustrates the differences in power flow as well as the information flow between the traditional power grid and the smart grid with distributed energy resources (DERs). It can be observed that the customers and distributed generators are now actively participating in power system operation, making operation, control, and planning of the system more challenging. The DERs are commonly integrated into the distribution networks. These sources could alter the flexibility through coordinated control. However, at low voltage (LV) distribution networks, such controls are usually not available due to the "fit-and-forget" design approach, and low return on investment based on the current market mechanism in many power systems [2, 3]. Since the operation of PVs or WGs heavily depends on weather conditions, the massive integration of those generators may lead to frequency variation and voltage fluctuation in the power systems. Moreover, these issues also affect the reliable and secure operation of distribution networks. Therefore, accurate forecasting, optimal planning, and state estimation for the distribution network are necessary.

Furthermore, the capital investment required to upgrade the existing infrastructure in the power network is relatively high. Therefore, precise and robust models are required to plan future development. Power flow simulation with DERs and emerging loads often requires fast and advanced computational facilities to process big data. Furthermore, the advances in processing power allow power system operators to use more sensing devices for monitoring and control. Data from these sensors can be used for optimal operation and planning [4]. However, some key challenges are still associated with the modelling and operation of distribution networks with DERs. Firstly, due to the weather dependency on renewable energy generators such as PV systems or wind generators, accurate short- to long-term forecasting is required. Moreover, the prediction of electric vehicle (EV) charging and user profiles are also necessary. Secondly, extreme events such as natural disasters (e.g. storms, bushfires, hurricanes, and others) and man made events (e.g. cyber or physical attacks) can affect the components of power systems, causing severe power outages. Therefore, a suitable model or operation planning framework by considering such factors is required. However, the probability of the occurrence of such events is low. Thus, the modelling of the system may become complex.

Fig. 1
figure 1

Comparison between traditional grid and modern smart grid

1.2 Motivation

Several surveys of this domain have already been published. Arturo et al. [5] reviewed several state-of-the-art multi-objective planning methods for DERs. Erdinc et al. [6] analyzed the nature-inspired algorithms used to size and design a hybrid renewable energy system. A comprehensive summary of objectives, functions, constraints and optimization tools used in a smart microgrid is also presented in [7, 8]. Other works regarding optimal power flow and impacts of PV generation can be found in [9,10,11].

Despite being a well-established field of study, there are some open questions that need further investigation. These can be listed as follows:

  • Forecasting models for DERs and load flexing: The demand in the LV network and the power generated by DERs are time-varying. Therefore, accurate forecasting is essential in developing precise planning models and managing reliable systems.

  • Solution approaches for Optimal Power Flow (OPF) for distribution systems, considering the integration of DERs: Optimal Power Flow (OPF) is a well-known and powerful technique in power system operation and planning. It is also a complex mathematical problem to solve due to its non-convexity. Furthermore, the lack of sensing in the distribution networks makes optimal operation planning challenging.

  • Machine learning or deep learning techniques to support modelling, forecasting and state estimation: Machine learning or deep learning models can be used to estimate load demand or power generation, in real time. However, the requirements of optimal training data with high accuracy are the key issues.

1.3 Contributions

This review work aims to answer these questions with a particular focus on distribution systems. Apart from summarizing and comparing the methods in the literature, the potential, as well as challenges in each aspect, and insights for future research are given. This paper contains the taxonomy of tools and techniques used for distribution system planning and operation studies. This work would guide the early career researchers, planning and design engineers towards a more profound understanding of distribution system planning and operation with DERs. Figure 2 shows the organization of this review work. Table 1 compares this work with some recent, notable review papers.

Fig. 2
figure 2

Organization of review process

1.4 Organization of the paper

The rest of the paper is organized as follows: Sect. 2 provides the methodology for review, along with the scopes and limitations of this work; Sect. 3 provides a taxonomy on forecasting algorithms applied to distribution networks with DERs, specifically PV systems; Sect. 4 reviews the mathematical models of the OPF and its variants, solution approaches used in recent works, and machine learning techniques used for solving OPF; Sect. 5 focuses on the recent development of state estimation methods in distribution networks; Future research directions are outlined in Sect. 6, and conclusions are given in Sect. 7.

2 Review methodology

This section explains the methodology used in this work to get the published works on distribution system planning and operation with DERs. Glasziou’s approach is used in this work to conduct the review. The following four steps were followed:

  • Identify the research keywords.

  • Disclosure of relevance of the studies.

  • Assess the quality of the research i.e., the reputation of the authors, citations received.

For this purpose, the research databases, including IEEE Xplore, ScienceDirect, and Scopus, were used. The following keywords are used in the title or keyword tracking:

  • Distribution network planning, Optimal planning.

  • Distributed resources, renewable-based DG, energy storage.

  • Forecasting, demand forecasting, nowcasting, EV, heat and cooling load, transport behavior.

  • Optimization, algorithms, centralized, decentralized, distributed optimization.

  • State-estimation, observability of the distributed system.

Once the papers were filtered according to the eligibility criteria explained i.e., the reputation of authors, citations received, and relevance of the work, 222 research papers were selected for this review work. Of the reviewed papers, 77% of them are peer-reviewed journal articles, while the remaining 23% come from conference proceedings (the majority of these are IEEE conferences). Moreover, 56% of the papers are published by IEEE, Elsevier publishes 27%, and the rest are published by other publishers (i.e., IET, Springer). Figure 3 shows the distribution of the reviewed paper by publication year. From Fig. 3, it can be noticed that most of the reviewed works used in this work were published between 2016–2023. Therefore, it covers the significant and recent advancements in this area. Figure 4 shows how the papers were selected for this survey. As illustrated, the coverage of the reviewed works is diverse and extensive. It should be worth noting that some country and location-specific issues may not be covered in this work. Therefore, this work has global relevance and value.

Table 1 Comparison with recent review works in the literature
Fig. 3
figure 3

Distribution of reviewed papers by publication year

The distribution system with DERs is the main focus of this work. Therefore, the forecasting techniques of the PV and load (i.e., various load types) are reviewed (the two most common and essential distribution network components). Due to brevity, the uncertainty modelling in the distribution system with DERs has been excluded from this work. Furthermore, wind and other renewable sources are rare in the distribution system. Therefore, they are excluded from this study.

3 Forecasting taxonomy

This section focuses on solar generation and load forecasting, which are two essential factors for the operation and planning of distribution networks. Figure 5 provides a summary of forecasting techniques for both solar and load forecasting, which can be categorized into five main types. These details are provided in subsection 3.1. The advantages and disadvantages of each type of technique are summarized in Table 2.

3.1 Solar output forecasting

Solar generation data is collected in the form of time series with various intervals. This is defined as a sequence of observations measured with respect to time [15]. The outputs of solar generation are volatile and weather dependent. Therefore, short, medium, and long-term forecasting is required for the operation and planning of the network. The metrics used for evaluating forecasting models, corresponding formulas and outputs and applications are presented in Table 3. From Table 3, it can be seen that MAE is the most basic matrix used to evaluate the forecasting error. Indices like forecasting skill or MASE is developed to handle extensive data and estimate forecasting with greater accuracy. The forecast horizon is defined as the span of time into the future for which the PV power outputs are forecasted. Forecasting horizons can be divided into four types based on the duration of the horizons and their specific applications, as presented in Table 4. From the table, it is evident that the nowcasting with the duration of a few seconds to minutes is possible. This is usually applied in the real-time operation of the power system.

Fig. 4
figure 4

Schematic diagram of the reviewing methodology

Fig. 5
figure 5

Different forecasting models discussed here

Tables 5 and 6 summarized the forecasting techniques which are used for both solar generation forecasting and load forecasting, respectively. The forecasting horizon, forecasting error, forecasting models, and load types are reported so that readers can use them as a benchmark for their research. They can be broadly categorized into five types as follows:

  • NaÏve model (also known as Persistence model): this techniques simply take last period’s actual values to use as the current period’s forecast. This model is usually used in very short to short term forecasting. However, this model’s accuracy depends on the stability of the input data, for example, today’s weather conditions should be the same as that of yesterday. If there is sudden changes, the forecasting error becomes larger [36].

  • Statistical models:

    • Auto-regressive moving average (ARMA) model: this model was proposed by Moran and Whittle [37]. The model is a combination of Auto-regressive (AR) and Moving average (MA) models. Given a time series, the current value \(x_t\) can be estimated based on the level of p lagged observations \(x_{t-1}, x_{t-2},..., x_{t-p}\); a white noise term \(\epsilon \); a constant C. Then the AR term of order p is defined as:

      $$\begin{aligned} \text {AR}(p): x_t=c+\sum _{i=1}^{p}\varphi _ix_{t-1}+\epsilon _t \end{aligned}$$
      (1)

      where \(\varphi _i\) are the parameters. Next, the moving average (MA) model of order q are defined as follows:

      $$\begin{aligned} \text {MA}(q): x_t= \epsilon _t+\sum _{i=1}^{q}\theta _i\epsilon _{t-i} \end{aligned}$$
      (2)

      where \(\theta _i\) are parameters. Then the ARMA model can be formed:

      $$\begin{aligned}&\text {ARMA}(p,q): \nonumber \\&\quad x_t= c+\sum _{i=1}^{p}\varphi _ix_{t-1}+\epsilon _t+ +\sum _{i=1}^{q}\theta _i\epsilon _{t-i} \end{aligned}$$
      (3)
      Table 2 Summary of advantages and disadvantages of forecasting models

      Works that used ARMA can be found in [38,39,40]. One drawback of the ARMA is that the time series needs to be stationary, that is, their properties do not depend on the time at which the series is observed. An extension of ARMA, called AR integrated MA (ARIMA), can deal with non-stationary data [15]. ARIMA assumes that the correlation between time series values of the past and present is linear. As real time series data is not always linear, ARIMA models might not be the best approach [28].

    • Regression models: this method aims at establishing a relationship between the explanatory (independent) and dependent variables (output). In this model, the dependent variable is forecasted by a linear/ non-linear combination of the independent variables. Abuella et al, [24] implemented a multiple linear regression for short term solar power forecasting. The model perform quite well with a low root mean square error (RMSE) of 0.0137. The authors stated that the model’s performance would be better for near forecasting horizon than farther horizon if the weather is clear. It is also suggested that additional historical data should be added to improve the model’s performance.

  • Physical models: this method uses mathematical equations to describe the physical state and dynamic motion of the atmosphere. The accuracy of this type of model is higher when the weather conditions are stable. As they are based on the motion of the atmosphere, they are highly sensitive to weather forecasting and they should be designed specifically for a particular location [29, 30].

  • Machine learning/ Deep learning (ML/DL) based models: the strength of this type of model is that it can learn from data to enhance its performance. Modern computers today allows the training and deployment of large machine learning/ deep learning models. ML/ DL has been intensively used for pattern recognition, data mining, classification problems, and forecasting. Common models that are used for forecasting are Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN). An illustration of the aforementioned networks’ architectures can be found in Fig. 6, and the figure also shows a comparison between the three models based on their input data, complexity, parameter sharing capabilities, and the ability to detect spatial relationship of data. The CNN and RNN are built based on the ANN. A CNN makes the use of the "convolutional layers" and "pooling layers" for feature extraction, and the extracted features will be passed to an ANN for classification. An RNN would learn features of the data by a memory of previous inputs stored internally in the network, as illustrated by the backward connection. Readers are encouraged to check the work in [41,42,43] for details about the mathematical concepts and architectures of those three networks. From the figure, it is evident that only RNN has the ability to identify the Spatial relationship of data.

    Table 3 Common forecasting metrics [16,17,18,19]
    Table 4 Summary of forecasting horizon
    • ANN: The ANN model mimics the activity of the human brain. A simple single-layered ANN consists of three parts: the input layer, the hidden layer, and the output layer. More hidden layers can be added to form the Multilayer Perceptron (MLP). When a nonlinear relationship exists between the data without any prior assumption, the ANN is more suitable than statistical methods. The ANN was used by Leva et al. [44], to conduct the 24-hour ahead neural forecasting of PV output power. The clear sky model was used for input data validation. The model attained an nRMSE of \(12.5\%-36.9\%\). Some authors used metaheuristics to optimize ANNs and achieved higher accuracy [45, 46].

    • CNN: The CNN belongs to a class of ANN that employs a mathematical operation called "convolution" to process and extract features from the inputs. Due to this property, CNN is widely used in image processing tasks. The CNN can also be used for time series analysis [47]. Sun et al., [48] developed a SUNSET model, in which CNN is proved to be an effective structure in correlating images with contemporaneous PV output. This work was further developed for short-term forecasting in [49], achieving a forecasting skill of \(15.7\%\) on the test set and \(16.3\%\) on cloudy days. A CNN-based model called Kloudnet was developed by Pothineni et al. [50] for estimating irradiance fluctuations from sky images and providing a short-term prediction of the irradiance state around the PV power plant. The forecasting horizon was 5–10 min. It is found that an 18-layer-ResNet has the best performance with \(91.3\%\) to \(92.9\%\) testing accuracy.

    • RNN: The RNN belongs to a class of neural networks in which previous outputs can be used as the inputs with hidden states. The most widely used RNN is the Long Short-term Memory network (LSTM), which has a wide range of applications in multiple domains [51]. In terms of PV forecasting, it is shown that LSTM based model could outperform ANN, CNN, and other conventional approaches discussed in [20, 52,53,54].

    It has been reported that the ML/DL based networks should be trained with more data to make them robust. Some studies have found that the optimization of inputs and outputs in terms of number and type (which is a challenging combinatorial optimization problem) may improve the forecasting quality [55, 56]. Metaheuristics were used for such applications in [55, 56]. The development of exact algorithms for this problem is worth consideration. Another potential direction of research is to develop integrated optimizers for training the ML/DL models. Hyperparameter optimization to enhance ML/DL network performance is also a challenging problem to solve.

  • Hybrid model: The hybrid model is a combination of at least two techniques. The hybrid model improves forecasting accuracy by taking advantage of each individual technique. It has shown very competitive results compared to the stand-alone techniques. A hybrid model can be a combination of two different types of the neural network, such as deep neural network and LSTM, deep CNN with LSTM [54, 57, 58], CNN with MLP [59], LSTM with autoencoder [60]. Other combinations include LSTM with feature selection block [61], LSTM with support vector regression [62], bidirection LSTM combined with deep CNN with wide first-layer kernels [63]. Fuzzy logic can also be combined with neural networks to develop the hybrid model [64, 65]. A combination of deep CNN, wavelet transform, and quantile regression were proposed in [66] to provide hybrid deterministic-probabilistic forecasting. In this model, the deterministic part comprising wavelet transform and deep CNN was employed to decompose and extract features from the data. Then the processed information will be fed to the probabilistic part of the model. The research in [67] considers five forecasting models, which are Seasonal naive, Seasonal/Non-seasonal ARIMA, Seasonal/Non-seasonal autoregressive integrated moving average exogenous (SARIMAX/ARIMAX), Multiple linear regression, and Support vector regression. The authors proposed a framework based on Particle swarm optimization (PSO) to select the best combination of the five individual models so that more accurate forecasting results can be obtained. Numerical experiments shows that this framework outperforms the individual models, and it can be adapted to any resolution or horizon. Since a hybrid model usually consists of at least two different techniques, the required computational effort significantly increases. Optimizing the parameters of the hybrid models can also be challenging since the optimal parameters of one technique can adversely affect the performance of the remaining ones. Therefore, an appropriate trade-off needs to be selected.

Table 5 Summary of solar generation forecasting techniques

3.2 Load forecasting

Load forecasting is important to make decisions on active and reactive power dispatch, load switching, and infrastructure development, network augmentation [68]. The risk-averse operation, demand monitoring, the demand of spinning reserve, and vulnerability to failures can also be identified through the use of load forecasting. Furthermore, load forecasting can also be used for system scheduling, reserve planning, estimating day-to-day operation, and efficiency [69]. For LV and MV networks, load forecasting can be used for network design and control [70,71,72], anomaly detection [73], or producing pseudo-measurements which can be used for state estimation [74]. Load data is also represented in the form of time series. Hence the forecasting methods discussed in Sect. 3.1 can also be applied to this problem. The loads in LV networks are volatile due to the consumer’s behaviors [73]. Therefore, forecasting is usually performed on a short-term horizon. Authors in [75] tried to identify the significant factors affecting residential LV network demand consisting of 128 customers. This work has forecasted the next day’s total energy use and peak demand for each phase. A hybrid model was proposed, which combines the best traits of the autoregressive integrated moving average with exogenous variables (ARIMAX) and neural network. This hybrid model exhibited better performance across all three phases. The LV network considered in [75] was revisited in [76], where an expert system was developed based on correlation clustering, discrete classification, neural network, and post-processing procedure to forecast the demand profile. This system aimed at addressing the issues caused by high variance and frequent random shocks. In [77], a multilayer perceptron (MLP) design was proposed, which utilizes a variable selection step via auto-correlation, and a model selection algorithm to optimize the performance of the model. The proposed model outperformed the other time series methods considered in this work. The developed model is tested with realistic data from French distribution networks. Two non-seasonal ARIMA models and two seasonal sliding window-based models were used in [78] to forecast hourly electricity load at the district meter level. Results show that daily consumption data and aggregated hourly coefficients of daily profiles are enough for obtaining accurate hourly predictions of electricity load. Multinodal load forecasting was performed in [79] using a fuzzy-artmap neural network, which is built based on the adaptive resonance theory and fuzzy logic. The proposed approach can deal with loads at substations, transformers, and feeders. An LSTM-based model for short-term household load forecasting was developed and tested on two different time series, namely aggregated load at the station level and data from smart meters at the household level [80]. The model outperformed other methods discussed in the paper. A method combining wavelet clustering analysis and kernel function weighting was introduced in [81]. It was used to forecast transformer load in distribution networks and tested on the distribution transformer data in China. The obtained results were very competitive compared with ANN and frequency component method. Syed et al. [82] proposed a framework combining clustering and deep learning models for short-term forecasting at the distribution transformer level. Clustering was used to enhance the scalability of the approach and its capability to analyze big data. Distribution transformers were grouped by their energy consumption profile at the aggregated level by the K-Medoid algorithm, and then forecasting models were used within each cluster. Linear regression, Deep neural network, and LSTM were used for forecasting, and each of them was tested with/without the cluster. It is reported that the clustering-based version can reduce the training time while demonstrating competitive results in comparison with the non-clustering-based version. The CNN can be combined with LSTM for very short-term up to medium-term load forecasting [33], which yields competitive results compared to LSTM, radial basis function network, and extreme gradient boosting algorithm [83]. Good data preparation and feature engineering can improve the accuracy of the forecasting model significantly. The number of lookbacks in LSTM also affects the forecasting performance, as illustrated in [84]. In [85], a hybrid model based on a combination of unidirectional LSTM and bi-directional LSTM was introduced and has shown better results in terms of complexity, training time, and forecasting accuracy compared with other DL models. Authors in [86] proposed a hybrid model which consists of the components of an LTSM autoencoder, bi-LSTM layer, stacks of LTSM layer, and a fully connected layer. This hybrid model was used for very short-term household load forecasting, and it can deal with nearly real-time data. Electric vehicles are increasing in the distribution network footprints with significant impacts on network operation. The increased penetrations of EVs could adversely to the power system [87]. Therefore, forecasting and profiling of EV loads are required. In [87], two short-term load forecasting methods are compared for EV load, one is based on the driving characteristics of EVs, whereas, the other is based on the Support Vector Machine (SVM). The SVM model considers daily loads, weather conditions of the day, the temperature of the day, the number of vehicles, and the number of charging ports available as inputs. The results showed that both MAE and RMSE of the SVM model were reduced approximately three times compared with those of the other model. In [88], a generative adversarial network (GAN) was used to generate missing data. A new gating mechanism called Mogrifier was proposed to enhance the LSTM network for daily EV load forecasting. The proposed Mogrifier-LSTM outperformed other benchmarking models, namely bidirectional LSTM, gated recurrent units (GRU), SVM, ARIMA, SARIMAX in terms of MAE and RMSE. Akil et al. [89] proposed a hybrid approach using empirical mode decomposition, Bayesian optimization, and neural networks for residential EV forecasting. Empirical mode decomposition is used to extract the features from the data. Extracted features are fed to a neural network, while Bayesian optimization is used to optimize the structure of the network. The proposed approach also tested LSTM, bidirectional LSTM, and MLP. It was shown that the hybrid LSTM outperformed the LSTM and MLP. A reinforcement learning technique called Q-learning was introduced in [90] to forecast the hourly EV load. Three scenarios were considered, namely uncoordinated charging (the vehicle is used during the day and charged in the evening), coordinated charging (the vehicle is charged during off-peak hours), and smart charging (the vehicle is charged whenever the battery is not full, and the electricity price is low). The Q-learning technique was compared with ANN and RNN. Although the Q-learning technique can achieve better accuracy than ANN and RNN, it requires 10,000 training epochs while ANN and RNN require 3,000 and 1,000 epochs, respectively. Both aggregated and disaggregated EV loads for an hour forecasting horizon are considered by the authors in [91]. Six different methods were compared, i.e., ANN, RNN, LSTM, bidirectional LSTM, GRU, and stacked auto-encoders. All the models were tested with three different choices of lookbacks, which are 1, 5, and 15, respectively. The time step indicates how much information in the past is used to predict the current output. It is found that LSTM was the best among all. Lu et al. [92] used a random forest model to forecast the EV load on a 15-minute horizon. Several charging stations were considered, and prediction for single and station groups was performed. The random forest technique outperformed the SVM for both case studies. Furthermore, the SVM performed badly for the station group case. Table 6 provides a summary of the models used in each study, their performances, and the type of load used for the study. The table shows that most recent works have considered short-term load forecasting (mins to a day). Furthermore, the single EV charging station data is used for EV load forecasting in most cases due to the lack of available data.

Table 6 Summary of recent works on load forecasting in distribution networks

4 Optimal system planning

In this section, we focus on the optimal power flow and its application in the planning as well as the operation of distribution networks. Mathematical formulations of the techniques and corresponding computation methods, are reviewed and discussed.

Fig. 6
figure 6

Architecture of ANN, CNN, and RNN and their characteristics

4.1 Optimal power flow (OPF)

Optimal Power Flow (OPF) is a very well-known constrained optimization method. It is also one of the most important and well-studied methods in power systems. The OPF is an extension of the economic load dispatch problem, with the addition of physical and technical constraints. The OPF was first introduced in 1962 by Carpentier [93]. The objective function of OPF could include the cost of generating real power, power loss, or consumption cost. The total generated power can also be assumed to be equal to the sum of the load and the losses at the system level. This implies that the minimization of the loss at the fixed load can be equivalent to minimizing the total generation. Some other specific objectives can also be used in the OPF [94]. Table 7 summarizes the objective functions employed in OPF studies at the distribution level, their characteristics, and the number of objectives optimized in each study. According to the table, only a few studies have considered the multi-objective and non-convex functions for OPF in the distribution system.

Table 7 Objective functions used in recent works on OPF

Let N be the set of buses in the network; \(V_i\) be the voltage in the complex form at bus i; \(P_{i}\) and \(Q_{i}\) be the active and reactive power at \(i^{th}\) bus; \(P_{ij}\) and \(Q_{ij}\) be the real and reactive power flow between bus i and j. Then the basic formulation of the OPF can be expressed as follows,

$$\begin{aligned} \min _{V,P,Q} f(V,P,Q) \end{aligned}$$
(4)

subject to

$$\begin{aligned}{} & {} P_{ij}+jQ_{ij}=V_i(V_i^*-V_j^*)Y_{ij}^*, \quad \forall i,j\in N \end{aligned}$$
(5)
$$\begin{aligned}{} & {} \sum _{j\ne i}P_{ij}=P_i, \quad \forall i\in N \end{aligned}$$
(6)
$$\begin{aligned}{} & {} \sum _{j\ne i}Q_{ij}=Q_i, \quad \forall i\in N \end{aligned}$$
(7)
$$\begin{aligned}{} & {} {\underline{P}}_i\le P_i\le {\overline{P}}_i, \quad \forall i\in N \end{aligned}$$
(8)
$$\begin{aligned}{} & {} {\underline{Q}}_i\le Q_i\le {\overline{Q}}_i, \quad \forall i\in N \end{aligned}$$
(9)
$$\begin{aligned}{} & {} P_{ij}^2+Q_{ij}^2\le {\overline{s}}_{ij}^2, \quad \forall i,j\in N \end{aligned}$$
(10)
$$\begin{aligned}{} & {} {\underline{V}}_i\le V_i\le {\overline{V}}_i, \quad \forall i\in N \end{aligned}$$
(11)

where \(*\) denotes the complex conjugates, \(Y_{ij}\) is the admittance value between bus i and j, \({\underline{P}}_i, {\underline{Q}}_i,{\underline{V}}_i\), \({\overline{P}}_i, {\overline{Q}}_i,{\overline{V}}_i\) are the lower bounds and upper bounds of active, reactive power and voltage, respectively, \({\overline{s}}_{ij}\) is the apparent power capacity of the lines, f denotes the objective function. Constraint in (5) defines the relationship between power flows and voltages, constraints in (6) and  (7) make sure that the active and reactive power in and out of node i are equal to the sum of the flows through the lines connected to bus i. The constraint in (10) ensures that the complex power flow magnitude must be below the line capacity. Many extended OPF versions are also developed. The key OPF versions are listed below:

  • Direct Current OPF (DC OPF): A simplified version of OPF in which the power flow constraints are approximated by linear constraints under certain assumptions [123].

  • Alternating Current OPF (AC OPF): Standard OPF method for power networks. This is based on the physical power flow characteristics of the system. This version can contains various constraints at a fixed time interval [124,125,126].

  • Dynamic OPF: An extended version of the static OPF which can deal with the load in multiple time periods [127].

  • Transient stability-constrained OPF: A variant of OPF in which static and dynamic constraints of the power network are considered simultaneously during the optimization. In this way, the system dynamics are considered in the optimization. However, this version is computationally expensive, convergence is not guaranteed. If the contingencies are involved, they should be selected carefully to avoid an overwhelming computational burden. [128, 129].

  • Security-constrained OPF (SC OPF): An extension of OPF that involves contingency constraints. A contingency is an event that may cause one or more generators or lines to be removed from the network [130, 131].

  • Stochastic OPF: In this variant, uncertain parameters such as changes in constraints (constraints with random variables), or multiple scenarios are considered. These parameters can affect the final OPF outputs [132,133,134].

  • Probabilistic OPF: In this variant, probability distribution functions of dependent variables are estimated from the probability distributions of loads and other uncertain factors. The uncertain factors, in this case, do not affect the final OPF results [100, 135].

Fig. 7
figure 7

Graphical comparison of key OPF algorithms

A graphical comparison can be seen in Fig. 7. It can be seen that the DC OPF has the lowest complexity and also the least robust one. In order to make the model more accurate and robust, more complicated constraints are considered in the remaining models. Consequently, their complexity increases significantly.

4.2 Formulation of OPF and solution method

The power flow equations can be formulated by two equivalent models, i.e., the Bus Injection Model (BIM) [136] or Branch Flow Model (BFM). This is also known as DistFlow Model [137]. These two models are based on the power system’s undirected and directed graph representation and proved to be equivalent as reported in [138]. Constraints of OPF in (5) are nonlinear due to the relationship between the bus voltages and phasor angles [139]. Additional constraints for more complicated OPF variants, such as load tap changing transformers and shunt capacitors, contain binary variables, and therefore more non-convex elements are introduced. The non-convex feasible space can hinder the performance of many optimization algorithms since it is difficult to find the global optimum, and they are more likely to find local optima [139]. Disconnection may also exist within non-convex feasible space [140]. The OPF is proved to be non-deterministic polynomial-time hard (NP-hard) in both general and tree networks [141, 142].

In order to deal with non-convexity, convex relaxation and convex approximation are usually employed. Convex relaxation aims at enclosing the non-convex feasible space within a convex hull [9]. Once this is achieved, many convex programming solvers can easily find the global optimum. However, the solution obtained by the relaxed version needs to be verified for feasibility. In other words, it lies inside the original non-convex feasible space. Convex relaxations are proven to guarantee the lower bounds of the original minimization problem of OPF [143]. Interested readers can find detailed tutorials on formulation and relaxation in [143, 144]. The most common types of relaxation are Second-Order Cone Programming (SOCP) relaxations and Semidefinite Programming (SDP) relaxations. In SOCP relaxations, a less restrictive inequality constraint is usually used to replace an equality constraint describing the line losses in the BFM. These SOCP relaxations are proved to yield the global solution to the original non-convex single-phased OPF problem for the radial networks, which involves some other non-technical constraints, i.e., linear separability, voltage bounds, and voltage angle differences across the lines [144]. The SDP relaxations can generalize SOCP and are usually used for meshed networks and three-phase unbalanced network models. The SDP relaxations are formed by introducing linear constraints and a rank constraint on a matrix whose entries are voltage phasor products. Then this rank constraint is replaced by a weaker positive semidefinite constraint on this matrix. Convex approximation does not try to construct a convex space over the original feasible space. Instead, this technique tries to approximate certain parts of a feasible space using a convex hull. A convex constraint approximates the non-convex power flow constraints. In this approach, the approximated problem may not contain all feasible solutions, or it may contain infeasible solutions. The most common approximation is the DC OPF model, which assumes that the lines are lossless, i.e. the resistance is very small compared to the reactance, small bus angle differences, and equal nodal voltage. These assumptions are not valid for the case of distribution networks as the X/R ratio is low, and the line resistance cannot be ignored [145]. Authors in [146] showed that the DC OPF solution is never feasible for the original problem for several test networks. The medium loading and low network loss condition are more suitable for using DC OPF. Many modelling approaches have been introduced in previous works depending on the system types. These includes multi-phased OPF [147], linearized OPF [148], OPF considering storage devices [149], unbalanced three-phase OPF [150], alternating direction method of multipliers (ADMM) [151], uncertainty-based OPF [152].

The advanced computational facilities also pave the way for decentralized/distributed optimization, which can benefit from parallel computing. Distributed approaches have certain advantages compared to the centralized approach [153], and the key advantages are illustrated in Fig. 8.

Fig. 8
figure 8

Centralized versus Decentralized optimization

In Fig. 8, ’0’ means lower and ’2’ means higher. The centralized and decentralized optimization methods are compared against scalability, data security, data sharing, computational speed, and infrastructure requirements. It is evident from Fig. 8 that the decentralized optimization provides better scalability, data sharing capability, and computational speed using the lower infrastructure than the centralized approach. However, parallel computing is required. Therefore, more computational infrastructures are required. Let us, now recall the definition of a centralized optimization problem. A centralized optimization problem can be formulated as follows. Let x be the vector of variables, f(x) be the objective functions, g(x) be the set of equality constraints, and h(x) be the set of inequality constraints. Then the centralized model can be expressed as:

$$\begin{aligned} \min _{x}~~&f(x) \end{aligned}$$
(12)
$$\begin{aligned} \text {subject to}~~&g(x)=0 \end{aligned}$$
(13)
$$\begin{aligned}&h(x)\le 0 \end{aligned}$$
(14)

Decentralized optimization, or distributed optimization, consists of two steps. In the first step, the original problem is split into several sub-problems. This means that the centralized objective functions and constraints are also decomposed into sub-problem-specific objective functions and constraints. Each sub-problem is then assigned to an optimization agent. The second step is called "coordination", where the sub-problems are solved by their own agents. The agents then share their information, and the whole problem is solved when each agent has optimized its sub-problem while satisfying certain mutual predefined conditions. The decentralized optimization technique can be applied to solve the OPF problem. Let the set of sub-problem be \(N=\{1,...,n\}\), the set of variables which belongs to agent \(i^{th}\) be \(x_i\) where \(x_i\) is also a subset of the original variable x. The \(i^{th}\) sub-problem has the associated objective, inequality constraints, and equality constraints \(f_i(x_i),g_i(x_i),h_i(x_i)\), respectively. Each agent optimizes its own problem using a copy of \(x_i\), which is defined as \({\tilde{x}}_i\). A detailed tutorial on a distributed formulation can be found in [154]. The decentralized model can be expressed as:

$$\begin{aligned} \min ~~&\sum _{i\in N}f_i({\tilde{x}}_i) \end{aligned}$$
(15)
$$\begin{aligned} \text {subject to}~~&g_i({\tilde{x}}_i)=0 \end{aligned}$$
(16)
$$\begin{aligned}&h_i({\tilde{x}}_i)\le 0 \end{aligned}$$
(17)
$$\begin{aligned}&A[{\tilde{x}}_1^T \dots {\tilde{x}}_n^T]^T=0 \end{aligned}$$
(18)

where the last constraint is called "coordination" constraint expressing the dependencies among different agents, for example, shared objective functions or constraints, A is a predefined matrix addressing the dependency.

Considering the data exchange mechanism, they can be defined as either static (offline) algorithms or dynamic (online) algorithms. In static form, the control agents exchange information with their neighboring agents in each iteration to generate control set-points. A solution will be obtained via computations of the agents before any actions are done. For the case of dynamic form, the control set-points will be applied to the physical system immediately. The agents will observe the control variables, such as nodal voltages, currents, or power flows. They then communicate with their neighboring agents and perform optimization process. In this case, the next iteration will be performed according to the response of the system to the previous iteration in a "feedback-based" manner. Another way to classify distributed algorithms is based on the implementation. If they are deployed with a shared database, then they are called "federated" [155]. Otherwise, they are referred to as "peer-to-peer" [156]. "Federated" algorithms require large computing and database infrastructure, while "peer-to-peer" ones would require higher specifications on facilities that are deployed at the distributed computing agents. Table 8 provides an overview on recent works considering both centralized/ decentralized approaches. From Table 8, it is evident that the Primal-dual algorithms and Dual-ascend algorithms are usually employed in decentralized approaches. It can also be observed that the decentralized algorithms are mainly used in the balanced system, whilst, centralized algorithms are broadly applied to the unbalanced system, with some exemptions. These two methods try to solve the dual problem consisting of a dual objective function and corresponding dual variables and constraints. Solving the dual problem provides a lower bound for the primal problem, this is called weak duality. The strong duality occurs when the dual objective value is equal to the primal’s one. The dual problem is convex even if the primal is not. Thus, it is easier to solve [157]. Dual ascent algorithms use gradient/sub-gradient descent to solve the dual problem. There are certain drawbacks of using duality, including the complexity in evaluating the dual function and the non-differentiability of the dual function. Strong duality also rarely holds in practice.

4.3 Machine learning/deep learning OPF

More data from power systems are now available than before due to the increasing number of monitoring and sensing technologies to the different levels of the power systems [167, 168]. Apart from forecasting, OPF is also a promising area for ML techniques, which aim to solve it in real time. The applications of ML techniques for OPF can be summarised as follows:

  • Predicting OPF outputs: In this approach, the inputs can be the nodal demand or the renewable energy power injection, and a corresponding OPF solution is obtained based on the inputs. A set of samples will be created in this way, and the ML models will be trained on the samples. A statistical learning framework was developed in [169] to solve the real-time OPF with stochastic generation. The authors utilized the set of active constraints, called bases, as learning features. The active power of the generators and the basic feasible solutions that satisfy technical constraints are used as decision variables. It is shown that ten bases are sufficient regardless of the problem size, and the number of important bases is uncorrelated to the problem size. A control policy was developed and trained on the selected important bases. Then the model can be used for real-time control. Authors in [170] considered many supervised learning models to predict security-constrained optimal generation dispatch prediction problems where local features are used as inputs. Pearson’s correlation plot is used to remove unuseful features. The features are then fed into an ANN and the random forest model. An augmented-learning framework that combined ML models and physics-based network equations was introduced in [171]. First, the ML models are used to predict the bus voltage magnitudes and angles. Then, the nodal injection is calculated via network equations. A deep learning model called DeepOPF is introduced in [172] to solve a DC OPF problem. Then, post-processing is performed to ensure that the solution is feasible. An upgraded version of DeepOPF is proposed in [173]. This method can achieve a computational speedup by two orders of magnitude compared to conventional solvers with minor optimality loss. Another upgraded version is DeepOPF V [174], which is capable of solving AC OPF and achieves a computational speedup of up to four orders of magnitude compared to conventional solvers. A competitive optimality gap is observed by preserving the feasibility of the solution. The DeepOPF-FT is considered the latest version[175]. This version is capable of solving multiple AC OPF problems flexible topology and line admittances without retraining. This network achieves better results than training one deep neural networks for every combinations of topology and admittance. DeepOPF-FT shows the potential in solving AC OPF with multiple types of DERs. Authors in [176] employed CNN to build a data-driven method, referred to as ConvOPF-DOP, to predict AC OPF output with unknown operating condition and topology. The ConvOPF-DOP can be 350 times faster than conventional methods in numerical experiments.

  • Predicting active constraints: Active constraints are defined as the inequality constraints that become equality constraints with certain solutions obtained through the optimization process. The OPF outputs obtained by ML prediction may be infeasible. Therefore, predicting the active constraints is an alternative way to address this issue. In [177], the authors tried to establish optimal active constraints and developed an algorithm called DiscoverMass to collect and learn active sets. This strategy helps to reduce the dimension of the learning problem. A multilayer, fully connected neural network was developed in [178] for learning the active constraint sets of DC-OPF at the optimal point. The predicted results allow the real-time tracking of the optimal solution. A single-layer neural network was used in [179] to predict the "umbrella constraints". These are the necessary and sufficient constraints for the feasibility of the OPF problem. It can be observed that the sets of umbrella constraints are more robust to the fluctuation in load demand. A scalable, data-driven learning framework was developed in [180, 181] to eliminate inactive constraints or zero probability events from the AC OPF. In this framework, the joint chance constraints are approximated by a series of single chance constraints. This can cope with uncertainty caused by the integration with DERs. Support Vector Classification (SVC) is also used to classify active/ inactive constraints. of the power loss

  • Learning control policy: In this approach, ML techniques are utilized to control the injection of power into the system. In [182], a decentralizing optimal power flow (OPF) method based on machine learning was developed with multiple DERs taken into account. Data were gathered from metering infrastructure. Then a centralized OPF was performed to find out how the DERs best behave by controlling active and reactive power. Then, decentralized learning was performed on each DER based on a regression method. The method was tested on both single-phase and three-phase systems. It should be worth noting that the method performed well at locally reconstructing OPF solutions. The SVM was used in [183] to control Volt-VAR. First, a centralized, multi-period OPF-based formulation, which considers various active control measures and uncertainty associated with renewable generation and loads, is used with historical data to generate a sequence of optimal DER set-points at different operating conditions. The proposed method is employed in the European LV network for testing and validation. This proposed method was further extended in [184], where active power curtailment, battery storage, and load shifting are considered. Robust, stochastic OPF was investigated in [133, 134]. A multi-stage distributionally robust optimal control problem for OPF was formulated. Furthermore, a distributed robust model predictive control algorithm was proposed to solve the subproblems at each stage. The framework is robust to inherent sampling errors. Two specific data-based OPF problems, which utilized DC approximations to construct chance-constrained solutions, are proposed for distribution networks and transmission systems. The former considers high solar penetration, while the latter focuses on security issues with high wind injection.

  • Data-driven stability constrained OPF: A data-driven approach, which ensures small-signal stability and N-1 contingency, is proposed in [185]. This work used mixed-integer nonlinear programming’s decision rules on the SC OPF formulation. The non-convexity is relaxed using the second-order cone programming technique. Small-signal model of the system is used to gather the data. The decision tree is applied to get the decision rules that are based on whether the scenarios fulfill the stability margin of the system or not. A multilayer neural network is used in [186] for the transient stability-constrained OPF. This model uses steady-state voltages and nodal active power generation as input features and predicts the critical clearing time constraints. The proposed method does not require dynamic simulation during optimization. The research in [187] uses MLPs to predict the quantile of the maximum constraint violation and expectation, then the trained MLPs are reformulated into mixed-integer linear constraints to build a surrogate model for solving joint chance-constrained OPF. Data augmentation are used to enhance the performance of the method. The proposed method is a potential solution to overcome the limitation in data for the constraint learning approach.

    Table 8 Summary of recent works related to centralized/decentralized OPF
  • Warm start points prediction: Using warm start solutions as the starting point for the iterative algorithm (for example, the Newton–Raphson method) can ensure convergence [188]. In [188], a random forest model is employed to predict AC OPF solution with demand as inputs. The predicted results are used as starting inputs for solvers. The application of the predicted solution as an initial point in the solver could outperform the conventional method with computational efficiency. In another recent work [189], the authors proposed a machine learning framework to warm start decentralized ADMM algorithm by learning a close-to-optimal primal-dual solution. Numerical experiments showed that by using this approach in tandem with ADMM, they can obtain solutions of similar quality as the original ADMM with significantly reduced computational time.

Finally, Fig. 9 summarizes the advantages and disadvantage of using machine learning/ deep learning for OPF, and Table 9 provides an overview of ML approaches for OPF. Table  9 also shows that several studies listed in the table did not consider DERs in the proposed methods.

Fig. 9
figure 9

Advantages and Disadvantages of machine learning approaches for OPF

5 State estimation in distribution networks

Power system state estimation (SE) can be defined as a process that converts meter readings and other available information into an estimate of the current state of an electric power system. In other words, it is a numerical process that maps data measurements to state variables [198]. The flexibility of power systems at LV and MV levels has increased as consumers shift from passive buyers to active users by installing DERs. Therefore, the system operators need to cope with the increasing variables and probabilistic load profiles of the network. At the same time, they also need to find solutions to improve the reliability and maximize the hosting of DERs without violating the operating constraints. Therefore, the distribution system state estimation (DSSE) has become an essential tool to monitor and control the distribution network to meet these paradigm shifts [199]. However, unlike the transmission systems, the distribution system’s observability is low due to the following factors:

  • The number of measuring instruments in a distribution network is limited due to the cost of installation [200]. To address this problem, pseudo-measurements can be utilized. These measurements have played a vital role in distribution system state estimation. [201, 202].

  • The integration of the DERs, such as PV systems, can cause fluctuation in the system state variables, such as voltage fluctuation [203]. The unbalanced loads in the distribution system also lead to difficulties in SE formulation.

  • Since the value of line reactance is low in the distribution system, which leads to low X/R ratios, state estimation techniques that are usually used for transmission networks cannot be directly applied to distribution systems [145].

The DSSE have many applications in practice ranging from fault localization [204, 205], power quality estimation [206, 207], harmonic state estimation [208], short circuit estimation [209], and outage management [210]. The output of DSSE is the set of state variables (either nodal voltages or currents) that describes the current state of the power system. Weighted Least Square (WLS), and variants of Kalman Filters (KF) are two popular formulations for the solution of the SE problem. Section 5.1 provides a brief overview on those conventional methods while Sect. 5.2 focuses on the use of ML/ DL techniques for state estimation.

5.1 Conventional DSSE

5.1.1 Weighted least square

Here we revisit the mathematical model of WLS. Assume that x is a vector of n random variables \(x_1,x_2,...,x_n\) that y is another vector of \(m (> n)\) random variables \(y_1,y_2,...y_m\) and both are related as

$$\begin{aligned} y=Hx+r \end{aligned}$$
(19)

where H is a known \(m\times n\) matrix, \(r \in {\mathbb {R}}^n\) is zero mean random variable. In this equation, x represents the state variables that need to be estimated, while y represents the variables whose numerical values can be measured. The equation implies that the measurement vector y is linearly related to the state vector x and is also affected by the noise r. Let \({\hat{x}}\) be the desired estimate so that \({\hat{y}}=H{\hat{x}}\), then the estimation error can be defined as \({\tilde{y}}=y-{\hat{y}}\). The Least Square Estimation (LSE) can then be formed,

$$\begin{aligned} {\hat{x}}={\text {*}}{argmin}_{x}{\tilde{y}}^T{\tilde{y}}={\text {*}}{argmin}_{x} (y-{\hat{y}})^T(y-{\hat{y}}) \end{aligned}$$
(20)

LSE can be extended to Weighted Least Square (WLS) estimation by adding a real, symmetric weight matrix W defining the importance of some specific measurements,

$$\begin{aligned} {\hat{x}}={\text {*}}{argmin}_{x}{\tilde{y}}^T W {\tilde{y}}={\text {*}}{argmin}_{x} (y-{\hat{y}})^TW (y-{\hat{y}}) \end{aligned}$$
(21)
Table 9 Summary of machine learning techniques for OPF

The choice of the matrix W varies, but the most common choice is the following diagonal matrix,

$$\begin{aligned} W=\begin{bmatrix} \frac{1}{\sigma _1^2}&{} \dots &{} 0\\ \vdots &{} \ddots &{} \vdots \\ 0 &{} \dots &{} \frac{1}{\sigma _m^2} \end{bmatrix} \end{aligned}$$
(22)

where \(\sigma _1^2, \dots , \sigma _m^2\) are the variance of the measurement error corresponding to the elements of y. Some modifications of W can be found in [211]. WLS estimation has been in used for DSSE for a long time. Baran and Kelley used a WLS based model to monitor a distribution feeder in real time, and a three-phase model was also proposed [212]. A similar approach can be found in [213]. Singh et al. [214], assessed various SE algorithms, including WLS. It is concluded that WLS can work well if the noise characteristics are known, which is usually assumed to be Gaussian. Recent works that utilised WLS can be found in [215,216,217], in which data from smart meters is needed. WLS DSSE is a non-convex problem and therefore local optima are more likely to be found, and with limited measurements, more local optima may appear in the search space. Gauss-Newton type algorithms are commonly used to solve this problem, but their performance varies with different initialisations.

5.1.2 Kalman filter

Kalman filter (KF) is a well-known approach used for the estimation of power networks states. The KF is a two-stage process, the first "prediction" stage tries to calculate the current state values via the previous ones by means of a predefined process model. The second "filtering" stage corrects the predicted state by taking into account the available measurements and the accuracies of both process model and measurements. The accuracy of KF is influenced by the process noise covariance matrix, and the measurement noise covariance matrix. Readers can refer to [218] for the detailed mathematical model. The use of KF and its variants in state estimation is thoroughly discussed in [219], in which the author pointed out that it is necessary to perform model validation using different events to find the best parameters to enhance the robustness of the filtering algorithms. Compared to other types of networks, works on LV distribution network are still in small number. Some of the recent works can be found in [220,221,222,223], and these works still rely on the data from PMUs, and it is pointed out that the filter algorithm can produce unreliable results under strong system nonlinearities [222]. Napolitano et al., [224] developed a three-phase SE using low rate smart meters’ measurements for a LV distribution network with PV plants, as time delays became larger, the performance of the SE decreases significantly. Mitja et al., [225] developed a robust and fast SE based on the Extended Kalman filter for the case of unreliable measurement. The proposed approach outperformed the WLS algorithm. It can be concluded that future work should aim at investigating and optimizing the value of the prediction covariance matrix, particularly for the case of unpredictable energy production from distributed generators or load variations, and more testing scenarios should be used to evaluate the robustness.

5.2 Machine learning/deep learning approaches in DSSE

Machine learning algorithms have also been used in many previous studies to address the DSSE issues. The ANN was used in [226] to create a more accurate pseudo-measurement that follows a Gaussian distribution (i.e., the voltage measurement from smart meters). Pseudo-measurement generated from ANN was used for state estimation and resulted in higher accuracy. This approach was further investigated in [227]. It is concluded that the addition or removal of pseudo-measurement depends on the cases considered [228, 229]. Zamzam et al. [230] used a shallow ANN to learn the relationship between the measurements and the states. Historical load and generation data are fed to the ANN to estimate the states. Then, the states are further refined by the Gauss-Newton algorithm. The ANN in [231] was trained using the power flow obtained via a quadratic-based backward/forward sweep algorithm. For this method, no PMUs data is required. However, the network without DERs was used in [231]. Therefore, a further investigation of this method is necessary. A multi-scenario framework for the network with a high number of DERs and a low number of measurements using ANN was proposed in [232]. In this work, the ANN was trained over 2200 scenarios and five topology configurations. The framework outperformed the previous methods mentioned in the literature. However, it is stated that the case for the unbalanced grid is yet to be investigated. Furthermore, the amount of training data required would be significantly increased with the dynamic topologies, such as transformers with taps. The ANN can also be used for decentralized state estimation [233]. An observable distribution system was investigated in [234], where a Bayesian approach was used to learn probability distributions of active and reactive power injection from smart meter data. This generates the samples to train the deep neural network. Furthermore, stochastic optimization methods and regularization methods were used to avoid overfitting. The approach is computationally efficient and robust against bad data, and variation of net consumption distributions involving high penetration of DERs. This demonstrates its suitability for real-time operation. An autoencoder constructed from ANN was applied to develop a three-phase state estimator for unbalanced distribution systems [235]. The method was tested on both medium-voltage and low-voltage networks. From the results, it is evident that an autoencoder trained for each network feeder outperformed one trained for the whole system. Better results were obtained for the autoencoders trained for each network phase rather than for three phases. The work in [236] proposed a physic-aware neural network whose architecture can be simplified by exploiting the separability of the estimation problem. The DSSE can be partitioned into the smaller problem and micro-PMU measurements. These are more accurate than any other measurements from the network. The graph convolutional NN [237, 238] was used to process data over graphs. The Greedy algorithm was used to determine where to place the micro-PMUs. By this approach, overfitting was reduced via the removal of unnecessary neural network connections. It should be worth noting that the method showed robustness against bad data. This work was revisited in [239], in which the physical connection of the network was further taken into consideration. The layers of the neural networks were built based on those physical connections, resulting in a reduction in estimation time while preserving accuracy compared to the former work. A similar physic-aware framework was used in [240], which consists of two modules, e.g., RNN for predicting the state variables as pseudo-measurements, and those measurements are then fed into a prox-linear neural network to perform state estimation. The prox-linear neural network was constructed by unrolling a proximal-linear solver [241]. Transfer learning was investigated for the first time in [242] to improve the performance of ANN considering topology changes, and training data were obtained from SIMONA simulation framework [243]. The approach shows a promising potential to be a flexible framework for different grids, especially distribution networks with low observability. Overall, ANN is the most frequently used method in recent works for the case of LV networks, either with or without the DERs. However, the use of other models such as CNN, and LSTM is worth considering. Similar to what we discussed in Sect. 3.1, hyper-parameters tuning should also be considered in future research. Due to the stochastic behavior of the distribution network and the lack of measurements, advanced learning frameworks such as reinforcement learning can also be considered as a potential topic for further investigation. Table 10 summarizes the approaches of DSSE using ML discussed in this section. Table 9 shows that most proposed methods are used and verified in the unbalanced network. Furthermore, all these proposed methods are tested in the IEEE test systems, where many of these studies did not consider DERs. Based on the reviewed research, the key points of DSSE are summarized in Fig. 10. According to Fig. 10, a robust method is required for real-time application in a large system.

Table 10 Machine learning approaches for DSSE
Fig. 10
figure 10

Summary of key points of DSSE for distribution networks

6 Future research directions

Modelling and monitoring distribution networks have become more challenging than ever due to the integration of DERs. Apart from their original distinct dynamic characteristics, the increasing penetration of DERs, for example, the penetration of PV, introduced several technical challenges such as voltage violation, network constraints, reliability, or harmonic issues. Modelling these issues requires several stochastic factors to be taken into account; one such example is daily solar radiation with accurate forecasting. As seen from comprehensive reviews in Sects. 35, modern computational facilities pave the way for more advanced and robust techniques such as distributed optimization and machine learning models to be utilized. Based on the critically reviewed research, the following areas have been identified as being of interest for future research:

  • Developing new optimization algorithms for network planning: Several convex programmings and relaxation techniques, such as SDP and SOCP, are widely used in various cases in power systems. However, introducing additional terms into the objective function or adding more realistic constraints may make the whole optimization problem non-convex. The SDP or SOCP solutions may not be feasible for such cases. Therefore, algorithms for non-convex problems need to be developed or improved. Scalable non-convex programming techniques are yet to be widely applied, either in a centralized or decentralized form. Robust optimization algorithms are also required for network planning handling supply and demand side uncertainties that exist in distribution networks. Another potential research direction is combining machine learning with optimization algorithms to speed up the computational time, as illustrated in [189]. Other practical aspects, such as frequency stability, should be also incorporated into the optimization model.

  • Developing more interpretable algorithms for machine learning approaches: Research using neural networks often uses the default optimizers for training the network, which is usually ADAM or Levenberg - Marquardt. Developing new algorithms for this purpose is one of the potential research areas since it can enhance the training process. Optimizing the hyper-parameters, such as the number of layers, or the number of hidden nodes, is also a challenging problem that needs further investigation. Moreover, the articles which work on machine learning reviewed by this survey overlook feature selection and sample selection methods. Feature selection and sample selection methods should be considered in future works, since they also reduce the computational expense as well as enhance the accuracy of the machine learning models. Feature selection and sample selection can be formulated as optimization problems, as shown in [247, 248], so developing efficient algorithms for tackling those problems could be another potential research direction.

  • Use of other machine learning techniques for DSSE: The ANN is the most common approach for DSSE. The use of other models such as CNN, RNN, or reinforcement learning, which have competitive performances in dealing with time-varying data, is worth investigating. The use of GAN in creating pseudo-measurements can also be considered. Graph neural networks [249], which is a modern architecture, should also be considered in future work, as the electrical networks and their associated data share certain connections, which can be modelled using graphs. Few-shot learning is also a potential solution to the lack of training data.

  • Developing realistic test cases: The lack of new benchmark datasets and case studies prevents the development of new, practical models and tools. For example, developing machine learning approaches requires years of data to train the model correctly. In addition, new representative test cases with DERs need to be developed for better planning and operation with emerging technologies such as electric vehicles, electrification of transportation, and others.

7 Conclusions

Distribution networks have many challenging planning and operation issues due to their dependency on consumer behaviors. With the proliferation of DERs, the challenges in planning and operation are bigger than earlier. The operation and planning of a distribution network, therefore, require new techniques, tools, and faster-and-accurate computational algorithms. This paper contributes to providing an overview of the latest works on distribution system planning and operation so that it would help decision-makers and early-career researchers to get insights into updated findings. Thus, this study starts with understanding the distribution network’s paradigm shift. Then, it reviews the source and load forecasting methods and identifies the gaps in the reviewed techniques. Next, several optimization approaches, including centralized and decentralized optimization, are reviewed and discussed along with their corresponding solvers/algorithms. Machine learning approaches for OPF and the tools used to develop these methods are also reviewed. The application of distribution system state estimation is also provided, along with corresponding tools and techniques. Finally, the potential research directions, questions to be solved, and current research gaps are taken into account based on the prior discussion in this work. This survey did not consider optimization under uncertainty problems, and methods to model uncertainty, as well as problems involving the interactions with electric vehicles, and we leave those for future studies.