1 Introduction

DC microgrids attract increasing attention in recent years for two reasons: 1) on one hand, the emerging diversity of distributed generators (DGs) includes a majority of DC generators, such as photovoltaics (PVs), fuel cells (FCs) and energy storage systems (ESSs); 2) on the other hand, there will be more and more DC loads, such as electric vehicles (EVs), DC relays, in future smart grids [1,2,3]. Generally, a DC microgrid has several advantages compared with an AC microgrid: 1) it has less loss from power transformation because there are no AC/DC converters, 2) it avoids some problems often occurring in an AC microgrid, for instance, harmonics and synchronization; and 3) it can have improved power quality and reliability, because reactive power compensation is not needed from the power supply [4,5,6]. Therefore, much current research focusses on the control and management of DC microgrids.

Droop control is generally accepted as an effective solution for DC microgrids, and such applications of droop control have been investigated in many papers [7, 8]. However, it is hard to achieve predictive, accurate load sharing and voltage regulation by using the droop control without communication, and moreover, both line impedances and output impedances of DGs will affect the accuracy of load sharing [9, 10]. Therefore, hierarchical control schemes which consist of primary and secondary control have been proposed and utilized to solve these problems.

The structures of hierarchical control schemes can be centralized or distributed [11,12,13]. A typical centralized control scheme was proposed in [14]; it collected global voltage and current information by low bandwidth communication and realized voltage restoration and enhanced current sharing accuracy. It is well known that a centralized control scheme requires a complicated communication network to collect global operating conditions and a powerful central controller to process the huge amount of information. Thus, centralized schemes are costly to implement and susceptible to single-point failures. Furthermore, taking the uncertainty of intermittent DGs into consideration, a generation fluctuation may result in unintentional structural changes in current flows, which will further increase the burden on centralized schemes [15, 16]. Advantages of a distributed scheme include the ability to survive unexpected disturbances and decentralized data updating, which leads to efficient information sharing and eventually a faster decision-making process and operation [8, 17,18,19].

Much research focuses on improving distributed control in multiple ways in AC or DC microgrids. [20] proposed a distributed cooperative control strategy based on a multi-agent system (MAS) that involves primary and secondary frequency control and multi-stage load shedding to achieve cooperative frequency recovery. [21] used input-output feedback linearization to convert secondary voltage control to a linear second-order tracker synchronization problem. A pinning-based scheme for microgrids is proposed to obviate the requirements for a central controller and a complex communication topology, and to achieve control under both fixed and uncertain communication topologies in [22]. With regard to DC microgrids, [23,24,25] proposed two kinds of distributed control schemes, which discover global current information and adjust the droop control gains using a distributed consensus algorithm, and implemented accurate load distribution in DC microgrids.

Hence, a distributed control scheme can be regarded as a feasible solution for DC microgrids in this study. Another problem that needs to be considered in a DC microgrid is to coordinate the following two objectives in which exist inherent contradictions: 1) to implement voltage restoration in DC buses; and 2) to realize accurate current or load sharing in a DC microgrid. For a DC microgrid, the average output current of each DG reflects the load fluctuation of the whole system, and can also reflect the voltage deviation caused by a load change. Thus, the average output current of the DGs is selected as the control input to simultaneously realize voltage adjustment and load proportional distribution of current.

To address the above problems, reinforcement learning (RL) has been introduced to the distributed control scheme, and this would be a possible solution [26]. RL is a simple iterative algorithm that learns to act in an optimal way through a reward signal evaluated by the performance of prior solutions obtained. Over the past few years, several multi-agent based RL algorithms have been proposed and applied to practical problems [27, 28]. The RL algorithm has major advantages. It is an online learning algorithm directly interacting with the environment, and it does not require an accurate model of the environment. It only needs a reward function to evaluate the quality of a solution instead of complicated mathematical operations. Finally, it has the ability to escape local minima because it performs stochastic optimization [29,30,31,32].

Inspired by distributed control and the RL algorithm, a novel distributed RL (DRL) approach for a DC microgrid is proposed and investigated in this study. It can achieve the same control performances as a centralized control scheme while overcoming some of its problems. It also can coordinate voltage restoration and load sharing during secondary control, and implement accurate current sharing while recovering the DC voltages. DRL with reward feedback and applying the distributed consensus method through pinning control are the distinguishing features of this work. More specifically, the main contributions of this study are as follows:

  1. 1)

    Proposal of a new DRL method, which combines RL and the distributed consensus method together to achieve an optimal solution for a DC microgrid.

  2. 2)

    Proposal of an evaluation method using a global reward discovered locally, which can be used to evaluate the control performance of DRL considering both equal proportional current sharing and cooperative voltage restoration for an islanded DC microgrid.

  3. 3)

    Proposal of a distributed consensus method through pinning control, which can be applied to discover global information or to achieve synchronization by seeking a pinning consensus value. Additionally, the corresponding adaptive updating method can adapt to changes of communication topology, including both exchanging coefficients and updating the identity of participating agents.

The rest of this paper is organized as follows: Section 2 presents a brief introduction to hierarchical control of a DC microgrid and the distributed consensus method through pinning control; Section 3 elaborates on the proposed DRL, including its reward function, the distributed consensus method through pinning control, and its detailed control process; the proposed DRL is simulated and investigated with a typical system in Section 4; and finally, conclusions are presented.

2 Preliminary

2.1 Hierarchical cooperative control of DC microgrid

Typically, a DC microgrid has a two-layered hierarchical control structure, comprising the primary control layer and the secondary control layer. Primary control, which is usually implemented by droop control, aims at quick response to maintain the stability of a DC microgrid. Whereas, secondary control has two control objectives: 1) to restore voltage and 2) to share the load in a suitable proportion.

In contrast to droop control in an AC microgrid, droop control in a DC microgrid is based on the predefined relationship between voltage and current as follows:

$$\left\{ \begin{array}{l} U_{i} = U_{ref,i}^{{}} - m_{i}^{{}} I_{i} \hfill \\ U_{ref} = U_{N} - {\raise0.7ex\hbox{${\lambda_{V} }$} \!\mathord{\left/ {\vphantom {{\lambda_{V} } 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} \hfill \\ m = {\raise0.7ex\hbox{${\lambda_{V} }$} \!\mathord{\left/ {\vphantom {{\lambda_{V} } {I_{\hbox{max} } }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${I_{\hbox{max} } }$}} \hfill \\ \end{array} \right.$$
(1)

where U ref,i is the voltage reference of the i th DG; m i is the droop control gain of the i th DG;I i is the measured value of the output current of the i th DG; λ V indicates the maximum voltage deviation; U N is the rated voltage;and I max is the maximum current of the droop controller.

However, fully decentralized droop control may cause steady state deviations if there is no communication among droop-controlled DGs. To address this problem, secondary control is utilized to improve voltage restoration in DC buses and realize predictive load sharing in a DC microgrid. It is accomplished by controlling the voltage reference U ref in (1) as follows:

$$\left\{ \begin{array}{l} U_{i} = (U_{ref,i}^{{}} + \Delta U_{i} ) - m_{i}^{{}} I_{i} \hfill \\ \Delta U_{i} = \Delta U_{C,i}^{{}} + \Delta U_{V,i}^{{}} \hfill \\ \end{array} \right.$$
(2)

where the adjustment of U ref can control both voltage and current. Thus, the control change of voltage reference ΔU i is divided into the current adjustment term ΔU C,i and the voltage adjustment term ΔU V,I ; the control of ΔU C,i aims at realizing proportional power dispatch, and ΔU V,i aims at correcting the voltage deviation [19, 20].

2.2 Distributed consensus method through pinning control

2.2.1 Pinning-based distributed consensus method

Assume that r i denotes the state variable of agent i. The distributed consensus method through pinning control can be expressed in a discrete form as follows:

$$r_{i}^{[k + 1]} (t) = \sum\limits_{{j \in N_{i} }} {\alpha_{ij} \left\{ {r_{j}^{[k]} (t) - r_{j}^{[k]} (t)} \right\}} - d_{i} \left\{ {r_{i}^{[k]} (t) - r_{p}^{\ast} } \right\}$$
(3)

where i = 1, 2, …, n; j = 1, 2, …, n; n indicates the total number of participating agents; k is the discrete-time index; \(r_i^{[k+1]}\) i is the state of agent i at iteration k + 1, which corresponds to the local information defined in this study; \(r_i^{[k]}, r_j^{[k]}\) are respectively the states of agents i and j at iteration k; and α ij is the connectivity coefficient between agents i and j. If agents i and j are connected through a communication line, α ij  ≠ 0, otherwise, α ij  = 0. N i expresses the neighboring agent set of the i th agent; d i is the pinning gain of the i th agent, d i  ≥ 0 with d i  = 0 when there is no pinning control over agent i; and \(r_p^{\ast}\) is the preset pinning consensus value of the consensus method.

Generally, the method in (3) can be used to control all agents to the preset pinning consensus value using the connectivity coefficients among them. When d i  = 0, the method in (3) also can be used to discover global information as for other average consensus methods [19, 20, 22, 26].

For convenient analysis, define the control error as:

$$e_{i}^{[k]} = r_{i}^{[k]} - r_{p}^{\ast}$$
(4)

Then, the distributed consensus method based on pinning described in (3) can be rewritten in terms of (4) as follows:

$$e_{i}^{[k + 1]} (t) = \sum\limits_{{j \in N_{i} }} {\alpha_{ij} \left\{ {e_{j}^{[k]} (t) - e_{j}^{[k]} (t)} \right\}} - d_{i} e_{i}^{[k]} (t)$$
(5)

Hence, the consensus process of the whole DC microgrid based on pinning can be illustrated as

$$\left\{ \begin{array}{l} \varvec{E}_{{}}^{{[k+1]}} (t) = \left[ {\varvec{A} - (\varvec{D} \otimes \varvec{I})} \right]\varvec{E}^{{[k]}} (t) \hfill \\ \varvec{A} = [\alpha_{ij} ] \hfill \\ \varvec{D} = [d_{i} ] \hfill \\ \end{array} \right.$$
(6)

where E [k] is the information matrix; A is the communication updating matrix that is determined according to the communication topology; D is the pinning matrix; I is the identity matrix;“\(\otimes\)”indicates Kronecker product of matrix.

2.2.2 Adaptive updating method

To adapt to communication link changes, a connectivity coefficient updating method is proposed in (7). Here, Δ(t) is utilized to express the communication topology changes in an DC microgrid; δ is the consensus constant, the value of which can affect the convergence characteristics of the two-layer algorithm, 0 < δ < 2; n i,Δ(t) and n j,Δ(t) respectively indicate the number of agents in the neighborhood of agents i and j according to the communication topology. Both n i,Δ(t) and n j,Δ(t) are local information which can be detected by corresponding agents, so (7) can adapt locally to the communication link changes.

$$\alpha_{ij} = \left\{\begin{array}{ll} \frac{\delta}{{n_{i,\Delta (t)} + n_{j,\Delta (t)} }}& j \in N_{i,\Delta(t)}\\ 1 - \sum\limits_{{j \in N_{i,\Delta (t)} }} \frac{\delta}{{n_{i,\Delta (t)} + n_{j,\Delta (t)} }}& j = i \\ 0 & {\text{otherwise}}\end{array} \right.$$
(7)

Additionally, to adapt to changes in the number of agents and thereby meet the plug-and-play operation requirements for a DC microgrid, an agent identity updating method is proposed. If (3) is initialized with the predefined index i, and all d i are set to 0, it will converge to the average value of total number of agents. Thus, the total number of agents can be determined by

$$\left\{ \begin{array}{l} n_{a,i} = {i \mathord{\left/ {\vphantom {i {n_{\Delta (t)} }}} \right. \kern-0pt} {n_{\Delta (t)} }} \hfill \\ n_{\Delta (t)} = {i \mathord{\left/ {\vphantom {i {n_{i} }}} \right. \kern-0pt} {n_{i} }} = {i \mathord{\left/ {\vphantom {i {[{i \mathord{\left/ {\vphantom {i {n_{\Delta (t)} }}} \right. \kern-0pt} {n_{\Delta (t)} }}]}}} \right. \kern-0pt} {[{i \mathord{\left/ {\vphantom {i {n_{\Delta (t)} }}} \right. \kern-0pt} {n_{\Delta (t)} }}]}} \hfill \\ \end{array} \right.$$
(8)

where n a,i is the average value discovered by agent i, and n Δ(t) is the total number of participating agents in the DC microgrid, which will be adaptively adjusted when the number changes.

2.2.3 Stability proof

To verify the stability of the proposed information discovery method, a positive Lyapunov function L is defined, and the partial derivative of L with respect to e [k] is derived as follows:

$$\begin{aligned} \varvec{L} & = \frac{1}{2}\sum\limits_{i = 1}^{n} {(e_{i}^{[k]} )^{T} e_{i}^{[k]} } \\ \Delta \varvec{L} & = \sum\limits_{i = 1}^{n} {(e_{i}^{[k]} )^{T} e_{i}^{[k]} } \\ & = \sum\limits_{i = 1}^{n} {(e_{i}^{[k]} )^{T} \left[ {\sum\limits_{{j \in N_{i} }} {\alpha_{ij} (e_{j}^{[k]} - e_{i}^{[k]} )} - d_{i} e_{i}^{[k]} } \right]} \\ & \le \sum\limits_{i = 1}^{n} {(e_{i}^{[k]} )^{T} \sum\limits_{{j \in N_{i} }} {\alpha_{ij} (e_{j}^{[k]} - e_{i}^{[k]} ) - \sum\limits_{i = 1}^{n} {d_{i} \left\| {e_{i}^{[k]} } \right\|^{2} } } } \\ & \le \sum\limits_{i = 1}^{n} {\left\| {e_{i}^{[k]} } \right\|\sum\limits_{{j \in N_{i} }} {\alpha_{ij} (\left\| {e_{j}^{[k]} } \right\| + \left\| {e_{i}^{[k]} } \right\|) - \sum\limits_{i = 1}^{n} {d_{i} \left\| {e_{i}^{[k]} } \right\|^{2} } } } \\ & = \left| {\varvec{E}^{[k]} } \right|^{T} (\varvec{A} - \varvec{D} \otimes \varvec{I})\left| {\varvec{E}^{[k]} } \right| \\ \end{aligned}$$
(9)

Therefore, to ensure the stability of the distributed consensus method, the stability condition can be finally expressed as

$$\varvec{A} - \varvec{D} \otimes \varvec{I} \le 0 \Rightarrow \Delta \varvec{L} \le 0$$
(10)

where ΔL ≤ 0 implies that the stability of the proposed consensus method can be ensured and consensus will be reached asymptotically.

3 Distributed reinforcement learning control (DRLC) for a DC microgrid

In this study, a DC microgrid is considered as an MAS, which includes distributed generator agents (DGAs), energy storage system agents (ESSAs) and load agents (LAs). By implementing specific characteristics of agents in a MAS, such as autonomy, sociality, proactivity, and adaptability, the agents can provide greater functionality than traditional controls and cater to the special needs and difficulties of the proposed control [19,20,21,22]. The proposed DRL scheme can immediately take action in the event of disturbances and realize distributed decision-making to achieve cooperative recovery.

Furthermore, DRL for agents, which is a simple iterative algorithm by which optimal actions are learnt through rewards gained by exploring the unknown environment, can be applied to improve the control characteristics. As illustrated in Fig. 1, during the process of DRL, the solution is updated according to its performance as evaluated by the corresponding reward signal. Hence, each agent can optimize its control solution for the associated generator, storage, or load, while some elements of its solution can be communicated to other agents to arrive at a shared solution.

Fig. 1
figure 1

Fundamental control structure of DRL

To implement such a distributed DRL, two related problems of defining the local reward function and achieving distributed consensus based on pinning are described in detail below.

3.1 Definition of reward for DRL

For DRL the main challenge is finding the global reward of the entire system. It is hard to obtain the global reward directly under a distributed communication framework where each agent can exchange information only with its neighboring agents. Thus, a local reward function is designed to evaluate the performance of a candidate solution.

Firstly, to take into account the equal proportional current sharing in the DC microgrid, a proportional coefficient for the i th agent is defined by

$$\kappa_{i} = \frac{{I_{i} }}{{I_{N,i} }}$$
(11)

where I i is the measured current and I N,i is the rated current of the i th agent. By using the distributed consensus method illustrated in (3), κ i can be shared among the MAS as follows:

$$\kappa_{i}^{[k + 1]} (t) = \sum\limits_{{j \in N_{i} }} {\alpha_{ij} \left\{ {\kappa_{j}^{[k]} (t) - \kappa_{j}^{[k]} (t)} \right\}} \quad {d_{i} = 0}$$
(12)

where, because all d i are set to 0, all κ i will converge to the average consensus value \(\kappa_p^{\ast}\) of the current proportional coefficients, which can be determined as

$$\kappa_{p}^{\ast} = {\sum\limits_{i} {\kappa_{i} } } / n$$
(13)

Secondly, to take voltage restoration of the DC microgrid into consideration, voltage control should be coordinated with current control. The DC voltages need to be adjusted while maintaining the equal proportional current sharing. The voltage control adjustment ΔU V,i defined in (2) can be calculated as

$$\Delta U_{V,i} = \lambda_{i} \kappa_{p}^{\ast} = \lambda_{i} \left({\sum\limits_{i} {\kappa_{i} } } /n \right)$$
(14)

where λ i is the voltage control constant, which is set to bring the DC voltage to its new stable value.

Finally, the local reward function can be defined to solve the current sharing and voltage restoration problems as follows:

$$\eta_{i} = \frac{1}{{\Delta U_{V,i} + \left| {\kappa_{i} - \kappa_{p}^{\ast} } \right|}} = \frac{1}{{\zeta_{i} \kappa_{p}^{\ast} + \left| {e_{\kappa ,i} } \right|}}$$
(15)

using (14), where η i is the local reward defined for the i th agent, and ζ i is a constant set to decrease the sensitivity of η i and avoid zero denominator.

Hence the global reward η is accordingly derived as the summation of all the local rewards:

$$\eta = \sum\limits_{i} {\eta_{i} } = \sum\limits_{i} {\frac{1}{{\zeta_{i} \kappa_{p}^{\ast} + \left| {e_{\kappa ,i} } \right|}}}$$
(16)

This global reward η can be used to evaluate the performance of a candidate solution; generally, the larger the global reward, the better the current solution.

3.2 DRL based on pinning

Based on the above local and global rewards, the DRL can be recognized as an optimization method to maximize the global reward:

$${\hbox{max}}\quad{\left\{{\eta=\sum\limits_{i} {\frac{1}{{\lambda_{i} \kappa_{p}^{\ast} + \left| {e_{\kappa ,i} } \right|}}} } \right\}}$$
(17)

Accordingly, the optimal objective of (17) will be reached when all the local rewards of the DC microgrid converge to a common value \(\eta_p^{\ast}\), which is the well-known solution of (16) as shown in [19, 33]. With regard to (16) and (17), it is clear that the global reward will reach its maximum value when the |e κ,i | become zero, therefore, the pinning consensus value \(\eta_p^{\ast}\) of the global reward can be predefined by

$$\eta_{p}^{\ast} = \frac{1}{{\lambda_{i} \kappa_{p}^{\ast} }}$$
(18)

Hence, by using the distributed consensus method illustrated in Section 2.2, DRL with respect to the local reward η i can be accomplished as follows:

$$\eta_{i}^{[k + 1]} (t) = \sum\limits_{{j \in N_{i} }} {\alpha_{ij} \left\{ {\eta_{j}^{[k]} (t) - \eta_{j}^{[k]} (t)} \right\}} - d_{i} \left\{ {\eta_{j}^{[k]} (t) - \eta_{p}^{ \ast} } \right\}$$
(19)

When all the local rewards converge to the pinning consensus value preset in (18), the global reward will reach its maximum value of

$$\eta_{\infty } = \sum\limits_{i} {\eta_{i,\infty } } = \sum\limits_{i} {\frac{1}{{\lambda_{i} \kappa_{p}^{\ast} + \left| {e_{\kappa ,i,\infty } } \right|}}} = \sum\limits_{i} {\frac{1}{{\lambda_{i} \kappa_{p}^{\ast} }}} = n\eta_{p}^{\ast}$$
(20)

Based on the synchronization process of the local reward described in (19), DC current and voltage control can be realized, and the control structure of the DC microgrid implemented through this process is shown in Fig. 2. The entire control process of the proposed DRL can be described in the following steps:

Fig. 2
figure 2

Fundamental structure of DRL

  • Step 1: To take into account the requirements for equal proportional current sharing and voltage restoration in the DC microgrid, the local reward function is defined for each agent as in (15), and the related current proportional coefficients κ i and the voltage control adjustments ΔU V,i are calculated as in (11)–(14).

  • Step 2: Accordingly, maximizing the corresponding global reward of the whole DC microgrid is the optimization objective for DRL, as described in (17).

  • Step 3: The distributed consensus method based on pinning described in Section 2.2 is used to solve this optimization problem, and the pinning consensus value for the DRL is preset according to (18).

  • Step 4: The proposed DRL is finally implemented to achieve an optimal solution and control the DGs asymptotically, coordinating equal proportional current sharing and voltage restoration of the DC microgrid through the synchronization process of the global reward, as shown in (19) and (20).

4 Simulation studies

To investigate the effectiveness and adaptability of the proposed DRL, a typical DC microgrid containing 5 DGs is simulated in the PSCAD/EMTDC platform, and its configuration is shown in Fig. 3. The algorithms of the proposed DRL, including the calculation of local rewards, the distributed consensus method for information sharing, and the distributed consensus method through pinning control, are compiled in MATLAB, making full use of the mathematical capabilities of this software. Then, the PSCAD model and MATLAB programs are connected together through a Fortran-language-based interface procedure [19]. The communication topology of the simulated DC microgrid is illustrated in Fig. 3.

Fig. 3
figure 3

Simulated DC microgrid

Using this simulation model three case studies are presented in the following sections.

4.1 Case A: overload scenario

Initially, the DC microgrid works in a stable islanded mode, and all DGs are controlled by droop control. When t = 1 s an overload occurs. Consequently, the power balance between supply and demand is lost at that moment, and the proposed DRL is immediately implemented to maintain the DC microgrid.

The rated voltage of the DC microgrid is 0.6 kV and its control parameters are given in Table 1.

Table 1 Control parameters of DC microgrid in Case A

Firstly, the current proportional coefficients κ i are collected by each agent, the average value of which \(\kappa_p^{\ast}\) is discovered by using the distributed consensus method described in (12). The synchronization process of the κ i is shown in Fig. 4a.

Fig. 4
figure 4

Control performances of the proposed DRL in Case A

Secondly, with the discovered average value \(\kappa_p^{\ast}\), the associated values ΔU V,i can be calculated by (14), and with λ i  = 0.204 the pinning consensus value of the local reward can be preset as \(\eta_p^{\ast}=5.576\) according to (18). Thus, the local reward η i of the DRL defined in (15) is estimated by its corresponding agent, and the synchronization seeking process is shown in Fig. 4b.

Finally, the proposed DRL which coordinates the voltage restoration and equal proportional current sharing is implemented, and the current and voltage control per agent are shown in Fig. 4c and d, where the consensus convergence process can be seen clearly.

It can be observed in Fig. 4a and b that the distributed consensus method presented in Section 2.2 realizes two functions in this case: 1) discovering global information based on average consensus and obtaining the averaged current proportional coefficient, as shown in Fig. 4a, and 2) implementing the distributed consensus method based on local reward pinning to coordinate equal proportional current sharing and voltage restoration, as shown in Fig. 4b. Thus, in the DRL for equal proportional current sharing and voltage restoration, illustrated in Fig. 4c and d respectively, the current proportional coefficients of all agents converge to equal consensus values, and the voltages reach a corresponding new state.

4.2 Case B: overload and communication line switches on

In this case, the overload accident occurs in the DC microgrid at t = 1 s, and at the same time a new communication link between agent 1 (A1) and agent 3 (A3) switches on, as illustrated in Fig. 5. The control parameters of the proposed DRL in Case B are shown in Table 2.

Fig. 5
figure 5

Communication topology changes in Case B

Table 2 Control parameters of DC microgrid in Case B

In contrast to Case A, to address the change in communication topology, the connectivity coefficients α ij for the newly connected agents update as described in (7). Then, by using the proposed distributed consensus method, the average value of the current proportional coefficients is discovered to be \(\kappa_p^{\ast}=0.75\). Additionally, with λ i  = 0.232, the pinning consensus value of the local reward can be preset as \(\eta_p^{\ast}=5.747\). The synchronization seeking process of the current proportional coefficients and the local rewards are shown in Fig. 6a and b respectively.

Fig. 6
figure 6

Control performances of the proposed DRL in Case B

Through (17)–(20) the global reward is maximized when the pinning-based distributed consensus is reached. It can be seen in Fig. 6c that the current proportional coefficients of all DGs converge asymptotically to a new common value, and the synchronization seeking process is different from that of Case A because of the additional communication link. Similary, DRL-based voltage restoration is also adjusted to adapt for changed communication topoloty, as can be seen in Fig. 6d.

4.3 Case C: overload and agent unplugs

In Case C, the agent 5 marked as A5 unplugs from the DC microgrid and its corresponding communication link switches off accordingly at t = 1 s; as a result, the communication topology of the simulated DC microgrid changes, as shown in Fig. 7.

Fig. 7
figure 7

Communication topology changes in Case C

The control parameters are illustrated in Table 3 and the proposed control response is implemented as follows. Firstly, to adapt for the unplugging of A5, agent identities are updated according to the method described in (8). Only the neighboring agents of the faulted A5 need to be updated. Secondly, after the adaptive updating, both the discovery of current proportional coefficients by distributed consensus and the pinning-based distributed consensus of the local reward can be implemented, as in Cases A and B.

Table 3 Parameters of DC microgrid in Case C

The average value of the current proportional coefficients in Case C is discovered to be \(\kappa_p^{\ast}=0.916\). Additionally, with λ i  = 0.236, the pinning consensus value of the local reward can be preset as \(\eta_p^{\ast}=4.761\). The synchronization seeking process of the current proportional coefficients and the local rewards are shown in Fig. 8a and b.

Fig. 8
figure 8

Control performances of the proposed DRL in Case C

In Fig. 8c and d it can be seen that the DC currents and voltages of all the DGs asymptotically converge to new common values through the proposed DRL, so the equal proportional current sharing and the voltage restoration problems are successfully coordinated, and the proposed DRL can be adaptively implemented when an agent is unplugged.

5 Conclusion

In this study, a novel DRL strategy has been proposed and investigated for an islanded DC microgrid. The implementation of this DRL strategy is achieved by integrating two methods, which are the distributed consensus method through pinning and the RL method.

The proposed distributed consensus method can be used to discover global information and implement pinning synchronization, and it can also meet the requirement to adapt to changes in the communication network, such as communication line switches or agent plug-and-play operations. The proposed DRL based on local and global rewards can be utilized to maximize the global reward and achieve an optimal solution for a DC microgrid. Hence, the proposed strategy can coordinate the equal proportional current sharing and the voltage restoration of an autonomous DC microgrid.

The effectiveness and advantages of this approach are demonstrated by simulating three representative cases of an overload condition, including addition of a new communication link and unplugging of a DG agent. The DRL method worked quickly and effectively in each case.