1 Introduction

Modern power systems have experienced major changes and enhancements since the first proposal of ‘smart grid’ in the early 21st century. The integration of renewable energy sources, participations of demand side and deployment of advanced sensor infrastructure have further increased the system complexities and have driven the re-constructions of grid operations [1].

Residential/commercial buildings are large energy consumers in the distribution systems. For example, in China the energy consumption of buildings contributes to 33% of the whole society’s energy consumptions [2]. The thermostatically controlled appliances (e.g., air conditioners, heaters), widely used in houses and buildings, have great potentials to participate in demand side management (DSM) programs due to their thermal storing capabilities. Extensive effort has been devoted to date at studying the direct control techniques of thermostatically controlled loads (TCLs). For example, [3] outlined the fundamental requirements of direct load control (DLC) and presented a general optimization framework to do the feeder-scale load reduction, while [4] designed a priority based control scheme for TCLs to participate in grid frequency regulation. Reference [5] proposed a two-stage dispatch method for TCLs where, in a first stage, a day-ahead scheduling model is solved to determine the optimal TCL dispatch and, in a second stage, a real time control model allocates the desired setpoints to individual TCL. In [6], the authors used the Markov transition matrix to model the populated TCLs to do the non-disruptive load reduction. Previous work of the authors made use of an advanced thermal inertia model to control TCL in smart homes [7] and proposed a model for coordinately dispatch the TCLs and thermal generation units [8]. In above works, the ON/OFF control actions of TCLs are driven by the thermostats settings and by a certain pre-set indoor temperature range. In our previous works [9, 10], International Organization of Standardization (ISO) standard thermal comfort models, widely adopted to estimate the occupants’ thermal comfort degree, have been considered in the model representation for the TCL dispatch. By integrating the thermal comfort model, the control actions of TCLs are driven by the calculated comfort level of the occupants instead of thermostats settings and indoor temperature dead-band.

In the smart grid paradigm, buildings can be aggregated to place load reduction bids in the day-ahead market. In this way, the aggregated TCLs act as a virtual power plant (VPP) [11]. It can ‘generate more power’ to the grid by turning off some TCLs to decrease the load and can ‘generate less power’ by turning some TCLs on. Once the load reduction contracts are determined for the day-ahead market, the differences between actual load shedding amounts and contracted load reduction volumes will be settled in the regulation market. Therefore, after determining the day-ahead contracts, the TCL aggregators need to optimize the real-time control actions on TCLs by taking into account the contracts and real-time information (e.g., regulation prices, weather information, etc.). Although there have been several works studying the TCL control schemes of in the regulation market, e.g., [12, 13], they all consider TCL aggregators as ancillary service providers to follow certain frequency regulation signals.

In this context, this paper aims to consider the TCL aggregator as an independent biding actor in the day-ahead market, instead of an ancillary service provider, and to study its real-time operation with the determination of day-ahead contracts. In particular, the main contributions of this paper including:

  1. 1)

    The use of an ISO standard thermal comfort to probabilistically control the occupants’ thermal comfort within real-time indoor environments.

  2. 2)

    The establishment of a rolling horizon optimization (RHO) based real-time scheduling model aimed at maximizing the TCL aggregator’s profit in the regulation market while reducing the negative impacts of system uncertainties.

  3. 3)

    The implementation and use of a metaheuristic algorithm, i.e., natural aggregation algorithm (NAA), for solving the real-time scheduling model.

This paper is organized as follows. Section 2 introduces the market structure used in this study, followed by Section 3 which presents the modelling of TCL and thermal comfort indices. Section 4 depicts the proposed RHO based real-time scheduling model for TCL aggregators and Section 5 discusses the solution approach of the proposed model. Experimental studies are reported in Section 6 and conclusions are drawn in Section 7.

2 Electricity market structure

The real-time scheduling model studied in this paper is based on the Nordic energy market without any loss of generality as with minor modifications, the model is applicable to also other market structures

Nordic energy market is a common market for electricity trading in Nordic countries. According to the Nordic market report 2014 [14], the largest shares of Nordic energy market in 2014 include: Vattenfall (18.8%), Statkraft (13.6%), Fortum (12.1%), E.ON (7%) and other electricity producers (50%). Nordic energy market is subdivided into three markets, i.e., Elspot, Elbas and regulation market, respectively [15, 16]. These markets are briefly described in the following sub-sections.

2.1 Elspot

In the Nordic power market system, Elspot is the day-ahead market. In Elspot, market actors sign the hourly contracts for the 24 hours of the next day. The spot market closes at 12:00 am each day. By receiving the bids, the market operator constructs the power purchasing and selling curves, and their cross point determines the clearance price and volume being traded on each hour of the next day [15]. The minimum contract size in the Elspot market is 0.1 MWh [17].

2.2 Elbas

Elbas is the intraday market. It opens at 15:00 each day to allow hourly energy trading of the market actors for the coming day. It closes 1 hour prior to the delivery. Elbas is considered to be an adjustment market which supplements Elspot and helps secure the necessary balance between supply and demand in the power market for Northern Europe. The minimum trading volume in Elbas market is 1 MWh per hour [18].

2.3 Regulation market

The gap between power generation and consumption is balanced by the transmission system operator (TSO) through the regulation market. The actors with power reserves place bids in the regulation market, and the bids are ordered by price and form a staircase for each delivery hour. At the end of each hour, the regulation price is determined according to the most expensive upward regulation measures or cheapest down regulation measures taken by the TSO [14]. The minimum bid size in the regulation market is 5 MW [19].

2.4 Balance settlement

The TSO allocates the regulation costs among balance responsible actors through the balance settlement. All the balance responsible actors pay or are paid according to the deviations between their actual and planned productions.

There are 4 situations for the balance settlements:

  1. 1)

    If only the upward regulation is activated, the actors with negative imbalance pay the upward regulation price, which is larger or equal to the spot price, while the actors with positive imbalance are paid at the spot price.

  2. 2)

    If only the downward regulation is activated, the actors with negative imbalance pay the spot price, and those with positive imbalance are paid by the down regulation price, which is less or equal to the spot price.

  3. 3)

    If no regulation is activated, all transactions are settled at the spot price.

  4. 4)

    If both upward and downward regulations are activated, upward or downward regulation prices are applied depending on which regulation has the higher volume. If volumes for ordered upward and downward regulation are equal, then the spot price is applied.

To facilitate the participation of demand response, the Nordic market also allows the trading of ‘demand flexibility’ in both Elspot and Elbas, and the real-time imbalances are managed and settled by the TSO as usual [20].

3 Modeling of TCL and thermal comfort indices

A fundamental step of TCL dispatch is to understand the thermal transition process of the buildings and set up appropriate thermal comfort model for the occupants.

3.1 TCL thermal dynamics model

In this paper, the widely adopted R-C model is used for modelling the thermal transition of buildings with TCLs, which have been proved to be capable for capturing the main thermal dynamics of the buildings [3, 4]. The R-C model includes two parameters: thermal resistance (R) and thermal capacitance (C). By using the R-C model, the indoor temperature trajectory caused by a TCL is governed by the following first-order ordinary differential equation [3].

$$\dot{T}_{\text{in}} (t ) { = }\frac{1}{{C_{\text{th}} R_{\text{th}} }}\left( {T_{\text{out}} (t )- T_{\text{in}} (t )- s (t )R_{\text{th}} P_{\text{rate}} } \right)$$
(1)

where C th and R th are the thermal capacity and resistance of the building; s(t) is the state of the TCL, 0 is OFF, 1 is ON; T out(t) and T in(t) are the outdoor air temperature at time t; P rate is the rated power of the TCL.

Other alternative models could be used in the proposed computational framework without significantly modifying the proposed real-time scheduling model in Section 3.

For example, the authors used an R-C thermal network model in [12] to model the thermal dynamics of commercial buildings and, in [8], employed a thermal inertia model by considering the wall’s thermal capacitance.

3.2 ISO 7730 thermal comfort model

One important consideration in TCL dispatch is the user’s comfort. In this paper, we employ an ISO standard thermal comfort model [21] to estimate the occupants’ thermal comfort degrees. The model we used is the ISO7730 model, which has many implementations of HVAC systems [22, 23]. It depicts the analytical representation of human’s thermal comfort degree by two indices: predicted mean vote (PMV) and predicted percentage of dissatisfied (PPD). PMV predicts the mean value of votes of a large group of people on the ISO thermal sensation scale. Based on PMV, PPD predicts the percentage of a large group of people likely to feel ‘too warm’ or ‘too cool’.

The thermal sensation of a human is determined by the thermal balance of his or her body. In the ISO 7730 model, this balance is mainly influenced by four environment factors (air temperature, air relative humidity, air velocity, and mean radiant temperature) and two individual factors (activity level and clothing insulation). When these factors are determined, the 7-point thermal sensation of the body as a whole can be predicted by calculating the PMV in Fig. 1.

Fig. 1
figure 1

Schematic of PMV calculation

By taking into account the above factors, the PMV can be calculated by (2) [21].

$$\begin{aligned} f_{\text{PMV}} & = (0.352\exp ( - 0.042M) + 0.032)[M - \;0.35(43 - \\ & \quad 0.061M - P_{\text{v}} )- 0. 4 2(M - 50) - 0.0023M(44 - \\ & \quad P_{\text{v}} ) - 0.0014M(34 - T_{\text{a}} ) - 3.4 \times 10^{ - 8} f_{\text{cl}} ((T_{\text{cl}} + \\ & \quad 273)^{4} - (T_{\text{mrt}} + 273)^{4} ) - f_{\text{cl}} h_{\text{c}} (T_{\text{cl}} - T_{\text{a}} )] \\ \end{aligned}$$
(2)

where M is the metabolic rate; P v is the vapor pressure in ambient air; T a is the ambient air temperature; f cl is the clothing surface area factor; T cl is the mean temperature of outer surface of clothed body; T mrt is the mean radiant temperature; h c is the heat transfer coefficient. The clothing surface temperature T cl needs to be computed iteratively to find the root of the nonlinear equations in (3) and (4).

$$\begin{aligned} T_{\text{cl}} & = 35.7 - 0.032M - 0.18I_{\text{cl}} \{ 3.4 \times 10^{ - 8} f_{\text{cl}} [(T_{\text{cl}} + \\ & \quad 273)^{4} - (T_{\text{mrt}} + 273)^{4} ]\} + f_{\text{cl}} h_{\text{c}} (T_{\text{cl}} - T_{\text{a}} ) \\ \end{aligned}$$
(3)
$$h_{\text{c}} = \left\{ {\begin{array}{*{20}l} {2.05\left| {T_{\text{cl}} - T_{\text{in}} } \right|^{0.25} } \hfill & {2.38\left| {T_{\text{cl}} - T_{\text{a}} } \right|^{0.25} > 10.4\sqrt {v_{\text{ar}} } } \hfill \\ {10.4\sqrt {v_{\text{ar}} } } \hfill & {2.38\left| {T_{\text{cl}} - T_{\text{a}} } \right|^{0.25} < 10.4\sqrt {v_{\text{ar}} } } \hfill \\ \end{array} } \right.$$
(4)

where I cl is the thermal resistance of clothing (clo). f cl and P v are calculated based on (5) and (6), respectively [21]:

$$f_{\text{cl}} = \left\{ {\begin{array}{*{20}l} {1 + 1290l_{\text{cl}} } \hfill & {l_{\text{cl}} \le 0.078} \hfill \\ {1.05 + 0.645l_{\text{cl}} } \hfill & {l_{\text{cl}} > 0.078} \hfill \\ \end{array} } \right.$$
(5)
$$P_{\text{v}} = rh \cdot 10 {\exp} {{\left( \frac {16.6536 - 4030.183} {T_{\text{a}} + 273} \right)}}$$
(6)

Based on the PMV value, PPD is calculated in terms of the determined PMV and provides a quantitative prediction of the percentage of people who feel too cool or too warm [21]:

$$f_{\text{PPD}} = 100 - 95\exp \left( { - 0.03353f_{\text{PMV}}^{4} - 0.2179f_{\text{PMV}}^{2} } \right)$$
(7)

3.3 Parameter determination of ISO 7730 model

As previously introduced, in the ISO 7730 model there are 4 environment factors and 2 individual factors. In the TCL control, the values of these parameters need to be determined when calculating the PPD value.

  1. 1)

    Determination of Activity Level and Clothing Insulation. In ISO 7730 model, these two factors are represented by discrete numerical values. For example, the clothing insulation value of light summer clothing could be 0.5, and the activity level value of sedentary activity is 1.2 [16]. In real applications, these two factors can be estimated by analyzing the seasonal clothing characteristics of the occupants, nature of the occupants’ jobs, etc. They can also be determined from the historical recorded data or directly monitored by sensors.

  2. 2)

    Determination of Indoor Relative Air Humidity and Indoor Air Velocity. In the real applications, the values of indoor relative air humidity and velocity can be directly obtained through the deployed indoor sensors.

  3. 3)

    Determination of Indoor Air Temperature. The outdoor air temperature can be obtained by sensors and meteorological forecasting techniques. The indoor air temperature depends on the outdoor air temperature and TCL state. Once these are determined, the indoor air temperature trajectory can be calculated with (1).

  4. 4)

    Determination of Mean Radiant Temperature. The mean radiant temperature is defined as the uniform temperature of an imaginary enclosure in which the radiant heat transfer from the human body equals the radiant heat transfer in the actual non-uniform enclosure. It can be measured by sensors or calculated with approximation methods [24].

3.4 TCL grouping strategy

Single building cannot participate in the power market due to its limited capacity. When aggregating a large number of TCLs, it would be technically intractable to dispatch the TCLs individually. A feasible approach is to group the TCLs properly, and dispatch the TCLs on a group basis [4]. In this paper, we first assume that all the occupants are within a moderate activity environment and, consequently, set the value of M equal to 1.2. This assumption covers many typical building scenarios, such as dwellings, offices, classrooms, etc. [25]. We then represent each TCL sample as a feature vector \(\left[ {C_{\text{th}} ,R_{\text{th}} ,I_{\text{cl}} ,P_{\text{rate}}^{{}} } \right]\) and use the C-means clustering method [26] to cluster the TCL samples based on their feature similarities. Based on the clustering results, we separate TCLs into multiple groups according to their calculated parameters.

For each TCL group, an equivalent TCL model is then established whose parameters are the averaged values of the parameters of the TCL samples part of that group.

4 Rolling horizon optimization based real-time operation model for TCL aggregator

There are two major phrases for TCL aggregators to participate in the Nordic power market. The first phase is the day-ahead stage, where the aggregator determines the bids based on the 24-hour ahead forecast data. The second phase depicts the real-time operation stage, where the aggregator practically applies control actions on TCLs based on the updated real-time information and pre-determined contracts. In this paper, we assume the day-ahead contracts have already been determined and restrict our study on the real-time scheduling.

RHO has been proved to be an effective approach for real-time dispatch [27]. By continuously proceeding over the optimization horizon, updating the system states and repetitively performing the optimization over the rolling future horizon, the RHO can effectively mitigate the negative impacts of the forecast errors. In this paper, we employ the RHO strategy for the real-time scheduling of TCL aggregators.

4.1 Identification of stochastic variables

The stochastic variables involved in the real-time scheduling of TCL aggregator include the 6 thermal comfort model parameters in Section 3 and upward/downward regulation prices. Since the clothing condition (I cl) and activity level (M) of the occupants often do not change significantly, in this paper we treat them as fixed values during the scheduling period. The other 4 thermal comfort parameters and regulation prices can be online forecasted by using the weather forecast techniques or machine learning techniques.

4.2 RHO based real-time TCL scheduling model

In the real-time stage, the TCL aggregator decides ON/OFF control actions of each managed TCL group at each time interval. By employing the RHO strategy, the scheduling process is characterized by the following steps:

  1. 1)

    System modeling. The stochastic variables previously introduced are predicted over a future finite horizon, which is called the control window (or prediction window).

  2. 2)

    Objective definition. The scheduling objective of the TCL aggregator over the prediction window is specified.

  3. 3)

    Objective function optimization. The objective function is optimized as a function of the set of future TCL control signals to be applied to the system during the predictive window.

  4. 4)

    Receding horizon strategy. Only the control signal of the first time interval is applied to the TCL groups. In the next time step, the predictive window moves forward with one time interval and all the algorithms is repeated.

In each round of RHO, the scheduling objective of aggregator is to maximize its total profits over the prediction window:

$${ \hbox{max} }\; \, f(\varvec{s}_{\text{GP}} )= \sum\limits_{{t{ = }t '}}^{{t\prime { + }T_{\text{wd}} }} {\left( {C_{{{\text{spot,}}t}} { + }C_{{{\text{reg}},t}} - C_{{{\text{ls,}}t}} } \right)}$$
(8)

where T wd is the RHO control window size; \(C_{{{\text{spot,}}t}}\) is the revenue of the TCL aggregator in the spot market at time t; \(C_{{{\text{reg}},t}}\) is the revenue of the TCL aggregator in the regulation market at time t; \(C_{{{\text{ls,}}t}}\) is the TCL shedding cost at time t. The decision variables are collected in \(\varvec{s}_{\text{GP}}\), which is the TCL state matrix with size \(T_{\text{wd}} \times N_{\text{GP}}\), where \(N_{\text{GP}}\) is the number of TCL groups managed by the TCL aggregator. The entry \(\varvec{s}_{\text{GP}} (t ,i )\) represents the ON/OFF state of the i th TCL group at time t. \(C_{{{\text{spot,}}t}}\) is a deterministic value since the bids and clearance prices have already been determined:

$$C_{{{\text{spot,}}t}} = C_{{{\text{con,}}t}} p_{{{\text{spot,}}t}} \Delta t$$
(9)

where \(C_{{{\text{con,}}t}}\) is the contracted load reduction volume of the TCL aggregator at time t; \(p_{{{\text{spot,}}t}}\) is the clearance electricity price at time t; \(\Delta t\) is the duration of a time interval. When the actual shed load is smaller than the contracted load reduction volume, \(C_{{{\text{reg}},t}}\) is negative and represents the cost in the regulation market. \(C_{{{\text{reg}},t}}\) is calculated as follows:

$$C_{{{\text{reg}},t}} = \left\{ {\begin{array}{*{20}l} {p_{{{\text{pos,}}t}} (C_{{{\text{shed,}}t}} - C_{{{\text{con,}}t}} )\Delta t} \hfill & {C_{{{\text{shed,}}t}} \ge C_{{{\text{con,}}t}} } \hfill \\ {p_{{{\text{neg,}}t}} (C_{{{\text{shed,}}t}} - C_{{{\text{con,}}t}} )\Delta t} \hfill & {C_{{{\text{shed,}}t}} < C_{{{\text{con,}}t}} \, } \hfill \\ \end{array} } \right.$$
(10)
$$C_{{{\text{shed,}}t}} = \sum\limits_{{i{ = }1}}^{{N_{\text{GP}} }} { ( 1 { - }\varvec{s}_{\text{GP}} (t ,i )P_{{{\text{GP}},i}} )}$$
(11)
$$P_{{{\text{GP}},i}} = \sum\limits_{{j{ = 1}}}^{{N_{{{\text{TCL,}}i}} }} {P_{{{\text{rate,}}i,j}} }$$
(12)

where \(p_{{{\text{pos,}}t}}\) and \(p_{{{\text{neg,}}t}}\) are forecasted positive and negative imbalance price at time t; \(C_{{{\text{shed,}}t}}\) is the total shed TCL power of the aggregator at t; \(P_{{{\text{GP}},i}}\) is the aggregated power of the i th TCL group; \(P_{{{\text{rate,}}i,j}}\) is the rated power of the jth TCL of the i th group; \(N_{{{\text{TCL}},i}}\) is the number of TCLs of the i th TCL group.

\(C_{{{\text{ls,}}t}}\) is the TCL shedding cost, represented by the incentive scheme (or customer reward) provided by the TCL aggregator to the customers, so as to encourage their participation in the DSM programs. In this paper, we assume the customer reward increases when the thermal comfort tends to decrease:

$$C_{{{\text{ls,}}t}} = \sum\limits_{{i{ = }1}}^{{N_{\text{GP}} }} {C_{{{\text{ls,}}i,t}} }$$
(13)
$$C_{{{\text{ls,}}i,t}} = \left\{ {\begin{array}{*{20}l} 0 \hfill & {f_{{{\text{PPD,}}i ,t}} \le f_{\text{PPD}}^{\text{limit}} } \hfill \\ {\left( {\exp \left( {\frac{{f_{{{\text{PPD,}}i ,t}} }}{{f_{\text{PPD}}^{\text{limit}} }}} \right) - 1} \right)\Delta t} \hfill & {f_{{{\text{PPD,}}i ,t}} { > }f_{\text{PPD}}^{\text{limit}} } \hfill \\ \end{array} } \right.$$
(14)

where \(C_{{{\text{ls,}}i,t}}\) is the TCL shedding cost of the i th TCL group at time t; \(f_{{{\text{PPD,}}i ,t}}\) is the PPD value calculated on the equivalent model of i th TCL group at time t; \(f_{\text{PPD}}^{\text{limit}}\) is the allowable limit of the PPD value.

Equations (13) and (14) show that, when the calculated PPD of the equivalent TCL group model is less than a pre-set threshold, the TCL shedding cost is nil or, otherwise, the load shedding cost exponentially increases with the increase of PPD. Model (8) is subjected to following constraints:

  1. a)

    TCL group state constraint

    $$\varvec{s}_{\text{GP}} (t ,i )\in (0,1)\quad \, \forall t{ = }t^{\prime } :t^{\prime } { + }T_{\text{wd}} ,\quad \, i{ = }1 :N_{\text{GP}}$$
    (15)
  2. b)

    Minimum online time constraint: it is applied to avoid mechanical weariness of TCLs due to frequent ON/OFF actions:

    $$\tau_{i}^{\text{on}} (t) \ge \tau_{\hbox{min} }^{\text{on}}$$
    (16)
    $$\tau_{i}^{\text{on}} (t){ = }\left( {\tau_{i}^{\text{on}} (t - 1){ + }\varvec{s}_{\text{GP}} (t ,i )\Delta t} \right)\varvec{s}_{\text{GP}} (t ,i )$$
    (17)

    where \(\tau_{i}^{\text{on}} (t)\) is the accumulated online time of the i th TCL group at time t; \(\tau_{\hbox{min} }^{\text{on}}\) is the minimum required online time.

5 Approach to solve the model

The proposed model is essentially a binary optimization problem over a finite horizon. By introducing the ISO 7730 thermal comfort model, the function of (8) becomes highly nonlinear and non-convex. When calculating PMV, the clothing surface temperature T cl has to be computed iteratively. It would be difficult, if not impossible, to use mathematical programming methods to solve this model. Recently, the authors have proposed a new metaheuristic method, referred to as a natural aggregation algorithm (NAA) [28, 29], and this is used in the following for the benchmarking of the experiments. The NAA has been employed in this study because it possesses strong global searching capabilities in a nonlinear space and has the potentials to outperform other state-of-the-art heuristic algorithms [28].

5.1 Introduction of NAA

NAA mimics the collective intelligence of group-living animals in their resource sharing and competition process. The group-living animals tend to group on the multiple resources (e.g., shelters, food, etc.) to exploit the resources. Resources with higher qualities will attract more swarm individuals to aggregate, while the overcrowding of a group would make the group members leave it to explore better resource or join other groups. Biologists established probabilistic models to describe the group-living animals’ self-aggregation behaviors and mathematically proved that their aggregating behaviors can help the swarm to optimally balance the resource exploitation and exploration [30].

In NAA, a population of individuals is distributed as multiple sub-populations, where each sub-population is called a ‘shelter’. The core of NAA is a stochastic migration model, which individuals can dynamically migrate among sub-populations. In particular, in each generation, each individual placed at a shelter s will first evaluate its probability of leaving its current shelter (\(Q_{s}\)):

$$Q_{s} =\,\frac{{\overline{{\theta_{s} }} }}{{1{ + }\left( {\frac{{x_{s} }}{{C_{s} }}} \right)^{2} }}$$
(18)

where \(\overline{{\theta_{s} }}\) is the normalized quality of shelter \(s\); \(x_{s}\) is the number of individuals currently in \(s\); \(C_{s}\) is the capacity of shelter s, representing the maximum number of individuals it can contain. Based on \(Q_{s}\), the individual decides whether or not to leave the current shelter. For each individual not part of any shelter, it randomly selects a shelter s and evaluates its probability of entering it as follows:

$$R_{s} { = }\left( {1 - \overline{{\theta_{s} }} } \right)\left( {1 - \frac{{x_{s} }}{{C_{s} }}} \right)$$
(19)

Based on \(R_{s}\), the individual decides whether or not to enter the shelter under consideration. After making the migration decisions, each individual placed at a certain shelter performs a located search, while each individual not included in any shelter performs a generalized search. Further details related to the NAA can be found in [28].

5.2 Applying NAA on binary spaces

The NAA is designed for real-parameter optimizations over the continuous space. Since the proposed optimization function in (8) is a binary optimization problem, a mapping scheme needs to be adopted to map the NAA on binary spaces. In this study, we use the same mapping scheme applied in particle swarm optimization (PSO) [31]:

$$P(d_{ij} ) = \frac{1}{{1 + \exp ( - d_{ij} )}}$$
(20)
$$x_{ij} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {rand < P(d_{ij} )} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$
(21)

where \(P(d_{ij} )\) is the bit change probabilities; \(d_{ij}\) is the difference of j th dimensional value of individual i between current generations at the last generation.

5.3 Encoding real-time scheduling model into NAA

In each round of RHO, NAA is employed to solve model (8). We first transform the optimization described in (8) into a minimization problem by simply using the reciprocal of the objective function. In NAA, each individual is a T wd dimensional vector, representing a potential ON/OFF control scheme over the prediction window. The value of each dimension is binary: 0-OFF or 1-ON.

The schematic of the proposed real-time scheduling model of TCL aggregators can be depicted in Fig. 2.

Fig. 2
figure 2

Schematic of proposed real-time control model

6 Simulation study

6.1 Simulation setup

A TCL aggregator which manages 300 buildings, with the rated power of each TCL randomly generated from the range [10 kW, 20 kW]. This setting represents a residential area of moderate scale and the aggregated energy of TCLs is significantly larger than the minimum contract size requirement of the Elspot market (0.1 MWh). All buildings are assumed to be equipped with cooling TCLs (such as air conditioner). Based on [32], \(C_{\text{th}}\) of each TCL is randomly generated from 0.015 to 0.065 kWh/°C per square meters and the thermal conductance (1/\(R_{\text{th}}\)) is randomly generated from 0.001 to 0.003 kW/°C per square meters. The room areas of the 30 houses considered are randomly generated in the range [100 m2, 500 m2]. \(\Delta t\) is taken as 5 minutes.

It is normally safe to assume \(I_{\text{cl}}\) and \(M\) as fixed values over the dispatch horizon. We set the value of \(M\) to be 1.2 as discussed in Section 3 and then randomly generate values of \(I_{\text{cl}}\) in the range [0.25, 1.65] [21]. \(\tau_{\hbox{min} }^{\text{on}}\) is set to five minutes.

For the indoor environment, it is safe to take the air velocity as 0.1 m/s [25]. Due to the lack of relevant data, in this paper we do not consider the solar radiations and set the mean radiation temperature of each TCL to be equal with the indoor air temperature. We also assume the indoor relative humidity to remain equal to 50%, regarded as a comfort and healthy value for humans. This condition can be achieved through the use of automatic indoor humidity control devices. Based on the above settings, the number of stochastic variables over the scheduling horizon is reduced to 3: outdoor air temperature, up regulation prices and down regulation prices. The incentive rate \(\alpha\) is set to be 300; \(f_{\text{PPD}}^{\text{limit}}\) is set to 20% according to the recommendations of American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) [25].

1-day real outdoor air temperature recorded by a meteorological observation station and 1-day real regulation price pro file downloaded from the Nordic market website [16] are used in the simulations presented in the following. Gaussian noises are added to simulate the corresponding very short term forecasting profiles. The real and forecasted outdoor temperature profiles are shown in Fig. 3. The day-ahead contracted load reduction volumes are set as Fig. 4, while the corresponding clearance prices, also obtained from [16], are also plotted.

Fig. 3
figure 3

Real & forecasted outdoor air temperature profiles

Fig. 4
figure 4

Clearance price profile and contracted shed load capacities

The parameter settings of NAA are: population size = 60, maximum generation time = 300, \(N_{S} = 4\), \(C_{s} = 12\), \(\delta = 1\), \(C_{{r,{\text{local}}}} =0. 8\), \(\alpha = 1.2\), \(C_{{r,{\text{global}}}} = 0. 1\).

6.2 Numerical results

300 TCL samples are firstly generated by using the Monte-Carlo sampling method and then clustered by using the K-means clustering method. Based on the clustering results, 33 TCL groups are formed, where each one consists of maximum 10 TCLs. Figure 5(c) shows the clustering of the 300 TCLs, where each cluster is identified by a particular color. Figure 5 (a) shows the scatters of the equivalent models, and Fig. 5(b) illustrates the equivalent model and all sample points of a representative TCL group.

Fig. 5
figure 5

TCL group clustering

Under the RHO strategy, with the preceding of time, the system states (profit, indoor temperature, etc.) are updated after each round of RHO optimization with the realizations of the stochastic variables. The efficiency of the RHO strategy is illustrated in Fig. 6, where the total load shedding amounts over all the TCL groups of the first 3 rounds of RHO optimizations are shown. It can be clearly seen that there are some differences of the TCL schedules for the 3 rounds of optimizations. These differences are mainly produced by the updates of the forecasting and real-time information. The results indicate that the RHO strategy can well respond to the updated information to adjust the scheduling plan.

Fig. 6
figure 6

Total load shedding of aggregator at first 3 rounds of RHO

We then extend the RHO process to cover the whole scheduling horizon (24 hours). Figure 7 shows the comparison of the day-ahead load shedding bids and the actual shed loads. It shows that there are some deviations between the day-ahead bid volumes and actual shed load volumes. These deviations are attributed to two main reasons. On one hand, the bid volumes are determined on an hourly basis, but the TCL control often needs higher control frequency [12] (in our case, 5 minutes), and the TCL group capacities are considered as discrete values. This leads to unavoidable deviations. On the other hand, driven by the real-time regulation price, the actual load shedding amounts does not necessary to strictly follow the bids, where the deviations can make the TCL aggregator save imbalance costs or make profits from the regulation market. It shows that in the optimal solution, the TCLs are ‘pre-cooled’ before the peak upward regulation price time by shed ding less load than the contract, and then during the peak upward regulation price period, more TCLs are switched off to mitigate the upward regulation risks. The numerical work has been performed on a DELL workstation with 128-Gegabyte memory and 2 Intel Xeon processor. The average simulation time for one RHO round has been approximately 242.7 seconds. With a control interval equal to 5 minutes, the computational requirements indicate that the proposed model can be used for practical applications.

Fig. 7
figure 7

Comparison of actual and contracted shed load and market regulation price profiles

As a demonstration, Fig. 8 shows the applied ON/OFF control actions and the corresponding mean indoor temperature trajectory and mean PPD profile of the representative TCL equivalent group model in details. From the optimization results, it can be seen that for most of the times, the averaged PPD values of the TCL group are controlled within an acceptable range (below 20%), while occasionally the PPD values produce slightly higher values. The variation scale of the PPD profile depends on the incentive rate \(\alpha\). The smaller value of \(\alpha\)indicates the more flexibility for the aggregator to dispatch the TCLs, but also implies larger disturbances for the customers. In real applications, the choice of incentive rate can be determined by the negotiation between the customer and utility. Also from Fig. 8, it can be seen that during the high outdoor temperature period (say, 12:00am ~ 13:00 pm), the TCL group is more frequently switched between ON and OFF status, so as to maintain the indoor comfort environment.

Fig. 8
figure 8

Dispatch results of representative TCL group

We next illustrate the efficiency of our TCL grouping strategy. A comparison case is designed where the TCL groups are formed randomly from the whole sample set. The comparison equivalent TCL group model is then generated by averaging the parameters of the samples. Figure 9 shows the indoor temperature and PPD trajectories of a representative TCL group under both cases. In Fig. 9, the red line represents the PPD trajectory of the equivalent TCL model and the gray dotted lines represent the PPD trajectory of each TCL in the group. To further illustrate the comparison, the small plots reported with in each of the two graphs in Fig. 9 illustrate the scatters of PPD values of each TCL in the group at a randomly selected time interval under both cases. It shows that with our cluster strategy, the deviations between the individual TCLs and equivalent TCL group model are much less than those of the comparison case, indicating our method can significantly reduce the impacts of TCL heterogeneity.

Fig. 9
figure 9

Comparison results of representative TCL group

We compare our RHO approach with the case without RHO. In this comparison case, we do the 24-hour optimization subjected to the same objective, based on all the forecasted data. The profits & costs of the TCL aggregator are then settled by the real data. The overall revenue & costs under the proposed model and the comparison model are shown in Table 1. Clearly, without using the RHO strategy, there are more customer reward costs and less market profits for the TCL aggregator, which are incurred by the forecast error of the weather conditions and regulation prices.

Table 1 Real-time scheduling results of TCL aggregator

Lastly, we validate the performance of NAA on the proposed model by comparing it with four widely used heuristic algorithms: genetic algorithm (GA) [33], PSO [34], differential evolution (DE) [35], and artificial bee colony (ABC) algorithm [36]. The codes of GA are provided by Matlab; the codes of DE and PSO are implemented in Matlab scripts; and the codes of ABC are obtained from [37]. The same population size and maximum generation time are applied for all five algorithms. For a fair comparison, five trials are performed for each algorithm and the averaged result is c–alculated. For convenience purpose, we multiply the objective function (8) by − 1 to transform the maximization problem (8) to be a minimization problem. The convergence comparison results are shown in Fig. 10. The convergence curves indicate that on the proposed model, NAA performs slightly better than DE, but significantly better than GA, ABC, and PSO. This trend is generally consistent with the experiments of NAA on standard benchmark functions [28].

Fig. 10
figure 10

Convergence comparison of five algorithms on the proposed model

7 Conclusion and future works

In this paper, we study the real-time scheduling scheme of the TCL aggregator in the power market using the Nordic market structure as a case study. The thermal comfort control of the residents is considered by means of an ISO standard thermal comfort model, and a RHO based real-time TCL scheduling model is proposed to maximize the profit of the aggregator in the regulation market. A metaheuristic based algorithm is applied to solve the model and experiments are established to validate the efficiency of the proposed method. The simulation results show that the aggregated TCLs have the flexibility to help the TCL aggregator mitigate the imbalance cost risk in the real-time regulation market, and also validate the proposed TCL clustering strategy and thermal comfort model can well control the users’ thermal comfort.

The work of this paper is based on the assumption that the load shedding bids of the TCL aggregator have already determined ahead of the real-time scheduling. The authors are planning to extend the current work for identifying optimal bidding strategies of the TCL aggregator participating in the market.