1 Introduction

Electricity generation systems in Europe and worldwide are undergoing a major transformation towards clean low carbon energy, through increased penetration of renewable energy sources and the electrification of a wide range of sectors. A key challenge in this transformation is the intermittency of many renewable energy sources. Solutions applied to improve the energy and environmental efficiency of residential and commercial heating and cooling (H&C) systems include various combinations of heat pumps (HPs) and energy storage; this energy storage may be thermal (e.g., water tanks, tanks with phase change materials) or electrical storage (e.g., batteries). On-site electricity generation using building mounted photovoltaic (PV) panels have also been increasingly combined with innovative heat pump systems (Chua et al. 2010).

Currently, most of these electric H&C systems are controlled by simple rule-based control logic, which do not account for future weather conditions and thermal loads. However, an intelligent control system architecture able to take future external factors into account is needed to exploit the potential for demand response that these residential H&C systems offer. Such control systems should take intermittent availability of renewable energy sources into account, both directly (e.g., through on-site solar panels) and indirectly (e.g., through adapting to grid incentives, such as dynamic electricity tariffs). Given the significance of the building heating sector, there is also a substantial potential for grid stabilization (Perez et al. 2016). In this context, model predictive control (MPC) approaches have shown to be well suited to operate these systems efficiently (Yao 2021). Generally, MPC approaches use a simplified model to predict how the system will behave for given control signals and predicted external conditions. Then, a separate optimizer searches the solution space to find the control trajectory that respects all system and control constraints, and for which the simplified model yields the highest objective function value (Yao 2021).

Building energy simulation packages like TRNSYS (Klein et al. 2017), Modelica, or EnergyPlus can accurately model a building or group of buildings. However, their potential application in MPC is limited: first, they require a significant effort to accurately set up and validate the model for each individual application; and second, they require large execution time. The execution time is particularly important in MPC, since an optimizer should evaluate many solution candidates in a complex multidimensional search space in each optimization step. Consequently, MPC utilizing this kind of energy models need to resort to heuristic or meta-heuristic approaches for optimization, as described in Corbin and Henze (2013). Heuristic approaches, however, do not search the entire search space and consequently are not guaranteed to find the global system optimum.

An approach that can find a global optimum is to formulate the problem as a mixed integer linear problem (MILP), for which powerful commercial solvers are available (Sharma et al. 2016). The main drawback of this approach is that it requires the system representation to be linearized, which makes it unable to accurately predict the behaviour of a highly non-linear system, such as a residential H&C system. For example, the influence of ambient and system conditions on the coefficient of performance (COP) of the HP, or its ramping behaviour cannot be taken into account. Previous research has addressed this shortcoming by using mixed integer non-linear (MINLP) approaches and linearizing around selected reference points (Lu 2015). The main shortcomings of both approaches, however, remain the large computational resource requirements due to the high number of independent decision variables, as well as the significant model mismatch due to the required linearization of the system model inherent in any MI(N)LP approach. This means that the approaches are often unsuitable for the residential sector, due to the computational complexity, as well as the inherent model mismatch (Drgoňa et al. 2020).

An alternative approach is to formulate the optimization as a sequential decision-making problem and solving it with Dynamic Programming (DP). DP, introduced by Bellman (Bellman 1954) in the 1950s, offers several advantages in this context. First, it is guaranteed to find the globally optimal solution within the validity of the simplified model. Second, it can handle complex nonlinearities in the simplified model, as well as any arbitrary objective function with incrementally additive stage costs. Since its first formulation, it has been applied to a wide range of areas involving sequential decision-making problems. In the 1970s, the approach was applied to solve multi-objective optimization problems associated with water reservoirs (Tauxe et al. 1979), for example, and Shao et al. (Shao et al. 2012) applied the approach to optimize weather-dependent shipping routes.

The key challenge in developing an effective DP implementation for MPC is to model the system with adequate accuracy within reasonable modelling and computational requirements. In this context, reduced order models (ROMs) have been adopted both for individual buildings (Cole et al. 2014) and entire district heating sectors (Kim et al. 2019). Using ROMs for a DP application poses a range of challenges and offers opportunities that are unique for this application. Key is that the parameters describing the system states to be optimised need to be as little as possible, as each individual parameter adds another dimension to the optimization problem, increasing the computational complexity and effort by multiple orders of magnitude.

As an example, it is standard practice to model a TES water tank with multiple nodes, and then solve the mass and energy equation for each node (Johannes et al. 2005); this is not feasible within the DP optimization, as each of the nodes would need to be an individual state variable, rendering the problem infeasible to solve in real time. In this context, a single-node, concentrated parameters TES model is best suited instead. Notwithstanding the restrictions of describing the system state with limited parameters, physical processes leading to that state can be modelled in great detail, e.g., the approach can handle non-linear TES thermal losses, arbitrary and time-dependent HP operations, including ramps and modulations, and many more system complexities that MI(N)LPs cannot accurately capture.

MPC using the DP approach and ROMs have the potential to offer sufficient model accuracy, within low run time requirements, meeting optimality. Essentially, the development of robust and fast ROMs can provide a detailed enough representation of the system components, which allow the optimizer to find the global optimum with reasonable computational resources.

Successful applications of DP powered MPC leveraging ROMs include industrial (Henze et al. 2008) as well as commercial (Henze 1995) cooling applications. DP has also been applied to optimize the operation of combined heat, power and cooling systems with promising results (Facci et al. 2014). Some researchers have also previously attempted to use DP for residential H&C systems (Candanedo and Athienitis 2011), though they usually shy away from directly controlling the HP in the optimization, but rather focus on the change of the set points for TES and building temperatures over time, which can lead to both redundant calculations and potentially missing the global optimum, as it severely limits the granularity at which a heat pump can operate.

This paper presents a novel DP-based MPC framework using ROMs to represent a H&C system's HPs, thermal energy storage (TES) and hydraulic circuit. To validate the ROMs, a detailed TRNSYS representation of a residential building in Northern Italy is developed, and a model mismatch analysis is performed, both for the individual components and for the system as a whole. Lastly, it compares the potential of the framework for cost saving and load shifting with conventional rule-based controllers.

Chapter 2 elaborates on how the DP powered MPC is implemented in python. It outlines how the individual ROMs are designed, and how they are combined into a single simplified system. It also shows how the TRNSYS system is designed, and how the simplified system is built to mirror the behavior of the TRNSYS ground truth. Chapter 3 focuses on analyzing the amount of mismatch that both the individual ROMs and the entire simplified system have compared to the full TRNSYS simulation. Finally, it briefly outlines an exemplary outcome of DP optimization based on the models developed. Lastly, chapter 4 summarizes the conclusions drawn from this work.

2 Methods

This section will outline the different methods used in this work. Chapter 2.1 gives a general introduction about the receding horizon MPC approach. Chapter 2.2 explains the model development for the individual components of the H&C system. Chapter 2.3 consequently explained how the receding horizon MPC approach in combination with the individual component models is applied to the system to be optimized.

2.1 MPC using DP framework

The DP optimization is employed within a receding horizon MPC framework (Kwon and Han 2005): as the initial state of the system is known, but the ideal state at the end of the prediction horizon is not, it was decided to use a forward implementation of the DP approach. The advantage of this approach is that model dynamics are generally easier to compute forwards rather than backwards (Verdu and Poor 1984). At each time the optimization algorithm is running, all outdoor conditions and loads within a fixed prediction horizon are considered known. An optimal solution is then found for the entire prediction horizon.

In DP, at each timestep within the optimization horizon, the system is characterized by state variable(s) only, i.e., the state of charge of a thermal storage tank in this application. For each of the optimization timesteps, the DP searches the cheapest trajectory to each of the state variable's discretization levels, based on the previous step's results, and considering the external and internal factors and restrictions for this timestep. Since the problem has an optimal substructure, this strategy ensures that the globally optimal trajectory is found.

For the optimisation to reach a meaningful result, it is crucial to use simplified models, fast enough to compute, that allow the optimizer to run in real time with limited computational resources, while keeping the model mismatch within a manageable level. The framework detailed here employs state space models of system's TES and HP with their respective unique operating characteristics, such as effects due to the HP inverter ramping up and down and the thermal losses of the storage.

The models need to accurately calculate how much energy is stored in the TES. If the state of charge is overestimated, the storage might run out of energy at the wrong point in time, as the optimiser allowed the TES to discharge for too long. On the contrary, if it underestimates the TES state of charge, the HP might spend excessive hours trying to charge a TES that cannot accept any more energy. A pure state space representation of the components is not sufficient to accurately predict the state of charge of the TES, as rounding errors due to time discretization result in inacceptable rounding errors across the state of charge discretization. While in the TRNSYS 18 environment (Klein et al. 2017) this would be solved by simply reducing the timestep, this is not feasible in the ROMs due to the computational constraints.

In our approach, we couple conventional state space models with a minimum reliance of operational features, allowing to significantly reduce computational complexity while maintaining a manageable model mismatch. Hereafter, we show how, for the system concerned, a time discretisation of 30 min and a state of charge discretisation of 50 steps are sufficient to maintain the uncertainty of the model to at acceptable levels for the optimizer to perform.

2.2 Models development

This Section describes the system modelling, namely the HP and TES models. The MPC framework is then applied in Sect. 2.3.

2.2.1 Heat pump model

The HP ground truth is modelled with a detailed grey box model in the TRNSYS environment (Type 202). This model is a sophisticated representation of a commercial, air-to-water HP functionality, including compressor speed ramps and modulations, and delivers accurate results.

The ROM of the same HP, suitable for DP optimisation, has been developed adding complexity incrementally, while comparing the simulation performance. First the HP has been modelled as a simple performance map, as a function of outdoor and supply temperatures. This representation can deliver fairly accurate predictions of the HP Coefficient of Performance (COP) and nominal thermal capacity. However, missing to represent the HP compressor speed changes can result in a significant mismatch between predicted and actual heating capacity.

The rate of change of HP compressors speed is inherently limited in both directions due to the inertia of the various mechanical components in a HP; manufacturers usually limit the maximum ramp speed in the firmware to protect the critical components (primarily the compressor) from high mechanical stress and premature wear and tear (https://www.innovaenergie.com/site/assets/files/2792/n420674a_bollettino_tecnico_stone_m1_rev_02_it_link.pdf). After being switched on, it must run for a specific duration at a minimum speed \({p}_{re{l}_{min}}\) before ramping up. Similarly, when ramping down, it cannot simply be switched off, but it must keep running at \({p}_{re{l}_{min}}\) for some time before shutting down. And lastly, once the HP is switched off, it must remain off for a minimum amount of time. These constraints limit how fast in reality the HP is able to respond to control signals. Hence, a state variable (on/off, compressor speed) is implemented to monitor the HP status and monitored during each simulation timestep. This, in turn, defines the set of allowable states for the next simulation timestep. This approach provides a simple model for the HP with less model mismatch than the simple performance map HP model. So, for a given relative compressor speed \({{p}_{rel}}_{{t}_{0}}\) at time \({t}_{0}\) with a ramp speed \({c}_{r}\) and a control signal of 1, the relative compressor speed \({p}_{re{l}_{{t}_{1}}}\) at time \({t}_{1}\) is calculated as

$${p}_{re{l}_{{t}_{1}}}=\mathrm{min}\left({p}_{re{l}_{{t}_{0}}}+{c}_{r}, 1\right)$$
(1)

if the compressor is past the initial warmup phase.

Similarly, for a control signal of 0, \({p}_{re{l}_{{t}_{1}}}\) is calculated as

$${p}_{re{l}_{{t}_{1}}}=\mathrm{max}\left({p}_{re{l}_{{t}_{0}}}-{c}_{r}, {p}_{re{l}_{min}}\right)$$
(2)

until the compressor spent sufficient time at \({p}_{re{l}_{min}}\) to be switched off.

To track a given load profile, and move toward a corresponding target HP compressor relative speed \({p}_{re{l}_{target}}\), the equation for \({p}_{re{l}_{{t}_{0}}}<{p}_{re{l}_{target}}\) is

$${p}_{re{l}_{{t}_{1}}}=\mathrm{min}\left({p}_{re{l}_{{t}_{0}}}+{c}_{r}, {p}_{re{l}_{target}}, 1\right).$$
(3)

Analogously, for \({p}_{re{l}_{{t}_{0}}}> {p}_{re{l}_{target}}\), \({p}_{re{l}_{{t}_{1}}}\) is calculated as

$${p}_{re{l}_{{t}_{1}}}=\mathrm{max}\left({p}_{re{l}_{{t}_{0}}}-{c}_{r},{p}_{re{l}_{target}}, {p}_{re{l}_{min}}\right).$$
(4)

In either operation mode, the following equations apply: performance map defining

$${\dot{Q}}_{nom}=f\left({T}_{amb}, {T}_{circuit}\right)$$
(5)
$$CO{P}_{hp}=f\left({T}_{amb}, {T}_{circuit}, { p}_{re{l}_{{t}_{1}}}\right)$$
(6)

and \({p}_{re{l}_{{t}_{1}}}\)as calculated above, the thermal power of the heat pump \({\dot{Q}}_{loa{d}_{hp}}\) gets calculated as \({\dot{Q}}_{loa{d}_{hp}}= {\dot{Q}}_{nom}{p}_{re{l}_{{t}_{1}}}\) required electrical power \({P}_{e{l}_{hp}}\) is calculated as

$${P}_{e{l}_{hp}}=\frac{{\dot{Q}}_{loa{d}_{hp}}}{CO{P}_{hp}}$$
(7)

respectively.

2.2.2 Thermal energy storage model

For thermal storage purposes, a fully mixed water tank is considered and modelled as a first order dynamic system. The thermal storage is modelled with a state of charge (representative of the water temperature) equation. In addition, to account for the thermal losses through the tank walls, a constant heat-loss coefficient \(U{A}_{tank}\) of 3 W/K was chosen, which corresponds to a moderately efficient hot water storage tank according to standard hot water tank classification (Commission Delegated Regulation (EU) 2013). The schematic of the tank is shown in Fig. 1.

Fig. 1
figure 1

Schematic of the ideally mixed thermal energy storage tank

This implies that for a water influx of \(\dot{m}\) at a temperature of \({T}_{in}\), a storage temperature of \({T}_{tank}\), and a mechanical room temperature of \({T}_{room}\), the continuous time energy balance equation for the tank is:

$$\frac{dQ}{dt} = \dot{m}{c}_{p,water}\left({T}_{in}-{T}_{tank}\right)-\left({T}_{tank}-{T}_{room}\right)U{A}_{tank}$$
(8)

And the continuous time equation for temperature dynamics of the tank is:

$$\frac{d{T}_{tank}}{dt}*{V}_{tank}*{\rho }_{water}*{c}_{v,water}= \frac{\mathrm{d}Q}{\mathrm{dt}}$$
(9)

Since stratification is not modelled, water always leaves the tank at temperature equal to \({T}_{tank}\).

For the chosen simulation configuration and timestep of 6 min, the changes in tank temperature per timestep are small (< 0.2 K). The tank temperature can hence be assumed to be constant along the timestep itself, which means that for the computational implementation, the differential can be linearized, greatly simplifying the calculation of \({Q}_{tan{k}_{{t}_{1}}}\) and \({T}_{tan{k}_{{t}_{1}}}\), based on a timestep length of \({t}_{step}\) and known initial conditions \({Q}_{tan{k}_{{t}_{0}}}\) and \({T}_{tan{k}_{{t}_{0}}}\) for the time \({t}_{0}\). Therefore, the discretized equations are:

$${Q}_{tan{k}_{{t}_{1}}}={Q}_{tan{k}_{{t}_{0}}}+\left(\dot{m}{c}_{p,water}*\left({T}_{i{n}_{{t}_{0}}}-{T}_{tan{k}_{{t}_{0}}}\right)-\left({T}_{tan{k}_{{t}_{0}}}-{T}_{roo{m}_{{t}_{0}}}\right)*U{A}_{tank}\right)*{t}_{step}$$
(10)

and

$${T}_{tan{k}_{{t}_{1}}}={T}_{tan{k}_{{t}_{0}}}+ \frac{{Q}_{tan{k}_{{t}_{1}}}-{Q}_{tan{k}_{{t}_{0}}}}{{V}_{tank}{\rho }_{water}{c}_{v,water}}$$
(11)

2.3 System application

The MPC framework and system modelling is applied to a multi-family house, modelled in the TRNSYS environment. The three-story building has 10 residential apartments is considered to be set in the north of Italy (Weather Archive Verzuolo 2023). The usual ambient temperature range in this area during January is from cold nights down to − 5 °C up to + 10 °C on sunny days, though a daily range from 0 to 5 °C is most common. During shoulder season, a typical daily temperature swing will be from 7 to 13 °C.

The water loop system used in the demo site building (https://www.happening-project.eu/wp-content/uploads/2022/06/15_happening_wp4_d4-5-system_design_verzuolo-italian-demo_20220325.pdf) is illustrated in Fig. 2. It employs an HP cascade where a primary HP (P-HP) system heats/cools the water in the circuit, maintaining the temperature close to the ambient temperature, and 21 dedicated micro HPs (m-HPs) are used to deliver energy from this ambient temperature loop into the individual dwellings' rooms.

Fig. 2
figure 2

Hydraulic flowchart of the system as modelled by both TRNSYS and the ROMs. The three way valve with T_set defines the fraction of the load mass flow going through the bypass, with the HP water pump defining how much of the remaining mass flow goes through the P-HP and the TES respectively

The cornerstone of this H&C setup is the direct integration of a TES into the ambient temperature loop with minimal additional energetic costs. The TES system should be charged when boundary conditions are favourable for the operation of the P-HP (cold mornings in cooling mode in summer and warm midday hours in heating mode in shoulder seasons and winter).

A three-way valve, using a temperature set point, controls the temperature distributed to the terminals from the P-HP and the TES.

The P-HP in the detailed model is designed to be similar to a typical domestic HP with a propane refrigerant (https://www.innovaenergie.com/site/assets/files/2792/n420674a_bollettino_tecnico_stone_m1_rev_02_it_link.pdf), with a nominal thermal capacity of 35 kW. All m-HPs with 1.5 kW thermal each are individually modelled in the TRNSYS deck, and their evaporator inlet temperature range is assumed to be from 15 to 35 °C for both heating and cooling. They are used to determine the thermal load as a function of time; the timeseries identified is used as the forecasted load imputed to the simplified system. For said simplified system, all m-HP loads were lumped together, and one aggregate COP is calculated for the entire heating/cooling load.

2.3.1 Model application

The P-HP of the application building was modelled as detailed in Sect. 2.2.1. The m-HPs were treated in a similar fashion as the P-HP, though without modelling the ramping behaviours. As the system scope assumes that the heating load is a fixed pattern (i.e., not calculated dynamically along with the MPC operation), the m-HPs were assumed not to exhibit a ramping behaviour, or a COP based on the speed of the compressor. Hence, the only variables defining the m-HP operation are the supply temperature and the room temperature, with higher supply temperatures and lower room temperatures leading to more favourable COPs during heating operations, and vice versa for cooling operations.

2.3.2 Optimization parameters

In this application, the optimizer was set to run every 30 min, which corresponds to 5 ground truth simulation timesteps of 6 min each. This was considered a good trade-off between minimizing the model mismatch and the required computational time. The prediction horizon was set to 96 optimization timesteps (48 h) with a discretization grid of 100 steps for the DP optimization. This enabled the TES to be used for both intra-day storage and inter-day storage, while keeping the computational requirements low. The optimization parameters are summarized in Table 1.

Table 1 Summary of optimization parameters for the simulation run

3 Results and discussion

The first part of this Section shows how close the individual simplified control models come to the detailed TRNSYS simulation models. The second part shows how well the individual components interact to form an entire modelled system. The third part then explores the potential for savings.

The MPC was run in a 1-year simulation, starting on the first day of January. As showing graphs for the entire simulation year would lead to overcrowding, specific representative subsets have been chosen for the plots, like the middle of January for a strong heating load (see Figs. 3, 5, 6, 7, and 8), the middle of April for a typical shoulder season load (Fig. 9), and the middle of August for a typical heating season load (Fig. 10). For showcasing the performance of the P-HP ROM, the data was taken from the entire heating season (Fig. 4). For the potential savings, November was chosen as it is a month with a representatively moderate heating load (Fig. 11).

Fig. 3
figure 3

Mismatch of the temperatures in the system between the ROMs and the TRNSYS ground truth, with the ambient temperature shown for reference in the top plot. In the middle plot, the green and blue lines represent the ROM predicted temperature lift, the grey and black lines show the ground truth temperature lift. (Color figure online)

Fig. 4
figure 4

Scatter plot of the predicted against the observed Coefficient of Performance for the P-HP for the heating season. For reference, the line representing a perfect forecast is shown in black

With a significant amount of renewable energy being integrated in the grid, the hours between 9 AM and 3 PM are considered favourable hours, along which a DSO could be motivated to incentivise the use of electricity to avoid local curtailment issues. The optimizer is thereby incentivized to shift electrical power consumption from the unfavourable to these favourable hours. The pricing strategy is summarized in Table 2. It is important noticing that in this work the optimizer reacts to a relative electricity price (defined like the ration between favourable hours price over unfavourable hours price); the approach can be used with any arbitrary price signal, and an arbitrary number of different price categories. All figures in this section were taken from a year-round simulation run with a price incentive of 0.5. This means that during the hours indicated by the grey bars in the charts (9AM to 3PM), the electricity price is half of what it is outside of said grey bars. The x-axis is the number of days since the first of January.

Table 2 Summary over pricing structure

3.1 Model validation and mismatch analysis

Figure 3 shows the ROM prediction and ground truth temperatures achieved, both by P-HP and m-HPs. For the P-HP, the middle plot shows the ROM temperature lift in the green and blue lines, which is compared to the ground truth temperature lift in grey and black lines for the circuit temperature that is fed to the m-HPs, the bottom plot shows the ROM prediction in blue against the black ground truth. In general, there is a good agreement between prediction and ground truth. There is a tendency of the system to underestimate the HP temperature, but it only appears in certain operating conditions.

This mismatch is one of the reasons contributing to differences in COP prediction, which will be outlined below. One aspect that is visible in predicting the supply temperature to the heating circuit is that there are significant oscillations in the prediction that do not happen in reality. One reason for this is that the model does not account for thermal inertia of the heating circuit; any change in operational parameters in the model instantly affects the output temperature. This could be improved by including this thermal inertia in the simulation. Nonetheless, the mismatch implications are not expected to be significant when considering an entire day, as these oscillations mostly cancel out, and the general efficiency/operation is modelled with reasonable accuracy.

The next step is to evaluate the quality of the COP estimations. Figure 4 shows the ROM prediction of the COP of the primary HP against the TRNSYS model. In general, the prediction is in good agreement with the observed values, though the ROM consistently overestimates the COP relative to the ground truth. The root mean squared error of the prediction was calculated to be 0.46 [−],with a coefficient of variation of the root mean squared error (CVRMSE) of 9.8% over the course of the heating season.

Similar information is also shown over time in Fig. 5, with a focus of how the mismatch develops over time. The top plot is again ambient temperature. The middle plot shows the ROM predicted COP of the P-HP in light green against the ground truth in dark green. The bottom plot shows the ROM predicted COP of the m-HP in blue against the ground truth in dark green.

Fig. 5
figure 5

Model mismatch of the COPs in the system between the ROMs and the TRNSYS ground truth, with the state of charge of the TES shown for reference in the top plot. In the middle plot, the predicted versus ground truth COP is shown. The same applies to the bottom plot with respect to the m-HPs

Significant model mismatch is visible in the P-HP simulation; the model again consistently overestimates the COP to be achieved. Further research is required to understand what causes this mismatch, since the model uses the same performance map as the TRNSYS simulation, and it is generally able to predict HP operation temperatures. Nonetheless, is important to highlight that even in the presence of significant mismatch, a systematic consistency is observed over different operating conditions, enabling two conclusions: first, a simple augmentation of the COP prediction is likely to increase accuracy significantly. Second, as this mismatch is so consistent, it is unlikely to systematically favour one strategy over the other; one way or another, the P-HP has to provide all energy that is used in the system, so the optimizer only decides when this energy is best added to the system. More research is still required on the overall effect, which can eventually improve the performance of the MPC.

When looking at the modelled m-HPs COP, the same oscillations seen in the operating temperatures of the m-HPs are also observed in the COP prediction, since the COPs of any HP directly depends on the operating temperatures. The decision not to account for thermal inertia in the heating circuit, as discussed for the temperature chart above, is also showing its effect here. However, as with the thermal effects, when looking at an entire heating process, these effects tend to cancel out, so the predictions of the overall efficiency for one heating period is relatively accurate.

After giving an overview over the model mismatch of the individual ROMs, the next section will elaborate on the model mismatch of the entire simplified system.

3.2 Summary of results

Figure 6 shows the ROM predicted and ground truth storage state in the upper panel, with the ambient temperature in the lower panel. The black line represents the ground truth, as calculated by TRNSYS. For simplification, in providing an overview of the MPC operation, thermal losses through the tank walls are not considered here. However, thermal tank losses can be accurately modelled within the DP framework, as they can be incorporated into the DP discretization grid.

Fig. 6
figure 6

MPC prediction of the state of charge of the TES over time, with the ambient temperature shown for reference in the bottom plot. The grey shades indicate the hours with lower electricity prices, and the rainbow lines indicate the prediction of each individual MPC run, where red indicates prediction made for the immediate future (< 3 h) and blue indicates prediction made further in the future (> 30 h). (Color figure online)

Each one of the rainbow lines in Fig. 6 corresponds to the TES state of charge prediction of one DP calculation, assuming no discrepancy between the forecasted and observed temperature and irradiation for the prediction horizon: each 48 h prediction starts in red for the first couple of hours, and then gradually turns yellow and green as the prediction horizon moves away. Expectably, the prediction diverges further into the future: as new information becomes available, the optimizer chooses a different path. The results show that, while there is some mismatch for the individual timesteps, the curve's general shape is relatively consistent between the different runs.

Two takeaway points from this analysis should be highlighted. First, the ROM-based system predictions are reasonably close to the TRNSYS simulation. Second, the optimizer manages to alter the state of charge of the TES (the black line) in response to outdoor conditions. For example, one of the highest states of charge happens just at the end of the favourable period on day 21. This is because the optimizer sees that the operating conditions on day 22 are less favourable (as the ambient temperature is lower) than on day 21, so it charges the storage ahead of time. On day 24, the optimizer is aware that the temperatures the next day will be much warmer, so it only charges a minimum to cover the load until the beginning of the next favourable period.

One challenge in this representation of the storage state prediction is the fact that the optimizer does not just deliver one solution. For every timestep in the optimization horizon, every single feasible discretization step also represents a potential solution. So, assuming that at the end of the prediction horizon most of the 50 used discretization steps will be feasible, every optimization run leads possibly to 50 feasible solution candidates. The choice of this candidate significantly impacts the later shape of the TES state of charge graph. Potential strategies could be to choose the solution candidate with the least overall cost, or the solution candidate that finishes at exactly half the charge. As these solution candidates may or may not share the same initial control sequence, this choice potentially also influences the optimizer performance.

This paper focuses on the ability of the simplified system models to accurately model a more complex TRNSYS system simulation model, and detailing the DP decision making process is beyond its scope. For this analysis, it was chosen to use the solution candidate where the final state is as close as possible to the initial state of the storage media. Hence, the coloured lines in Fig. 6 were simplified for the remainder of this analysis to only show the part of the solution that is fed back to TRNSYS. As inherent in MPC, only the first set of operational parameters are passed back to the TRNSYS simulation, the rest is discarded. Figure 7 shows the model mismatch analysis between the DP prediction and the TRNSYS model after the results of the DP have been fed back to the TRNSYS simulation.

Fig. 7
figure 7

Prediction of the state of charge of the TES. For the entire prediction horizon (rainbow lines in the top plot), and only the step that is fed back to the TRNSYS system (red crosses in the bottom plot). In both top and bottom plot, the black line shows the ground truth as calculated by TRNSYS. (Color figure online)

This plot shows that the ROMs used in the simplified system do incur a certain degree of error when compared to the full TRNSYS simulation. It also shows that the prediction error generally increases over time. The yellow (middle) part of the prediction lines tend to be further away from the black ground truth than the red ones (near future), and the blue (far future) parts are even further away than the yellow ones. One reason for this effect is the accumulation of model mismatch, but the major reason is that as time passes, the receding horizon optimizer sees further into the future, leading to different target trajectories. For illustration, the bottom plot of Fig. 7 shows the prediction error for the very first step of the MPC trajectory. As this is the step that is fed back to the TRNSYS system and implemented completely, the only reason for the gap here is the actual model mismatch. The discrepancies between the simplified system and the detailed system model are likely associated with the fact that the H&C system in the simplified model is assumed not to have thermal inertia, apart from the actual thermal storage. This means that, in the simplified model, changes in operational parameters have an immediate effect on the state of charge of the storage. Conversely, the TRNSYS model accounts for the thermal inertia in the water circuit, hence the changes in storage temperature are much smoother.

Figure 8 illustrates the mismatch of the TES state to the state prediction of the P-HP as it runs to meet the thermal load and charges the storage. The difference between the P-HP thermal power at condenser and the load is the rate change of the storage tank charge over a specific timestep. Hence, it is crucial to understand how well the prediction matches the actual HP operation, as this is the only way to actively alter the state of the storage. in general, the system works to utilize the favourable hours (grey) run the P-HP almost at nominal capacity, thereby charging the storage with the thermal energy not required for the supply of the m-HPs. Outside of the favourable hours, the optimizer runs the P-HP as little as possible to preserve enough charge in the TES to make it to the next favourable period. While there are noticeable differences between observed and predicted thermal P-HP power on the individual timestep level, when taking an entire P-HP operation cycle into consideration, the two are reasonably aligned.

Fig. 8
figure 8

TES state of charge and HP ROM mismatch during the heating season. in the bottom plot, the P-HP power prediction shown in purple is compared to the ground truth in black, with the thermal load in yellow for reference purposes. (Color figure online)

In the shoulder season in Fig. 9, exemplified by 5 days in early April, the heating and cooling load almost balance out. The amount of waste energy put into the water circuit over the course of the day is almost equal to the amount of energy drawn from the system for the preparation of DHW. The TES here is working as a buffer, able to compensate for the fact that these two opposing loads do not happen at the same time along the day. As a result, the system manages without turning on the P-HP for most of the time. It only has to intervene if the TES is running either too high or too low, to cope with the forecasted loads. In this case, the DP uses the favourable (grey) hours to minimize the costs required to adjust the tank state of charge.

Fig. 9
figure 9

TES state of charge and HP ROM mismatch during shoulder season. The integral of the thermal load (yellow line in the bottom plot) over time is very low, requiring only very infrequent activity by the P-HP, both predicted (purple) and observed in the ground truth (black). (Color figure online)

Figure 10 shows the results for the cooling case for 5 days in mid-August. A similar (though inverted) trend to that of the heating season can be observed. Minor discrepancies in the P-HP ROM prediction can be seen, which were also observed in the heating period. Once more, while the prediction for individual points in time has significant errors, the model mismatch is acceptably low throughout the entire P-HP daily operation period. The same applied to the mismatch in the TES state of charge predictions: while observed and predicted values present significant differences, the general trend among them stays stable, with the storage charging (cooling down in this case) during the favourable midday hours marked in grey, and covering the second peak of cooling load (usually happening as people get home in the evening) mainly from the storage, avoiding running the P-HP during the unfavourable but still warm hours of the late afternoon.

Fig. 10
figure 10

TES state of charge and HP ROM mismatch during the cooling season. The top plot shows the state of charge of the TES over time, while the bottom plot shows the ROM HP activity in purple versus the ground truth HP activity in black. The thermal load is shown in yellow for reference

3.3 MPC results

As mentioned, the overall H&C system operation is optimized using the DP optimization approach within an MPC framework. The DP optimizer responds to the incentives it is exposed to, either internally (through time-dependent availability of solar electricity and part-load characteristics) or externally (through real-time variable electricity prices). As a result, the DP optimizer achieves the set objective by minimizing the energy cost of system operation.

Figure 11 shows the potential for energy saving and shifting in the application building during 1 month in November, with moderate heating requirements. The optimizer is run with various strengths of these incentives (referred to as Price 100, Price 60, and Price 20 respectively). Price 100 means that the price is the same for unfavourable and favourable hours. Price 60 means that during favourable hours, the price is 60% of the unfavourable hours, corresponding to a blend of electricity purchased from the grids and freely available PV electricity. The cost of using basic control is also shown as a reference.

Fig. 11
figure 11

Total consumption and load shifting through smart control for different price incentives. It shows that for stronger price incentives, the MPC is able to decrease electricity consumption at the unfavourable hours, albeit at the cost of increased total electricity consumption

The DP optimizer significantly reduces electricity consumption without any price incentives, decreasing the daily energy requirement of the reference building by around 13% for this specific month (15.02 kWh/day vs. 17.23 kWh/day). This is because the system can leverage the different ambient temperatures throughout the day to charge the storage when ambient temperatures are favourable and run the P-HP less during frigid hours. Once the price incentive is added, the total energy consumption increases, though the consumption during unfavourable hours continues to decrease. This was to be expected: load shifting relative to the optimum determined without a price incentive will always come at an energetic cost. Lastly, there is a saturation effect visible. Even with a strong economic incentive, the DP optimizer is not able to significantly decrease daily unfavourable energy consumption below around 8 kWh/day (see the blue column for price_60 and price_20). This is likely because the m-HPs always run at a time when there is demand, regardless of the power price at that time.

4 Conclusions

This paper shows that reduced order Models (ROMs) are suitable for use in a Dynamic Programming (DP)-based Model Predictive Control (MPC) framework for heating and cooling (H&C) systems, combining heat pumps (HP)s with thermal energy storage. The framework is applied to a multi-family building in Northern Italy, and the models' mismatch is analysed using a detailed TRNSYS simulation as a reference ground truth.

The comparison of ROMs for the system's primary heat pump (P-HP) and micro-heat pumps (m-HPs), as well for the thermal energy storage (TES) with TRNSYS shows that the ROMs can provide adequate prediction capabilities when compared to their more complex TRNSYS counterparts. In particular:

  • The individual ROMs capture the dynamics of their more complex TRNSYS counterparts well. The modelled temperatures are in good agreement with the ground truth, even though some problems exist with oscillations. The same good model agreement is obtained with respect to the COPs, though it also has the same problems with oscillations, as well as a small tendency for the ROMs to overestimate the COP of the P-HP.

  • Combining the individual ROMs into a model of the complete H&C system enables us to get a system representation with very manageable model mismatch, which can be used for a DP-based MPC framework. Figure 7 shows how the model mismatch increases the further the predictions go into the future. Figures 7, 8 and 9 show the HP operation predictions and ground truth for different times of the year. They show that the oscillations in the individual ROMs translate to some system mismatch for individual timesteps, but there is generally a good agreement between prediction and ground truth for P-HP operation and TES state of charge. They also show how, given a financial incentive, the MPC is mostly using the favourable hours to run the P-HP, reverting to unfavourable hours only when required by outside factors.

  • Initial results suggest that the DP-based MPC framework is able to exploit the potential for optimization inherent in HP cascades with thermal storage and variable power pricing. The assessment of the MPC framework when applied to the building concerned gives a first glance into the potential for cost reduction and load shifting. Without any financial incentive, power consumption decreases by around 14% compared to the baseline rule-based control strategy. It has furthermore been shown that the framework is able to adapt to price incentives rewarding the energy consumption in favourable hours.

Future research should follow three distinct alleys. The first focus should be on the ROM performance. Measurable systematic and chaotic error persists, both in the individual ROMs and in how they are joined together to emulate the entire H&C system. Any improvement here is likely to significantly enhance the performance of the MPC framework as a whole.

The second alley is to deepen the analysis of the performance of the MPC framework as a whole: A full year analysis should be run to see how the MPC performs over a longer time frame or in different locations and ambient conditions. It would also be helpful to further study the potential for cost savings as a function of different price incentives. As the approach outlined in this paper copes with arbitrary price signals, it will be interesting what saving potential it can tap when faced with more complex price signals.

The third alley concerns the expansion of the framework as a whole. In its current version, it only supports optimization for cases where a single storage medium is employed. Future research should focus on expanding the framework to also be able to handle more complex types of storages (like heavily stratified tanks), or multiple different storages in the same system, both thermal and electric.