# Load forecasting using 24 solar terms

- 1.2k Downloads

## Abstract

Calendar is an important driving factor of electricity demand. Therefore, many load forecasting models would incorporate calendar information. Frequently used calendar variables include hours of a day, days of a week, months of a year, and so forth. During the past several decades, a widely-used calendar in load forecasting is the Gregorian calendar from the ancient Rome, which dissects a year into 12 months based on the Moon’s orbit around the Earth. The applications of alternative calendars have rarely been reported in the load forecasting literature. This paper aims at discovering better means than Gregorian calendar to categorize days of a year for load forecasting. One alternative is the solar-term calendar, which divides the days of a year into 24 terms based on the Sun’s position in the zodiac. It was originally from the ancient China to guide people for their agriculture activities. This paper proposes a novel method to model the seasonal change for load forecasting by incorporating the 24 solar terms in regression analysis. The case study is conducted for the eight load zones and the system total of ISO New England. Results from both cross-validation and sliding simulation show that the forecast based on the 24 solar terms is more accurate than its counterpart based on the Gregorian calendar.

## Keywords

24 solar terms Calendar Load forecasting Multiple linear regression## 1 Introduction

The accuracy of electric load forecasts is crucial to the operations and planning activities in the electric power industry. During the past 30 years, most papers in the load forecasting literature have been devoted to the investigations of various forecasting techniques, such as regression analysis, time series analysis, and artificial neural networks [1, 2, 3, 4]. Some of them, together with several emerging techniques, such as gradient boosting machines and random forests, have been recognized through notable load forecasting competitions [5, 6, 7, 8, 9]. Another track of research has focused on exploring the various variables for load forecasting models. The load or log-transformed of load is used in almost every load forecasting paper as the dependent variable [10, 11]. The frequently used explanatory variables include weather and calendar variables [12, 13, 14, 15]. For medium and long term load forecasting, economy variables are usually used to drive the trend [10].

A typical electricity demand series presents multiple seasonal patterns. For instance, the demand is high during summer and winter and low during spring and fall due to the response to cooling and heating needs. The demand during workdays is typically higher than that of the weekend days. The demand during the daytime is mostly higher and more volatile than that during the sleeping hours at night. As a result, many load forecasting models use the calendar variables such as months of a year, days of a week, and hours of a day as the inputs to capture the multiple seasonal patterns in the demand series [1, 9, 10, 12].

When modeling the annual seasonality, the Gregorian calendar is usually used in the load forecasting models. It dissects the days of a year into 12 months based on the Moon’s orbit around the earth. Many literature directly used *month* as a class variable in the load forecasting model [7, 9, 10, 12]. Some other literature reported the usage of *season* as a class variable in the load forecasting model, where the season is usually pre-defined by grouping the 12 months of a year into summer and winter periods, four seasons, or four seasons plus some transition seasons [15, 16].

In fact, the season marked by changes in the weather is a result from the yearly orbit of the earth around the sun. Therefore, the widely-used Gregorian calendar, which is based on the Moon’s orbit around the Earth, for load forecasting may not be an accurate indicator for the change of the season. Alternatively, the 24 solar terms originated from China to guide the agriculture activities was developed based on the sun’s position in the zodiac. This leads to our research question: is the solar-term calendar better than the Gregorian calendar in load forecasting models?

This is the first formal study in the load forecasting literature that seeks better means than the Gregorian calendar to categorize days of a year. The case study is conducted for six states of the U.S. served by ISO New England (ISONE). Two forecast evaluation settings, namely cross validation and sliding simulation, are used to evaluate the effectiveness of the proposed method.

The rest of the paper is organized as follows: Section 2 introduces the case study data, the benchmark model using 12 months, and the proposed model using 24 solar terms. Section 3 describes the setup of the case study and presents the results. Section 4 discusses some practical considerations for implementing the proposed method. The paper is then concluded in Section 5 with a discussion on future research directions.

## 2 Background

### 2.1 Data

In this paper, we use nine years of hourly load and temperature data published by ISONE from 2008 to 2016 to conduct the case study [17]. ISONE serves the six states in the northeast U.S. including Connecticut (CT), Massachusetts (MA), Maine (ME), New Hampshire (NH), Rhode Island (RI), and Vermont (VT). Each of the five states except MA is considered as a load zone, while MA is dissected into three load zones, namely Northeastern MA and Boston (NEMASS), Southeastern MA (SEMASS), and Western Central MA (WCMASS). The system total (ISONE) is the sum of the load from these eight load zones. The load data is adjusted for the daylight-saving time (DST) as follows: at the beginning hour of DST, the zero reading is replaced by the average of the adjacent two hours; at the ending hour of DST, the load is divided by two.

Summary statistics of ISONE load zones and system total

Zone | Load (MW) | Temperature (°F) | ||
---|---|---|---|---|

Mean | STD | Mean | STD | |

CT | 3545.50 | 769.36 | 51.66 | 19.25 |

ME | 1306.67 | 205.43 | 47.89 | 18.30 |

EMASS | 2925.39 | 569.29 | 52.31 | 17.70 |

NH | 1321.33 | 265.49 | 47.65 | 20.15 |

RI | 935.28 | 204.02 | 52.39 | 18.00 |

VT | 658.51 | 109.04 | 47.78 | 20.73 |

SEMASS | 1710.37 | 388.76 | 52.39 | 18.00 |

WCMASS | 1996.31 | 382.99 | 48.78 | 18.76 |

ISONE | 14402.75 | 2838.89 | 50.64 | 18.48 |

### 2.2 Benchmark model

Multiple linear regression (MLR) is a widely-deployed load forecasting technique in the field. One frequently cited MLR based load forecasting model is Tao’s Vanilla benchmark model, which was used as the benchmark model in the load forecasting track of Global Energy Forecasting Competition 2012 [7].

*E*

_{ Load }is the expected load;

*β*

_{ i }are the coefficients estimated using the ordinary least squares method;

*Trend*represents chronological trend;

*M*,

*W*, and

*H*are class variables representing the coincidence month of a year, day of a week, and hour of a day, respectively;

*T*is the coincidence temperature. The multiplications between these variables represent the cross effects or interactions.

### 2.3 Solar term model

Starting dates of 24 solar terms (2008 to 2016) of a year [18]

Solar terms | Solar date | ||||||||
---|---|---|---|---|---|---|---|---|---|

2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | |

Slight Cold | 6-Jan | 5-Jan | 5-Jan | 6-Jan | 6-Jan | 5-Jan | 5-Jan | 6-Jan | 6-Jan |

Great Cold | 21-Jan | 20-Jan | 20-Jan | 20-Jan | 21-Jan | 20-Jan | 20-Jan | 20-Jan | 20-Jan |

Vernal Begins | 4-Feb | 4-Feb | 4-Feb | 4-Feb | 4-Feb | 4-Feb | 4-Feb | 4-Feb | 4-Feb |

Rain Water | 19-Feb | 18-Feb | 19-Feb | 19-Feb | 19-Feb | 18-Feb | 19-Feb | 19-Feb | 19-Feb |

Insects Awaken | 5-Mar | 5-Mar | 6-Mar | 6-Mar | 5-Mar | 5-Mar | 6-Mar | 6-Mar | 5-Mar |

Vernal Equinox | 20-Mar | 20-Mar | 21-Mar | 21-Mar | 20-Mar | 20-Mar | 21-Mar | 21-Mar | 20-Mar |

Clear and Bright | 4-Apr | 4-Apr | 5-Apr | 5-Apr | 4-Apr | 4-Apr | 5-Apr | 5-Apr | 4-Apr |

Grain Rain | 20-Apr | 20-Apr | 20-Apr | 20-Apr | 20-Apr | 20-Apr | 20-Apr | 20-Apr | 19-Apr |

Summer Begins | 5-May | 5-May | 5-May | 6-May | 5-May | 5-May | 5-May | 6-May | 5-May |

Grain Full | 21-May | 21-May | 21-May | 21-May | 20-May | 21-May | 21-May | 21-May | 20-May |

Grain in Ear | 5-Jun | 5-Jun | 6-Jun | 6-Jun | 5-Jun | 5-Jun | 6-Jun | 6-Jun | 5-Jun |

Summer Solstice | 21-Jun | 21-Jun | 21-Jun | 22-Jun | 21-Jun | 21-Jun | 21-Jun | 22-Jun | 21-Jun |

Slight Heat | 7-Jul | 7-Jul | 7-Jul | 7-Jul | 7-Jul | 7-Jul | 7-Jul | 7-Jul | 6-Jul |

Great Heat | 22-Jul | 23-Jul | 23-Jul | 23-Jul | 22-Jul | 22-Jul | 23-Jul | 23-Jul | 22-Jul |

Autumn Begins | 7-Aug | 7-Aug | 7-Aug | 8-Aug | 7-Aug | 7-Aug | 7-Aug | 8-Aug | 7-Aug |

Limit of Heat | 23-Aug | 23-Aug | 23-Aug | 23-Aug | 23-Aug | 23-Aug | 23-Aug | 23-Aug | 23-Aug |

White Dew | 7-Sep | 7-Sep | 8-Sep | 8-Sep | 7-Sep | 7-Sep | 8-Sep | 8-Sep | 7-Sep |

Autumnal Equinox | 22-Sep | 23-Sep | 23-Sep | 23-Sep | 22-Sep | 23-Sep | 23-Sep | 23-Sep | 22-Sep |

Cold Dew | 8-Oct | 8-Oct | 8-Oct | 8-Oct | 8-Oct | 8-Oct | 8-Oct | 8-Oct | 8-Oct |

Frost’s Descent | 23-Oct | 23-Oct | 23-Oct | 24-Oct | 23-Oct | 23-Oct | 23-Oct | 24-Oct | 23-Oct |

Winter Begins | 7-Nov | 7-Nov | 7-Nov | 8-Nov | 7-Nov | 7-Nov | 7-Nov | 8-Nov | 7-Nov |

Light Snow | 22-Nov | 22-Nov | 22-Nov | 23-Nov | 22-Nov | 22-Nov | 22-Nov | 22-Nov | 22-Nov |

Great Snow | 7-Dec | 7-Dec | 7-Dec | 7-Dec | 7-Dec | 7-Dec | 7-Dec | 7-Dec | 7-Dec |

Winter Solstice | 21-Dec | 22-Dec | 22-Dec | 22-Dec | 21-Dec | 22-Dec | 22-Dec | 22-Dec | 21-Dec |

*SM*, to replace the variable month, denoted by

*M*, in the benchmark model 1 to obtain our proposed model 2:

*SM*is a class variable representing the 24 solar terms of a year.

## 3 Case study

The case study is built based on the nine years of data introduced earlier. For each of the eight load zones and the system total, we implement two forecast evaluation settings, cross validation and sliding simulation.

The last six years (2011 – 2016) are used to conduct the 6-fold cross validation. The data is divided by 6 pieces based on the calendar year. Each time we use five years of data to estimate the parameters of the model and use them to predict the other year. All nine years (2008 – 2016) are used to conduct the sliding simulation. Each time three consecutive years of history is used for parameter estimation. The resulting model is then used for one-year ahead ex post forecasting.

*N*is the number of observations;

*A*

_{ i }is the actual load;

*P*

_{ i }represents the predicted load. We evaluate the performance of the models based on the MAPE of each validation year and the average of MAPEs across all validation years.

### 3.1 V-fold cross validation

Cross validation is a widely-used forecast evaluation technique to avoid the potential overfitting issues [19]. In this study, we adopt the V-fold cross validation (VFCV) technique. It first partitions the data into *V* equally (or nearly equally) sized segments. One of the *V* segments is used as validation data and the rest *V*-1 segments are used as training data. This process is repeated *V* times without replication of the validation dataset. The performance of the model is usually evaluated based on its average performance across these *V* segments. In this study, we divide six years (2011 to 2016) of data into six segments by the calendar year for cross validation. For example, when predicting the load for the year of 2012, we use the data from 2011 to 2016 excluding 2012 as the training data to estimate the parameters of the model.

MAPEs (in %) of the proposed (Pro.) and the benchmark (Ben.) model (6-fold cross validation)

Zone | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | Average | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | |

CT | | 4.49 | | 4.01 | | 3.70 | | 3.95 | | 4.29 | 4.74 | | | 4.18 |

ME | | 3.94 | 3.39 | | 3.20 | | | 4.36 | 3.57 | | 6.44 | | 4.11 | |

NEMASS | | 3.56 | | 3.39 | 3.47 | | | 3.48 | | 3.85 | 4.08 | | | 3.63 |

NH | | 3.90 | | 3.25 | | 3.28 | | 3.47 | | 3.54 | | 3.62 | | 3.50 |

RI | | 4.50 | | 3.66 | | 3.69 | | 3.70 | | 3.85 | | 3.69 | | 3.85 |

SEMASS | | 5.01 | | 3.86 | | 3.79 | | 3.74 | | 4.49 | | 4.80 | | 4.28 |

VT | | 3.79 | | 3.63 | | 3.58 | 3.32 | | 4.15 | | | 5.35 | | 3.95 |

WCMASS | | 4.02 | | 3.38 | | 3.42 | | 3.54 | | 4.22 | | 4.33 | | 3.82 |

ISONE | | 3.59 | | 3.04 | | 2.99 | | 3.05 | | 3.42 | 3.61 | | | 3.28 |

### 3.2 Sliding simulation

Another widely-used forecast evaluation technique is the sliding simulation [20]. Comparing with cross validation, sliding simulation better mimics the forecasting operations in real-world. Specifically, the sliding simulation technique allows the coefficients of a model to be estimated with a pre-defined length of historical window (e.g. three years) to forecast a period ahead (e.g. one year). For example, we first forecast the year of 2011 using the data from 2008 to 2010 as the training data. We then advance the forecast origin by a year to forecast the year of 2012 using the data from 2009 to 2011 as the training data. We repeat this rolling process until we complete the forecasts for all six years (2011-2016).

MAPEs (in %) of the proposed (Pro.) and the benchmark (Ben.) model (sliding simulation)

Zone | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | Average | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | Pro. | Ben. | |

CT | | 4.41 | | 3.97 | | 3.80 | | 4.17 | | 4.18 | 4.65 | | | 4.18 |

ME | | 3.50 | 3.08 | | | 4.58 | | 3.51 | | 4.50 | 5.72 | | | 4.12 |

NEMASS | | 3.26 | | 3.62 | 3.60 | 3.50 | | 3.67 | 3.96 | | 4.06 | | | |

NH | | 3.76 | | 3.12 | | 3.33 | | 3.90 | | 3.73 | | 3.42 | | 3.54 |

RI | | 3.89 | | 3.69 | | 3.66 | | 4.20 | 3.88 | | 3.70 | | | 3.83 |

SEMASS | | 3.65 | | 3.72 | | 3.88 | | 4.47 | | 4.64 | | 4.49 | | 4.14 |

VT | | 3.60 | | 3.61 | | 3.58 | | 3.74 | | 4.41 | 4.33 | | | 3.87 |

WCMASS | | 3.03 | | 3.51 | | 4.01 | 3.89 | | 4.41 | | | 5.20 | | 3.98 |

ISONE | | 3.13 | | 3.02 | 3.13 | | | 3.26 | | 3.54 | 3.41 | | | 3.23 |

## 4 Discussion

### 4.1 Other potential studies

In this paper, the sliding simulation was conducted for one-year ahead forecasting under the regression framework. While the results from the experiment demonstrated the superior performance of using the 24 solar terms than its counterpart that uses the 12 Gregorian calendar months. Additional empirical case studies are encouraged to investigate the performance of including the proposed solar term variable in load forecasting models for other forecast horizons and on other forecasting techniques.

The 24 solar terms were originated from ancient China and then widely used in Southeast Asia to guide the agriculture activities. In this paper, the same solar terms were used for the load forecasting for six different states in the northeast U.S. Although these six states are in a different climate zone than China or Southeast Asia, this case study showed that the solar terms can still effectively categorize the days of a year for the load forecasting purpose. Following a similar analysis framework, studies can be conducted for other climate regions in order to get a more comprehensive understanding on using solar terms for load forecasting.

### 4.2 Tradeoff

When a regression model includes class variables, the complexity of the model is affected by the number of levels of these class variables. For example, when the 24 solar terms are included in the vanilla benchmark model to replace the original 12 months of a year, 23 instead of 11 parameters need to be estimated for the class variable *SM*. The interaction terms between *SM* and temperature variables further increases the number of estimated parameters. Although estimating these two models takes almost the same time using the GLM procedure in SAS^{®}/STAT [21]: it takes 0.134 second to estimate the benchmark model 1 for one zone and one test year (i.e. one regression model) and takes 0.146 second to estimate the proposed model 2 for the same zone and year, but a more complex model may cause an overfitting issue as commented below.

When there are sufficient observations in the training data, it is probably fine to use a complicated model. In the case that the training data is limited, we may encounter the overfitting issues caused by too many parameters supported by insufficient amount of observations. However, avoiding the overfitting is both an art and science. The forecaster need to figure out a trade-off between the signal captured by the increased amount of variables and the noise over fitted by the same variables. A similar tradeoff analysis was presented in [4], where the authors showed that the inclusion of many (but not too many) variables helps improve the forecast accuracy.

Given the same training data and same model structure, using the solar terms with 24 levels is more likely to over fit the data than using the month with 12 levels. Meanwhile, the solar term model is more likely to capture more signals than the model based on 12 months. In the case study presented in this paper, the solar term model wins. Nevertheless, the performance of solar term models may degrade due to the overfitting issue as the underlying model structure becomes even more complicated, such as including the lagged and moving-average temperature variables interacting with the solar terms. This study points to another future research direction: how to group similar solar terms together to reduce the complexity of the model?

### 4.3 Data-driven models

In this paper, we have compared the use of 12 months and 24 solar terms. It is worth noting that there are other types of calendars being adopted in different parts of the world. An alternative to the use of these calendar models is a data-driven approach. Instead of adopting any calendar to come up with the “month-equivalent” variable, we can simply group the days of year via trial and error, such as enumerating the group size from 1 to 365 days. When the ultimate goal is to improve forecast accuracy, such a data-driven approach may be advantageous to its alternatives. However, the real world applications look for the features of a solution beyond its accuracy, such as interpretability, simplicity, consistency and defensibility. The practitioners tend to be more receptive to those conventional calendars, no matter Gregorian calendar or Chinese solar calendar, for these reasons.

## 5 Conclusion

In this paper, we used the solar terms as an alternative to the widely-used class variable month in the load forecasting model. The case study was conducted for the eight zones and the system total of ISONE for one-year ahead forecasting. The results from both V-fold cross validation and sliding simulation demonstrated the effectiveness of the proposed method in comparison to the Vanilla model using month to categorize the days of a year. Additional empirical studies are encouraged to test the effectiveness of the proposed method for load forecasting problem with other forecast horizon, using different modeling techniques, or for datasets in different climate zones. In addition, the findings from this paper also spark other interesting questions for future research. Recall that grouping the days of a week was discussed in [12] to reduce the complexity of the model. Following the similar idea, one may consider grouping the solar terms to overcome the potential overfitting issue as the underlying model grows with additional weather variables. The readers are welcome to test other calendars and data-driven methods as well.

## References

- [1]Hippert HS, Pedreira CE, Souza RC (2001) Neural networks for short-term load forecasting: a review and evaluation. IEEE Trans Power Syst 16(1):44–55CrossRefGoogle Scholar
- [2]Weron R (2006) Modeling and forecasting electricity loads and prices: a statistical approach. Wiley, New YorkCrossRefGoogle Scholar
- [3]Hong T, Fan S (2016) Probabilistic electric load forecasting: a tutorial review. Int J Forecast 32(3):914–938CrossRefGoogle Scholar
- [4]Wang P, Liu B, Hong T (2016) Electric load forecasting with recency effect: a big data approach. Int J Forecast 32(3):585–597CrossRefGoogle Scholar
- [5]Ramanathan R, Engle R, Granger CWJ et al (1997) Short-run forecasts of electricity loads and peaks. Int J Forecast 13(2):161–174CrossRefGoogle Scholar
- [6]Chen BJ, Chang MW, Lin CJ (2004) Load forecasting using support vector machines: a study on EUNITE competition 2001. IEEE Trans Power Syst 19(4):1821–1830CrossRefGoogle Scholar
- [7]Hong T, Pinson P, Fan S (2014) Global energy forecasting competition 2012. Int J Forecast 30(2):357–363CrossRefGoogle Scholar
- [8]Hong T, Pinson P, Fan S et al (2016) Probabilistic energy forecasting: global energy forecasting competition 2014 and beyond. Int J Forecast 32(3):896–913CrossRefGoogle Scholar
- [9]Xie J, Liu B, Lyu X et al (2015) Combining load forecasts from independent experts. In: Proceedings of North American Power Symposium (NAPS), Charlotte, USA, 4–6 Oct 2015, p 5Google Scholar
- [10]Hong T, Wilson J, Xie J (2014) Long term probabilistic load forecasting and normalization with hourly information. IEEE Trans Smart Grid 5(1):456–462CrossRefGoogle Scholar
- [11]Fan S, Hyndman RJ (2011) The price elasticity of electricity demand in South Australia. Energy Policy 39(6):3709–3719CrossRefGoogle Scholar
- [12]Hong T (2010) Short term electric load forecasting. Dissertation, North Carolina State UniversityGoogle Scholar
- [13]Goude Y, Nedellec R, Kong N (2014) Local short and middle term electricity load forecasting with semi-parametric additive models. IEEE Trans Smart Grid 5(1):440–446CrossRefGoogle Scholar
- [14]Lloyd JR (2014) GEFCom2012 hierarchical load forecasting: gradient boosting machines and gaussian processes. Int J Forecast 30(2):369–374CrossRefGoogle Scholar
- [15]Charlton N, Singleton C (2014) A refined parametric model for short term load forecasting. Int J Forecast 30(2):364–368CrossRefGoogle Scholar
- [16]Hagan MT, Behr SM (1987) The time series approach to short term load forecasting. IEEE Trans Power Syst 2(3):785–791CrossRefGoogle Scholar
- [17]ISO SMD Hourly Data (2016) ISO New England. https://www.iso-ne.com/isoexpress/web/reports/load-and-demand/-/tree/zone-info. Accessed 03 Mar 2017
- [18]Chinese Calendar (2017) Your Chinese astrology. http://www.yourchineseastrology.com/calendar/2008/. Accessed 03 Mar 2017
- [19]Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79MathSciNetCrossRefMATHGoogle Scholar
- [20]Tashman LJ (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. Int J Forecast 16(4):437–450CrossRefGoogle Scholar
- [21]SAS
^{®}/STAT 9.3 User’s Guide. https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_glm_sect001.htm. Accessed 04 Oct 2017

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.