Forecasting implied volatilities of currency options with machine learning techniques and econometrics models

Developing an effective modeling framework to minimize FX risk is of vital importance for hedgers and traders in FX markets. In this study, we compare the ability of Long short-term memory (LSTM) models to that of Random Forest and AR-GARCH-type models for forecasting EURUSD implied volatility across the volatility surface. As our literature study argues, there are only a few published papers on this subject. We �nd that the LSTM model is the best model for shorter option maturities, whilst the AR-GARCH model is superior when the maturities increase. However, imposing other speci�cations and residual distributions for the GARCH models, we �nd that the AR-GARCH framework outperforms the more advanced machine learning models for all options. We observe that the LSTM model is able to capture immense and immediate changes in implied volatility, which is important for hedging against signi�cant shifts in FX rates.


Introduction
Forecasting the implied volatility of foreign exchange (FX) options and nancial time series in general, is a challenging task mainly due to incomplete information and unprecedented changes in economic trends and conditions.However, as periods of transitions from a low to high market volatility regime can be abrupt and short-lived, the development of effective modeling framework is of critical importance for the design and implementation of active portfolio immunization strategies, in order to avoid sizeable drawdowns during periods of turmoil (Galakis & Vrontos, 2021).
Traditional econometric time series models struggle to capture non-linearity in data, incentivizing economic researchers to utilize more advanced models.Machine learning methods can alleviate the complexity in time series forecasting by identifying structures and patterns of data such as non-linearity and dependency between predictors.Particularly, LSTM (Long Short-Term Memory) has received increased focus in forecasting nancial time series, however, with mixed results.
Developing an effective modeling framework to minimize FX risk is of vital importance for hedgers and traders in FX markets.The subject of this study is to compare the predictive power of LSTM models to Random Forest and AR-GARCH-type models for forecasting implied volatility of options on the EUR/USD foreign exchange rate.
We structure our analysis into two parts: 1. Statistical distribution: Analyzing the univariate time series characteristics of the implied volatility components, including dependency structure.
2. Forecasting models: Evaluating and proposing forecasting models for implied volatility.
We use a comprehensive dataset of daily implied volatilities for maturities from one week through one year, covering a broad range of strike levels.The dataset spans important global macroeconomic events such as the nancial crisis, the euro sovereign debt crisis and the Covid19 outbreak.
We optimize an LSTM model on a training set of the data and compare its forecasting capability to a RF (Random Forest) model and AR-GARCH-type models.The estimator is the daily spot rates for the implied volatility for the EUR/USD FX options.We impose RF and a Gaussian distributed AR(1)-GARCH(1,1) as benchmark models and compare their forecasting performance to the more advanced LSTM model.In addition to the benchmark AR(1)-GARCH(1,1) model, we extend the analysis with models that include an asymmetric GARCH term, moving average terms, along with Gaussian distributed residuals and Student tdistributed residuals.The machine learning models are optimized using different hyperparameters, and we compare the best-tted structures for each option to the benchmark models.
Our ndings show that the LSTM model is better than the benchmark models for shorter option maturities, whilst the AR-GARCH model is superior when the maturities increase.However, when imposing other speci cations and residual distributions for the GARCH models, we nd that the AR-GARCH framework outperforms the more advanced machine learning models for all options.For shorter maturities the tdistributed models perform best, while ARIMA-GARCH-type models perform better for longer maturities.
Further, this study is organized as follows: Section 2 discusses previous publications on implied volatility and the models we apply in our analysis.Section 3 presents the data, including statistical and distributional behavior.Section 4 presents the theory behind our models, and further describes the methodology and model architecture.In section 5, we present results and ndings.Section 6 summarizes our ndings and concludes.

Literature Review
Garman and Kohlhagen (1983) derive an implied volatility modi cation to the Black-Scholes formula for option pricing, introduced by Fischer Black and Myron Scholes in 1973.According to Stan (1981), Latane and Rendleman (1976), several studies have shown that implied volatility is a better forecaster of future price variability than measurements based on history (Bharadia et al., 1996).In recent years implied volatility has become a common estimator for forecasting purposes.Ornelas and Mauad (2019) nd that the slopes of currency implied volatility term structures have predictive power for the behavior of exchange rates from both cross-sectional and time series perspectives.Carr et al. (2020) build a volatility index by formulating a variance prediction model using machine learning methods such as Feedforward Neural Networks and Random Forest on the S&P 500 index options.According to Haug et al. (2010), the standard derivation of implied volatility has an evident variation over time and declines as time to maturity increases.Time-varying properties entail a major challenge; volatility clustering.That is, small (big) changes in the volatility tend to be followed by small (big) changes in the volatility (Mandelbrot, 1963).
A profound approach to account for this was introduced in 1982 when Robert Engle introduced a non-linear model allowing the time-varying conditional variance to depend on the lagged values of the squared errors, the autoregressive conditional heteroskedasticity (ARCH) model.An extension to the ARCH model that allows the conditional variance to depend on lags of the conditional variance is the general ARCH model, or the GARCH model, introduced by Bollerslev (1986).The GARCH model is more parsimonious than the ARCH model.The GARCH model avoids over tting and is still today a much-applied modeling framework for nancial time series data.
In 1993, Glosten et al. formulated an extension to the GARCH model that accounted for an asymmetric response to a volatility shock, i.e., "good" news and "bad" news had different impacts on the subsequent period volatility, known as the GJR-GARCH.Lim and Sek (2013) found that in "normal times", that is in postand pre-crisis times, the symmetric GARCH performs well, and in times of big volatility uctuations, i.e., times of crisis, the asymmetric model is preferred.Schmidt (2021) argues that the asymmetric models are better forecasters for nancial indexes in the aftermath of the shock caused by the outbreak of the Covid-19 pandemic compared to the symmetric speci cations.Poon and Granger (2001) argue that the simpler GARCH models seem to provide more extensive volatility forecasts when compared to the more sophisticated models.In contrast, the GJR-GARCH seems to forecast lower values due to its asymmetry for the nancial markets, which helps this model to quickly revert from different volatility states.Ramasamy and Minusamy (2012) found that the asymmetric GJR does not improve the forecasting performance considerably compared to symmetric GARCH models.According to Javed and Mantalos (2013) the performance of information criteria for the GARCH(1,1) is satisfactory, compared to higher order GARCH speci cations.
The literature dedicated to implementing machine learning techniques for forecasting the implied volatility of FX options is scarce.However, some research exists regarding machine learning for predicting stock prices, returns and volatility.Galler and Kruzanowski (1993) 2021) study whether the application of machine learning approaches can outperform traditional econometric models in forecasting implied volatility indices.They conclude that certain machine learning techniques are strongly encouraged as they signi cantly improve the accuracy of the out-of-sample forecasts.However, they also report that the model accuracy is not consistent across all models.

Data -Distribution and Statistical Behavior
In this Section, we discuss the statistical properties and distribution of our data.The dataset, retrieved from Bloomberg, consists of daily observations of implied volatilities for eleven options with distinct levels of moneyness and ve different times to maturity during the period 02.01.2007 to 31.08.2021.This provides 55 distinct time series of implied volatility consisting of 164.670 observations, enabling us to analyze our models' forecasting performance for different maturities and distinct moneyness levels.Table 3.1 summarizes descriptive statistics for ATM (at-the-money) put options for the ve distinct maturities. [1]The variance of the volatilities declines as the time to maturity increases.The shorter maturities have both higher peaks and lower troughs of implied volatility.In comparison, the longer maturities have higher average levels of implied volatility, measured in both mean and median (50% quantile).In this study we use put and call options with OTM delta values of 5, 10, 18, 25, 35 and ATM put options with a delta equal 50.The level of implied volatility, measured in mean and different quantiles, is higher for options OTM than ATM or close to ATM, and it is higher for puts than for calls.This is also the case for the volatility (i.e., daily changes in the level of implied volatility).This distribution pattern is the same for all ve distinct maturities and is referred to as the volatility smile, which is visualized in Fig. 3.1.The implied volatility is higher for OTM put options than similar call options, consistent with a negative risk reversal that measures the volatility smile's skewness.It is most common to measure the risk reversal for call and put options with a delta of 25 (McDonald, 2014).When looking at the 25-delta risk reversal, on average for the whole data sample, all risk reversals are negative and increasingly negative as the time to maturity increases.Further, the risk reversals become increasingly negative as the options become increasingly OTM and can be interpreted as a market-based measure of implied skewness.When comparing historical values of the implied volatility for one week and one year options (see Fig. 3.2), the variation in implied volatility is immense.Since the one-week option has a shorter time to maturity, the implied volatility reacts more to news and small shocks and is more volatile.The cost of short-term options is lower than that of long-term options.According to nancial theory, FX option traders have limited capital, which results in higher demand for short-term options.This causes the longer maturity options to trend more, meaning it recovers more slowly from massive shocks than the shorter maturities, implying that the time series for the longer maturities are non-stationary.When we study different maturities, we observe a decline in how often the option crosses its mean, as the time to maturity increases.The fact that the amplitude of the daily changes in implied volatility declines will contribute to a change in the tails of the return distribution.Since there is a signi cant difference in the behavior and distribution of the different maturities and levels of moneyness, we expect that there will be a difference in which models will t better for the different options.We will come back to this in section 4.
Figure 3.2 Implied volatility for ATM put options with one week and one year to maturity Several macroeconomic factors impact the EUR/USD exchange rate, and its implied volatility.Our data set stretches from January 2007 to August 2021 and during this time, the nancial markets worldwide endured multiple shocks and events that impacted the EUR/USD exchange rate.Figure 3.2 exhibits the implied σ volatility for the ATM one week and one year option.Particularly after the nancial crisis in 2008, we see a considerable rise in implied volatility.Also, shocks such as the European debt crisis, US ceiling debt crisis, and in more recent years, the Covid-19 pandemic had a signi cant impact on the implied volatility.These different shocks captured in the data will affect the implied volatility, the daily return of the implied volatility, and the residual distribution of the options.Intuitively this can cause challenges for the GARCH-type models, considering the normality assumption for the distribution of the residuals.When looking at the tails of the distribution, the outliers result in fatter tails for the shorter maturities, implying a fatter tail than the normal distribution provides.We will further address these issues in section 5.5.
Different shocks to the data produce positive skewness for all option maturities, as illustrated by the righttailed empirical distribution of the ATM options in Fig. 3.3.The gure exhibits the ATM option for each maturity, and the distribution widens as the time to maturity increases.The empirical distribution has a double peak for the longest maturities, caused by more extended periods away from their respective mean value, as visualized in Fig. 3.2.The distribution pattern corresponds to OTM options, and our ndings regarding EUR/USD FX derivatives' statistical and distributional behavior align with earlier literature.Figure 3.3 Empirical distribution for ATM options for each maturity, Gaussian distribution drawn for each maturity

Methodology
In the following section we will brie y introduce the most important theory behind the models we apply, along with the methodology for the analysis.

Forecast Evaluation
To evaluate the forecasting performance of the different models we use mean squared error (MSE), root mean squared error (RMSE) and mean absolute error (MAE).These statistical measurements are given by the formulas 1 2 3 where is the implied volatility at time t, is the forecasted value of implied volatility at time t, and n is the number of observations.While the mean absolute error measures the average across errors, where the errors are weighted equally, the mean squared error penalizes higher errors.

Benchmark Models
We use GARCH and supervised Random Forest as our benchmark model.show that for all options, we can at any signi cance level reject the null hypothesis of no ARCH effects.We conclude that there is proof of autoregressive conditional heteroskedasticity in the squared residuals for all options.
In addition to the benchmark AR(1)-GARCH(1,1), we extend the analysis with a GJR-GARCH speci cation to analyze for asymmetric behavior of shocks in the conditional variance.We compare both in-sample goodness of t and out-of-sample forecast errors for all models, focusing on the out-of-sample forecast performance.Further, we test the data with additional lags of the AR(p) and MA(q) processes, and lastly, we perform the same regressions with a Student t-distribution from Eq. (7).Additional lags beyond an ARMA(1,1) process do not improve the model's forecasting accuracy and will not be included further in this study.
Non-stationary variables can produce spurious regressions, yielding a high and t-statistics which appear to be signi cant but without any economic meaning (Enders, 2015).To test for stationarity, we use the Augmented Dickey-Fuller test.
Based on the Augmented Dickey-Fuller test and the Phillips-Perron test, we cannot, at a 5% signi cance level, reject the null hypothesis that the options with time to maturity of 3 months and more follow unit root processes and are non-stationary.To avoid spurious regressions, we apply rst difference to all options with three months or more to maturity.When applying the rst difference for the options with one week and one month to maturity, the forecast error measured by MSE, RMSE and MAE is reduced, along with the in-sample goodness of t.Therefore, the benchmark AR(1)-GARCH(1,1) model is not differentiated for the shorter maturities, i.e., one week and one month to maturity.
The in-sample goodness of t for the models are compared by the information criteria Log-Likelihood, AIC and BIC.According to these information criteria, the in-sample model is considerably improved for all distinct options when adding a moving average (MA) term and other autoregressive term lags.When looking at the autocorrelation and partial autocorrelation, there are individual preferences of which lag of AR(p) and MA(q) terms should be included, and show signi cance, for the different options.When testing R 2 for asymmetries in the conditional variance, the threshold term improves the goodness of t and is statistically signi cant for all options across the level of moneyness and maturities.Based on these ndings, we can claim that there is an asymmetry in the volatility shocks, i.e., the volatility reacts differently to positive and negative shocks.When comparing the out-of-sample forecast accuracy measured in MSE, RMSE and MAE, the simple AR(1)-GARCH(1,1) outperformed the better in-sample speci ed models on shorter maturities.Interestingly, the better the in-sample model speci cation measured in goodness of t is, the poorer the out-of-sample forecasting accuracy is for the short maturity options.However, the AR(1)-GARCH(1,1)is used as a benchmark model for simplicity.

Random Forest Random Forest Model Architecture
We start our model architecture by implementing the Random Forest regressor from scikit learn (Pedregosa et al., 2011).The sklearn.ensemble.RandomForestRegressor has several parameters that can be tuned.When building a decision tree, each time a split is to happen, a random sample of m predictors are chosen from a total of p predictors.The default number of features m used when making splits in a random forest regression is m = p in Python's sklearn, where p is the number of predictors in the regression problem.Breiman (2001) argued that the optimal number of features should be the square root of p. Hastie et al. (2008) argued that p / 3 is the set of features best suitable for the Random Forest regressor.However, Geurts et al. (2006) researched the ratio empirically.They concluded that the optimal set of features is simply m = p.After running our Random Forest model for different sets of m on different options, we conclude that our model's optimal set of features is m = p (see Table 4.1).The next parameters we investigate are the number of trees and the length of the window size.Our initial model was equipped with ten trees and a window size of two.However, increasing the number of trees to the sklearn default of one hundred trees decreased the mean squared error signi cantly for options with a shorter time to maturity.We also ran the model with a window size architecture from two to forty.However, this was computationally too expensive to do for each option.Table 4.2 illustrates the mean squared error for an ATM option with one year to maturity.The table indicates that increments in window size do not improve the mean squared error.

Arti cial Neural Networks
Arti cial Neural networks are software implementations of the network of neurons present in the human brain.The neurons in the human brain can be thought of as organic switches as the neurons, depending on the strength of their electrical or chemical input, can change their output state.The neurons have millions of connections with other neighboring neurons (Neural Networks Tutorial -A Pathway to Deep Learning, 2017).This highly complex network allows the human brain to carry out its learning function by activating particular neural connections.The learning process includes feedback resulting in strengthened neural connection when the expected outcome occurs.In this Section we derive model the architecture for our LSTM model.

LSTM Data Split
First, we split our dataset into training, validation and test sets.This is important as the model will be near perfect if we feed the model with the test data.We settle on a 60:20:20 split, where 60% is used as the training set, 20% is used as the validation set and 20% is used as a test set.In other words, the rst 1797 of the rst observations in the dataset are used to train the model, the next 599 observations are used to improve the model and ne-tune hyperparameters and the last 599 observations are used for out-of-sample forecasts.
The date intervals are as follows: - Neural network algorithms are stochastic, i.e., they make use of randomness, such as initializing random weights, which will yield different results for a network that is trained on the same data.To improve the LSTM model, we use a random seed, which generates a long sequence of numbers, which will function as weights in the stochastic algorithm, and ensure that the same result occurs when we run the same model twice.
Several parameters require tuning to optimize the LSTM model.The common practice is to evaluate every possible combination of parameters on the validation set and choose the varieties that minimize the statistical properties.However, this approach becomes computationally expensive, especially with an increased window size.For this reason, we develop an architecture that starts by combining smaller sets of hyperparameters, and for each iteration, the hyperparameters increase by 10.
We start our architecture by implementing an LSTM model from the Keras functional API (Chollet, F., & others., 2015) with two hidden layers searching for hyperparameters.We tried using different stacks of LSTM layers, which was computationally expensive and did not improve the model.Even though the activation function ReLU has risen in popularity because of its computability e ciency (James, 2018) early stopping, which stops the network as the learning rate stops improving. [2]  The hyperparameters that are left to tune are the following: -Batch size -Hidden Neurons -Window Size

Batch Size
We use the previously stated model architecture to speed up the hyperparameter search to locate the optimal batch size.Our goal is to nd a batch size that minimizes the mean squared error, and the common practice is to increase the batch size by the power of two because of computational e ciencies (Kandel & Castelli, 2020).Figure 4.5 shows the results for the batch sizes and indicates an optimal batch size of 16.

Combinations of Window Sizes and Neurons
Further, we investigate the combinations of window sizes and hidden neurons, which are the parameters that largely affect the model's ability to learn.We implement the optimal batch size of 16 to our initial model and develop an architecture that combines the different neurons and window sizes from two to fty.Table 4.3 reports the mean squared errors for each combination for an ATM option with six months to maturity.The hyperparameter search indicates that a simple model of two lags and twenty hidden neurons minimizes the mean squared error for this speci c option.neurons along the vertical axis and window size along the horizontal axis.Optimal combination of neurons and window size is highlighted, and for this particular option the best combination of window size and neurons are 20 neurons and a window size of 2.

Forecasting Results
For the out-of-sample forecast, generally forecasting accuracy increases with moneyness and time to maturity.From one-week to maturity to one-year to maturity the average RMSE decreases from 0,7357 to 0,2417 for the GARCH model, 0,7303 to 0,2521 for LSTM and 0,8224 to 0,2473 for Random Forest.A decline of respectively 67,16%, 65,47% and 69,92%.The daily change in the implied volatility, computed as the absolute value of the average daily change for each maturity, declines by 76,21% when the maturity increases from one week to one year.The reduction in RMSE is therefore declining with longer maturities as expected beforehand.The daily change in the implied volatility is also lower for options ATM and options close to ATM than OTM options.It is also lower for call options than put options with the same option delta (negative risk-reversal).

ATM and OTM Options Summary
The results for an ATM put and an OTM put and call for each of the ve speci c times to maturity are presented in Table 5.1.The rst column indicates the options level of moneyness and time to maturity, the three following columns the forecasting accuracy of the AR(1)-GARCH(1,1) model, the three mid columns the results for the LSTM model, and the three rightmost columns the results for the Random Forest model.Table 5.1.Forecast performance for OTM put/call options and ATM put options 1 year call 5 0,0770 0,2776 0,1195 0,0815 0,2855 0,1232 0,0841 0,2900 0,1359 The benchmark AR-GARCH model is performing superior for both put and call options compared to the machine learning methods for options with longer maturities.For all options with maturities of three months or more, the AR-GARCH model outperforms the machine learning models.The Random Forest model has the lowest forecast performance of the three models for all options across the different maturities.On shorter maturities, i.e., for options with one week and one month to maturity, the LSTM and AR-GARCH model are the best-tted models.For an OTM put option with one week to maturity, the GARCH model is the best tted with an RMSE of 0,7913 whilst the LSTM model has an RMSE of 0,7976.The Random Forest model performs signi cantly poorer with an RMSE of 0,9098, an increase of 14,97% compared to the GARCH model.It is somewhat surprising that the GARCH outperforms the LSTM model for this particular option, considering this is the most volatile of the fty-ve options.The LSTM performs better in terms of MAE, meaning it is not as robust to outliers in the test set as the GARCH model.When we perform a Diebold-Mariano (DM) test for this option we see that there are no signi cant differences between LSTM and GARCH or LSTM and RF.However, the AR-GARCH is signi cantly better than the RF.According to the DM test, AR-GARCH is signi cantly better than RF for one-week options.AR-GARCH is signi cantly better than LSTM for the ATM option, but not for OTM options when time to expiration is one week.

One Week to Maturity
For all other options with a maturity of week, the LSTM model outperforms the benchmark models on RMSE and MAE.An exception is a put option with a delta of 35, where the benchmark GARCH model has an MAE 1,96% lower than the LSTM, and a put option with a delta of 10, for which both models have an RMSE of 0,7635, a forecast accuracy 7,85% better than the Random Forest model.On average, for all options with one week to maturity, the LSTM outperforms the GARCH model with 0,75% in RMSE and 2,77% measured by MAE, whilst LSTM is 11,07% and 14,99% lower than the Random Forest in terms of RMSE and MAE, respectively.These ndings show that the benchmark AR-GARCH model is not signi cantly poorer than the LSTM model at shorter maturities, whilst the LSTM outperforms the Random Forest model considerably.
For the one week to maturity there is no signi cant difference between LSTM and RF, according to the DM test.

One Month to Maturity
When time to maturity increases to one month, the results uctuate more.For the OTM and ATM put and call options depicted in Table 5.1, the benchmark AR-GARCH model seems to deliver the best forecast accuracy of the three models measured by RMSE.The LSTM performs equally well for the OTM call option and beats the benchmark AR-GARCH at MAE for the OTM put option.When comparing RMSE for all levels of moneyness, the benchmark AR-GARCH model performs on average 0,78% better than the LSTM model.However, the LSTM is, on average,  The RF overestimates the peaks from COVID-19 whereas the LSTM model underestimates these shocks.All through the test period, which stretches from the end of September 2018 to august 2021, the AR-GARCH ts the rapid changes in implied volatility better than the machine learning models, especially around the extensive shocks, whereas the implied volatility rises signi cantly.

Longer Maturities
For all a time to maturity of three months and longer, the simple benchmark AR(1)-GARCH(1,1) model proved superior to the more complicated machine learning models across all moneyness levels.Due to non-stationarity at a 5% signi cance level, the rst difference is applied for all options when the maturity surpasses three months.The benchmark AR-GARCH model is better than the LSTM model with increasing maturity.At the same time, the Random Forest comes closer to the LSTM with increasing maturity, measured in average RMSE.However, LSTM still outperforms the Random Forest for all moneyness levels except for a three-month put option with a delta of ve and a call option with a delta value of 35.On average, the difference in RMSE between the Random Forest and LSTM declines from 11,07% for one-week options to 2,80% for a one-year option.When comparing the MAE for the LSTM and Random Forest model, the differences are more signi cant, varying from the lowest for the three-month option at 8,24% to 14,99% for the one-week option.Interestingly, the MSE increases between the two models as the time to maturity increases with respectively 10,08% at six months and 11,25% when the time to maturity increases to one year, while the difference in RMSE decreases.When conducting the DM test on the longer maturities, the results indicate no statistically signi cant difference between the forecasts.This result is expected as the day-to-day changes in implied volatility decrease as maturity increases.

Other Findings
The distribution for the changes in implied volatility has high peaks and fat tails.As mentioned in section 3, the distribution changes with time to maturity.When regressing in-sample, we assume that the residuals follow a normal distribution.This precondition is applied for the in-sample regressions for the benchmark AR(1)-GARCH(1,1) model.Performing the same regressions assuming Student t-distribution, the average RMSE declined by 1,36% for one-week to maturity options, making the Student t-distributed model the superior model for forecasting compared to the benchmark AR-GARCH model.For options with one month to maturity, the Student t-distributed model performs 0,24% better than the benchmark AR-GARCH model.The t-distribution ts better for data with mean clustering and fat tails than the normal distribution.Our ndings support that the t-distributed models t the data better for shorter maturities.The benchmark AR(1)-GARCH(1,1) model with normal distribution is better than the t-distribution for the rst-order integrated options with a maturity of three months and more.We note that the in-sample goodness of t decreases with maturity for the Student t-distributed model compared to the normal distribution model used as the benchmark.

Conclusion
The main objective of this study is to compare the predictive power of LSTM models with that of Random Forest and AR-GARCH-type models, for forecasting implied volatility of currency options.All regressions are conducted on daily observations of the spot rate of implied volatility for EUR/USD FX options.Implied volatility is of interest to market participants for hedging and trading purposes.To our knowledge, there are only a few prior studies applying machine learning models to forecast the implied volatility of currency options (ref.section 2 above).
We nd that the AR-GARCH model outperforms the LSTM model for longer maturities, and the RF model was the poorest overall forecaster.LSTM is the better model for shorter maturities.Shorter maturity options are more volatile than the longer maturities.The LSTM seems to capture rapid changes better than the benchmark models, which is consistent with the ndings in previous literature.The LSTM model is able to capture immense and immediate changes in implied volatility, which is important for hedging against signi cant shifts in FX rates.
Overall, the Random Forest model is a poorer forecaster of implied volatility than the LSTM and AR-GARCH model for all moneyness levels and time to maturity.

Figures 5 . 4 , 5 .
Figures 5.4, 5.5 and 5.6, plot the forecasted values for ATM one month to maturity put options for the LSTM, RF and AR-GARCH model.The plot shows that the RF model has problems with the extensive shocks in implied volatility, especially around March 2020, when the COVID-19 pandemic had its outbreak worldwide.The RF overestimates the peaks from COVID-19 whereas the LSTM model underestimates these

Figure
Figure 3.1 Volatility smiles for the different maturities

3 . 1
Figure 3.1 Volatility smiles for the different maturities

Figure 3 . 1
Figure 3.1 Volatility smiles for each distinct maturity, calculated as an average across the data sample.Top left is for one year to maturity, top right six months, mid left is three months, mid right is one month and bottom one week to maturity.Level of implied volatility along the vertical axis, and level of moneyness along the horizontal axis.

Figure 4 .
Figure 4.1.80:20 data split for training and test set, option exhibited is ATM put option with six months to maturity.

Figure 4 .
Figure 4.1.80:20 data split for training and test set, option exhibited is ATM put option with six months to maturity.

Figure 4 . 4
Figure 4.4 Data split for training, validation and test set for LSTM model

Figure 4 . 4 LSTM
Figure 4.4 LSTM Data sets split into training, validation and test sets.

Figure 4 . 5
Figure 4.5 Optimal Batch size for LSTM network on ATM six months option

Figure 4 . 5
Figure 4.5 Optimal batch size for a Delta 50 Put Option with Six Months to Maturity

Figure 5 . 1 .
Figure 5.1.Forecast results for LSTM model on ATM one week to maturity option plotted against the actual spot rate for the implied volatility.

Figure 5 . 2 .
Figure 5.2.Forecast results for RF model on ATM one week to maturity option.

Figure 5 . 2 .
Figure 5.2.Forecast results for Random Forest model on ATM one week to maturity option plotted against the actual spot rate for the implied volatility.

Figure 5 . 3 .
Figure 5.3.Forecast results for the benchmark AR-GARCH model on ATM one week to maturity option plotted against the actual spot rate for the implied volatility.

Figure 5 . 4 .
Figure 5.4.Forecast results for LSTM model on ATM one month to maturity option plotted against the actual spot rate for the implied volatility.

Figure 5 . 5 .
Figure 5.5.Forecast results for Random Forest model on ATM one month to maturity option plotted against the actual spot rate for the implied volatility.

Figure 5 .
Figure 5.6.Forecast results for AR-GARCH model on ATM one month to maturity option plotted against the actual spot rate for the implied volatility.

Krauss et al. (2017) revealed that deep learning models performed exceptionally well in times of market turmoil. Also, Yu and LI's (2018) ndings are consistent with the claim that deep learning networks perform well during market turmoil. Yu and Li (2018) forecast the volatility of the Shanghai compos stock price index using LSTM and GARCH, where they only selected extreme values (highs
implemented deep learning to classify whether stock returns are positive or negative one-year-ahead.More recently, Krauss et al. (2017) used various machine learning models, such as deep learning and tree-based methods, to model S&P 500 constituents.and lows) and concluded that the LSTM model was superior.A paper somewhat similar to ours, is Namin and Namini (2018).They compare an Arima model and a univariate multistep LSTM model imposed by Brownlee (2016) on different stock indexes.They conclude that the LSTM model outperforms the ARIMA model.Galakis and Vrontos (

Table 3 .
2 Descriptive statistics of implied volatility for options with one week to maturity σ

Table 3 .
2 descriptive statistics of implied volatility for options with one week to maturity.Put with delta 50 is ATM and put and call options become increasingly OTM as the delta value decreases.Put 5 indicates a put option with an option delta of 5. Different quantiles measure the level of the implied volatility throughout the data sample.

Table 3 .
3Descriptive statistics for options with one year to maturity

Table 3 .
3 descriptive statistics of implied volatility for options with one year to maturity.Put with delta 50 is ATM and put and call options become increasingly OTM as the delta value decreases.Put 5 indicates a put option with an option delta of 5. Different quantiles measure the level of the implied volatility throughout the data sample.
Common practice is to use 80% of the dataset as a training set and 20% as an out of sample test set.This split produces 2396 observations for the training set and 599 for the out-of-sample test set.4.2.1 Econometric ModelTo con rm the presence of autoregressive conditional heteroskedasticity in the data, we perform Engle's Lagrange multiplier test, also known as an ARCH-LM test.The results from the ARCH-LM test can only be interpreted as an indication of whether ARCH effects are present or not, and according to Sjölander (2010), the test is biased in nite samples.It does not consider whether the stationarity constraints are met.Test results

Table 4 .
1 Search for Optimal Features.We selected m = p for our RF model.

Table 4 .
2 Search for Optimal Window Size.Increasing the window size has negligible effect on MSE.
, we choose to use the Keras LSTM built-in activation function tanh, as this function seems to work better for our datasets.We use Sigmoid for the recurrent activation function, and Adam as our optimizer.Adam is a variant of the mini-batch gradient descent that adjusts the learning rate at each iteration for each model parameter (Chollet, F., & others., 2015).Our model is speci ed to minimize the mean squared error.Initially, we construct our model architecture with 300 epochs, a batch size of 64, a window size of 50 and 50 hidden neurons.Another issue with LSTM is over tting.With excessive training, the model will learn the statistical noise in the training set, predicting the next value based on memory.To avoid over tting, we implement

Table 4 .
3 Reports the 36 Different Mean Squared Error Estimates for a Delta 50 Put Option with Six Months to Maturity.Window size

Table 4 .
3. Reported error estimates for different choices of neurons and window size.The number of

Table 5 .
1First column indicates the level of moneyness measured in delta for the different maturities.Delta 50 is the ATM option, and delta 5 is the OTM put and call option.The highlighted value indicates the best tted value for that particular option for MSE, RMSE and MAE, respectively.
1,56% more accurate measured by MAE.The benchmark AR-GARCH model captures the outliers, i.e., signi cant sudden changes in the volatility, better than the LSTM model.It is also interesting that the LSTM model outperforms both benchmark models in terms of RMSE and MAE for OTM call options, i.e., options with a delta of 25 and lower.Comparing the LSTM to the Random Forest model, the RMSE and MAE are 4,97% and 10,82% lower for the LSTM model.The LSTM model outperforms the Random Forest model more often for call options than for put options.According to the DM test, LSTM is signi cantly better than the RF for OTM call option, but not signi cantly better for ATM and OTM put options.Comparing LSTM to AR-GARCH model, the same applies here.LSTM is signi cantly better for OTM call option, and there is no signi cant difference for ATM and OTM put options.When comparing AR-GARCH to RF, RF is signi cantly poorer for ATM and OTM put options, and there are no signi cant differences between OTM call options.