Deep learning for derivatives pricing: a comparative study of asymptotic and quasi-process corrections

Funahashi, Hideharu

doi:10.1007/s10479-024-06114-1

Deep learning for derivatives pricing: a comparative study of asymptotic and quasi-process corrections

Original Research
Open access
Published: 08 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Annals of Operations Research Aims and scope Submit manuscript

Deep learning for derivatives pricing: a comparative study of asymptotic and quasi-process corrections

Download PDF

Hideharu Funahashi ORCID: orcid.org/0000-0002-4701-6515¹

204 Accesses
Explore all metrics

Abstract

In this study, we compare two methods for using neural networks to efficiently learn the price of derivatives. The first method, proposed by Funahashi (Quant Financ 21(4):575–592, 2021a), involves training the neural networks to learn the difference between the derivative price and its asymptotic expansion, rather than learning the derivative price directly. The target derivative price is then obtained by adding the approximate solution with the predicted value of neural networks. This method reduces the required amount of learning data, often by a factor of one hundred to one thousand, compared to the case where the derivative price is directly learned through neural networks. In the second method, established in this paper, the neural networks learn the difference between the derivative price written on the underlying asset price that follows the target complex stochastic process and the derivative price written on the underlying asset price that has a relatively simple stochastic process that has a closed-form solution for the target derivative prices. This method provides an alternative valuation method when no efficient approximate solution for the derivative value is observed and if one can arbitrarily determine the model parameters of the quasi-process that approximates the original process. We also propose a unified method to determine the model parameters of quasi-processes from underlying asset processes. These methods prove valuable in cases where general analytic solutions are absent, as seen in widely used financial models such as the stochastic volatility models. These cases involve time-consuming numerical calculations to generate learning data, highlighting the value of the two methods, which significantly compress calculation times.

Derivatives of feed-forward neural networks and their application in real-time market risk management

Article Open access 21 March 2022

The DeepONets for Finance: An Approach to Calibrate the Heston Model

Neural Networks and the Nonlinear Feynman–Kac Theorem Applied to Financial Options Pricing

Article Open access 03 July 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Practically, local volatility models (LVMs) and stochastic volatility models (SVMs), or hybrid models (LSVMs) that combine them, are widely used to express asset prices. However, exact analytical solutions for European option prices written on such underlying assets often do not exist. Conversely, in practice, the financial installments that have sufficient liquidity to be used for model calibration are mostly limited to European options. Therefore, the feasibility of an efficient calculation of European option prices is a bottleneck in the modeling of underlying assets. As some important studies, we refer to Rubinstein (1983), Dupire (1994), and Marris (1999) for LSVs and Hull and White (1987), Heston (1993), and Schöbel and Zhu (1999) for SVMs.

In LSVMs, a stock price S and its volatility v are assumed to follow the stochastic differential equation (SDE):

$$\begin{aligned} \left\{ \begin{array}{rcl} \displaystyle \frac{\textrm{d}S_t}{S_t} &{}=&{} (r(t) - d(t)) \textrm{d}t + \sigma (S_t, v_t) \textrm{d}W^S_t, \\ \textrm{d}v_t &{}=&{} \left( \theta (t) - \kappa (t) v_t \right) dt + \gamma (v_t) \textrm{d}W^v_t \end{array} \right. \end{aligned}$$

(1.1)

where the short rate r(t), the dividend rate d(t),^{Footnote 1}$\theta (t)$, and $\kappa (t)$ are deterministic functions of time t, $\sigma (s,v)$ is a deterministic function of asset price and volatility, $\gamma (v)$ is a deterministic function of volatility, and $W^{S}$ and $W^{v}$ are standard Brownian motions under the risk neutral probability measure with $ dW^{S}_t dW^{v}_t = \rho dt$. It is assumed throughout this study that $\sigma (s,v)$ and $\gamma (v)$ are infinitely differentiable regarding (s, v) and v, respectively. See, for example, Funahashi (2014) for LSVMs.

Many previous studies have challenged the pricing of financial derivatives that lack analytic solutions. For example, methods using finite difference methods (FDM) or Monte Carlo (MC) methods to numerically solve stochastic differential equations and obtain solutions have been widely proposed. See, for example, Duffy (2006) for FDM and Brigo and Mercurio (2006); Valentina (2023) for MC. However, calibrating the model using these methods becomes computationally intensive, as it involves solving an optimization problem through techniques such as Newton–Raphson, simplex, and Levenberg–Marquardt. These techniques iteratively determine the model parameters until a stopping criterion is met. Practically, frequent re-calibration is required for trading desks and risk management purposes. Hence, the efficient calculation of European option prices remains a significant bottleneck in the modeling of underlying assets.

Many approximation methods have been proposed to fill this gap. In this study, we refer to some approximation approaches that have high generality. Fouque et al. (2003) used the singular perturbation method for stochastic volatility models and asymptotically expanded the partial differential equation around the invariant distribution of a stochastic volatility model and calculated the approximate solution of the option price. Hagan et al. (2002) derived the pricing formula for European call and put options when the underlying asset follows the SABR model using the singular perturbation method. Methods for approximating the transition probability and likelihood function in diffusion processes have also been proposed in Aït-Sahalia (2002, 2008). The author approximated the transition probability by expanding it in Hermite polynomials and calculated the approximate solution of the log-likelihood function when the underlying asset follows a multidimensional process. The asymptotic expansion method is also an active approach of discussion. This approach is based on the idea of expanding the solution of a stochastic differential equation (SDE) into a power series of a small parameter, such as volatility. By truncating the series at a certain order, one can obtain an approximate solution of derivative prices, or its Greeks. Takahashi (1999) applied this method to compute the European and average call options under general Markovian processes of underlying asset prices. Funahashi (2014) applied the Wiener-Itô Chaos expansion to derive an approximate closed formula of the vanilla option prices under the LVMs and SVMs, respectively.

The use of artificial intelligence in the financial industry has made remarkable progress in the past 10 years. Particularly, in the field of financial engineering and quantitative finance, deep learning (DL) of artificial neural networks (ANNs) has been used to solve the problems of hedging^{Footnote 2} and derivative pricing. Efforts to use DL and ANN as alternatives to numerical computation and approximate solutions are actively being researched. Neural networks have the potential to handle derivatives with complex payoffs that cannot be analytically solved by traditional mathematical models because of their high ability to approximate nonlinear functions. Moreover, it can efficiently compute several derivatives because of its rapid computation and parallel processing. An excellent, comprehensive review of the literature can be found in Ruf and Wang (2020).

Neural networks comprise two processes: Training, which is the process of teaching the training data to the neural network model, and prediction, which is the process of estimating values using the learned results. The former requires generating several training data and correcting the hidden layer parameters of the neural network through supervised learning, which is a time-consuming process; however, the latter can rapidly estimate values. One advantage of using neural networks for pricing derivatives is that they can perform the time-consuming learning process offline and use online estimation in daily training to swiftly calculate prices.

Hernandez (2017) applied the neural network to the calibration procedure to fix the pricing model parameters for the highly liquid financial instrument. The author trained the feed forward neural network to directly return calibrated parameters of a derivative pricing model. More specifically, after all of the training sets were generated using an option pricing formula, he switched the option price and model parameters in order to directly return the calibrated parameters of the pricing model. He showed that this inverse mapping method directly performs the calibration task using neural networks. An important advantage of this method is that it eliminates the need for iterative calculations by pre-learning the relationship between model parameters and model prices with a neural network.

Itkin (2015) suggested some limitations of the inverse mapping method proposed by Hernandez (2017), such as the lack of control over the inversion function. Therefore, the two-step process has recently become the dominant method: First, a feedforward neural network is trained offline with simulated data to estimate the value function for a given asset pricing model. Subsequently, model parameters are calibrated online with a traditional optimization method. In this process, the price calculation formula depends on the rapid predictions made by the neural network. Similar methods have been adopted in various studies, such as Liu et al. (2019a, b) and Horvath et al. (2021), and the references they cite. These studies examined different pricing models, such as the Black-Scholes, Heston (1993), and Bates (1996) models, and the rough Bergomi model. The results showed that ANNs can greatly reduce computation time, thus reducing the importance of calibration speed in model selection.

However, these methods require a high computational cost due to the offline training. This is because, training the ANN model for option pricing directly requires a large number of numerical simulations to produce training and testing data. To achieve a high level of accuracy in derivative pricing, neural networks usually need between 100,000 and 1,000,000 training data points. The actual number of data points depends on factors such as the contract’s expiry date, the volatility of the underlying asset, and the volatility of the volatility. However, financial firms handle thousands of products, each associated with several pricing models. To get these training data points offline, they must run numerical simulations such as MC and PDE for each product and model combination. This task is remarkably computationally intensive, even if it is performed only once or a few times a year.

Efficiently training neural networks is an important area of research in the field of derivative pricing. McGhee (2018) proposed an accurate integration scheme for the SABR model (instead of the two-factor finite difference scheme, which is more accurate but time-consuming) and computed it 300,000 times to generate data sets for training and testing the ANN models. The author showed that an ANN could construct highly efficient representations of both integration and the two-factor finite difference scheme. Funahashi (2021a) combined the advantages of asymptotic expansion (AE) and neural networks (ANNs) by training an ANN to learn the residual term between the option price C and its asymptotic approximation $\bar{C}$. This improved the stability and approximation accuracy of previous research methods because (i) the option price, C, can start from an approximated price ${\bar{C}}$ that is adjacent to the original value, and moreover, the variance of the training data is reduced significantly, (ii) the residual terms can be approximated in a smooth and infinitely differential function and (iii) the derivative of the residual term with respect to volatility is no longer bell-shaped and the exploding gradients are less likely to occur. See also Buccioni (2023) for a detailed discussion of this approach. Funahashi (2021a) showed empirically that their method can safely reduce the training set size by approximately a hundredth to thousands of standard ANN trainings with fewer layers and nodes, making the ANN training and prediction more robust. This method lowers the computational cost of offline procedures, which are computationally expensive, and simultaneously increases the stability and accuracy of the online prediction of derivative prices. Funahashi (2023) applied the same method to price options under the SABR model. The author trained an ANN to learn the difference between the implied volatility values obtained by numerical computation and those obtained by Hagan’s approximation formula. This enables one to calculate the implied volatility of deep-in-the-money and deep-out-of-the-money options more accurately and efficiently than conventional approximation methods. However, this method is not flawless either. Approximation methods are common in option pricing and many useful ones exist in the literature. Nonetheless, as the model and product become more realistic and complex, approximation formulas often become either unavailable or cumbersome and challenging to compute.

A similar approach is proposed by Kienitz et al. (2020). The authors used the difference between a target option price C on an original underlying asset process S, and the option price ${\bar{C}}$ on a completely different model ${\bar{S}}$, which has a tractable solution of ${\bar{C}}$. The authors regard this method as the control variates (CV)^{Footnote 3} for neural networks; similar to applications for MC simulations, they used a completely different model for the approximate price to improve the quality of deep learning applied to option pricing problems. The ANN with quasi-process correction with a different model has lower accuracy and slower convergence speed than the direct approximation formula with the same model as Funahashi (2021a) did. However, it is more widely applicable because it does not require a complex approximation formula. This makes it easier to implement and use, hence it can be applied to a wider range of options. Notably, as the approximation order increases, the asymptotic approximation becomes tedious and messy, and expansion terms increase exponentially. Kienitz et al. (2020) do not reveal how to decide on a suitable model and the parameters that correspond to the model; hence, their method is not suitable for practical applications. Accordingly, one of the aims of this study is to establish a unified approach to determining suitable parameters and appropriate models for the ANN with quasi-process correction. As will be demonstrated shortly, if one chooses a model with a different distribution than the original one, the convergence will be slow, especially in deep-in-the-money and deep-out-of-the-money cases, resulting in worse predictions compared to the direct ANN learning.

This paper is organized as follows: The next section introduces three previous studies that form the basis of this study. First, we provide an overview of the asymptotic approximation for the price of derivatives. Second, we introduce Funahashi’s (2021a) method for improving the stability of neural networks by learning the difference between the price of derivatives and the asymptotic approximation. Third, we explain the quasi-process correction for neural networks. In Sect. 3, we propose and establish a new unified approach for determining suitable parameters and appropriate models for the ANN with quasi-process correction. Section 4 is devoted to numerical examples. By comparing the methods of Funahashi (2021a) and Kienitz et al. (2020) using the European and Barrier options, we show that the former method has significantly higher accuracy for learning and prediction than the latter method. However, as will be observed, the latter method does not require a complex approximation, and if one can appropriately select the base approximation model and set the correct parameters of the selected models, the latter method requires one of ten training data sets compared to directly learning the price of derivatives even if a relatively simple model is used. The latter method proves particularly advantageous in cases where efficient approximation prices for derivatives are unavailable. In Sect. 5, we examine the circumstances in which our proposed neural network effectively learns derivative prices in the context of a complex local stochastic volatility model. We utilized the SABR model as a foundational quasi-process. Finally, Sect. 6 concludes this paper.

2 Backgrounds

Before proposing our deep learning method for derivative pricing, this section summarizes the results obtained in previous studies to provide a foundation for the analysis given in the next section. To enhance intuitive understanding, in this section, we begin with a relatively simple model and progressively extend it to LSVM (1.1).

The price of the underlying asset $S_t(\omega ) = S_t$ for ${0 \le t \le T}$ is assumed to follow the SDE

$$\begin{aligned} \frac{\textrm{d}S_t}{S_t} = \left( r(t) - d(t) \right) \textrm{d}t + \sigma (S_t, t) \textrm{d}W^S_t \end{aligned}$$

(2.1)

where $\{W_t\}_{t \ge 0}$ is a standard Brownian motion under the risk-neutral measure. This model is called a local volatility model, which is a simple version of (1.1).

Suppose that the SDE (2.1) has the solution, and we denote $\Vert g \Vert _{t}^2 = \int _{0}^{t} g^2(u)\textrm{d}u$, $J_t(g)=\int _{0}^{t} g(u)\textrm{d}W^S_u$, we then apply Itô’s formula to obtain

$$\begin{aligned} S_t = F(0,t) \exp \left[ J_t(\sigma ) - \frac{1}{2} \Vert \sigma \Vert ^2_t \right] \end{aligned}$$

(2.2)

where $F(0,t)= S_0 \textrm{e}^{\int _0^t (r(s)-d(s)) \textrm{d}s}$ is the forward price.

2.1 Wiener Itô Chaos expansion

We assume the following condition, which can be regarded as a stochastic version of the Picard iteration:

Assumption 2.1

Let $S^{(0)}_t = F(0,t)$, where $F(0,t)= S_0 \textrm{e}^{\int _0^t (r(s)-d(s)) \textrm{d}s}$, and $S_t^{(m)}$ is defined successively by

$$\begin{aligned} S_t^{(m+1)} = F(0,t) \exp \left[ J_t(\sigma ^{(m)}) - \frac{1}{2} \Vert \sigma ^{(m)} \Vert ^2_t \right] , \end{aligned}$$

(2.3)

where $\sigma ^{(m)}(t) = \sigma (S_t^{(m)}, t)$. It is assumed throughout the rest of the study that $S_t^{(m)}(\omega )$ converges to $S_t(\omega )$ as $m \rightarrow \infty $ for P-a.s. $\omega \in \Omega $.

Although Funahashi and Kijima (2015) gave a sufficient condition for the convergence in Assumption 2.1, the condition is often too strong for practical uses. Hence, we only assume the successive substitution (2.3) in the following development.

Under this assumption, the third-order chaos expansion approximation of the process is as follows:

$$\begin{aligned} X_t : =\frac{S_t}{F(0,t)} - 1 = a_1(t) + a_2(t) + a_3(t) + R_4 , \end{aligned}$$

(2.4)

where $R_n$ represents the contributions of the nth or higher-order multiple stochastic integrals. $a_1$, $a_2$, $a_3$ are first, second, and third-order chaos expansion terms, respectively:

$$\begin{aligned} a_1(t)= & {} \int \limits _{0}^{t} p_{1}(s) \textrm{d}W^S_{s}, \quad a_2(t) = \int \limits _{0}^{t} p_{2}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) \textrm{d}W^S_{u} \right) \textrm{d}W^S_{s}, \\ a_3(t)= & {} \int \limits _{0}^{t} p_{3}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) \left( \int \limits _{0}^{u} \sigma _{0}(r) \textrm{d}W^S_{r} \right) \textrm{d}W^S_{u} \right) \textrm{d}W^S_{s}, \\{} & {} \quad +\, \int \limits _{0}^{t} p_{4}(s) \left( \int \limits _{0}^{s} p_{5}(u) \left( \int \limits _{0}^{u} \sigma _{0}(r) \textrm{d}W^S_{r} \right) \textrm{d}W^S_{u} \right) \textrm{d}W^S_{s}. \end{aligned}$$

Notably, $a_1(t)$ follows a normal distribution with zero mean and variance $\Sigma _t = \int _0^t p_1^2(s) \textrm{d}s$. $p_k(t)$ are all deterministic functions

$$\begin{aligned} p_{1}(s):= & {} \left\{ \sigma _{0}(s) + F(0,s) \sigma '_{0}(s) \left( \int \limits _{0}^{s} \sigma ^{2}_{0}(u) \textrm{d}u \right) + \frac{1}{2} F^2(0,s) \sigma ''_{0}(s) \left( \int \limits _{0}^{s} \sigma ^{2}_{0}(u) \textrm{d}u \right) \right\} ,\\ p_{2}(s):= & {} \sigma _{0}(s) + F(0,s) \sigma '_{0}(s), \\ p_{3}(s):= & {} \sigma _{0}(s) + 3 F(0,s) \sigma '_{0}(s) + F^2(0,s) \sigma ''_{0}(s),\\ p_{4}(s):= & {} \sigma _{0}(s) + F(0, s) \sigma '_{0}(s), \\ p_{5}(s):= & {} F(0, s) \sigma '_{0}(s), \end{aligned}$$

with $\sigma _0(t) = \sigma (F(0,t), t)$, $\sigma '_0(t) = \partial _x \sigma (x, t)|_{x=F(0,t)}$, $\sigma ''_0(t) = \partial _{xx} \sigma (x, t)|_{x=F(0,t)}$.

We can justify this approximation through an analysis based on small volatility expansion, where we assume small volatility. Let us denote $f_0(t) = \sigma _0(t)$, $f_i(t) = p_i(t)$ for $i = 1, \ldots , 5$ and $\bar{f}(t) = \max _k f_k(t) \in L_2([0,t])$ for all t. We then have

$$\begin{aligned} \mathbb {E}[a_n^2] \le \Vert f\Vert _t^{2n} /n! \end{aligned}$$

(2.5)

Therefore, if $\Vert \bar{f_k}\Vert _t$ is sufficiently small, the sum of iterated integrals beyond the nth order can be approximated as zero. More intuitively, to emphasize that the volatility is small, we rewrite $\sigma _0 \rightarrow \epsilon \sigma _0$ and $p_i(t) \rightarrow \epsilon p_i(t)$ and obtain

$$\begin{aligned} a_1(t) \rightarrow \epsilon a_1(t), \ \ a_2(t) \rightarrow \epsilon ^2 a_2(t), \ \ a_3(t) \rightarrow \epsilon ^3 a_3(t), \ \ R_4 \rightarrow \epsilon ^4 R_4 . \end{aligned}$$

(2.6)

We now insert these results into (2.4) to get

$$\begin{aligned} X_t: =\frac{S_t}{F(0,t)} - 1 \approx \epsilon a_1(t) + \epsilon ^2 a_2(t) + \epsilon ^3 a_3(t) + O(\epsilon ^4). \end{aligned}$$

(2.7)

Moreover, we note that because the right-hand side of (2.5) is divided by the factorial of n, it accelerates the convergence speed of $\mathbb {E}[a_n^2] \rightarrow 0 \ (n \rightarrow \infty )$ and improves the approximation accuracy of (2.4). In this study, we omit terms involving iterated integrals higher than the third order; $R_4 \approx 0$ for $n \ge 4$.

Let $\Psi ^S(\xi )$ be the characteristic function of $X_t$. Substituting (2.4), we obtain

$$\begin{aligned} \Psi ^S(\xi )= & {} E[\textrm{e}^{i \xi X_t}] \approx E[\textrm{e}^{i \xi (a_1(t) + a_2(t) + a_3(t))}] \nonumber \\= & {} E \left[ \textrm{e}^{i \xi a_1(t)} \left\{ 1 + i \xi a_2(t) + i \xi a_3(t) - \frac{1}{2} \xi ^2 a^2_2(t) + R_4 \right\} \right] \end{aligned}$$

(2.8)

Based on the same approximation strategy to ignore $R_4 \approx 0$, we get

$$\begin{aligned} E[\textrm{e}^{i \xi a_1(t)} R_4] \le E[|\textrm{e}^{i \xi a_1(t)}|^2]^{\frac{1}{2}} E[|R_4|^2]^{\frac{1}{2}} = E[|R_4|^2]^{\frac{1}{2}} \approx 0. \end{aligned}$$

Hence, $\Psi ^S(\xi )$ reduces to

$$\begin{aligned} \Psi (\xi )\approx & {} E \left[ \textrm{e}^{i \xi a_1(t)} \right] + i \xi E \left[ \textrm{e}^{i \xi a_1(t)} E[a_2(t) | a_1(t) ] \right\} \nonumber \\{} & {} + i \xi E \left[ \textrm{e}^{i \xi a_3(t)} E[a_2(t) | a_1(t) ] \right] - \frac{1}{2} \xi ^2 E \left[ \textrm{e}^{i \xi a_1(t)} E[a^2_2(t) | a_1(t)] \right] . \end{aligned}$$

(2.9)

However, using formulas provided in Appendix D of Funahashi and Kijima (2015), which present one-dimensional (1D) versions of Lemma 2.1 in Takahashi (1999), we can explicitly compute the conditional approximations as follows:

$$\begin{aligned} \mathbb {E}[ a_{2}(t) | a_{1}(t) = x ]= & {} q^S_{1}(t) \left( \frac{x^{2}}{(\Sigma ^S_{t})^{2}}- \frac{1}{\Sigma ^S_t} \right) , \end{aligned}$$

(2.10)

$$\begin{aligned} \mathbb {E}[ a_{3}(t) | a_{1}(t) = x ]= & {} q^S_{2}(t) \left( \frac{x^{3}}{(\Sigma ^S_{t})^{3}}- \frac{3x}{(\Sigma ^S_{t})^{2}} \right) , \end{aligned}$$

(2.11)

$$\begin{aligned} \mathbb {E}[ a^2_{2}(t) | a_{1}(t) = x ]= & {} q^S_3(t) \left( \frac{x^{4}}{(\Sigma ^S_{t})^{4}} - \frac{6x^{2}}{(\Sigma ^S_{t})^{3}} + \frac{3}{(\Sigma ^S_{t})^{2}} \right) + q^S_{4}(t) \left( \frac{x^{2}}{(\Sigma ^S_{t})^{2}}- \frac{1}{\Sigma ^S_{t}} \right) \nonumber \\{} & {} + q^S_{5}(t), \end{aligned}$$

(2.12)

where the exact formulas of deterministic functions $\Sigma _t$ and $q_i(t)$ are given as

$$\begin{aligned} \Sigma ^S_t= & {} \int \limits _{0}^{t} p_1^2(s) ds, \\ q^S_{1}(t)= & {} \int \limits _{0}^{t} p_{1}(s) p_{2}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) p_{1}(u) \textrm{d}u \right) \textrm{d}s, \\ q^S_{2}(t)= & {} \int \limits _{0}^{t} p_{1}(s) p_{3}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) p_{1}(u) \left( \int \limits _{0}^{u} \sigma _{0}(r) p_{1}(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \\{} & {} +\int \limits _{0}^{t} p_{1}(s) p_{4}(s) \left( \int \limits _{0}^{s} p_{1}(u) p_{5}(u) \left( \int \limits _{0}^{u} \sigma _{0}(r) p_{1}(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s, \\ q^S_3(t)= & {} q^2_1(t),\\ q^S_{4}(t)= & {} 2 \int \limits _{0}^{t} p_{1}(s) p_{2}(s) \left( \int \limits _{0}^{s} p_{1}(u) p_{2}(u) \left( \int \limits _{0}^{u} \sigma ^2_{0}(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \\{} & {} + 2 \int \limits _{0}^{t} p_{1}(s) p_{2}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) p_{2}(u) \left( \int \limits _{0}^{u} \sigma _{0}(r) p_{1}(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \\{} & {} + \int \limits _{0}^{t} p^2_{2}(s) \left( \int \limits _{0}^{s} \sigma _{0}(u) p_{1}(u) \textrm{d}u \right) ^{2} \textrm{d}s, \\ q^S_{5}(t)= & {} \int \limits _{0}^{t} p^2_{2}(s) \left( \int \limits _{0}^{s} \sigma ^2_{0}(u) \textrm{d}u \right) \textrm{d}s. \end{aligned}$$

Remark 2.1

As an example, when S(t) follows a displaced diffusion (DD) model

$$\begin{aligned} \textrm{d}S_t = (r(t) - d(t)) S_t \textrm{d}t + \bar{\epsilon } \left( \bar{\beta } S_t + (1 - \bar{\beta }) F(0,t) \right) \textrm{d}W^S_t, \end{aligned}$$

(2.13)

we have $\sigma _0(s) = \bar{\epsilon }$, $p_1(s) = \bar{\epsilon }$, $p_3(s) = \bar{\beta } $, $p_4(s) = \bar{\beta } $, and $p_5(s) = -(1 - \bar{\beta }) \bar{\epsilon } $. Hence, the six deterministic functions, $\Sigma _t$ and $q_i \ (i=1, \ldots , 5)$, $\Sigma ^S_t = \bar{\epsilon }^2 t$, $q_1^S(t) = \frac{1}{2} \bar{\beta } \bar{\epsilon }^4 t^2$, $q_3^{S}(t) = \frac{1}{6} \bar{\beta }^2 \bar{\epsilon }^6 t^3$, $q_3^S(t) = \frac{1}{4} \bar{\beta }^2 \bar{\epsilon }^8 t^4$, $q_4^S(t) = \bar{\beta }^2 \bar{\epsilon }^6 t^3$, $q_5^{S}(t) = \frac{1}{2} \bar{\beta }^2 \bar{\epsilon }^4 t^2$.

Because $a_1(t)$ follows a normal distribution with zero mean and variance $\Sigma ^S_t$ and the conditional expectations in $\Psi ^S(\xi )$ are polynomial functions, one can directly apply the Fourier inversion formula to obtain the probability distribution function $f_{X_t}(x)$. More specifically, for any polynomial functions h(x) and g(x), we have

$$\begin{aligned} \frac{1}{2 \pi } \int \limits _{\mathcal {R}} \textrm{e}^{-iky} g(-ik) \mathbb {E}\big [h(Y) \textrm{e}^{ikY} \big ] \textrm{d}k = g\left( \frac{\partial }{\partial y}\right) h(y) n(y;0,\Sigma ) , \end{aligned}$$

(2.14)

where $Y \sim N(0, \Sigma )$ and n(x; a, b) is the normal density function with mean a and variance b. Notably, the aforementioned formula is easily obtained by differentiating both sides of

$$\begin{aligned} \frac{1}{2 \pi } \int \limits _{\mathcal {R}} \textrm{e}^{-i k y} \mathbb {E}\big [h(Y) \textrm{e}^{ikY}\big ] \textrm{d}k = h(y) n(y;0,\Sigma ) \end{aligned}$$

regarding y.

By applying (2.14) to each term in (2.9), the probability density function of $X_t$ is approximated as

$$\begin{aligned} {\tilde{f}}_{X_t}(x)= & {} \frac{1}{2} n\left( x; 0, \Sigma _{t} \right) \bigg [ \frac{q_{3}(t)}{\Sigma _{t}^{3}} h_{6} \left( \frac{x}{\sqrt{\Sigma _{t}}} \right) + \frac{\left( 2 q_{2}(t) + q_{4}(t) \right) }{\Sigma _{t}^{2}} h_{4} \left( \frac{x}{\sqrt{\Sigma _{t}}} \right) \nonumber \\{} & {} + \frac{2 q_{1}(t)}{\left( \sqrt{\Sigma _{t}} \right) ^{3}} h_{3} \left( \frac{x}{\sqrt{\Sigma _{t}}} \right) + \frac{q_{5}(t)}{\Sigma _{t}} h_{2} \left( \frac{x}{\sqrt{\Sigma _{t}}} \right) + 2 \bigg ], \end{aligned}$$

(2.15)

where $h_n(t)$ is Hermite polynomial of order n:

$$\begin{aligned} h_{n}(x) = (-1)^{n} \textrm{e}^{x^2/2} \frac{\textrm{d}^{n}}{\textrm{d}x^{n}} \textrm{e}^{-x^2/2}, \quad n=1,2, \dots , \end{aligned}$$

(2.16)

with $h_0(x)=1$.^{Footnote 4}

Note that the value of the European call option with maturity T and strike K is given by

$$\begin{aligned} C^S(t) = \mathbb {E}\left[ \textrm{e}^{-\int \limits _{0}^{t} r(s) \textrm{d}s} \left( S_t - K \right) ^{+} \right] = S(0) \int \limits _{-\widetilde{K}}^{\infty } \left( x + \widetilde{K} \right) f_{X_t} (x) \textrm{d}x, \end{aligned}$$

where $\widetilde{K}:=1-\frac{K}{F(0,t)}$. Hence, it follows that

$$\begin{aligned} C^S(t) \approx C^S_\textrm{App}(t) \end{aligned}$$

where the approximation holds by following our approximation strategy, and $C^S_\textrm{App}(t) $ is given by

$$\begin{aligned} C^S_\textrm{App}(t)\approx & {} \frac{S_0 n(\widetilde{K};0, \Sigma _t)}{2} \bigg [ \frac{q_3(t)}{\Sigma _t^{2}} h_4 \left( \frac{\widetilde{K}}{\sqrt{\Sigma _t}} \right) + \frac{\left( q_4(t) + 2 q_2(t) \right) }{\Sigma _t} h_2 \left( \frac{\widetilde{K}}{\sqrt{\Sigma _t}} \right) \nonumber \\{} & {} -2 \frac{q_1(t)}{\sqrt{\Sigma _t}} h_1 \left( \frac{\widetilde{K}}{\sqrt{\Sigma _t}} \right) + q_5(t) + 2 \Sigma _t \bigg ] \nonumber \\{} & {} + S_0 \widetilde{K} \left( 1 - \Phi (-\widetilde{K} /\sqrt{\Sigma _t}) \right) , \end{aligned}$$

(2.17)

where $\Phi (x)$ is the cumulative distribution function of the standard normal distribution.

Remark 2.2

Thus far, we have derived the approximation formula for European options in the framework of the local volatility model. Notably, as demonstrated in Funahashi (2014), the formula (2.17) retains its identical form even when influenced by stochastic volatility, with $\Sigma _t$ and $p_i(t)$ for $i = 1, \ldots , 5$ undergoing adjustments. The functions in the stochastic local volatility model are detailed in Appendix A.

Moreover, approximate closed-form solutions for a wide range of exotic derivatives, including the Barrier option, Asian options, Basket options, and VWAP options, can be obtained using the deterministic functions $\Sigma ^S_t$ and $q_i(t) \ (i=1, \ldots , 5)$. See Funahashi and Higuchi (2018) and Funahashi and Kijima (2014) for detailed discussions. Below, we denote the price of these options by $C^S$ and their approximate closed-form formula by $C^S_\textrm{App}$ in the following sections.

2.2 ANN with asymptotic correction

Let us denote $\pmb {\xi }^\textrm{M} = \{ \xi ^\textrm{M}_1, \xi ^\textrm{M}_2, \ldots \}$ by the input data observed from the market or determined in contract, including interest rate r, strike K, spot asset price $S_0$, and maturity T: $\pmb {\xi }^\textrm{M} = \{ r, K, S_0, T, \ldots \}$. $\pmb {\xi }^\textrm{P} = \{ \xi ^\textrm{P}_1, \xi ^\textrm{P}_2, \ldots \}$ denotes model parameters, for example, the DD model in (2.13) has two model parameters $\pmb {\xi }^\textrm{P}= \{ {\bar{\beta }}, {\bar{\epsilon }} \}$. Then, the option price with strike K and maturity T written on the asset S is given by

$$\begin{aligned} C^S(\pmb {\xi }) = \mathbb {E}[\textrm{e}^{-rT} g(S_T) ] \end{aligned}$$

(2.18)

where $g(\cdot )$ is a payoff function and $\pmb {\xi } = \{ \pmb {\xi }^\textrm{M}, \pmb {\xi }^\textrm{P} \}$.

Various ANN approaches have emerged for derivative pricing and financial asset pricing model calibration. One commonly employed technique involves utilizing deep learning to predict option prices. This process generally follows these steps:

(1)
generate N sets of input vectors $\pmb {\xi }_k$ for $k = 1, 2, \ldots , N$,
(2)
obtain $C^S(\pmb {\xi }_k)$ for $k = 1, 2, \cdots , N$ using numerical methods such as PDF and MC,
(3)
train the ANN model, that is, minimizing the loss function to determine weights and biases in the ANN model using all pairs of inputs and outputs $\{ \pmb {\xi }_k, C^S(\pmb {\xi }_k) \}_{k=1, \ldots , N}$, to obtain a map
$$\begin{aligned} \mathcal {M}_C: \pmb {\xi } \mapsto C^S_\textrm{ANN} \end{aligned}$$
, where $C^S_\textrm{ANN}$ is an ANN prediction of the derivative price $C^S$, and
(4)
predict option price at an arbitrary parameter set $\pmb {\xi }$
$$\begin{aligned} C^S(\pmb {\xi }) \approx C^S_\textrm{ANN}(\pmb {\xi }) \end{aligned}$$
(2.19)

One of the benefits of applying ANNs in the financial sector is that they can split the pricing process into two parts: Offline training steps (1)–(3), which use ANN models to obtain an accurate estimate of option prices, and prediction step (4), which forecasts and calculates option prices online with the trained ANN. The offline training is computationally intensive because it needs abundant numerical simulations to generate training and testing data, whereas the online prediction is fast enough for real-world applications. After the ANN models have learned the connection weights, the network can be reused to predict option prices for new input patterns. Therefore, practitioners can benefit from the online prediction with quick computations in their daily pricing work and perform the ANN training, which consumes much time, offline when they have enough time at hand.

The offline training is done only once or a few times a year, but financial firms handle various products with many pricing models. So, for the offline training, they need to run several hundred thousand to a few million numerical simulations for each product, which demands huge computational costs. Funahashi (2021a) combined the advantages of asymptotic expansion (AE) and ANN, which trained the residual term

$$\begin{aligned} D^S(\pmb {\xi }) = C^S(\pmb {\xi }) - C^S_\textrm{App}(\pmb {\xi }) \end{aligned}$$

between the option price, $C^S(\pmb {\xi })$, and its asymptotic approximation, $C^S_\textrm{App}(\pmb {\xi })$, to improve the stability and approximation accuracy. More specifically, the author proposed a mapping

$$\begin{aligned} \mathcal {M}_D: \pmb {\xi } \mapsto D_\textrm{ANN} \end{aligned}$$

and predicted the option value of arbitrary input $\pmb {\xi }$ to be

$$\begin{aligned} C^S(\pmb {\xi }) \approx C^S_\textrm{App}(\pmb {\xi }) + D^S_\textrm{ANN}(\pmb {\xi }) \end{aligned}$$

(2.20)

If the base approximation is chosen appropriately, the variance of the target outputs $\{ D^S(\pmb {\xi }_k) \}$ for $k = 1, \ldots , N$ is very small compared to that of $\{ C^S(\pmb {\xi }_k) \}$; hence, the prediction and convergence of the mapping $\mathcal {M}_D$ become stable and fast.

Moreover, for instance, using the WIC expansion, the approximate option price is determined by the sum of the product of polynomials by a CDF and PDF of the standard normal distribution,^{Footnote 5}$D^S(\pmb {\xi })$ is a smooth and infinitely differential function. Hence, it is useful not only to improve the efficiency of the training ANN but also to stabilize the computation of the Greeks guaranteed by Hornik et al. (1990).

Consequently, by employing European and Barrier options in the framework of the LSVM in (1.1), Funahashi (2021a) empirically shows that their ANN training and prediction methods demonstrate robustness. Additionally, their approach enables a significant reduction in training set size by a factor ranging from a hundredth to thousands, all while utilizing fewer layers and nodes in the ANN architecture. This outcome indicates that the extensive numerical computations required for derivative price calculations, which constitute a significant portion of the calculation time, can be effectively reduced by a factor of 100–1000. Thus, the computational cost associated with resource-intensive offline procedures can be substantially alleviated. Moreover, this reduction in computation overhead simultaneously improves the stability and accuracy of online predictions of derivative prices.

2.3 ANN with quasi-process correction

ANNs with an asymptotic correction approach require the derivation of an approximate option formula and the prediction of the impact of the residual terms of the approximation option formula using ANNs. Generally, it can improve the ANN prediction as the approximation order increases. Moreover, it can estimate the degree of error, and hence, one can safely choose an appropriate $C^S_\textrm{App}$. Note that $C^S_\textrm{App}$ could be a general approximation formula; Kienitz et al. (2020) and Funahashi (2022) examined SABR models using the approximate implied normal and lognormal volatility derived by Hagan et al. (2002), respectively, and Funahashi (2022) tested the free-boundary SABR model using the approximation formula proposed in Antonov et al. (2015).

However, as the approximation order increases, the calculation becomes tedious and messy, and expansion terms increase exponentially. For exotic derivatives under LSVM, it is generally difficult to obtain an approximation solution. Although Funahashi and Higuchi (2018) derived an approximate pricing formula for the Barrier option under the Heston model using a WIC expansion, it becomes messy and requires many calculations to derive these formulas.

Conversely, Kienitz et al. (2020) consider this method as the control variate (CV) for neural networks. Similar to applications for MC simulations, they used a completely different model, ${\bar{S}}$, for the approximate price to improve the quality of deep learning applied to option pricing problems. They focused on the fact that for each parameter, $\pmb {\xi }_k = \{ \pmb {\xi }^M, \pmb {\xi }^P \}$, of the original process, if the model parameters of the approximation process, $\bar{\pmb {\xi }}_k = \{ \pmb {\xi }_k^M, \bar{\pmb {\xi }}_k^P \}$, are suitably chosen, then a large portion of the price is already mimicked by the approximate price

$$\begin{aligned} D(\pmb {\xi }_k, \bar{\pmb {\xi }}_k) = C^S(\pmb {\xi }_k) - C^{{\bar{S}}}_\textrm{App}(\bar{\pmb {\xi }}_k) . \end{aligned}$$

(2.21)

Motivated by this fact, they generated the mapping

$$\begin{aligned} \mathcal {M}_{{\bar{D}}}: \{ \pmb {\xi }_k, \bar{\pmb {\xi }}_k \} \mapsto D_\textrm{ANN}. \end{aligned}$$

The authors examined the use of the Black-Scholes price as an approximate benchmark for pricing European options in the Heston (1993) model, and similarly, they employed a co-terminal European swaption as an approximation for pricing Bermudan swaptions in the Hull-White mode.

Although the accuracy and speed of convergence are inferior when compared with using the direct approximation formula with the same model proposed in Funahashi (2021a), the ANN with quasi-process correction can be more widely applicable; the error related to the model differences is left to the learning process of the neural network without needing to derive a complex asymptotic formula. However, in most cases, the problem of determining appropriate models for $C^S_\textrm{App}$ and suitable parameters, $\bar{\pmb {\xi }}_k$, are not clear; hence, their method cannot apply to general problems. As will be subsequently discussed, it is essential to select the appropriate $C^S_\textrm{App}$ carefully. If one selects a model with a different distribution than the original one, the convergence will be slow, especially in deep-in-the-money and deep-out-of-the-money cases. Moreover, the prediction will be inferior to direct ANN learning.

Before running the ANN process, $\bar{\pmb {\xi }}_k = \{ \pmb {\xi }^M_k, \bar{\pmb {\xi }}^P_k \}$ is estimated from $\pmb {\xi }_k = \{ \pmb {\xi }^M_k, \pmb {\xi }^P_k \}$ under the conditions, $\pmb {\xi }^M_k = \{r, K, S_0, T, \ldots \}$. However, recall that our ANN first generates a mapping $\mathcal {M}_{{\bar{D}}}$ using training data

$$\begin{aligned} \{ \{ \pmb {\xi }_k, \bar{\pmb {\xi }}_k \}, D(\{ \pmb {\xi }_k, \bar{\pmb {\xi }}_k \}) \}_{k=1, 2, \ldots , N} \end{aligned}$$

where $D(\{ \pmb {\xi }_k, \bar{\pmb {\xi }}_k \})$ is defined in (2.21), and then predicts the option price at an arbitrary parameter set $\{ \pmb {\xi }, \bar{\pmb {\xi }} \}$:

$$\begin{aligned} C^S(\pmb {\xi }) \approx C^{{\bar{S}}}(\bar{\pmb {\xi }}) + D_\textrm{ANN}(\{ \pmb {\xi }, \bar{\pmb {\xi }} \}) . \end{aligned}$$

(2.22)

$\bar{\pmb {\xi }}$ used in training (2.21) should be stably consistent with that used in prediction (2.22); otherwise, because the base approximation oscillates, the predicted value jumps and produces a poor result.

Notably, for off-line training, one can utilize prices of the target derivatives, $C^S(\pmb {\xi }_k)$, to calibrate $\bar{\pmb {\xi }}_k$ because one has already derived training data using numerical simulation. However, recall that our original goal is to find $C^S(\pmb {\xi }_k)$ from $\pmb {\xi }_k$ and $\bar{\pmb {\xi }}_k$ using the mapping $\mathcal {M}_{{\bar{D}}}$; hence, for online prediction, one does not have $C^S(\pmb {\xi }_k)$ and cannot obtain $\bar{\pmb {\xi }}_k$ using calibration. Hence, by some means, we must estimate the appropriate parameter $\bar{\pmb {\xi }}_k$ from parameter $\pmb {\xi }_k$ without using price. Moreover, recall that only one strike is available in $\pmb {\xi }_k$ to generate $\bar{\pmb {\xi }}_k$. Hence, even if one obtains suitable implied volatility at the strike, K, $\bar{\pmb {\xi }}^P$ changes from strike to strike because the pricing model incorporates the volatility skew and smile across different strikes, which causes the base approximation to be unstable.

In the next section, we propose a replication strategy,

$$\begin{aligned} S \approx {\bar{S}}, \end{aligned}$$

that can minimize the error and unsuitability for the ANN training and prediction. In the following section, we will show that this approach is free from the contract parameters, including strikes; hence, it fits our purpose.

3 Proposed method

One aim of this study is to establish a unified approach to determining the suitable parameters and appropriate models for the ANN with quasi-process correction. To achieve this goal, we use a replication technique proposed by Funahashi (2021b) to replicate a complex model $S_t$ from a simpler model ${\bar{S}}_t$, for which the closed-form solution of the target contingent claim is available.

From (2.9), if one can set the parameters of the simpler model, ${\bar{S}}_t$, such that

$$\begin{aligned} \Sigma _t^{{\bar{S}}} = \Sigma _t^S \ \ \text{ and } \ \ q_i^{{\bar{S}}}(t) = q_i^{S}(t) \end{aligned}$$

(3.23)

for $i = 1, \ldots , 5$, the characteristic functions of the target process, $S_t$, can be approximated by that of ${\bar{S}}_t$, that is,

$$\begin{aligned} \Psi ^S(\xi ) \approx \Psi ^{{\bar{S}}}(\xi ). \end{aligned}$$

From the unique relationship between the distribution function and the characteristic function, the marginal distribution of $S_t$ can be approximated by that of ${\bar{S}}_t$ and the European call option price of S with maturity T, and strike K is approximated by that of ${\bar{S}}$ because

$$\begin{aligned} C^S_\textrm{App}(t) = C^{{\bar{S}}}_\textrm{App}(t). \end{aligned}$$

Before proceeding, we review the assignment of the functions $\Sigma ^S_t$ and $q^S_i(t)$ for $i = 1, \ldots , 5$, and determine the priority of the equation (3.23) with an analysis based on a small volatility explanation. From (2.6), $a_1(t)$ is $O(\epsilon )$, which follows a normal distribution $N(0, \Sigma ^S_t)$, where $\Sigma ^S_t$ is $O(\epsilon ^2)$. Hence, if $\epsilon $ is sufficiently small, then $S_t$ is approximated by the normal process

$$\begin{aligned} S_t \approx F(0,t) ( 1 + a_1(t) ) \end{aligned}$$

(3.24)

with mean F(0, t) and variance $F(0,t)^2 \Sigma _t$. However, $a_2(t)$ and $a_3(t)$ are $O(\epsilon ^2)$ and $O(\epsilon ^3)$, respectively, which increases the accuracy of $X_t$ (and thus, $S_t$) when $\epsilon $ is not negligible. $q^S_1(t)$, $q^S_2(t)$, and $q_3^S(t) - q_5^S(t)$ are derived from the conditional expectations $\mathbb {E}[ a_{2}(t) | a_{1}(t) = x ]$, $\mathbb {E}[ a_{3}(t) | a_{1}(t) = x ]$, and $\mathbb {E}[ a^2_{2}(t) | a_{1}(t) = x ]$ in (2.9), whose asymptotic orders are $O(\epsilon ^2)$, $O(\epsilon ^3)$, and $O(\epsilon ^4)$, respectively. These deterministic functions correct the characteristic function, $\Psi (\xi )$, by including the influence of $a_2(t)$, $a_3(t)$, and $a_2^2(t)$, respectively. Thus, the functions $q_1(t)$, $q_2(t)$, and $q_3(t)$ determine the size of the corrections of the asymptotics of the order $O(\epsilon ^2)$, $O(\epsilon ^3)$, and $O(\epsilon ^4)$, respectively. Therefore, the first priority is matching $\Sigma _t$ and $q_1(t)$, followed by $q_2(t)$, and then $q_3(t) - q_5(t)$.

As an example of the simple process, the DD model in (2.13) has only two parameters, ${\bar{\beta }}$ and ${\bar{\epsilon }}$, and hence, no solution exists for the six equations in (3.23). Therefore, we match the first two equations: $\Sigma ^{{\bar{S}}}_t = \Sigma ^S_t$ and $q^{{\bar{S}}}_1 = q^S_1$. Recall that because

$$\begin{aligned} \Sigma ^{{\bar{S}}}_t = ({\bar{\epsilon }}^{*})^2 t, \quad q_1^{{\bar{S}}}(t) = \frac{1}{2} {\bar{\beta }}^{*} ({\bar{\epsilon }}^{*})^4 t^2 , \end{aligned}$$

(3.25)

optimal ${\bar{\epsilon }}^*$ and ${\bar{\beta }}^*$ can be explicitly determined as

$$\begin{aligned} {\bar{\epsilon }}^* = \sqrt{\frac{\Sigma ^S_t}{t}}, \qquad {\bar{\beta }}^* = \frac{2 q_1^S(t)}{(\Sigma ^S_t)^2} . \end{aligned}$$

(3.26)

The European option price under the DD model can be analytically obtained as

$$\begin{aligned} C_\textrm{DD}^{{\bar{S}}}(T, K; {\bar{\beta }}, {\bar{\sigma }}) = \textrm{e}^{-\int \limits _0^t r(s) \textrm{d}s} \textrm{Bl} \left( \frac{F(0,T)}{{\bar{\beta }}}, K+\frac{1-{\bar{\beta }}}{{\bar{\beta }}} F(0,T), {\bar{\beta }} {\bar{\epsilon }} \sqrt{T} \right) , \end{aligned}$$

(3.27)

where

$$\begin{aligned} \textrm{Bl}(F,K,v) = F(0,T) \Phi (d_1(F,K,v)) - K \Phi (d_2(F,K,v)) , \end{aligned}$$

(3.28)

with

$$\begin{aligned} d_1(F, K, v) = \frac{\log \left( \frac{F}{K} \right) + \frac{v^2}{2}}{v}, \quad d_2(F, K, v) = \frac{\log \left( \frac{F}{K} \right) - \frac{v^2}{2}}{v}. \end{aligned}$$

Hence, the European option price under the SDE (2.1) is approximated by

$$\begin{aligned} C^{S}(S,T) \approx C_\textrm{DD}^{{\bar{S}}}(T, K; {\bar{\beta }}^*, {\bar{\sigma }}^*) \end{aligned}$$

where the error can be explicitly estimated as

$$\begin{aligned}{} & {} C^{S}(S,T) - C_\textrm{DD}^{{\bar{S}}}(T, K; {\bar{\beta }}^*, {\bar{\sigma }}^*) \nonumber \\ {}{} & {} \quad \approx C^{S}_\textrm{App}(\pmb {\xi }) - C^{{\bar{S}}}_\textrm{App}(\bar{\pmb {\xi }}) \nonumber \\{} & {} \quad = \frac{S_0 n(\widetilde{K};0, \Sigma _t)}{2 \Sigma _t^{4}} \bigg [ D_3(t) (\widetilde{K}^{4}-6 \widetilde{K}^{2} \Sigma _t + 3 \Sigma _t^{2}) \nonumber \\{} & {} \qquad +\Sigma _t^{2} \left( D_4(t) + 2 D_2(t) \right) \left( \widetilde{K}^{2} - \Sigma _t \right) +\, D_5(t) \Sigma _t^4 \bigg ] \end{aligned}$$

(3.29)

where $\Sigma _t = \Sigma ^S_t = \Sigma ^{{\bar{S}}}_t$ and $D_i(t) = q_i^S(t) - q_i^{{\bar{S}}}(t)$ for $i =2, \ldots , 5$. Note that the leading terms in (2.17) vanish in (3.29).

The Black-Scholes model is another useful example

$$\begin{aligned} \frac{\textrm{d}S_t}{S_t} = r \textrm{d}t + {\bar{\epsilon }} \textrm{d}W^S_t . \end{aligned}$$

(3.30)

In this case, only one parameter can be controlled; hence, this model is not very accurate at explaining complex models. However, it is very flexible and can easily match the variance, $\Sigma ^{{\bar{S}}}_t = \Sigma ^S_t$, with an optimal parameter ${\bar{\epsilon }}^*$:

$$\begin{aligned} {\bar{\epsilon }}^* = \sqrt{\frac{\Sigma ^S_t}{t}}. \end{aligned}$$

(3.31)

In Sect. 5, we compare a direct mapping, $\mathcal {M}_C $, and our method, $\mathcal {M}_D$ and $\mathcal {M}_{{\bar{D}}}$, with three base approximations: Approximate close-form solution using Wiener-Itô chaos expansion, and replication method using DD and BS models as base approximations.

4 Artificial neural networks

ANNs are a type of machine learning that are developed based on biological neural networks. The perceptron, which is the prototype of ANNs, was proposed by Rosenblatt (1958), and later the backpropagation algorithm was developed and widely known by Rumelhart et al. (1986), which made it possible to efficiently perform the calculations necessary for updating the parameters in the learning of multilayer neural networks. Today, ANNs are successful in many fields. The background of this success includes factors such as the development of the internet and infrastructure, the availability of large-scale data that can be easily obtained to train neural networks without overfitting for complex real-world problems, and the dramatic improvement of computer capabilities such as GPUs and multicore CPUs. For the history of multilayer neural networks, we cite Okatani (2015) and the references therein.

4.1 Feedforward neural network

Figure 1 shows the outline of a feedforward neural network. A perceptron (Fig. 1a) is a simple unit of a neural network that receives the input $\pmb {x}=\{x_1, \ldots , x_n \}$, multiplies it by the weights $ W=\{w_1, \ldots , w_n \}$ that represent the strength of the connections between the layers, and adds a term called bias b to perform a linear transformation

$$\begin{aligned} u = \sum _{i=1}^n w_i x_i + b. \end{aligned}$$

Subsequently, it applies a nonlinear activation function f to the value and computes the output

$$\begin{aligned} z=f(u). \end{aligned}$$

The activation function determines whether a neuron should be activated or not. Some examples of activation functions are sigmoid, tanh, ReLU, and softmax. Refer to Nwankpa et al. (2018) for an example.

A feedforward neural network is a neural network that has a structure in which units (perceptrons) arranged in layers are connected between adjacent layers, and information propagates from the input side to the output side in one direction without any feedback loops. Hence, it is sometimes called a multi-layer perceptron. The feedforward neural network in Fig. 1b consists of L layers, and we denote each layer from left to right as $l=2, \ldots , L-1$. In this case, $l=1$ is the input layer, $l=L$ is the output layer, and $l=2, \ldots , L-1$ are called hidden layers. Moreover, we assume that each layer l has $n_l$ nodes arranged in it. From now on, we put the layer number on the right shoulder of each variable to distinguish the input and output of each layer. When the input $z^{(1)} = \pmb {x}$ is given to this network, the $l-$th $(l=2, \ldots , L - 1)$ hidden layer receives the output $\pmb {z}^{(l)}$ from the previous layer and calculates

$$\begin{aligned} \pmb {u}^{(l+1)} = W^{(l+1)} \pmb {z}^{(l)} + \pmb {b}^{(l+1)} \end{aligned}$$

(4.1)

and then applies the activation function f to obtain the output

$$\begin{aligned} \pmb {z}^{(l+1)} = f(\pmb {u}^{(l+1)}) \end{aligned}$$

(4.2)

This way, information is propagated from the input layer through the hidden layer to the output layer, resulting in the final output being obtained as

$$\begin{aligned} \pmb {y} = \pmb {z}^{(L)}. \end{aligned}$$

Therefore, a feedforward neural network can be regarded as a deep nested function that gives the output

$$\begin{aligned} \pmb {y}(\pmb {x}; \pmb {w}) = {\bar{f}}(W^{(L)}f(W^{(L-1)}f(W^{(L-2)} \cdots f(W^{(2)} \pmb {x}+\pmb {b}^{(2)})+ \pmb {b}^{(L-1)}) + \pmb {b}^{(L)}) \end{aligned}$$

(4.3)

depending on the values of the weights $W^{(l+1)}$ and biases $\pmb {b}$ between the layers, given the input $\pmb {x}$. Here, $\pmb {w}=\{w_1, \ldots , w_P \}$ is a vector consisting of all elements of $W^{(l+1)}$ and $\pmb {b}$ and ${\bar{f}}$ is the activation function for output layer.

In neural networks, when a training data consisting of pairs of input $\pmb {x}_i$ and output $\pmb {y}_i$ is given

$$\begin{aligned} \{ (\pmb {x}_1, \pmb {y}_1), (\pmb {x}_2, \pmb {y}_2), \ldots , (\pmb {x}_N, \pmb {y}_N), \} \end{aligned}$$

(4.4)

the goal is to adjust $\pmb {w}$ to reproduce these input–output pairs

$$\begin{aligned} \pmb {y}(\pmb {x}_i; \pmb {w}) \approx \pmb {y}_i , \end{aligned}$$

(4.5)

and to estimate an appropriate output $\pmb {y}$ for an unknown input $\pmb {x}$. In this case, an error function $E(\pmb {w})$ is used as a measure of the closeness of (4.5). Additionally, it is common to perform data normalization as a preprocessing step, as it can hinder learning if a bias in the training data is observed. The most widely used method involves normalizing the input–output data $\pmb {x}_i=\{ x_{i1}, \ldots , x_{in} \} \ (i=1, \ldots , N)$ by

$$\begin{aligned} x_{ij} = \frac{x'_{ij} - \bar{x}'_j }{\sigma _j}, \quad \bar{x}'_j = \frac{1}{N} \sum _{i=1}^N x'_{ij}, \ \ \sigma _j = \sqrt{ \frac{1}{N} \sum \nolimits _{i=1}^N (x'_{ij} - \bar{x}'_j)^2 }. \end{aligned}$$

for each component $j \ (j=1, \ldots , n)$ of the original input–output vector $\pmb {x}'_i=\{ x'_{i1}, \ldots , x'_{in} \} \ (i=1, \ldots , N)$.

In this study, we use the squared error as the error function

$$\begin{aligned} E(\pmb {w}) = \sum _{i=1}^N E_i (\pmb {w}), \quad E_i (\pmb {w})= \frac{1}{2} \Vert \pmb {y}_i - \pmb {y}(\pmb {x}_i; \pmb {w}) \Vert ^2 . \end{aligned}$$

(4.6)

In other words, the purpose of learning is to find

$$\begin{aligned} \pmb {w}^* = \mathop {\arg \max }_{\pmb {w}} E(\pmb {w}) . \end{aligned}$$

(4.7)

4.2 Backpropagation

Generally, an optimization problem (4.8) uses methods such as Newton–Raphson, simplex, and Levenberg–Marquardt that iteratively determine the model parameters until a stopping criterion is met. However, in general, in neural network problems, the scale of optimization becomes large, making it difficult to calculate second- or higher-order derivatives. Therefore, gradient descent methods, which only require first derivatives, are used. In gradient descent methods, starting from a preset initial value $\pmb {w}^0$, the current weight $\pmb {w}^{m}$ is updated repeatedly

$$\begin{aligned} \pmb {w}^{(m+1)} = \pmb {w}^{(m)} - \epsilon \Delta E, \quad \Delta E = \left\{ \frac{\partial E}{\partial w_1} , \ldots , \frac{\partial E}{\partial w_P} \right\} ^\textrm{T} \end{aligned}$$

(4.8)

to search for a local minimum point $\pmb {w}^*$. Here, $\epsilon $ is a coefficient that determines the size of the update and is called the learning rate.

In (4.6), the error function was calculated using all the training data, but in each step m, a suitable set $A_m $ (called a mini-batch, $|A_m|=M_m \le N$) is selected from the training data, and the weight is updated in that unit.

$$\begin{aligned} \pmb {w}^{(m+1)} = \pmb {w}^{(m)} - \epsilon \Delta E_{A_m}(\pmb {w}), \quad E_{A_m}(\pmb {w}) = \frac{1}{M_m} \sum _{i \in A_m} E_i(\pmb {w}). \end{aligned}$$

(4.9)

Particularly, the method of updating the parameters with the size of the mini-batch set to $(M_m =1$ is called stochastic gradient descent (SDG)). In (4.8), the objective function to be minimized is always the same, so once it falls into a local solution, it cannot escape from there, but in SDG, by randomly selecting samples in each step m, the objective function is different every time; hence, it can greatly reduce the risk of falling into an undesirable local minimum point. Particularly, $M_m = 8 - 128$ is often used to enjoy the advantages of SDG and the benefits of efficiently implementing parallel computing.

As discussed thus far, the efficient calculation of the gradient of the error function is crucial for performing gradient descent. However, in deep layers (layers close to the input) of a feedforward neural network, the gradient of the error function $(\frac{\partial E(\pmb {w})}{\partial w^{(l)}_{ji}}$ and $\frac{\partial E(\pmb {w})}{\partial b^{(l)}_{j}})$ can become intricate. The backpropagation method offers an efficient approach to computing these gradients. We consider differentiating $E_n(\pmb {w})$ regarding the weight of the lth layer, and from the chain rule we get

$$\begin{aligned} \frac{\partial E_n(\pmb {w})}{\partial w^{(l)}_{j,i}} = \frac{\partial E_n(\pmb {w})}{\partial u^{(l)}_{j}} \frac{\partial u^{(l)}_{j}}{\partial w^{(l)}_{j,i}} \end{aligned}$$

(4.10)

However, because $u^{(l)}_{j} = \sum _{i=1}^{n_{l-1}} w^{(l)}_{j,i} z^{(l-1)}_{i} + b^{(l)}_j$, the second term on the right-hand side is given by

$$\begin{aligned} \frac{\partial u^{(l)}_{j}}{\partial w^{(l)}_{j,i}} = z^{(l-1)}_i . \end{aligned}$$

(4.11)

Concurrently, when considering the first term on the right-hand side, the effect of a change in $u^{(l)}_{j}$ on $E_n(\pmb {w})$ is transmitted through $u^{(l+1)}_{k} = \sum _{j=1}^{n_{(l)}} w^{(l+1)}_{k,j} f(u^{(l)}_{j}) + b^{(l+1)}_k$; hence, using the chain rule again, we obtain

$$\begin{aligned} \frac{\partial E_n(\pmb {w})}{\partial u^{(l)}_{j}} = \sum _{k=1}^{n_{l+1}} \frac{\partial E_n(\pmb {w})}{\partial u^{(l+1)}_{k}} \frac{\partial u^{(l+1)}_{k}}{\partial u^{(l)}_{j}} = \sum _{k=1}^{n_{l+1}} \frac{\partial E_n(\pmb {w})}{\partial u^{(l+1)}_{k}} w^{(l+1)}_{k,j} f'(u^{(l)}_{j}) \end{aligned}$$

If we define $\delta ^{(l)}_j = \frac{\partial E_n(\pmb {w})}{\partial u^{(l)}_{j}}$, then

$$\begin{aligned} \delta ^{(l)}_j= f'(u^{(l)}_{j}) \sum _{k=1}^{n_{l+1}} \delta ^{(l+1)}_k w^{(l+1)}_{k,j} . \end{aligned}$$

(4.12)

Therefore, substituting (4.11) and (4.12) into (4.10), we obtain

$$\begin{aligned} \frac{\partial E_n(\pmb {w})}{\partial w^{(l)}_{j,i}} = \delta ^{(l)}_j z^{(l-1)}_{i} \end{aligned}$$

(4.13)

Ultimately, we observe that the effect of the variation of $w^{(l)}_{j,i}$ which represents the strength of the connection between layer l and $l+1$, on $\frac{\partial E_n(\pmb {w})}{\partial w^{(l)}_{j,i}}$ is determined only by the delta of unit j, $\delta ^{(l)}_j$, and the output of unit i in layer $l-1$, $ z^{(l-1)}_{i}$. Notably, because the delta of layer l, $\delta ^{(l)}_j$, can be calculated according to (4.12) if the delta of layer $l+1$ is obtained. This can be repeated sequentially to the output layer, and because the delta of the output layer

$$\begin{aligned} \delta ^{(L)}_j = \frac{\partial E_n(\pmb {w})}{\partial u^{(L)}_{j}} \end{aligned}$$

is given, we can calculate the delta of any layer. Therefore, from (4.12) or (4.13), we can calculate the desired gradient (4.9). In this calculation method, because the delta is propagated from the output layer to the input layer, the error is corrected in the opposite direction of forward propagation; hence, this method is termed backpropagation.

5 Numerical example

In this section, we compare the accuracy and effectiveness of the ANN prediction of our method based on asymptotic correction with those of the quasi-process corrections using numerical examples. Therefore, we examine call option and barrier option prices under an LSVM

$$\begin{aligned} \left\{ \begin{array}{rcl} \displaystyle \textrm{d}S_t &{}=&{} r(t) S_t \textrm{d}t + v_t \left[ \beta S_t + (1 - \beta ) F(0,t) \right] \textrm{d}W^S_t, \\ \textrm{d}v_t &{}=&{} \left( \theta (t) - \kappa (t) v_t \right) dt + \nu v_t \textrm{d}W^v_t \end{array} \right. \end{aligned}$$

(5.14)

The ANN approximations of the implied volatilities (respectively, barrier option prices) using our proposed mapping $\mathcal {M}_D$ and $\mathcal {M}_{{\bar{D}}}$ are denoted by $\sigma ^D_\textrm{ANN}$ (respectively, $B^D_\textrm{ANN}$), while those using the direct mapping $\mathcal {M}$ are denoted by $\sigma _\textrm{ANN}$ (respectively, $B_\textrm{ANN}$).

The implied volatilities $\sigma _\textrm{MC}$, $\sigma _\textrm{WIC}$, $\sigma _\textrm{DD}$, and $\sigma _\textrm{BS}$ are calculated using the MC simulation, the Wiener-Itô chaos expansion, the replicated DD model, and the mimicked BS model, respectively. Similarly, the barrier option prices $B_\textrm{MC}$, $B_\textrm{WIC}$, and $B_\textrm{BS}$ are calculated using those methods.

In this study, we generate the training and testing data by following the method used in Funahashi (2023). Notably, our method trains the ANN model to predict implied volatilities, $\mathcal {M}_D: \xi \mapsto \sigma ^D_\textrm{ANN}(\xi )$, using the differences between Monte Carlo and approximate implied volatilities, $D(\xi _k) = \sigma _\textrm{MC}(\xi _k) - \sigma _\textrm{App}(\xi _k)$, for $k = 1, \ldots , N$ with respect to the parameters $\xi _k = \{ (S_0)_k, r_k, \beta _k, (v_0)_k, \nu _k, \rho _k, \kappa _k, \theta _k, K_k \}$. To prepare N sets of vectors $\{ \xi _k \}$ for $k = 1$ to N, we first generate M sets of vectors $\{ v_l \}_{l = 1, \ldots , M}$, where $M=N/21$. Each $v_l$ is a vector of 9 elements, namely $(S_0)_l, r_l, \beta _l, (v_0)_l, \nu _l, \rho _l, \kappa _l, \theta _l$. The elements of each $v_l$ are generated from a uniform random distribution with the given ranges.

To generate appropriate strikes, we do not use fixed values because, depending on the combination of the 9 elements, the volatility of the underlying asset becomes extremely small or large. Instead, we use each $v_l$ to run an MC simulation with W trials and then set

$$\begin{aligned} K_{1}= & {} \max (\mu - 2 \sqrt{V}, 0.6 F(0,T_l)), \end{aligned}$$

(5.15)

$$\begin{aligned} K_{21}= & {} \min (\mu + 2 \sqrt{V}, 1.5 F(0,T_l)), \end{aligned}$$

(5.16)

and $K_k = K_1 + (k-1) \Delta K$ for $k = 2, \ldots , 19$ where $\Delta K = \frac{K_{21} - K_1}{20}$. For each sample path $w = 1, \ldots , W$, we generate $F(\omega _w; 0,T)$ and compute its mean $\mu = E[F(0,T)]$ and variance $V = E[(F(0,T) - \mu )^2]$. More specifically, we compute M Monte Carlo simulations to create training and testing data of size $N = M \times 21$. For each Monte Carlo simulation, W trials are run. The training and testing data $\xi _{i,k} = \{ v_i, K_k \}$ and target values $\sigma _\textrm{MC}(\xi _{i,k})$ (respectively, $B_\textrm{MC}(\xi _{i,k})$) are obtained simultaneously.

For each test, we run $M = 1000$–20, 000 MC simulations to create $N = 21,000$–420, 000 datasets and split the datasets into an 80/20 ratio, where 80% is used for training and 20% is used for testing, which is a common practice in data science. All of the results of the following tests are created by the out-of-sample inputs, that is, the latter 20%, which is not used for the training.

In the training stage, the hyper-parameters are set with Adam as the optimizer, ReLU as the activation function,^{Footnote 6} six hidden layers, and 32 nodes for each layer. The epoch length and batch sizes are set to 100 and 128, respectively.

5.1 Call option

We first examine call option prices under the LSVM (5.14) using the WIC approximation, mimicking the DD model with optimal parameters in (3.26), and mimicked the BS model with parameters (3.31) for the base approximations. Here, we use the Euler-Maruyama scheme with W = 500,000 trials^{Footnote 7} and 100 simulation time steps.

We generated uniformly distributed random vectors $\pmb {\xi }^\textrm{P} = \{ \beta , v_0, \nu , \rho , \kappa , \theta \}$ and $\pmb {\xi }^\textrm{M} = \{ T, S_0, r \}$ within the range given in Table 1. On the other hand, strikes are computed by following the strategy discussed in the introduction of this section to create $\pmb {\xi } = \{ \pmb {\xi }^\textrm{P}, \pmb {\xi }^\textrm{M} \}$ where $\pmb {\xi }^\textrm{M} = \{ T, S_0, r, K \}$. $\sigma _\textrm{MC}(\pmb {\xi })$ and $\sigma _\textrm{WIC}(\pmb {\xi })$ were computed using $\pmb {\xi }$. To ensure the stability of the estimation, we throw out any implied volatilities that are too small or too big

$$\begin{aligned} \sigma _\textrm{A}(\xi _l) < 0.05 \ \textrm{or} \ \sigma _\textrm{A}(\xi _l) > 0.8 . \end{aligned}$$

(5.17)

Therefore, for the actual ANN training and testing, we use $N' (\le N)$ data sets, $\{ \xi _n \}_{n = 1, \ldots , N'}$, which exclude these cases.

Table 1 Upper and lower limits of the input model parameters $\{ T, S_0, r, \beta , v_0, \nu , \rho , \kappa , \theta \}$ generated by uniform random valuables

Full size table

For the WIC approximation, the deterministic functions of the LSVM, $p_1$ to $p_7$ defined in Appendix A, can be explicitly computed as

$$\begin{aligned} p_1= & {} \frac{e^{-s (2 \kappa +r)} \left( p_{1a} +p_{1b}-p_{1c}+p_{1d}+p_{1e}-p_{1f} \right) }{\kappa ^3}, \ p_2 = \frac{\beta e^{-\kappa s} \left[ \theta \left( e^{\kappa s}-1\right) +\kappa v_0 \right] }{\kappa }, \\ p_3= & {} e^{-\kappa s}, \ p_4 = \nu \left[ \frac{\theta \left( e^{\kappa s}-1\right) }{\kappa } + v_0 \right] , p_5 = \frac{\beta e^{-\kappa s} \left[ \theta \left( e^{\kappa s}-1\right) +\kappa v_0 \right] }{\kappa }, \ p_6 = 0, \\ p_7= & {} \left( \beta +e^{r s}-1\right) e^{-s (\kappa +r)}, \ p_8 = (\beta -1) e^{-\kappa s} \left[ \frac{\theta \left( e^{\kappa s}-1\right) }{\kappa } + v_0 \right] \end{aligned}$$

where $p_{1a} = \theta e^{s (2 \kappa +r)} \left( \theta \nu \rho +\kappa ^2\right) $, $p_{1b} = \kappa e^{s (\kappa +r)} \left[ -2 \theta ^2 \nu \rho s+\theta \kappa (2 \nu \rho s v_0-1)-2 \theta \nu \rho v_0 + \kappa v_0 (\kappa +\nu \rho v_0) \right] $, $p_{1c} = \nu \rho e^{r s} (\theta -\kappa v_0)^2$, $p_{1d} = (\beta -1) \theta ^2 \nu \rho e^{2 \kappa s}$, $p_{1e} = (\beta -1) \kappa \nu \rho e^{\kappa s} \left( -2 \theta ^2\,s+2 \theta v_0 (\kappa s-1)+\kappa v_0^2\right) $, and $p_{1f} = (\beta -1) \nu \rho (\theta -\kappa v_0)^2$. The optimal DD parameters, $\bar{\pmb {\xi }}^\textrm{P}_i = \left\{ {\bar{\epsilon }}^*, {\bar{\beta }}^* \right\} $, and BS parameter, $\bar{\pmb {\xi }}^\textrm{P}_i = \left\{ {\bar{\epsilon }}^* \right\} $, can be explicitly determined using (3.26) and (3.31), respectively. $\sigma _\textrm{DD}(\bar{\pmb {\xi }}_i)$^{Footnote 8} and $\sigma _\textrm{BS}(\bar{\pmb {\xi }}_i)$ are computed using the optimized parameters, where $\bar{\pmb {\xi }}_i = \{ \pmb {\xi }^\textrm{M}_i, \bar{\pmb {\xi }}^\textrm{P}_i \}$.

Here, we generate N = 21000, 210000, and 420,000 (M = 1000, 10,000, and 20,000 (number of MC simulations) $\times $ 21 (strikes per one trial)) sets of $\xi $ and remove 1513, 12,327, and 25,724 data sets (i.e., $N'$ =19,487, 197,673, and 394,276), respectively, following the condition (5.17). We use the MC scheme with 500,000 trials and 100 simulation time steps to compute $\sigma _\textrm{MC}$. A single Monte Carlo trial requires approximately 2.46 s to complete on a PC with an Intel Core i9-10980XE CPU with 18 cores and 36 threads. The test is performed under a multi-threaded application running on a multi-core processor with 20 cores, and it takes 41 min to compute M = 20,000 MC simulations. Whereas, the computational costs to compute the WIC approximation, replicated DD, and replicated BS models for a call option are listed in Table 3, which are swift enough for practical usage.

Figures 2 and 3 compare the implied volatilities of our proposed methods and MC results for M = 1000 (N =21,000) and M =10,000 (N = 210,000), respectively. The upper left-, upper right-, lower left-, and lower right-hand panels plot the implied volatilities $\sigma _\textrm{ANN}$ v.s. $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(WIC)$ v.s. $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(DD)$ v.s. $\sigma _\textrm{MC}$, and $\sigma ^D_\textrm{ANN}(BS)$ v.s. $\sigma _\textrm{MC}$ respectively. The upper left-, upper right-, lower left-, and lower right-hand panels of Figs. 4, 5 and 6 show the frequency histograms of the differences between $\sigma _\textrm{ANN}$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$ and $\sigma _\textrm{MC}$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ and $\sigma _\textrm{MC}$, respectively. Here, $\sigma _\textrm{ANN}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ represent the implied volatilities calculated by ANN through direct mapping $\mathcal {M}$, with WIC correction, using DD model as quasi-process correction, and using BS model as quasi-process correction, respectively. More specifically, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ is derived using mapping $\mathcal {M}_D$, while $\sigma ^D_\textrm{ANN}(\mathrm DD)$ and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ are computed using mapping $\mathcal {M}_{{\bar{D}}}$. Recall that we used the test data, which is $20\%$ in the second half of data in N.

To understand more intuitively, we compare the implied volatilities calculated by $\sigma _\textrm{MC}$, $\sigma _\textrm{ANN}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ under various test data sizes, using the parameters given in Table 2. Figures 13, 14, and 15 in Appendix C^{Footnote 9} show the cases where the test data, that is, the latter 20% of $N=21,000, 210,000,$ and 420, 000 (i.e., $M = 1,000, 10,000,$ and 20, 000), respectively, are used.

Table 2 Parameter sets of the comparative statics used in Figs. 13, 14 and 15. The table shows the values of the parameters T, $S_0$, r, $\beta $, $v_0$, $\nu $, $\rho $, $\kappa $, $\theta $ for each of the four scenarios considered in the analysis. The scenarios are labeled as A, B, C, and D

Full size table

As observed from Figs. 2–6 and 13–15, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ converges most swiftly. From Fig. 13, it is evident $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ mostly converges to the $\sigma _\textrm{MC}$ even in the case of $M = 1,000$, while other methods do not. In $\sigma ^D_\textrm{ANN}(\mathrm WIC)$, the range of error is already kept within $1\%$ even with $N=21,000$. When N is set to 210, 000, $\sigma ^D_\textrm{ANN}(\mathrm DD)$ converges next, and when N is set to 420, 000, $\sigma ^D_\textrm{ANN}(\mathrm BS)$ converges. Conversely, $\sigma _\textrm{ANN}$ surpasses $1\%$ error even with $N = 420,000$. From Fig. 14, $\sigma ^D_\textrm{ANN}(\mathrm DD)$ has converged but $\sigma ^D_\textrm{ANN}(\mathrm BS)$ does not in Case A. From Fig. 15, $\sigma ^D_\textrm{ANN}(\mathrm BS)$ has converged but $\sigma _\textrm{ANN}$ still has some error in Cases B, C, and D.

As the distribution of the model that we approximate becomes more similar to the original one, we can reduce the amount of training data for ANN learning. The WIC expansion is the closest one because it approximates the original distribution itself. Moreover, using the DD model that approximates the original distribution well can reduce the training data much more than using the BS model with the log-normal distribution. However, it is remarkable that even the BS model with a very different distribution could lower the data amount compared to directly using ANN learning. We provide a detailed discussion in Sect. 6.

These results can also be confirmed from Table 3, which analyzes the mean, variance, and computational times of online prediction and offline learning for the differences between $\sigma _\textrm{ANN}$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$ and $\sigma _\textrm{MC}$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ and $\sigma _\textrm{MC}$ with respect to N. In the table, “times (online) $D_\textrm{ANN}$” and “times (online) $\sigma _\textrm{App}$” show the computational times of the online prediction and approximation methods, respectively, whereas “time (offline)” indicates the computational times of the offline learning (excluding the numerical simulation). The direct mapping $\mathcal {M}$ converges to the MC result as N increases, but the convergence speed is slower than the mapping $\mathcal {M}_D$. This is consistent with the findings of Funahashi (2021a). To conduct a more detailed analysis, in Appendix D, we assess the impact and performance of both the new and previous methods across a range of ANN configurations, including different activation functions, numbers of nodes, and numbers of hidden layers.

Table 3 Mean and variance of the differences between $\sigma _\textrm{ANN}$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ and $\sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$ and $\sigma _\textrm{MC}$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$ and $\sigma _\textrm{MC}$. The number of Monte Carlo simulations performed is $M= 1,000, 10,000,$ and 20, 000 (i.e., $N = 21,000, 210,000,$ and 420, 000, respectively). “times (online) $D_\textrm{ANN}$” and “times (online) $\sigma _\textrm{App}$” show the computational times of the online prediction and approximation methods, respectively, whereas “time (offline)” indicates the computational times of the offline learning

Full size table

5.2 Barrier option

This subsection examines up-and-in barrier options as an example of an exotic derivative. These options cannot be exercised until the price of the underlying asset reaches or exceeds a predetermined barrier level, B. Once the barrier level is reached, the option becomes exercisable, and the holder can buy or sell the underlying asset at the strike price, depending on whether the option is a call or put.

We use the approximate Barrier option formula, $B_\textrm{WIC}(\pmb {\xi })$, proposed by Funahashi and Higuchi (2018). The option price of an up-and-in barrier option with barrier level B, maturity T, and strike K is approximated by

$$\begin{aligned} \textrm{UI}(T,K)= & {} \textrm{e}^{-\int \limits _0^T r(s) \textrm{d}s} \bigg [ \frac{\textrm{e}^{\Omega _T} }{2 \sqrt{2 \pi } \Sigma _T} \left( \textrm{e}^{-\frac{(\omega ^1_T(K) - {\dot{\omega }}_T)^2}{2T}} X_1(T) - \textrm{e}^{-\frac{(\omega ^1_T(B) - {\dot{\omega }}_T)^2}{2T}} X_2(T) \right) \nonumber \\{} & {} + \frac{\textrm{e}^{\Omega _T}}{2 \Sigma _T} X_3(T) \left( \Phi \left( \frac{\omega ^1_T(B) - {\dot{\omega }}_T}{\sqrt{T}} \right) - \Phi \left( \frac{\omega ^1_T(K) - {\dot{\omega }}_T}{\sqrt{T}} \right) \right) \nonumber \\{} & {} + \frac{F(0,T)}{\sqrt{2 \pi } \Sigma _T^{\frac{5}{2}}} \textrm{e}^{-\frac{{\bar{B}}^2}{2 \Sigma _T}} \left\{ {\bar{B}}^2 ({\bar{B}} + {\bar{K}}) q(T) - {\bar{K}} q(T) \Sigma _T + \Sigma _T^3 \right\} \nonumber \\{} & {} + F(0,T) {\bar{K}} \left( 1- \Phi \left( \frac{{\bar{B}}}{\sqrt{\Sigma _T}} \right) \right) \bigg ]. \end{aligned}$$

(5.19)

where ${\bar{K}}_T:= 1 - K / F(0,T)$, ${\bar{B}} = B/F(0,T)-1$. $\Phi (x)$ is the cumulative distribution function of the standard normal distribution. q(t), $\Sigma _T$, $\omega _t^{1}(B)$, $\Omega _T$, ${\dot{\omega }}_T$ and $X_i(T)$ are defined in Appendix B.

This formula is based on a second order chaos expansion. Hence, compared to the third-order expansion used for call option cases in Sect. 5.1, the accuracy of the approximation gradually worsens as volatility increases and maturity lengthens. This approximation can be expanded to include even high-order terms; see 5.4 of Funahashi and Higuchi (2018). However, in the usual derivative, such an approximation does not exist or, if it does, it requires very complex calculations. Therefore, we will keep the approximation to the second order and observe the effect of the replication method, which is expected to be applicable to more general cases.

We consider two types of datasets. The first type, Case E, has volatilities at the level observed in the normal market. Whereas the second type, Case F, allows for higher volatilities, which are the same level of volatilities used in the previous subsection, with higher barrier level and interest rates. More specifically, we generate uniformly distributed random vectors $\pmb {\xi }^\textrm{P} = \{ \beta , v_0, \nu , \rho , \kappa , \theta , \epsilon \}$ and $\pmb {\xi }^\textrm{M} = \{ T, S_0, r, U \}$ within the range given in Table 4. Here, $U_1$ and $U_2$ is used to generate the barrier level $B = S_0 \times U_1$ and volatility of volatility $\nu = v_0 \times U_2$, respectively.

Table 4 Upper and lower limits of the input model parameters $\{ T, S_0, r, \beta , v_0, \nu , \rho , \kappa , \theta , U_1, U_2 \}$ generated by uniform random valuables. $U_1$ and $U_2$ are used to generate $B = S_0 \times U_1$ and $\nu = v_0 \times U_2$, respectively

Full size table

Notably, for an up-and-in barrier option, if the strike price exceeds the barrier level, that is, $K > B$, then the option reduces to a standard call option, which we have already considered in the previous section. Therefore, we omit these cases from our analysis. For Case E, to generate appropriate strikes, we consider strikes

$$\begin{aligned} K_{1}= & {} \max (\mu - 1.1 \sqrt{V}, 0.9 F(0,T_l)), \end{aligned}$$

(5.20)

$$\begin{aligned} K_{21}= & {} \min (\min (\mu + 1.1 \sqrt{V}, 1.1 F(0,T_l)), B), \end{aligned}$$

(5.21)

and $K_k = K_1 + (k-1) \Delta K$ for $k = 2, \ldots , 19$ where $\Delta K = \frac{K_{21} -K_1}{20}$. Whereas for Case F, we use (5.15) for $K_1$ but modify $K_{21}$ as

$$\begin{aligned} K_{21} = \min (\min (\mu + 2 \sqrt{V}, 1.5 F(0,T_l)), B). \end{aligned}$$

For Case F, we adopt the same condition (5.17) as used in the call option case. For Case E, we lower the upper limit of the parameters that express maturity and volatility. This could make the volatility value too small and cause the MC results to be inaccurate for deep-in-the-money or deep-out-of-the-money cases. Therefore, we further limit the acceptable range of implied volatility.

$$\begin{aligned} \sigma _\textrm{A}(\xi _l) < 0.1 \ \textrm{or} \ \sigma _\textrm{A}(\xi _l) > 0.8 \end{aligned}$$

(5.22)

Regarding the call option cases, for the actual ANN training and testing, we use $N' (\le N)$ data sets, $\{ \xi _n \}_{n = 1, \ldots , N'}$, which exclude these cases.

For the Case E, the actual data N = 21,000 and 420, 000 (M = 1000 and 20,000) are prepared and filtered out 651 and 15,414 (i.e., $N'$ = 20,349 and 404,586) data sets, respectively, due to the condition (5.22). Conversely, for the Case F, we prepare N = 105,000 and 420,000 (M = 5000 and 20,000) sets of $\xi $ and remove 15,183 and 61,110 (i.e., $N'$ = 89,817 and 358,890) data sets, respectively, following the condition (5.17). We use the MC scheme with 500,000 trials and 1000 simulation time steps.

A single Monte Carlo trial takes approximately 24.4 s to complete on a PC with an Intel Core i9-10980XE CPU with 18 cores and 36 threads. The test is performed under a multi-threaded application running on a multi-core processor with 20 cores, and it takes seven hours to perform M = 20,000 MC simulations. Because the deterministic functions $\Sigma _t$ and q(t) (defined in Appendix B) used in the WIC approximation for the Barrier option can be explicitly computed as the following, it takes 0.175 milliseconds to compute WIC approximation for a barrier option, which is fast enough even compared to the offline prediction of the neural network.

$$\begin{aligned} \Sigma _s= & {} \frac{e^{-2 \kappa s}}{2 \kappa ^3} \left( \theta ^2 \left( e^{2 \kappa s} (2 \kappa s - 3 ) + 4 e^{\kappa s}-1 \right) + \kappa ^2 v_0^2 \left( e^{2 \kappa s}-1 \right) + 2 \theta \kappa v_0 \left( e^{\kappa s}-1\right) ^2\right) , \\ q(s)= & {} \frac{e^{-3 \kappa s}}{6 \kappa ^6} \left( A(s) + B(s) \right) . \end{aligned}$$

where

$$\begin{aligned} A(s){} & {} = 3 \beta e^{\kappa s} \left[ \sinh (\kappa s) \left( \theta ^2 (\kappa s-1)+\kappa ^2 v_0^2\right) +\theta \cosh (\kappa s) (\theta (\kappa s-2)+2 \kappa v_0) \right. \\{} & {} \quad \left. + 2 \theta (\theta -\kappa v_0) \right] ^2, \\ B(s){} & {} = \kappa \nu \rho \left[ e^{3 \kappa s} \left( -16 \theta ^3+6 \theta ^2 \kappa (\theta s+v_0)+3 \theta \kappa ^2 v_0^2+\kappa ^3 v_0^3\right) \right. \\{} & {} \quad \left. + 6 \theta e^{2 \kappa s} \left( 3 \theta ^2-\kappa ^2 v_0 (2 \theta s+v_0)+\theta \kappa (2 \theta s-v_0)\right) \right. \\{} & {} \quad \left. - 3 \kappa e^{\kappa s} (\kappa v_0-\theta ) (\kappa v_0 (2 \theta s+v_0)-2 \theta (\theta s+v_0)) - 2 (\theta -\kappa v_0)^3 \right] . \end{aligned}$$

Using the Case E parameters in Table 4, Fig. 7 shows the comparisons between ANN prediction and MC results of the up and in barrier option prices using the test data. The upper, middle, and lower panels show $B_\textrm{ANN}$ v.s. $B_\textrm{MC}$, $B^D_\textrm{ANN}(WIC)$ v.s. $B_\textrm{MC}$, and $B^D_\textrm{ANN}(BS)$ v.s. $B_\textrm{MC}$, respectively. Here, $B_\textrm{ANN}$, $B^D_\textrm{ANN}(\mathrm WIC)$, and $B^D_\textrm{ANN}(\mathrm BS)$ represent the barrier option price calculated by ANN through direct mapping $\mathcal {M}$, ANN with WIC correction, and ANN using BS model as quasi-process correction, respectively. The left- and right-hand side panels indicate N = 21,000 (M = 1000) and N = 252,000 (M = 12,000), respectively. Concurrently, Fig. 8 shows frequency histograms of the differences between $B_\textrm{ANN}$ and $B_\textrm{MC}$ (upper-), $B^D_\textrm{ANN}(\mathrm WIC(2nd))$ and $B_\textrm{MC}$ (middle-), and $B^D_\textrm{ANN}(\mathrm BS)$ and $B_\textrm{MC}$ (lower-panel).

Figures 7 and 8 indicate that the neural network of $B^D_\textrm{ANN}(\mathrm WIC)$ converges most quickly. $B^D_\textrm{ANN}(\mathrm WIC)$ has already converged for the most part with the learning data of N = 21,000 (M = 1000), but the estimated error of $B^D_\textrm{ANN}(\mathrm BS)$ is noticeable at the right tail of the distribution, with errors exceeding $-$0.5. Even with a second-order WIC approximation, it indicates that the convergence is faster than the BS model. Conversely, $B_\textrm{ANN}$ has not converged even with N = 420,000 (M = 20,000), and is insufficient for practical use.

However, notably, the estimated results between $B^D_\textrm{ANN}(\mathrm WIC)$ and $B^D_\textrm{ANN}(\mathrm BS)$ do not show a significant difference compared to the call option cases. The reduction to the second-order approximation has a significant impact on the accuracy of the model. However, up-and-in barrier options have additional conditions built in, which severely limits the downside compared to their equivalent vanilla counterparts^{Footnote 10}. Hence, even if we use an approximation based on the Black Scholes model as an alternative to the Wiener-Itô Chaos approximation, we can expect to converge sufficiently if we keep N = 105,000–420,000.

Using the Case F parameters, Figs. 9 and 10 show the comparisons and frequency histograms of the differences between ANN prediction and MC results of the up and in barrier option prices, respectively. In Fig. 9, the upper-, and lower-panels show $B_\textrm{ANN}$ v.s. $B_\textrm{MC}$ and $B^D_\textrm{ANN}(BS)$ v.s. $B_\textrm{MC}$, respectively while, the left- and right-hand side panels indicate N = 105,000 (M = 5000) and N = 420,000 (M = 20,000), respectively. In Fig. 10, the upper and lower panels indicate $B_\textrm{ANN}$ and $B_\textrm{MC}$ and $B^D_\textrm{ANN}(\mathrm BS)$ and $B_\textrm{MC}$, respectively, while the left- and right-hand side panels indicate N = 105,000 (M = 5000) and N = 420,000 (M = 12,000), respectively.

As shown in Figs. 9 and 10, the neural network used in $B^D_\textrm{ANN}(BS)$ has sufficient accuracy for practical use when N = 105,000–420,000 while $B_\textrm{ANN}$ does not converge sufficiently. This suggests the need for more training data, which subsequently requires more time-consuming numerical simulations, resulting in a significant increase in time requirements. Furthermore, the time required for offline learning cannot be overlooked. Therefore, even when an accurate approximation solution cannot be obtained for the derivative price on the underlying asset that follows a complex probability process, employing ANN with a simple model correction such as the Black-Scholes (BS) model enables efficient learning and prediction for neural networks.

6 Discussion

Here, we discuss the limitations of the ANN with quasi-process correction using call option prices under the LSVM (5.14) in the SABR stochastic volatility model

$$\begin{aligned} \left\{ \begin{array}{rcl} \displaystyle \frac{\textrm{d}\bar{S}_t}{\bar{S}_t} &{}=&{} v_t \bar{S}^{\beta -1}_t \textrm{d}W^{{\bar{S}}}_t, \\ \textrm{d}v_t &{}=&{} \nu v_t \textrm{d}W^{{\bar{v}}}_t, \ \ v_0 = \alpha \end{array} \right. \end{aligned}$$

(6.1)

for the base approximation, where $W^{{\bar{S}}}_t$ and $W^{{\bar{v}}}_t$ are two standard Brownian motions with correlation $\textrm{d}W^{{\bar{S}}}_t \textrm{d}W^{{\bar{v}}}_t = {\bar{\rho }} \textrm{d}t$.

For the WIC approximation, the first three functions of the SABR model are explicitly computed as

$$\begin{aligned} \Sigma ^{{\bar{S}}}_T= & {} \frac{\alpha \left[ \left\{ \alpha \beta T (\alpha (\beta -1)+2 \nu {\bar{\rho }} )+2 \right\} ^3-8 \right] }{12 \beta (\alpha (\beta -1)+2 \nu {\bar{\rho }} )}, \\ q^{{\bar{S}}}_1(T)= & {} \frac{1}{32} \alpha ^3 T^2 (\alpha \beta +\nu {\bar{\rho }} ) (\alpha \beta T (\alpha (\beta -1)+2 \nu {\bar{\rho }} )+4)^2, \\ q^{{\bar{S}}}_2(T)= & {} \frac{1}{384} \alpha ^4 T^3 \left( \alpha ^2 \beta (2 \beta -1)+3 \alpha \beta \nu {\bar{\rho }} +\nu ^2 {\bar{\rho }} ^2\right) (\alpha \beta T (\alpha (\beta -1)+2 \nu {\bar{\rho }} )+4)^3 \end{aligned}$$

To obtain appropriate SABR parameters, $\bar{\pmb {\xi }} = \{ \alpha , \beta , \nu , \bar{\rho } \}$, we use same correlation used in the LSVM (5.14), $\bar{\rho } = \rho $. On the other hand, $\pmb {\theta } = \{ v_0, \beta , \nu \}$ is obtained by solving the problem

$$\begin{aligned} \bar{\pmb {\theta }} = \underset{\pmb {\theta } \in \Theta }{{\text {argmin}}} \left( (\Sigma ^S_t - \Sigma ^{{\bar{S}}}_t)^2 + (q_1^S(t) - q_1^{{\bar{S}}}(t))^2 + (q_2^S(t) - q_2^{{\bar{S}}}(t))^2 \right) \end{aligned}$$

(6.2)

within the range $\Theta $, $0.05< v_0 < 2$, $0.55< \beta < 0.998$, $0.002< \nu < 0.8$,.^{Footnote 11}

Figure 11 shows frequency histograms of the differences between $\sigma _\textrm{ANN}$ and $\sigma _\textrm{MC}$ (upper-left panel), $\sigma ^D_\textrm{ANN}(\mathrm WIC)$ and $\sigma _\textrm{MC}$ (upper-right panel), $\sigma ^D_\textrm{ANN}(\mathrm DD)$ and $\sigma _\textrm{MC}$ (lower-left panel), and $\sigma ^D_\textrm{ANN}(\mathrm SABR)$ and $\sigma _\textrm{MC}$ (lower-right panel). The parameters are generated by uniformly distributed random vectors within the range Case E given in Table 4. Here, we generate N = 210,000 (i.e., M = 10,000) sets of $\xi $ and remove 714 data sets (i.e., $N'$ =209,28) following the condition (5.17). Here, $\sigma ^D_\textrm{ANN}(\mathrm SABR)$ represents the implied volatilities calculated by ANN using SABR model as quasi-process correction and $\sigma _\textrm{MC}$ is computed using the Euler–Maruyama scheme with W = 500,000 trials and 100 simulation time steps.

From Fig. 11, we can see that the differences between $\sigma ^D_\textrm{ANN}(\mathrm SABR)$ and $\sigma _\textrm{MC}$ are distributed near 0, compared to the differences between $\sigma _\textrm{ANN}$ and $\sigma _\textrm{MC}$. Therefore, the former approximation is generally more efficient than the latter. However, it is also evident that the approximation through the SABR model demonstrates a relatively large error. This discrepancy arises because the SABR model lacks the capability to accommodate negative values, unlike the LSVM model (5.14). When attempting to replicate the LSVM model using the SABR model, the latter tends to adopt values near 0 instead of negative values, leading to the inflation of probability density near 0.

Figure 12 shows the cumulative distribution functions of the LSVM model, replicated SABR model, and replicated DD model. Here, we use a low-bias simulation scheme for the SABR model proposed by Chen et al. (2012), which introduced an efficient algorithm to simulate the squared Bessel process with an absorbing boundary at zero, whereas the Euler-Maruyama scheme is used to compute the underlying asset price in the DD model. To compute the cumulative distribution function (CDF), we run 1,000,000 MC trials with 300 simulation steps. The parameters of the original LSVM model, $\pmb {\xi }^M = \{ \epsilon ^*, \beta ^* \}$, the replicate DD model $\bar{\pmb {\xi }^M} = \{ \epsilon ^*, \beta ^* \}$ in (3.26), and replicated SABR model $\bar{\pmb {\theta }}$ in (6.2) are listed in Table 5.

In this case, the original LSVM and SABR processes have relatively large errors, potentially hampering the estimation, rather than directly estimating the option price using a neural network. This observation is consistent with the findings presented in Section 8.5 of Funahashi (2021b). In summary, considering the distribution shape, it is not advisable to employ the SABR model for replicating the LSVM model.

Thus, when using the NN based on the approximation of a quasi-process, it becomes crucial to exercise caution and thoroughly select the probability distribution of the underlying asset, whereas NNs utilizing asymptotic methods are free of these concerns because they directly use the original distribution. This represents a significant advantage of the latter approach.

Table 5 Parameters of the original LSVM model, replicate DD model, and replicated SABR model used to compute CDF in Fig. 12

Full size table

7 Conclusion

This study introduces two methods for efficiently learning the price of derivatives in neural networks. The first method is to learn the difference between the price of derivatives and its asymptotic expansion, and the second method is to learn the difference between the price of derivatives written on two different underlying asset prices; one underlying asset price is the target complex stochastic process, and the other is a relatively simple stochastic process that has a closed-form solution for the target derivatives prices. The former method has the advantage that it can be much more efficient than the latter one, especially when an approximate solution is available. Concurrently, the latter method is an alternative valuation method when there is no efficient approximate solution for the derivative value, and if one can arbitrarily determine the model parameters of the quasi-process that approximates the underlying asset process. The latter method requires more training data than the former method, but we demonstrate that it remains significantly more efficient than directly learning the derivative price using neural networks. Even if a relatively simple quasi-process, such as the Black-Scholes model, is employed, the learning and estimation efficiency are overwhelmingly superior to the direct method.

However, the latter method, as shown in Section 6 using the SABR model, may have the same or even lower approximation accuracy than the direct derivative price learning method if the pseudo-process of the underlying asset price is not selected appropriately. Therefore, the key to the latter method is to determine appropriate model parameters that mimic the original underlying asset process. An important contribution of this study is that it proposes a unified replication strategy to determine model parameters of quasi-processes from original underlying asset processes. Moreover, we showed that this approach is free from contract parameters, including strikes; hence, it fits our purpose. In summary, the two methods introduced in this study provide a more efficient way to learn the price of derivatives in neural networks. This is especially useful for stochastic volatility models and other cases where analytic solutions do not exist or are computationally expensive to obtain.

First, we use a general approximation method, such as the singular perturbation method or asymptotic expansion, to calculate the price of the derivative. If an accurate approximation is available, we use the first method (ANN with asymptotic correction). Conversely, if the approximate solution does not exist, or the accuracy is poor, or computation time is required, we calculate the difference between the target derivative price and the corresponding price under the quasi-process using the replication method, using the second method (ANN with quasi-process correction). The proposed methods can not only reduce the amount of training data required for the neural network offline but also significantly improve the accuracy of the online estimation used in daily trading.

Although this study only examines a method for estimating the derivative price by using the Monte Carlo simulation, these methods can be applied to other numerical methods that are computationally expensive, such as partial differential equations (PDEs), finite difference methods (FDMs), numerical integration, or approximation methods. Therefore, this method is generally effective for problems that can be calculated but take a long time to use in daily trading.

Our method works for more general cases, including multi-dimensional diffusions. We can then consider the valuation problem of financial products such as basket options and spread options using the approximation formulas derived in Funahashi and Kijima (2014) for $C_\textrm{App}$. Another case in which it is possible to extend our method is the valuation of American options. Liang et al. (2021) discusses the application of deep learning methods to the valuation problem of early exercisable derivatives such as American and Bermudan options, which is another topic that is frequently discussed in the literature. Since, the Fourier cosine expansion (COS) method, for example, Fang and Oosterlee (2009, 2011), is known to approximate American option prices well, it is the leading candidate for $C_\textrm{App}$ to obtain more accurate and stable ANN training. Moreover, using the result for the base approximation $C_\textrm{App}$, we can utilize the quasi-process correction using a simpler model to train the ANN model to learn the prices of American and Bermudan options under a complex model. A comparison between Liang et al. (2021) method and ours with this approach will be considered in a future study.

Notes

If one considers the FX rate as the underlying, the foreign interest rate plays the role of the dividend.
See Buehler et al. (2019) for example.
In the context of neural networks (ANNs), the term “control variate” generally refers to a different method. In this study, we unify the method of Kienitz et al. (2020) under the name ANN with quasi-process correction.
For example, we have $h_{1}(x)=x$, $h_{2}(x)=x^{2} - 1$, $h_{3}(x)=x^{3} - 3x$, $h_4(x) = x^4 - 6 x^2 + 3$, $h_6(x) = x^6 - 15 x^4 + 45 x^2 - 15$ etc.
Recall that the European option price is approximated by (2.17) and approximations of Asian and Barrier options can be found in Funahashi and Kijima (2014) and Funahashi and Higuchi (2018).
The activation function for the output layer, ${\bar{f}}$, is set to the identity function because the outputs are unbounded.
We compute M times MC simulations to create training and testing data $N = M \times 21$ and for each MC simulation, W trials are run.
In order to stabilize the calculation, for the DD case, if the optimal skew parameter, ${\bar{\beta }}^*$, is smaller than 0.05, we approximate that the underlying asset process follows the normal distribution and use the European call price
$$\begin{aligned} C(T, K; \epsilon ^*, 0) = (S_0 - K \textrm{e}^{-rT} ) \Phi (d(T, K)) + \epsilon ^* \textrm{e}^{-rT} \sqrt{T} \phi (d(T, K)), \ \ d(T, K) = \frac{S_0 \textrm{e}^{rT}- K}{\epsilon ^* \sqrt{T}} , \end{aligned}$$
(5.18)
to compute the implied volatility. Here, $\phi $ is the probability distribution function (PDF) of the standard normal distribution.
The graphs are in Appendix C because they are too many and too big to fit in the main text.
The same is true for other types of barrier options.
To stabilize the Hangn et al. (2002) approximation and the MC simulation, we restrict the range of the SABR parameters.

References

Aït-Sahalia, Y. (2002). Maximum-likelihood estimation of discretely-sampled diffusions: A closed-form approximation approach. Journal of Econometrics, 70(1), 223–262.
Article Google Scholar
Aït-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics, 36, 906–937.
Article Google Scholar
Antonov, A., Konikov, M., & Spector, M. (2015). The Free Boundary SABR: Natural Extension to Negative Rates. Risk Magazine, September, 68–73.
Bates, D. S. (1996). Jumps and stochastic volatility: Exchange rate processes implicit in deutsche mark options. Review of Financial Study, 9(1), 69–107.
Article Google Scholar
Brigo, D., & Mercurio, F. (2006). Interest Rate Models: Theory and Practice - with Smile, Inflation and Credit (2nd ed.). Springer.
Buccioni, V. (2023). Option pricing with artificial neural networks and Wiener-Itô chaos expansion approximation formulae. Master’s Thesis, Politecnico di Milano.
Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
Article Google Scholar
Chen, B., Oosterlee, C. W., & Weide, H. V. D. (2012). A low-bias simulation scheme for the stochastic volatility model. International Journal of Theoretical and Applied Finance, 15(2), 1250016.
Article Google Scholar
Duffy, D. J. (2006). Finite difference methods in financial engineering: A partial differential equation approach. Wiley.
Dupire, B. (1994). Pricing with a Smile. Risk, January, 18–20.
Fang, F., & Oosterlee, C. W. (2009). Pricing early-exercise and discrete barrier options by Fourier-cosine series expansions. Numerische Mathematik, 114, 27–62.
Article Google Scholar
Fang, F., & Oosterlee, C. W. (2011). A Fourier-based valuation method for bermudan and barrier options under Heston’s model. SIAM Journal on Financial Mathematics, 2(1), 439–463.
Article Google Scholar
Fouque, J. P., Papanicolaou, G., Sircar, K. R., & Solna, K. (2003). Singular perturbations in option pricing. Journal on Applied Mathematics, 63(4), 1648–1681.
Google Scholar
Funahashi, H. (2014). A chaos expansion approximation under hybrid volatility models. Quantitative Finance, 14(11), 1923–1936.
Article Google Scholar
Funahashi, H. (2021). Artificial neural network for option pricing with and without asymptotic correction. Quantitative Finance, 21(4), 575–592.
Article Google Scholar
Funahashi, H. (2021). Replication scheme for the pricing of European options. International Journal of Theoretical and Applied Finance, 24(3), 2150014.
Article Google Scholar
Funahashi, H. (2023). SABR equipped with AI wings. Quantitative Finance, 23(2), 229–249.
Article Google Scholar
Funahashi, H., & Higuchi, T. (2018). An analytical approximation for single barrier options under stochastic volatility models. Annals of Operations Research, 266(1–2), 129–157.
Article Google Scholar
Funahashi, H., & Kijima, M. (2014). An extension of the chaos expansion approximation for the pricing of exotic basket options. Applied Mathematical Finance, 21(2), 109–139.
Article Google Scholar
Funahashi, H., & Kijima, M. (2015). A chaos expansion approach for the pricing of contingent claims. Journal of Computational Finance, 18(3), 27–58.
Article Google Scholar
Hagan P., Kumar, D., Lesniewski, A., & Woodward, D. (2002). Managing smile risk. Wilmott Magazine, September, 84–108.
Hernandez, A. (2017). Calibration with neural networks. Risk, June, 1–5.
Heston, S. (1993). A closed form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6, 327–343.
Article Google Scholar
Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3, 551–560.
Article Google Scholar
Horvath, B., Muguruz, A., & Tomas, M. (2021). Deep learning volatility, a deep neural network perspective on pricing and calibration in (rough) volatility models. Quantitative Finance, 21(1), 11–27.
Article Google Scholar
Hull, J., & White, A. (1987). The pricing of options on assets with stochastic volatilities. Journal of Finance and Quantitative Analysis, 3, 281–300.
Google Scholar
Itkin, A. 2019. Deep Learning Calibration of Option Pricing Models: Some Pitfalls and Solutions. ArXiv:1906.03507
Kienitz., J., Acar, S. K., Liang, Q., & Nowaczyk, N. (2020). The CV Makes the Difference: Control Variates for Neural Networks. working paper.
Liang, J., Xu, Z., & Li, P. (2021). Deep learning-based least squares forward-backward stochastic differential equation solver for high-dimensional derivative pricing. Quantitative Finance, 21(8), 1309–1323.
Article Google Scholar
Liu, S., Borovykh, A., Grzelak, L., & Oosterlee, C. (2019). A neural network-based framework for financial model calibration. Journal of Mathematics in Industry, 9(9), 1–28.
Google Scholar
Liu, S., Oosterlee, C. W., & Bohte, S. M. (2019a). Pricing options and computing implied volatilities using neural networks. Risk, 7(1), 16.1-16.22.
Marris, D. (1999). Financial Option Pricing and Skewed Volatility. MPhil thesis, University of Cambridge.
McGhee, W. A. (2018). An Artificial Neural Network Representation of the SABR Stochastic Volatility Model. Working Paper.
Nwankpa, C., Ijomah, W., Gachagan, A., & Marshall, S. (2018). Activation Functions: Comparison of trends in Practice and Research for Deep Learning. arXiv preprint, arXiv:1811.03378.
Okatani, T. (2015). Deep learning. Kodansha.
Google Scholar
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
Article Google Scholar
Rubinstein, M. (1983). Displaced diffusion option pricing. Journal of Finance, 38, 213–217.
Article Google Scholar
Ruf, J., & Wang, W. (2020). Neural networks for option pricing and hedging: A literature review. Journal of Computational Finance, 24(1), 1460–1559.
Google Scholar
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Article Google Scholar
Schöbel, R., & Zhu, J. (1999). Stochastic volatility with Ornstein-Uhlenbeck process: An extension. European Finance Review, 4, 23–46.
Article Google Scholar
Takahashi, A. (1999). An asymptotic expansion approach to pricing contingent claims. Asia-Pacific Financial Markets, 6, 115–151.
Article Google Scholar
Valentina, B. (2023). Option pricing with Artificial Neural Networks and Wiener-Itô Chaos expansion approximation formulae Master thesis, Polytechnic University of Milan.

Download references

Acknowledgements

I express my heartfelt gratitude to Dr. Joerg Kienitz for his invaluable insights and enlightening discussions, which greatly contributed to the development of this paper. The author also wish to expresses sincere appreciation to the anonymous reviewer and the editors of the Annals of Operations Research for their invaluable feedback, which has significantly enhanced the quality of the manuscript. I am also grateful for the support provided by the JSPS KAKENHI Grant-in-Aid for Young Scientists (B) JP22K13436. Any errors or inaccuracies that may remain are entirely my responsibility. The author declares no conflicts of interest associated with this manuscript

Author information

Authors and Affiliations

Faculty of Economics, Kanagawa University, 3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama-shi, Kanagawa, 221-8686, Japan
Hideharu Funahashi

Authors

Hideharu Funahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hideharu Funahashi.

Ethics declarations

Conflict of interest

This study was funded by the JSPS KAKENHI Grant-in-Aid for Young Scientists (B) (Grant Number JP22K13436). The author declares no Conflict of interest associated with this manuscript. This article does not contain any studies with human or animal subjects performed by any of the author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. The exact formulas of $\Sigma _t$ and $q_i(t)$ under the LSVM

Funahashi (2014) derived the formulas of $\Sigma _t$, $p_i(t)$ for $i = 1, \ldots 8$, and $q_i(t)$ for $i = 1, \ldots , 5$ under the LSVM (1.1).

We denote $\sigma ^{(0)}(t) = \sigma (F(0,t), V(0,t))$ and $\gamma ^{(0)}(t) = \gamma (V(0,t))$ where $F(0,t) = \textrm{e}^{\int \limits _0^t r(s) \textrm{d}s}$ and $V(0,t) = \bar{E}(t) \left( v_0 + \int \limits _{0}^{t} E(u) \theta (u) \textrm{d}u \right) $ with $E(t) = \textrm{e}^{\int \limits _{0}^{t} \kappa (u) \textrm{d}u}$ and $\bar{E}(t) = 1/E(t)$, Subsequently, he demonstrated the validity of the following formulas for $p_i(t)$:

$$\begin{aligned} p_{1}(s):= & {} \sigma ^{(0)}(s) + \left\{ \sigma _S^{(0)}(s) F(0,s) + \frac{1}{2} F(0,s)^2 \sigma _{SS}^{(0)}(s) \right\} \left( \int \limits _{0}^{s} \left( \sigma _{0}(u) \right) ^{2} \textrm{d}u \right) \\{} & {} + \frac{1}{2} \sigma _{vv}^{(0)}(s) \bar{E}(s)^2 \left( \int \limits _{0}^{s} E(u)^2 \gamma ^{(0)}(u)^{2} \textrm{d}u \right) \\{} & {} + \left\{ \sigma _{v}^{(0)}(s) \bar{E}(s) + \sigma _{Sv}^{(0)}(s) F(0,s) \bar{E}(s) \right\} \left( \int \limits _{0}^{s} \rho E(u) \gamma ^{(0)}(u) \sigma ^{(0)}(u) \textrm{d}u \right) , \\ p_{2}(s):= & {} \sigma ^{(0)}(s) + F(0,s) \sigma _S^{(0)}(s), \\ p_{3}(s):= & {} \sigma _v^{(0)}(s) \bar{E}(s),\\ p_{4}(s):= & {} E(s) \gamma ^{(0)}(s), \\ p_{5}(s):= & {} \sigma ^{(0)}(s) + 3 \sigma _S^{(0)}(s) F(0,s) + \sigma _{SS}^{(0)}(s) F(0,s)^2, \\ p_{6}(s):= & {} \sigma _{vv}^{(0)}(s) \bar{E}(s)^2, \\ p_{7}(s):= & {} \sigma _{v}^{(0)}(s) \bar{E}(s) + \sigma _{Sv}^{(0)} F(0,s) \bar{E}(s), \\ p_{8}(s):= & {} \sigma _{S}^{(0)}(s) F(0,s). \end{aligned}$$

The derivatives are defined as $\sigma _S^{(0)}(t):= \partial _{s} \sigma (s,v)|_{s = S_t^{(0)}, v = v_t^{(0)}}$, $\sigma _v^{(0)}(t):= \partial _{v} \sigma (s, v)|_{s = S_t^{(0)}, v = v_t^{(0)}}$, $\sigma _S(t):= \partial _{ss} \sigma (s, v)|_{s = S_t^{(0)}, v = v_t^{(0)}}$, $\sigma _{vv}^{(0)}(t):= \partial _{vv} \sigma (s, v)|_{s = S_t^{(0)}, v = v_t^{(0)}}$, and $\sigma _{Sv}(t):= \partial _{sv} \sigma (s, v)|_{s = S_t^{(0)}, v = v_t^{(0)}}$, where $S_t^{(0)}=F(0,t)$ and $v_t^{(0)} = V(0,t)$.

Subsequently, the objective functions $\Sigma _t$ and $q_i(t)$ are given as follows:

$$\begin{aligned} \Sigma _t= & {} \int \limits _{0}^{t} p_1^2(s) ds, \end{aligned}$$

(A.1)

$$\begin{aligned} q_{1}(t)= & {} \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_1(u) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} \rho p_1(u) p_4(u) \textrm{d}u \right) \textrm{d}s, \end{aligned}$$

(A.2)

$$\begin{aligned} q_{2}(t)= & {} q_{2,1}(t) + q_{2,2}(t) + q_{2,3}(t), \end{aligned}$$

(A.3)

$$\begin{aligned} q_{3}(t)= & {} q_1^2(t), \quad q_{4}(t) = q_{4,1}(t) + q_{4,2}(t) + q_{4,3}(t), \end{aligned}$$

(A.4)

$$\begin{aligned} q_{5}(t)= & {} \int \limits _{0}^{t} p_2^{2}(s) \left( \int \limits _{0}^{s} (\sigma ^{(0)}(u))^2 \textrm{d}u \right) \textrm{d}s + \int \limits _{0}^{t} p_3^{2}(s) \left( \int \limits _{0}^{s} p_4^2(u) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_2(s) p_3(s) \left( \int \limits _0^s \rho \sigma ^{(0)}(u) p_4(u) \textrm{d}u \right) \textrm{d}s \end{aligned}$$

(A.5)

and we have defined $q_{2,1}(t)$, $q_{2,2}(t)$, $q_{2,2}(t)$, $q_{4,1}(t)$, $q_{4,2}(t)$ and $q_{4,3}(t)$ as

$$\begin{aligned} q_{2,1}(t)= & {} \int \limits _{0}^{t} p_1(s) p_5(s) \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_1(u) \left( \int \limits _{0}^{u} \sigma ^{(0)}(r) p_1(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} p_1(u) p_8(u) \left( \int \limits _{0}^{u} \sigma ^{(0)}(r) p_1(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \end{aligned}$$

(A.6)

$$\begin{aligned} q_{2,2}(t)= & {} \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} \rho p_1(u) \gamma _v^{(0)}(u) \left( \int \limits _{0}^{u} \rho p_1(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s, \end{aligned}$$

(A.7)

$$\begin{aligned} q_{2,3}(t)= & {} \int \limits _{0}^{t} p_1(s) p_6(s) \left( \int \limits _{0}^{s} \rho p_1(u) p_4(u) \left( \int \limits _{0}^{u} \rho p_1(s) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_1(s) p_7(s) \left( \int \limits _{0}^{s} \rho p_1(u) p_4(u) \left( \int \limits _{0}^{u} \sigma ^{(0)}(r) p_1(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_1(s) p_7(s) \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_1(u) \left( \int \limits _{0}^{u} \rho p_1(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} p_1(u) p_3(u) \left( \int \limits _{0}^{u} \rho p_1(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s, \end{aligned}$$

(A.8)

$$\begin{aligned} q_{4,1}(t)= & {} 2 \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} p_1(u) p_2(u) \left( \int \limits _{0}^{u} \left( \sigma (r) \right) ^2 \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_2(u) \left( \int \limits _{0}^{u} \sigma ^{(0)} (r) p_1(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_2(s)^2 \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_1(u) \textrm{d}u \right) ^2 \textrm{d}s \end{aligned}$$

(A.9)

$$\begin{aligned} q_{4,2}(t)= & {} 2 \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} p_1(u) p_3(u) \left( \int \limits _{0}^{u} p_4(r)^2 \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} \rho p_3(u) p_4(u) \left( \int \limits _{0}^{u} \rho p_1(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + \int \limits _{0}^{t} p_3(s)^2 \left( \int \limits _{0}^{s} \rho p_1(u) p_4(u) \textrm{d}u \right) ^2 \textrm{d}s \end{aligned}$$

(A.10)

$$\begin{aligned} q_{4,3}(t)= & {} 2 \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} p_1(u) p_3(u) \left( \int \limits _{0}^{u} \rho \sigma ^{(0)}(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} p_1(u) p_2(u) \left( \int \limits _{0}^{u} \rho \sigma ^{(0)}(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_1(s) p_2(s) \left( \int \limits _{0}^{s} \sigma ^{(0)} p_3(u) \left( \int \limits _{0}^{u} \rho p_1(r) p_4(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_1(s) p_3(s) \left( \int \limits _{0}^{s} \rho p_2(u) p_4(u) \left( \int \limits _{0}^{u} \sigma ^{(0)} p_1(r) \textrm{d}r \right) \textrm{d}u \right) \textrm{d}s \nonumber \\{} & {} + 2 \int \limits _{0}^{t} p_2(s) p_3(s) \left( \int \limits _{0}^{s} \sigma ^{(0)}(u) p_1(u) \textrm{d}u \right) \left( \int \limits _{0}^{s} \rho p_1(u) p_4(u) \textrm{d}u \right) \textrm{d}s \end{aligned}$$

(A.11)

Appendix B. Explicit formulas of the deterministic functions in barrier option

Funahashi and Higuchi (2018) derived an approximation formula for eight types of barrier options: up-and-in call, up-and-in put, down-and-in call, down-and-in put, up-and-out call, up-and-out put, down-and-out call, and down-and-out put. Their approximation is based on the 2nd-order WIC expansion.

The basic concept is to mimic the target asset process using a polynomial of the Wiener process. This approach allows them to translate the problem of solving the first hit probability of the asset process into that of a Wiener process whose distribution of passage time is known. Finally, using Girsanov’s theorem and the reflection principle, they showed that single barrier option prices can be approximated in a closed form (5.19).

Let $F(0,t) = S_0 \textrm{e}^{\int \limits _0^t r(u) \textrm{d}u}$, $V(0,t) = {\bar{E}}(t) \left( v_0 + \int \limits _0^t E(s) \theta _s \textrm{d}s \right) $, $\sigma ^{(0)}(t) = \sigma (F(0,t),V(0,t))$, $\sigma ^{(0)}_S(t) = \partial _x \sigma (x,y)|_{x=F(0,t),y=V(0,t)}$, $\sigma ^{(0)}_v(t) = \partial _y \sigma (x,y)|_{x=F(0,t),y=V(0,t)}$, and $\gamma ^{(0)}(t) = \gamma (V(0,t))$ with $E(s):=\textrm{e}^{\int \limits _0^s \kappa _u \textrm{d}u}$, ${\bar{E}}(s):= 1 / E(s)$, $\Sigma _t:=\int \limits _0^t (\sigma ^{(0)}(s))^2 \textrm{d}s$. Then, the explicit formulas of the functions q(t), $\Sigma _T$, $\omega _t^{1}(B)$, $\Omega _T$, ${\dot{\omega }}_T$ are given as

$$\begin{aligned} q(t) =\int \limits _0^t p_1(s) \sigma ^{(0)}(s) \left( \int \limits _0^s \left( \sigma ^{(0)}(u)\right) ^2 \textrm{d}u \right) \textrm{d}s + \rho \int \limits _0^t p_2(s) \sigma ^{(0)}(s) \left( \int \limits _0^s p_3(u) \sigma ^{(0)}(u) \textrm{d}u \right) \textrm{d}s. \end{aligned}$$

where

$$\begin{aligned} p_1(s):= & {} \sigma ^{(0)}(s) + F(0,t) \sigma ^{(0)}_S(s), \\ p_2(s):= & {} \sigma ^{(0)}_v(s) {\bar{E}}(s), \\ p_3(s):= & {} \gamma ^{(0)}(s) E(s),\\ \omega _t^{1}(B):= & {} -\frac{\sqrt{t} (\Sigma _t^{\frac{3}{2}} - \sqrt{D_t(B)})}{2 q(t)}, \end{aligned}$$

where

$$\begin{aligned} D_t(B):= & {} \Sigma _t^3 + 4 \left( \frac{B}{F(0,t)} - 1 \right) q(t) \Sigma _t + 4 q^2(t),\\ \Omega _t= & {} \int \limits _0^t \alpha ^2(s) \textrm{d}s, \end{aligned}$$

where

$$\begin{aligned} \alpha (t):= -\frac{\partial }{\partial t} \omega ^1_t(B), \end{aligned}$$

and ${\dot{\omega }}_T = \omega ^1_T(B) + \omega ^1_0(B)$. Finally, $X_i(T)$ for $i = 1, 2, 3$ are defined as

$$\begin{aligned} X_1(T):= & {} F(0,T) q(T) \left( \omega ^1_T(K) + {\dot{\omega }}_T \right) \left[ (\omega ^1_T(K))^2 + ({\dot{\omega }}_T)^2 \right] \Omega _T T^{-\frac{3}{2}} \\{} & {} + F(0,T) \left[ (\omega ^1_T(K))^2 + \omega ^1_T(K) {\dot{\omega }}_T + ({\dot{\omega }}_T)^2 \right] \left( 2 \phi q(T) \sqrt{\Omega _T} + \Sigma _T^{\frac{3}{2}} \Omega _T \right) T^{-1} \\{} & {} + \left( \omega ^1_T(K) + {\dot{\omega }}_T \right) \Sigma _T \left[ F(0,T) \left( 2 \phi \sqrt{\Sigma _T \Omega _T} + \Omega _T \right) - K \Omega _T \right] T^{-\frac{1}{2}} \\{} & {} + F(0,T) q(T) \left[ \omega ^1_T(K) (2 + \Omega _T) + {\dot{\omega }}_T (2 + 3 \Omega _T) \right] T^{-\frac{1}{2}} \\{} & {} + F(0,T) \Sigma _T \left[ 2 \phi \sqrt{\Omega _T} + \sqrt{\Sigma _T} (2 + \Omega _T) \right] + 2 \phi (F(0,T)q(T)-K\Sigma _T) \sqrt{\Omega _T},\\ X_2(T):= & {} F(0,T) q(T) \left( \omega ^1_T(B) + {\dot{\omega }}_T \right) \left[ (\omega ^1_T(B))^2 + ({\dot{\omega }}_T)^2 \right] \Omega _T T^{-\frac{3}{2}} \\{} & {} + F(0,T) \left[ (\omega ^1_T(B))^2 + \omega ^1_T(B) {\dot{\omega }}_T + ({\dot{\omega }}_T)^2 \right] \left( 2 \phi q(T) \sqrt{\Omega _T} + \Sigma _T^{\frac{3}{2}} \Omega _T \right) T^{-1} \\{} & {} + \left( \omega ^1_T(B) + {\dot{\omega }}_T \right) \Sigma _T \left[ F(0,T) \left( 2 \phi \sqrt{\Sigma _T \Omega _T} + \Omega _T \right) - K \Omega _T \right] T^{-\frac{1}{2}} \\{} & {} + F(0,T) q(T) \left[ \omega ^1_T(B) (2 + \Omega _T) + {\dot{\omega }}_T (2 + 3 \Omega _T) \right] T^{-\frac{1}{2}} \\{} & {} +F(0,T) \Sigma _T \left[ 2 \phi \sqrt{\Omega _T} + \sqrt{\Sigma _T} (2 + \Omega _T) \right] + 2 \phi (F(0,T)q(T)-K\Sigma _T) \sqrt{\Omega _T}, \end{aligned}$$

and

$$\begin{aligned} X_3(T):= & {} F(0,T) q(T) {\dot{\omega }}^4_T \Omega _T T^{-2} + F(0,T) {\dot{\omega }}^3_T \left( 2 \phi q(T) \sqrt{\Omega _T} + \Sigma ^{\frac{3}{2}}_T \Omega _T \right) T^{-\frac{3}{2}} \\{} & {} + {\dot{\omega }}^2_T \left\{ 2 \phi F(0,T) \Sigma _T \sqrt{\Sigma _T \Omega _T} + F(0,T)\Sigma _T \Omega _T + 2 F(0,T) q(T) (1 + 2\Omega _T) - K \Sigma _T \Omega _T \right\} T^{-1} \\{} & {} + 2 F(0,T) {\dot{\omega }}_T \left( 2 \phi q(T) \sqrt{\Omega _T} + \Sigma _T^{\frac{3}{2}}(1+\Omega _T) + \phi \Sigma _T \sqrt{\Omega _T} \right) T^{-\frac{1}{2}} -\, 2 \phi K {\dot{\omega }}_T \sqrt{\Omega _T} \Sigma _T T^{-\frac{1}{2}} \\{} & {} + 2 F(0,T)\Sigma _T \left( 1 + \phi \sqrt{\Sigma _T \Omega _T} \right) + 2 F(0,T) q(T) \Omega _T - 2 K \Sigma _T, \end{aligned}$$

respectively.

Appendix C. A Comparison of the implied volatilities from four ANN methods under various test data sizes

Here, we compare the implied volatilities calculated by $\sigma _\textrm{MC}$, $\sigma _\textrm{ANN}$, $\sigma ^D_\textrm{ANN}(\mathrm WIC)$, $\sigma ^D_\textrm{ANN}(\mathrm DD)$, and $\sigma ^D_\textrm{ANN}(\mathrm BS)$, using the parameters given in Table 2. Figures 13, 14, and 15 show the cases where the test data, that is, the latter 20% of N =21,000, 210,000, and 420,000 (i.e., M = 1000, 10,000, and 20,000), respectively, are used.

Appendix D. The impact and performance of the new and previous methods under various ANN configurations

In this section, we investigate the impact and performance of the new and previous methods under various ANN configurations, including different activation functions, numbers of nodes, and numbers of hidden layers.

It is evident from Table 6 that the variances and absolute values of the means of $\sigma ^D_\textrm{ANN}(\mathrm WIC) - \sigma _\textrm{MC}$, $\sigma ^D_\textrm{ANN}(\mathrm DD) - \sigma _\textrm{MC}$, and $\sigma ^D_\textrm{ANN}(\mathrm BS) - \sigma _\textrm{MC}$ are much lower than those of $\sigma _\textrm{ANN} - \sigma _\textrm{MC}$. This is especially true for $\sigma ^D_\textrm{ANN}(\mathrm WIC) - \sigma _\textrm{MC}$ with 1,000 training data cases, which are much lower than $\sigma _\textrm{ANN} - \sigma _\textrm{MC}$ with 10,000 training data cases. We observe that the hyperparameters are crucial for direct mapping $\mathcal {M}$ but have little influence on the mapping $\mathcal {M}_D$ with WIC approximation. It should be noted that the hyperparameters have a slight effect on the performance of $\mathcal {M}_D$ with DD and $\mathcal {M}_D$ with BS when the training data is 1,000 samples, but this effect vanishes when the training data is increased to 10,000 samples, while the instability remains in $\mathcal {M}$. The time required for offline learning increases in proportion to the number of training data so by using less data, one can speed up the computation. On the other hand, the time required for online prediction does not change considerably with this degree of hyperparameter variation. This suggests that the mapping $\mathcal {M}_D$ can achieve higher accuracy in offline learning with less computation.

Table 6 The mean, variance, and computational times of the new and previous methods under different activation functions, numbers of nodes, and numbers of hidden layers

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Funahashi, H. Deep learning for derivatives pricing: a comparative study of asymptotic and quasi-process corrections. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-06114-1

Download citation

Received: 02 September 2023
Accepted: 11 June 2024
Published: 08 July 2024
DOI: https://doi.org/10.1007/s10479-024-06114-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep learning for derivatives pricing: a comparative study of asymptotic and quasi-process corrections

Abstract

Similar content being viewed by others

Derivatives of feed-forward neural networks and their application in real-time market risk management

The DeepONets for Finance: An Approach to Calibrate the Heston Model

Neural Networks and the Nonlinear Feynman–Kac Theorem Applied to Financial Options Pricing

1 Introduction

2 Backgrounds

2.1 Wiener Itô Chaos expansion

Assumption 2.1

Remark 2.1

Remark 2.2

2.2 ANN with asymptotic correction

2.3 ANN with quasi-process correction

3 Proposed method

4 Artificial neural networks

4.1 Feedforward neural network

4.2 Backpropagation

5 Numerical example

5.1 Call option

5.2 Barrier option

6 Discussion

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A. The exact formulas of \(\Sigma _t\) and \(q_i(t)\) under the LSVM

Appendix B. Explicit formulas of the deterministic functions in barrier option

Appendix C. A Comparison of the implied volatilities from four ANN methods under various test data sizes

Appendix D. The impact and performance of the new and previous methods under various ANN configurations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation