Introduction

Multi-asset option pricing is an important topic, which has been gaining in prominence as the availability of such options has increased. Pricing these options presents additional challenges as it requires one to jointly model an entire basket of financial assets. Realistic models for multivariate financial returns tend to be quite complicated and depend on many parameters, which can be difficult to fit. Even worse, the number of parameters typically grows very quickly as the dimension increases, a phenomenon known as the “curse of dimensionality.” In this paper, we introduce a realistic model for multivariate financial returns, which fits the data well and where the number of parameters grows only linearly as the dimension increases. We give an approach for finding risk-neutral measures and for using Monte-Carlo methods to price multi-asset options. As an illustration, we apply the model to price multi-asset options in two, three, and four dimensions. Detailed goodness-of-fit methods show that this model fits the data very well.

Our model is based on the class of multivariate tempered stable (TS) distributions, which are obtained by modifying the tails of infinite variance stable distributions to make them lighter. This leads to tails that are more realistic than the extremely light tails of Brownian motion or the extremely heavy tails of stable distributions. Further, they allow for skewness and excess kurtosis. As such, they satisfy a number of well-known stylized facts about financial returns (Cont and Tankov 2004). Theoretical explanations for how these distributions arise in financial applications can be found in Grabchak and Samorodnitsky (2010) or Grabchak and Molchanov (2015). In the univariate case, empirical studies showing that they do a good job fitting financial returns can be found in, e.g., Carr et al. (2002), Fallahgoul and Loeper (2021), and the references therein. Similar results in the bivariate case can be found in Xia and Grabchak (2022). However, that methodology suffers from the curse of dimensionality and appears to be intractable in higher dimensions. This is a common issue in modeling with multivariate TS distributions and, hence, there has been little empirical work beyond the bivariate case. We remedy this by introducing a new model, which works well in higher dimensions. We call it the diagonal TS model. It is an extension of the diagonal model, which was originally introduced by Sharpe (1963) in the context of the normal distribution. We consider it in the context of TS distributions.

The rest of this paper is organized as follows. A brief review of the literature is given in " Literature review" Section. In "Multivariate tempered stable distributions" Section we recall the definition of the class of multivariate TS distributions and give some properties. In "Risk-neutral measures and general methodology" Section we discuss the problem of finding equivalent risk-neutral measures and give approaches for Monte-Carlo pricing of multi-asset options. In " The diagonal model " Section we formally introduce the diagonal TS model and discuss our methodology for data analysis, simulation, and parameter estimation under both the physical and the risk-neutral measures. In "Data analysis " Section we apply our methodology to fit the diagonal TS model to financial datasets in two, three and four dimensions. Some conclusions and directions for future work are given in " Conclusion" Section.

Before proceeding, we introduce some notation. We write \({\mathbb {R}}^d\) to denote the space of d-dimensional column vectors and, for \(x\in {\mathbb {R}}^d\), we write \(x^\top\) to denote the transpose of x. We equip \({\mathbb {R}}^d\) with usual inner product \(\langle \cdot ,\cdot \rangle\) and the usual norm \(|\cdot |\). Thus, for \(x,y\in {\mathbb {R}}^d\), we have \(\langle x,y\rangle =x^\top y\) and \(|x| = \sqrt{\langle x,x\rangle }\). We write \({\mathbb {S}}^{d-1}=\left\{ s\in {\mathbb {R}}^{d}: |s|=1\right\}\) to denote the unit sphere in \({\mathbb {R}}^{d}\). We write \(\mathfrak B({\mathbb {R}}^{d})\) and \(\mathfrak B({\mathbb {S}}^{d-1})\) to denote the Borel sets on \({\mathbb {R}}^{d}\) and \({\mathbb {S}}^{d-1}\), respectively. If \(\mu\) is a probability distribution on \({\mathbb {R}}^d\), we write \(X\sim \mu\) to denote that X is a d-dimensional random vector with distribution \(\mu\) and we write \(X_1,X_2,\dots ,X_n{\mathop {\sim }\limits ^{\textrm{iid}}}\mu\) to denote that \(X_1,X_2,\dots ,X_n\) are independent and identically distributed (iid) d-dimensional random vectors with common distribution \(\mu\). We write \({\mathbb {C}}\) to denote the set of complex numbers. For \(z\in {\mathbb {C}}\) we write \(\Re z\) to denote the real part and \(\Im z\) to denote the imaginary part. We write \(i=\sqrt{-1}\) to denote the imaginary unit. For \(a\in {\mathbb {R}}^d\), we write \((a)^+\) to denote \(\max \{a,0\}\).

Literature review

TS distributions are a class of models that are obtained by modifying the tails of stable distributions to make them lighter, which makes them more realistic for many applications. Perhaps the earliest models of this type are Tweedie distributions, which were introduced in Tweedie (1984) and then rediscovered in Koponen (1995). A general class of TS distributions was introduced in Rosiński (2007). That paper was also the first to introduce multivariate TS distributions. See the monograph Grabchak (2016) and the references therein for additional generalizations and many theoretical results in both the univariate and the multivariate settings.

The modeling of univariate financial returns with TS distributions goes back, at least, to Boyarchenko and Levendorskiĭ (2000) and Carr et al. (2002). Since then, a vast literature has developed. Empirical evidence shows that TS distributions do a good job fitting univariate returns, see, e.g., Carr et al. (2002), Fallahgoul and Loeper (2021), and the references therein. Theoretical explanations for how these distributions arise in financial applications can be found in Grabchak and Samorodnitsky (2010) and Grabchak and Molchanov (2015). There are many papers dealing with pricing single-asset options using TS distributions, see, e.g. Poirot and Tankov (2006), Černỳ and Kyriakou (2011), Rachev et al. (2011), Li et al. (2012), Küchle and Tappe (2014), or Section 7.1 in Grabchak (2016).

Much less work has been done in the multivariate setting. The difficulty stems from the fact that multivariate TS distributions depend on the spectral measure, which is an infinite dimensional parameter. As far as we know, the only paper to fit multivariate TS distributions to financial returns and check the goodness-of-fit is Xia and Grabchak (2022), which focused on the bivariate case. However, that approach suffers from the curse of dimensionality and appears to be intractable in higher dimensions. Multi-asset option pricing with TS and related distributions was considered in Linders and Stassen (2016), Guo et al. (2018), Fallahgoul et al. (2019), Kim et al. (2023), and Wu et al. (2023). These papers do not perform goodness-of-fit testing and, with one exception, the empirical work is limited to the bivariate case. The exception is Linders and Stassen (2016), where higher dimensional situations are considered. However, there dependence is modeled using a single correlation term, which is assumed to be constant over all pairs of assets. While this may be reasonable in some situations, it is not very realistic.

Perhaps the most famous method for pricing multi-asset options is multivariate Black–Scholes, see, e.g., Carmona and Durrleman (2005) or Björk (2009) for details. Here, pricing is easily implemented for any number of assets using Monte-Carlo methods. However, the approach depends on the assumption that returns jointly follow a multivariate normal distribution, which rarely holds in practice, see Cont and Tankov (2004). There are many other methods in the literature, see, e.g., Alexander and Venkatramanan (2012), Ruijter and Oosterlee (2012), Meng and Ding (2013), Chen and Wang (2020), and the references therein. We note that most of these focus on the bivariate case.

The model presented in this paper fills an important gap. First, it is realistic and we provide detailed goodness-of-fit results to show that it fits the data well. Second, the number of parameters grows only linearly, which makes it tractable in more than two-dimensions.

Multivariate tempered stable distributions

The characteristic function of a multivariate TS distribution \(\mu\) is given in Xia and Grabchak (2022). It can be written, for any \(z\in {\mathbb {R}}^d\), in the form

$$\begin{aligned} \hat{\mu }(z)= & {} \exp \left[ i\langle \gamma ,z \rangle +\int _{{\mathbb {S}}^{d-1}}\int _{0}^{\infty }(e^{i\langle s,z\rangle x}-1 )\dfrac{e^{-b(s) x}}{x^{1+\alpha }}\mathrm dx \sigma (\mathrm ds) \right] \nonumber \\= & {} \exp \left[ i\langle \gamma ,z \rangle + \Gamma (-\alpha )\int _{{\mathbb {S}}^{d-1}}\big ((b(s)-i\langle s,z\rangle )^{\alpha }-b^{\alpha }(s) \big )\sigma (\mathrm ds) \right] , \end{aligned}$$
(1)

where \(\alpha \in (0,1)\), \(\gamma \in {\mathbb {R}}^d\), \(b:{\mathbb {S}}^{d-1}\mapsto (0,\infty )\) is a Borel function, and \(\sigma\) is a finite Borel measure on \({\mathbb {S}}^{d-1}\). We denote this distribution by \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\). Here \(\alpha\) is the index of stability, \(\gamma\) is the drift, b is the tempering exponent, and \(\sigma\) is the spectral measure. The function \(q(x,s) = e^{-b(s) x}\), \(x>0\), \(s\in {\mathbb {S}}^{d-1}\) is sometimes called the tempering function. So long as the support of the spectral measure \(\sigma\) contains at least d linearly independent vectors, the distribution \(\mu\) is absolutely continuous. However, with few exceptions, there is no closed form for its joint probability density function (pdf) or its joint cumulative distribution function (cdf). For this reason, when working with TS distributions, we prefer methods based on characteristic functions or on simulation.

Every TS distribution \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) is infinitely divisible with Lévy measure

$$\begin{aligned} L(B) = \int _{{\mathbb {S}}^{d-1}} \int _0^\infty 1_B(xs))\dfrac{e^{-b(s) x}}{x^{1+\alpha }}\mathrm dx \sigma (\mathrm ds), \ \ B\in \mathfrak B({\mathbb {R}}^d). \end{aligned}$$

In the limiting case, where \(b(s)=0\) for each \(s\in {\mathbb {S}}^{d-1}\), L reduces to the Lévy measure of an infinite variance stable distribution with index of stability \(\alpha\) and spectral measure \(\sigma\), see Samorodnitsky and Taqqu (1994). In this sense, the tempering function q tempers (i.e., modifies) the tails of the Lévy measure to make them lighter. This leads to a similar tempering of the tails of the corresponding distribution \(\mu\) and justifies calling b the tempering exponent and q the tempering function.

We note that one can consider TS distribution with any \(\alpha \in (-\infty ,2]\), see Cont and Tankov (2004) or Grabchak (2016). We focus on the case \(\alpha \in (0,1)\) for two reasons. First, when \(\alpha \le 0\), TS distributions can no longer be interpreted as modifications of stable distributions to make their tails lighter, which is a primary motivation for their use in finance, see Grabchak and Samorodnitsky (2010) or Grabchak and Molchanov (2015). Second, we exclude the case where \(\alpha \in [1,2)\) as there are no known exact simulation methods in this case, even in the univariate case. For a discussion of approximate simulation methods, see Kawai and Masuda (2011). The focus on \(\alpha \in (0,1)\) is common for financial applications even in the univariate case, see, e.g., Küchle and Tappe (2014).

Associated with distribution \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) is a TS Lévy process \(\{X_t:t\ge 0\}\), where \(X_1\sim \mu\). In this case, \(\mu\) uniquely determines the distribution of the entire process. In fact, Theorem 7.10 in Sato (1999) tells us that, if \({\hat{\mu }}\) is the characteristic function of \(X_1\), then for any \(t\ge 0\), the characteristic function of \(X_t\) is \(\left( {\hat{\mu }}(z)\right) ^t\), \(z\in {\mathbb {R}}^d\). Combining this with (1) shows that \(X_t\sim \textrm{TS}_{\alpha }(t\sigma ,b,t\gamma )\). See Sato (1999) or Cont and Tankov (2004) for more on infinitely divisible distributions and Lévy processes. From (1), we see that TS distributions do not have a diffusion component, which means that the corresponding Lévy processes are pure jump process. See Cont and Tankov (2004) for a discussion on why jump processes tend to provide good models for financial returns.

By differentiating the characteristic function, it is readily checked that for \(X\sim \textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) the mean vector is

$$\begin{aligned} \mathrm E[X]= & {} \gamma +\Gamma (1-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -1} s\sigma (\mathrm ds) \end{aligned}$$
(2)

and the covariance matrix is

$$\begin{aligned} \mathrm Cov(X)= & {} \Gamma (2-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -2} ss^\top \sigma (\mathrm ds). \end{aligned}$$
(3)

While there is no simple definition for multivariate skewness and kurtosis (Jammalamadaka et al. 2021), the skewness of the ith component of X is given by

$$\begin{aligned} \frac{\Gamma (3-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -3} s_i^3 \sigma (\mathrm ds)}{\left( \Gamma (2-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -2} s_i^2 \sigma (\mathrm ds)\right) ^{3/2}} \end{aligned}$$

and the excess kurtosis of the ith component is given by

$$\begin{aligned} \frac{\Gamma (4-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -4} s_i^4 \sigma (\mathrm ds)}{\left( \Gamma (2-\alpha ) \int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -2} s_i^2 \sigma (\mathrm ds)\right) ^{2}}, \end{aligned}$$

where \(s_i\) is the ith component of vector s. We now give a result about linear transformations of TS distributions, which we will use when standardizing data.

Theorem 1

Let A be a positive definite \(d\times d\)-dimensional matrix, fix \(c\in {\mathbb {R}}^d\), and let \(X\sim \textrm{TS}_{\alpha }(\sigma ,b,\gamma )\). If \(Y=A X+ c\), then \(Y\sim \textrm{TS}_{\alpha }(\sigma ',b', \gamma ')\), where \(\gamma ' = A \gamma +c\),

$$\begin{aligned} \sigma ' (B) = \int _{{\mathbb {S}}^{d-1}} 1_{B}\left( \frac{A s}{|A s|}\right) |A s|^\alpha \sigma (\mathrm ds), \ \ \ B\in \mathfrak B({\mathbb {S}}^{d-1}), \end{aligned}$$

and

$$\begin{aligned} b'(s) = |A^{-1} s| b\left( A^{-1}s/|A^{-1}s|\right) , \ \ \ s\in {\mathbb {S}}^{d-1}. \end{aligned}$$

Proof

The characteristic function of Y is given, for \(z\in {\mathbb {R}}^d\), by

$$\begin{aligned} \mathrm E[e^{i\left\langle z, Y\right\rangle }]&= \exp \left[ i \left\langle c, z \right\rangle + i \left\langle \gamma , A z \right\rangle \right. \\&\left. \qquad + \int _{{\mathbb {S}}^{d-1}} \int _0^\infty \left( e^{ix\left\langle A z, s \right\rangle }-1 \right) x^{-1-\alpha }e^{-xb(s)}\mathrm dx \sigma (\mathrm ds) \right] \\&= \exp \left[ i \left\langle A \gamma +c,z \right\rangle \right. \\&\left. \qquad + \int _{{\mathbb {S}}^{d-1}} \int _0^\infty \left( e^{iu\left\langle z, \frac{A s}{|A s|} \right\rangle }-1 \right) u^{-1-\alpha }e^{-ub(s)/|A s|} \mathrm du \ |A s|^\alpha \sigma (\mathrm ds) \right] \\&=\exp \left[ \int _{{\mathbb {S}}^{d-1}} \int _0^\infty \left( e^{iu\left\langle z, s \right\rangle }-1 \right) u^{-1-\alpha }e^{-ub'(s)} \mathrm du \sigma '(\mathrm ds) \right. \\&\quad \left. +i \left\langle A \gamma +c ,z \right\rangle \right] , \end{aligned}$$

where we use the change of variables \(u=x|A s|\) and the facts that positive definite matrices are symmetric, invertible, and satisfy \(\left\langle y, Az\right\rangle =\left\langle Ay, z\right\rangle\) for any \(y,z\in {\mathbb {R}}^d\). \(\square\)

Simple tempered stable (STS) distributions are an important class of one-dimensional TS distributions that serve as the building blocks from which many other classes of TS distributions are built. A TS distribution \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) is an STS distribution when \(d=1\), \(\sigma (\{-1\})=0\), and \(\gamma =0\). Taking \(a=\sigma (\{1\})\ge 0\) and \(b=b(1)>0\), the characteristic function in (1) reduces to

$$\begin{aligned} \hat{\mu }(z)=\exp \left[ \int _{0}^{\infty }(e^{izx}-1 )\dfrac{ae^{-b x}}{x^{1+\alpha }}\mathrm dx \right] =\exp \left[ a\Gamma (-\alpha )\left( (b-iz)^{\alpha }-b^{\alpha }\right) \right] , \ \ \end{aligned}$$
(4)

for \(z\in {\mathbb {R}}\). We write \(\mu =\textrm{STS}_{\alpha }(a,b)\) in this case.

The problem of simulation from STS distributions is well-studied. A simple rejection sampling algorithm is given, e.g., as Algorithm 0 in Kawai and Masuda (2011). A more efficient double rejection sampling algorithm is given in Devroye (2009). This was further optimized in Hofert (2011). This optimized version is implemented in the retstable function of the “copula” package for the statistical software R. This is the function that we use for all of the simulations in this paper.

The importance of STS distributions is motivated by the following result from Xia and Grabchak (2022), which says that, in any dimension d, every TS distribution whose spectral measure has a finite support is the linear combination of STS distributions.

Theorem 2

Let \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) such that there exist \(a_1,a_2,\dots ,a_k>0\) and \(s_1,s_2,\dots ,s_k\in {\mathbb {S}}^{d-1}\) with

$$\begin{aligned} \sigma =\sum ^{k}_{j=1}a_j\delta _{s_{j}}. \end{aligned}$$
(5)

If \(X_{1}, X_{2},\dots , X_{k}\) are independent random variables with \(X_{j} \sim \textrm{STS}_{\alpha }(a_{j},b_{j})\), where \(b_j=b(s_j)\), \({X}=(X_1,X_2,\dots ,X_k)\), \(S=(s_1 s_2\cdots s_k)\), and

$$\begin{aligned} Y=\gamma +X_{1}s_{1}+X_{2}s_{2}+...+X_{k}s_{k} = \gamma + SX, \end{aligned}$$
(6)

then \(Y\sim \mu\).

This theorem fully characterizes all TS distributions, where \(\sigma\) has finite support. In particular, it implies that every one-dimensional TS random variable \(X\sim \textrm{TS}_{\alpha }(\sigma ,b,\gamma )\) can be written in the form

$$\begin{aligned} X {\mathop {=}\limits ^{d}}\gamma +X_1-X_2, \end{aligned}$$
(7)

where \(X_1\sim \textrm{STS}_{\alpha }(\sigma (\{1\}),b(1))\) and \(X_2\sim \textrm{STS}_{\alpha }(\sigma (\{-1\}),b(-1))\) are independent.

Theorem 2 gives a simple method for simulating TS random variables with spectral measures that have finite support. In light of (6), we just need to simulate the appropriate STS random variables, which can be done as discussed just below (4). We note that even if \(\sigma\) has infinite support, this method can be used for simulation, although, in this case, it is only approximate. This is justified by the fact that every TS distribution can be approximated arbitrarily well by one where \(\sigma\) has finite support, see Xia and Grabchak (2022) for details.

Risk-neutral measures and general methodology

Assume that the returns from a basket of assets jointly follow a multivariate TS distribution and that their evolution in time follows a TS Lévy process. Let \(r>0\) be the risk-free interest rate and, for \(i=1,2,\dots ,d\), let \(q_i\in [0,r)\) be the dividend rate for the ith asset. In order to price options on this basket, we must find an equivalent risk-neutral measure. We begin by carefully defining the underlying probability space.

Let \(\varOmega =D([0,\infty ),{\mathbb {R}}^d)\) be the space of càdlàg functions from \([0,\infty )\) into \({\mathbb {R}}^d\). For every \(\omega \in \varOmega\) and \(t\in [0,\infty )\), let \(R_t:\varOmega \mapsto {\mathbb {R}}^d\) by \(R_t(\omega ) =\omega (t)\). We call \(R=\{R_t:t\ge 0\}\) the canonical process. For \(t\ge 0\), we often write \({R_t}=(R_{1,t},R_{2,t},\dots ,R_{d,t})\). Let \({\mathscr {F}} = \sigma (R_t:t\ge 0)\), let \(\mathscr {F}_t=\bigcap _{s>t}\sigma (R_u:u\in [0,s])\) be the right-continuous natural filtration, and consider the space \((\varOmega ,{\mathscr {F}})\). For every TS distribution \(\mu\) on \({\mathbb {R}}^d\), there exists a probability measure \({\mathbb {P}}_\mu\) on \((\varOmega ,{\mathscr {F}})\) such that, under \({\mathbb {P}}_\mu\), R is a Lévy process with \(R_1\sim \mu\). For details see Section 33 in Sato (1999), Chapter 9 in Cont and Tankov (2004), or Section 4 in Rosiński (2007).

Let \(S_0=(S_{1,0},S_{2,0},\dots ,S_{d,0})\in {\mathbb {R}}^d\) be a (deterministic) vector of positive numbers and for \(t\ge 0\) set

$$\begin{aligned} {S_t }= (S_{1,t},S_{2,t},\dots ,S_{d,t}) = (S_{1,0} e^{R_{1,t}},S_{2,0} e^{R_{2,t}},\dots ,S_{d,0} e^{R_{d,t}}). \end{aligned}$$
(8)

Here \(S_0\) represents the (known) vector of prices at time 0 and, for \(t>0\), \(S_t\) and \(R_t\) represent, respectively, the (random) vectors of prices and (log) returns at time t.

Let \({\mathbb {P}}\) be a probability measure on \((\varOmega ,\mathscr {F})\) that governs the dynamics of the prices in the real-world. This is the physical or market measure. We assume that, under \({\mathbb {P}}\), the process R is a TS Lévy process with \(R_{1}\sim \textrm{TS}_{\alpha }(\sigma ,b,\gamma )\). We can perform arbitrage-free option pricing so long as there exists a probability measure \({\mathbb {Q}}\) on \((\varOmega ,\mathscr {F})\) that is equivalent to \({\mathbb {P}}\) and satisfies

$$\begin{aligned} e^{-(r-q_i)t}S_{i,t}=\mathrm E_{{\mathbb {Q}}}[e^{-(r-q_i)u}S_{i,u}|\mathscr {F}_t], \ \ \ \ 0\le t\le u,\ i=1,2,\dots ,d, \end{aligned}$$
(9)

where \(\mathrm E_{{\mathbb {Q}}}\) is the expectation under \({\mathbb {Q}}\). In this case we call \({\mathbb {Q}}\) the equivalent risk-neutral or martingale measure.

In general there may be many risk-neutral measures that are equivalent to \({\mathbb {P}}\). We look for those arising from multivariate Esscher transforms as these preserve the TS structure of the underlying process, i.e., the Esscher transform of a TS Lévy process is still a TS Lévy process, but with a different tempering exponent. Specifically, we consider the class of equivalent measures \({\mathbb {Q}}^{\eta }\), for \(\eta \in {\mathbb {R}}^{d}\), were the Radon-Nikodym derivative process is of the form

$$\begin{aligned} \frac{\mathrm d{\mathbb {Q}}^{\eta }}{\mathrm d{\mathbb {P}}}\bigg |_{\mathscr {F}_{t}}=\frac{e^{\left\langle \eta , R_{t}\right\rangle }}{\mathrm E_{{\mathbb {P}}}[e^{\left\langle \eta , R_{t}\right\rangle }]}, \ \ t\ge 0. \end{aligned}$$
(10)

For more on finding risk-neutral measures using multivariate Esscher transforms, see (Gerber and Shiu 1994) or (Tankov 2010). We now characterize when \({\mathbb {Q}}^{\eta }\) is risk-neutral. Recall that \(r>0\) is the risk-free interest rate and that, for \(i=1,2,\dots ,d\), \(q_i\in [0,r)\) is the dividend rate for the ith asset.

Theorem 3

Assume that, under \({\mathbb {P}}\), \(\{R_{t}: t\ge 0\}\) is a TS Lévy process with \(R_{1}\sim \textrm{TS}_{\alpha }(\sigma , b,\gamma )\) and let \(b_\eta (s) = b(s)-\left\langle s,\eta \right\rangle\). If \(\inf _{s\in {\mathbb {S}}^{d-1}}b_\eta (s)\ge 0\), then under \({\mathbb {Q}}^{\eta }\), \(\{R_{t}: t\ge 0\}\) is a TS Lévy process with \(R_{1}\sim \textrm{TS}_{\alpha }(\sigma , b_\eta ,\gamma )\). If \(\inf _{s\in {\mathbb {S}}^{d-1}}b_\eta (s) \ge 1\), then \({\mathbb {Q}}^{\eta }\) is risk-neutral if and only if

$$\begin{aligned} r-q_j=\gamma _{j}+\int _{{\mathbb {S}}^{d-1}}\Gamma (-\alpha ) [(b_\eta (s)-s_{j})^{\alpha }-b_\eta (s)^{\alpha }]\sigma (\mathrm ds),\ \ j=1,2,\cdots ,d, \end{aligned}$$
(11)

where \(s_j\) and \(\gamma _j\) are the jth components of s and \(\gamma\), respectively.

Proof

The fact that under \({\mathbb {Q}}^{\eta }\) we still have a Lévy process follows from Theorem 33.2 in Sato (1999). We now show that \(R_{1}\sim \textrm{TS}_{\alpha }(\sigma , b_\eta (s),\gamma )\) under \({\mathbb {Q}}^{\eta }\). Theorem 25.17 in Sato (1999) implies that the expectation in the denominator of (10) is finite when \(\inf _{s\in {\mathbb {S}}^{d-1}}b_\eta (s)\ge 0\) and that this expectation can be evaluated by formal substitution of \(z=-i\eta\) in (1). Similarly, for \(z\in {\mathbb {R}}^d\), formal substitution of \(z-i\eta\) in (1) gives

$$\begin{aligned} \mathrm E_{{\mathbb {Q}}^{\eta }}(e^{i\left\langle z,R_{t}\right\rangle })&=\mathrm E_{{\mathbb {P}}}\left( e^{i\left\langle z,R_{t}\right\rangle } \frac{e^{\left\langle \eta , R_{t}\right\rangle } }{\mathrm E_{{\mathbb {P}}} (e^{\left\langle \eta ,R_{t}\right\rangle })}\right) \\&=\exp \bigg \lbrace t\left[ \left\langle \gamma , iz+\eta \right\rangle +\int _{{\mathbb {S}}^{d-1}}\int _{0}^{\infty }(e^{\left\langle s,iz+\eta \right\rangle x}-1) \frac{e^{-b(s)x}}{x^{1+\alpha }}\mathrm dx\sigma (\mathrm ds)\right. \\&\left. -\left\langle \gamma , \eta \right\rangle -\int _{{\mathbb {S}}^{d-1}} \int _{0}^{\infty }(e^{\left\langle s,\eta \right\rangle x}-1)\frac{e^{-b(s)x}}{x^{1+\alpha }} \mathrm dx\sigma (\mathrm ds) \right] \bigg \rbrace \\&=\exp \bigg \lbrace t \left[ i \left\langle \gamma ,z \right\rangle +\int _{{\mathbb {S}}^{d-1}}\int _{0}^{\infty }(e^{i\left\langle s,z\right\rangle x }-1) \frac{e^{-(b(s)-\left\langle s,\eta \right\rangle )x}}{x^{1+\alpha }}\mathrm dx\sigma (\mathrm ds)\right] \bigg \rbrace . \end{aligned}$$

Next, note that when \(\inf _{s\in {\mathbb {S}}^{d-1}}b_\eta (s) \ge 1\) and \(0\le t<u\) we have

$$\begin{aligned} \mathrm E_{{\mathbb {Q}}^{\eta }}(e^{-(r-q_j)u}S_{j,u}\arrowvert \mathscr {F}_{t})&=e^{-(r-q_j)u}S_{j,0}\mathrm E_{{\mathbb {Q}}^{\eta }}(e^{R_{j,u}}\arrowvert \mathscr {F}_{t})\\&=e^{-(r-q_j)u}S_{j,0}\mathrm E_{{\mathbb {Q}}^{\eta }}(e^{R_{j,u}-R_{j,t}+R_{j,t}}\arrowvert \mathscr {F}_{t})\\&=e^{-(r-q_j)u}S_{j,0}e^{R_{j,t}}\mathrm E_{{\mathbb {Q}}^{\eta }}(e^{R_{j,u}-R_{j,t}}\arrowvert \mathscr {F}_{t})\\&=e^{-(r-q_j)u}S_{j,t}\mathrm E_{{\mathbb {Q}}^{\eta }}(e^{R_{j,u}-R_{j,t}})=e^{-(r-q_j)u}S_{j,t} \mathrm E_{{\mathbb {Q}}^{\eta }}(e^{R_{j,u-t}}), \end{aligned}$$

where the last line follows by the fact that Lévy processes have independent and stationary increments. For \({\mathbb {Q}}^{\eta }\) to be risk-neutral, we need (9) to hold, which is equivalent to \(\mathrm E(e^{R_{j,u-t}})=e^{(r-q_j)(u-t)}\). Since

$$\begin{aligned} \mathrm E(e^{R_{j,u-t}})&=\exp \left[ (u-t)\big (\gamma _{j}+\int _{{\mathbb {S}}_{d-1}}\Gamma (-\alpha )((b(s)-\left\langle s,\eta \right\rangle ) -s_{j})^{\alpha }\right. \\&\left. \ \ \ -(b(s)-\left\langle s,\eta \right\rangle )^{\alpha }\big )\sigma (\mathrm ds) \right] , \end{aligned}$$

we have \(\mathrm E(e^{R_{j,u-t}})=e^{(r-q_j)(u-t)}\) if and only if

$$\begin{aligned} r-q_j=\gamma _{j}+\int _{{\mathbb {S}}^{d-1}}\Gamma (-\alpha )[((b(s)-\left\langle s,\eta \right\rangle )-s_{j})^{\alpha }-(b(s)-\left\langle s,\eta \right\rangle )^{\alpha }]\sigma (\mathrm ds), \end{aligned}$$

as required. \(\square\)

In practice, one must calibrate the parameters of the distribution \(\mu =\textrm{TS}_{\alpha }(\sigma ,b,\gamma )\). Since, under the physical measure \({\mathbb {P}}\), R is a Lévy process with \(R_1\sim \mu\) and Lévy processes have independent and stationary increments, it follows that \(R_{t}-R_{t-1}, R_{t-1}-R_{t-2},\dots ,R_{1} {\mathop {\sim }\limits ^{\textrm{iid}}}\mu\) whenever t is a positive integer. Thus, the sequence of log-returns over one time period forms a random sample from \(\mu\), which can then be used to estimate the parameters using, e.g. the method of maximum likelihood or the method of moments. Since there is no closed form for the joint pdf of a multivariate TS distribution, we instead use the method of characteristic functions discussed in Xia and Grabchak (2022). The details, in the context of the diagonal TS model, are discussed in Sect. 5 below.

Once we have estimates \({\hat{\alpha }}\), \({\hat{\gamma }}\), \({\hat{b}}\), and \({\hat{\sigma }}\) of the parameters under the physical measure \({\mathbb {P}}\), our next step is to find the distribution under a risk-neutral measure \({\mathbb {Q}}\). Toward this end, we must find a vector \(\eta \in {\mathbb {R}}^d\) such that (11) holds. Numerically, we accomplish this by finding \({\hat{\eta }}\) that satisfies

$$\begin{aligned} {\hat{\eta }}=\mathop {\textrm{argmin}}_{\eta \in {\mathbb {R}}^d}\sum _{j=1}^d \bigg | r-q_j-\bigg ( {\hat{\gamma }}_{j}+\Gamma (-{\hat{\alpha }})\int _{{\mathbb {S}}^{d-1}}[({\hat{b}}_\eta (s)-s_{j})^{{\hat{\alpha }}} -{\hat{b}}_\eta (s)^{{\hat{\alpha }}}]{\hat{\sigma }}(\mathrm ds)\bigg )\bigg |, \end{aligned}$$
(12)

where \({\hat{b}}_\eta (s) = {\hat{b}}(s)-\langle s,\eta \rangle\) and \(s_j\) and \(\gamma _j\) are the jth coordinates of s and \(\gamma\), respectively. So long as the value of the objective function at \({\hat{\eta }}\) is close to 0 and \({\hat{b}}_{{\hat{\eta }}}(s)\ge 1\), we have an equivalent risk-neutral measure \({\mathbb {Q}}={\mathbb {Q}}^{{\hat{\eta }}}\) under which \(\{R_t:t\ge 0\}\) is a Lévy process with \(R_1\sim \textrm{TS}_{{\hat{\alpha }}}({\hat{\sigma }},{\hat{b}}_{{\hat{\eta }}},{\hat{\gamma }})\). This can then be used to price options.

For simplicity, we focus on European style options. Consider a European option, whose payoff at time T is \(H(S_T)\), where \(S_T\) is given in terms of \(R_T\) by (8). In this case, at time 0, an arbitrage-free price of the option is given by

$$\begin{aligned} \pi =e^{-rT}{\mathbb {E}}_{{\mathbb {Q}}}[H(S_T)]. \end{aligned}$$
(13)

Some important payoff functions are:

$$\begin{aligned} H(S_T)= & {} \big (S_{i,T} - K\big )^+ \end{aligned}$$
(14)
$$\begin{aligned} H(S_T)= & {} \left( \min _{i}S_{i,T}-K\right) ^+ \end{aligned}$$
(15)
$$\begin{aligned} H(S_T)= & {} \left( \max _{i}S_{i,T}-K\right) ^+. \end{aligned}$$
(16)

All three are call options with maturity T and strike price K. The first is on the ith asset, the second is on the cheapest asset, and the third is on the most expensive asset. While the first can be evaluated using only univariate modeling, the rest require working with the joint distribution of returns on all assets. We call options with payoff function (15) a call on min and those with the payoff function (16) a call on max. To evaluate (13), we can use Monte-Carlo methods. The algorithm to do this is as follows.

Algorithm 1. Given: N is a large integer denoting the number of replications, \({\text{TS}}_{\hat{\alpha }}({\hat{\sigma }},\,{\hat{b}}_{{\hat{\eta }}},\,{\hat{\gamma }})\) is the calibrated distribution over one time period under the risk-neutral measure, \(S_0=(S_{1,0},S_{2,0},\dots ,S_{d,0})\) is the vector of spot prices, T is the time to expiration, and H is the payoff function.

  1. 1.

    Set \(\pi =0\).

  2. 2.

    Repeat N times:

    1. a.

      Simulate \({R}_T=(R_{1,T},R_{2,T},\dots ,R_{d,T})\) from \({\text{TS}}_{{\hat{\alpha}}} (T{\hat{\sigma}},\,{\hat{b}}_{{\hat{\eta }}} ,\,T{\hat{\gamma}})\)

    2. b.

      For \(j=1,2,\dots ,d\), set \(S_{j,T} = S_{j,0}e^{R_{j,T}}\)

    3. c.

      Set \({S_T} = (S_{1,T},S_{2,T},\dots ,S_{d,T})\)

    4. d.

      Set \(\pi = \pi + H(S_T)\)

  3. 3.

    Return \(e^{-rT}\ \pi /N\).

The larger the value of N, the better this algorithm works. Note that we do not need to simulate a path of the process. We just need to simulate from the process at time T. Thus, to implement the algorithm we just need a way to simulate from the distribution \({\text{TS}}_{{\hat{\alpha}}}(T{\hat{\sigma }},{\hat{b}}_{{\hat{\eta }}},T{\hat{\gamma }})\). If \({\hat{\sigma }}\) has a finite support, simulation can be done using Theorem 2 and the discussion just below it. If the support of \({\hat{\sigma }}\) is infinite, there are a number of approaches to simulation. For instance we can approximate the distribution by the case where the support is finite, see Xia and Grabchak (2022). Other approaches can be found in Rosiński (2007), Grabchak (2019), and the references therein.

The diagonal model

In the previous section, we introduced a general approach for pricing multi-asset options using TS Lévy processes. However, it is difficult to calibrate the general model as it depends on two infinite dimensional parameters: the function b and the measure \(\sigma\). For the model to be tractable, we must make these finite dimensional. In Xia and Grabchak (2022) this is accomplished by approximating \(\sigma\) with a spectral measure that has finite support. However, that approach requires very many masses, which leads to a huge number of parameters and essentially fails for more that two dimensions. In this section we introduce a parsimonious TS model that works well even in higher dimensions. We call it the diagonal TS model. For simplicity, we discuss everything in the context of returns over one time period and we denote the return over this time period by \(R=(R_1,R_2\dots ,R_d)\) instead of the more cumbersome \(R_1=(R_{1,1},R_{2,1}\dots ,R_{d,1})\).

The diagonal model was first proposed, in the context of the normal distribution, by Sharpe (1963), see also Jorion (2007) for a discussion. The main idea is to assume that the common movements of all assets are due to a common factor represented by the market. Formally, we assume that there are d assets and that the return on the ith asset is given by

$$\begin{aligned} R_{i}=\gamma _i+\beta _iR_{m}+\varepsilon _i, \ \ i=1,2,\dots ,d, \end{aligned}$$
(17)

where \(\gamma _i,\beta _i\in {\mathbb {R}}\) for each \(i=1,2,\dots ,d\). Here \(R_{m}\) is the return on the market, \(\beta _i\) is the responsiveness to the market return, \(\gamma _i\) is an overall shift, and \(\varepsilon _i\) is the residual return. It is typically assumed that \(R_m,\varepsilon _1,\varepsilon _2,\dots ,\varepsilon _d\) are independent random variables. In the classical setting these are further assumed to have normal distributions. To allow for skewness and to ensure more realistic tails, we, instead, assume that they have one-dimensional TS distributions. In light of (7), this means that each of them can be written as the difference of two independent STS random variables, which leads to the following model.

Let \(X_1,X_2,\dots ,X_{2d+2}\) be independent STS random variables such that \(X_i\sim \textrm{STS}_\alpha (a_i,b_i)\) for \(i=1,2,\dots ,2d+2\). Note that the parameter \(\alpha\) is the same for all i. The model in (17) is then given by

$$\begin{aligned} R_1= & {} \gamma _1+(X_1-X_2)+\beta _1 (X_{2d+1}-X_{2d+2})\\ R_2= & {} \gamma _2+(X_3-X_4)+\beta _2(X_{2d+1}-X_{2d+2})\\{} & {} :\ \ \ \ \ \ :\ :\ \ \ \ \ \ :\ \ \ \ \ \ :\ \ \ \ \ :\ \ \ \ \ \ \ :\ \ \ \ \ \ :\\ R_d= & {} \gamma _d+(X_{2d-1}-X_{2d})+\beta _d(X_{2d+1}-X_{2d+2}). \end{aligned}$$

In matrix notation, if \({R} = (R_1,R_2,\dots ,R_d)\), \({X}=(X_1,X_2,\dots ,X_{2d+2})\), \(\gamma =(\gamma _1,\gamma _2,\dots ,\gamma _d)\), and \(\beta =(\beta _1,\dots ,\beta _d)\) then

$$\begin{aligned} {R}= \gamma +S{X}, \end{aligned}$$
(18)

where

$$\begin{aligned} S= (s_1\ s_2\ \dots \ s_{2d+2}) = \begin{pmatrix} 1 &{} -1 &{}\dots &{}0&{}0&{}\beta _1&{}-\beta _1\\ 0 &{} 0 &{}\dots &{}0&{}0&{}\beta _2&{}-\beta _2\\ :&{} :&{}\ddots &{}:&{}:&{}:&{}:\\ 0 &{} 0&{} \cdots &{}1&{}-1&{}\beta _d&{}-\beta _d \end{pmatrix}. \end{aligned}$$
(19)

Without loss of generality and to ensure identifiability, we assume that \(\beta \in {\mathbb {S}}^{d-1}\), i.e., that \(|\beta |=1\), since, otherwise, we can incorporate \(|\beta |\) into the distributions of \(X_{2d+1}\) and \(X_{2d+2}\). It follows that \({R}\sim \textrm{TS}_\alpha (\sigma ,b,\gamma )\), where \(\sigma\) is given by (5) and \(b:{\mathbb {S}}^{d-1}\mapsto (0,\infty )\) with \(b(s_i)=b_i\) for \(i=1,2,\dots ,d\) and \(b(s)=1\) for any s that is not a column of matrix S.

We denote the diagonal TS model by \(\textrm{DiTS}_\alpha (a,b,\beta ,\gamma )\), where \(a=(a_1,a_2,\) \(\dots ,a_{2d+2})\), \(b=(b_1,b_2,\dots ,b_{2d+2})\), \(\beta = (\beta _1,\beta _2,\dots ,\beta _d)\), and \(\gamma =(\gamma _1,\gamma _2,\dots ,\gamma _{d})\). Since \(|\beta |=1\), we only need to estimate \(d-1\) parameters to estimate \(\beta\), and, of course, we need one parameter to estimate \(\alpha\). Thus, in total, we must estimate \(6d+4\) parameters. In comparison, the approach used in Xia and Grabchak (2022) is only feasible in \(d=2\) dimensions and assumes that the spectral measure has masses in k equally spaced directions, where k is a tuning parameter. The number of parameters needed in that model is \(2k+d+1\). In practice, when modeling returns, it seems that k needs to be large and Xia and Grabchak (2022) took \(k=70\), which led to 143 parameters. In comparison, in the diagonal model, we only need 16 parameters in the two-dimensional case. Thus, the diagonal model is much more parsimonious. However, it requires us to estimate a direction vector \(\beta\), whereas in the method of Xia and Grabchak (2022) all directions are prespecified.

To work with \(\beta\) we need a parametrization of the vectors in \({\mathbb {S}}^{d-1}\). Every vector \(s\in {\mathbb {S}}^{d-1}\) can be written in spherical coordinates, where it is uniquely determined by \(d-1\) angles, see e.g. Blumenson (1960) for details. We prefer to use the parametrization given in Tashiro (1977), which aims to have the parameters more uniformly traverse the sphere and was used there for the purpose of simulation from a uniform distribution on \({\mathbb {S}}^{d-1}\). For ease of reference, we now give their representation of \(\beta\) for dimensions 2, 3, and 4. In the two-dimensional case, we write \(\beta = (\beta _1,\beta _2)\) in terms of one parameter \(\theta \in [0,1)\) as \(\beta _1=\cos (2\pi \theta )\) and \(\beta _2=\sin (2\pi \theta )\). In the three-dimensional case, we write \(\beta = (\beta _1,\beta _2,\beta _3)\) in terms of two parameters \(w\in [0,1]\), \(\theta \in [0,1)\) as \(\beta _1=w, \ \beta _2=\sqrt{1-w^2}\cos (2\pi \theta )\), \(\beta _3=\sqrt{1-w^2}\sin (2\pi \theta )\). Finally, in the four-dimensional case, we write \(\beta = (\beta _1,\beta _2,\beta _3,\beta _4)\) in terms of three parameters \(w\in [0,1],\theta _1,\theta _2\in [0,1)\) as \(\beta _1= \sqrt{w}\cos (2\pi \theta _1), \ \beta _2=\sqrt{w}\sin (2\pi \theta _1),\ \beta _3=\sqrt{1-w}\cos (2\pi \theta _2), \ \beta _4=\sqrt{1-w}\sin (2\pi \theta _2)\).

We now turn to the problem of parameter estimation. Let \(\Theta\) denote the parameter space of the diagonal TS model. Our estimation method is based on finding the TS distribution that minimizes the distance between its characteristic function and the empirical characteristic function. Let \(x_1,x_2,\dots ,x_n\) be a random sample from the diagonal TS model with d assets. Each \(x_j=(x_{1,j},x_{2,j},\dots ,x_{d,j})\in {\mathbb {R}}^d\) represents a vector of log-returns over one time period. The empirical characteristic function is given by

$$\begin{aligned} \hat{\mu }_{E}(z)=\dfrac{1}{n}\sum _{j=1}^{n}e^{i\langle z,x_{j}\rangle } = \dfrac{1}{n}\sum _{j=1}^{n}\cos (\langle z,x_{j} \rangle )+ \dfrac{i}{n} \sum _{j=1}^{n} \sin (\langle z,x_{j} \rangle ), \ \ z\in {\mathbb {R}}^d. \end{aligned}$$

Next we choose \(z_1,z_2,\dots ,z_m\in {\mathbb {R}}^d\) for some tuning parameter m and estimate the vector of parameters \(\theta\) by

$$\begin{aligned} {\hat{\theta }}= & {} \mathop {\textrm{argmin}}_{\theta \in \Theta } \sum _{\ell =1}^m |{\hat{\mu }}_{E}(z_\ell )-\hat{\mu }_{\theta } (z_\ell )|^2 \nonumber \\= & {} \mathop {\textrm{argmin}}_{\theta \in \Theta } \sum _{\ell =1}^m\left( |\Re \hat{\mu }_{E}(z_\ell )-\Re \hat{\mu }_{\theta } (z_\ell )|^2+ |\Im \hat{\mu }_{E}(z_\ell )-\Im \hat{\mu }_{\theta } (z_\ell )|^2\right) , \ \ \end{aligned}$$
(20)

where \({\hat{\mu }}_{\theta }\) is the characteristic function of a TS distribution with parameter vector \(\theta\). In Xia and Grabchak (2022) it is shown that for any \(z\in {\mathbb {R}}^d\)

$$\begin{aligned} \hat{\mu }_{\theta }(z) = \exp \big \{A(z)\big \} \Big (\cos \big (B(z)\big )+i\sin \big (B(z)\big )\Big ), \end{aligned}$$

where

$$\begin{aligned} A(z)= & {} \Gamma (-\alpha ) \sum _{j=1}^{2d+2} a_j\left( (b_j^2+\langle s_j,z\rangle ^2)^{\alpha /2}\cos \left( \alpha \arctan \left( \frac{\langle s_j,z\rangle }{b_j}\right) \right) -b_j^{\alpha } \right) ,\\ B(z)= & {} \langle \gamma ,z \rangle - \Gamma (-\alpha )\sum _{j=1}^{2d+2} a_j (b_j^2+\langle s_j,z\rangle ^2)^{\alpha /2}\sin \left( \alpha \arctan \left( \frac{\langle s_j,z\rangle }{b_j}\right) \right) , \end{aligned}$$

and \(s_{2d+1} = -s_{2d+2} = \beta\).

In practice, we found that the optimization in (20) works better if the data has first been standardized by scaling and centering. This also reduces the number of parameters over which we need to optimize. The idea is to let \({\bar{x}}_k\) and \({\hat{\nu} _k}\) be the sample mean and sample standard deviation, respectively, of the kth component of the data, for \(k=1,2,\dots ,d\). Next, for \(j=1,2,\dots ,n\), we let \(y_j=(y_{1,j},y_{2,j},\dots ,y_{d,j})\), where

$$\begin{aligned} y_{k,j} = \frac{x_{k,j}-{\bar{x}}_k}{{\hat{\nu} _k}}, \ \ k=1,2,\dots ,d \end{aligned}$$

are the standardized observations. Equivalently,

$$\begin{aligned} y_j = A^{-1}\left( x_j -{\bar{x}}\right) , \ \ j=1,2,\dots ,n, \end{aligned}$$
(21)

where \(A = {\text{diag}}({\hat{\nu}}_{1} ,{\hat{\nu}}_{2} , \ldots ,{\hat{\nu}}_{d} )\) and \({\bar{x}}=({\bar{x}}_1,{\bar{x}}_2,\dots ,{\bar{x}}_d)\). While the observations \(y_1,y_2,\dots ,y_n\) are no longer independent, so long as the sample size n is large, the estimates of the means and standard deviations will have very small variances and thus the deviation from independence will be negligible. We will treat \(y_1,y_2,\dots ,y_n\) as a random sample from the diagonal TS model and use this sample to estimate the parameters. After centering, we can estimate the mean of the standardized observations to be zero. In light of (2), this means that

$$\begin{aligned} \gamma =-\Gamma (1-\alpha )\int _{{\mathbb {S}}^{d-1}}(b(s))^{\alpha -1}s\sigma (\mathrm ds) = -\Gamma (1-\alpha )\sum _{j=1}^{2d+2}a_j (b(s_j))^{\alpha -1}s_j. \end{aligned}$$
(22)

Thus, we do not need to estimate \(\gamma\) in this case. Instead, we just use (22) in place of \(\gamma\) in the formula for B(z). Now we only need to estimate \(5d+4\) parameters. Once we have estimated the parameters for the standardized dataset, we transform them to estimated parameters of the original dataset \(x_1,x_2,\dots ,x_n\) using Theorem 1.

Specifically, assume that the estimated model for the standardized data is \({\text{DiTS}}_{{\hat{\alpha }^{\prime } }} (\hat{a}^{\prime } ,\hat{b}^{\prime } ,\hat{\beta }^{\prime } ,\hat{\gamma }^{\prime } )\), where we find \({\hat{\gamma }}'\) by plugging estimated values of the other parameters into (22). Theorem 1 implies that the estimated model for the original unstandardized data is \({\text{DiTS}}_{{\hat{\alpha }}}({\hat{a}},{\hat{b}},{\hat{\beta }},{\hat{\gamma }})\), where \({\hat{\alpha }}={\hat{\alpha }}'\), \({\hat{\beta }} = \frac{A{\hat{\beta }}'}{|A{\hat{\beta }}'|}\), \({\hat{\gamma }} = A{\hat{\gamma }}'+{\bar{x}}\), and the components of the vectors \({\hat{a}}\) and \({\hat{b}}\) are as follows. For \(i=1,2,\dots ,2d\) we have \({\hat{a}}_i = {\hat{a}}'_{i}{\hat{\nu }}_{i/2}^{{\hat{\alpha }}}\) and \({\hat{b}}_i = {\hat{b}}'_i/{\hat{\nu }}_{i/2}\) if i is even and \({\hat{a}}_i = {\hat{a}}'_{i}{\hat{\nu }}_{(i+1)/2}^{{\hat{\alpha }}}\) and \({\hat{b}}_i = {\hat{b}}'_i/{\hat{\nu }}_{(i+1)/2}\) if i is odd. For \(i=2d+1,2d+2\) we have \({\hat{a}}_i = {\hat{a}}'_i|A {\hat{\beta }}'|^{{\hat{\alpha }}}\) and \({\hat{b}}_i = {\hat{b}}'_i|A^{-1}{\hat{\beta }}| = {\hat{b}}'_i/|A{\hat{\beta }}'|\).

We now turn to the problem of estimating the parameters under the risk-neutral measure. Toward this end, let

$$\begin{aligned} {\hat{S}}= \begin{pmatrix} 1 &{} -1 &{}\dots &{}0&{}0&{}{\hat{\beta }}'_1&{}-{\hat{\beta }}'_1\\ 0 &{} 0 &{}\dots &{}0&{}0&{}{\hat{\beta }}_2'&{}-{\hat{\beta }}_2'\\ :&{} :&{}\ddots &{}:&{}:&{}:&{}:\\ 0 &{} 0&{} \cdots &{}1&{}-1&{}{\hat{\beta }}_d'&{}-{\hat{\beta }}_d' \end{pmatrix}, \end{aligned}$$

let \({\hat{s}}_i\) be the ith column of \({\hat{S}}\), and note that the estimated spectral measure is \({\hat{\sigma }} = \sum _{i=1}^{2d+2} {\hat{a}}_i \delta _{{\hat{s}}_i}\). We can find an equivalent risk-neutral measure by estimating \({\hat{\eta }}\) using (12). Next, for \(i=1,2,\dots ,2d+2\), set \({\hat{b}}_{i,{\hat{\eta }}} = {\hat{b}}_i-\langle {\hat{s}}_i,{\hat{\eta }} \rangle\) and let \({\hat{b}}_{{\hat{\eta }}}=({\hat{b}}_{1,{\hat{\eta }}},\,{\hat{b}}_{2,{\hat{\eta }}},\dots ,{\hat{b}}_{2d+2,{\hat{\eta }}})\). With this notation, \({\text{DiTS}}_{{\hat{\alpha }}}({\hat{a}},\,{\hat{b}}_{{\hat{\eta }}},\,{\hat{\beta }},\,{\hat{\gamma }})\) is the estimated model under the risk-neutral measure.

We can now price options using Algorithm 1. We just need a method for simulation in Step 2a. Specifically, we need to simulate a vector \(R_T\) from what is there denoted by \({\text{TS}}_{\hat{\alpha }}\,(T{\hat{\sigma }},\,{\hat{b}}_{{\hat{\eta }}},\,T{\hat{\gamma }})\) and is now, in our current notation, denoted \({\text{DiTS}}_{{\hat{\alpha }}}\,(T{\hat{a}},\,{\hat{b}}_{{\hat{\eta }}},\,{\hat{\beta }},\,T{\hat{\gamma }})\). From (18) it follows that this simulation can be done by first simulating independent random variables \(X_1,X_2,\dots ,X_{2d+2}\), where \(X_i\sim {\text{STS}}_\alpha (T{\hat{a}}_i,{\hat{b}}_{i,{\hat{\eta }}})\) for \(i=1,2,\dots ,2d+2\) and then taking

$$\begin{aligned} {R}_T = T{\hat{\gamma }}+{\hat{S}}{X}, \end{aligned}$$

where \(X=(X_1,X_2,\dots ,X_{2d+2})\).

Data analysis

In this section we apply our methodology to the modeling of several real-world financial datasets. For each example, we select a basket of stocks from the same sector. All of the data was downloaded from Yahoo! Finance. The simulations were performed on a MacBook Pro M1 Pro with 10-core CPU and 16-core GPU. On this computer, the two-dimensional example took 3.32 minutes, the three-dimensional example took 11.62 minutes, and the four-dimensional example took 33.13 minutes. These times include preprocessing, fitting the data, finding the risk neutral parameters, and performing Monte-Carlo option pricing.

Two-dimensional example

In this example we consider a basket of two stocks: Meta Platforms, Inc. (META) and Alphabet, Inc. (GOOGL). The dataset consists of vectors of log-returns on the closing prices for the period from May 31, 2012 to March 25, 2021. We note that the data for GOOGL has not been adjusted to reflect the 2022 stock split. The first component in the vector corresponds to META and the second to GOOGL for the same day. In total, the data consists of 2218 ordered pairs. We randomly split the data into two halves: training data and testing data, each consisting of 1109 pairs. Next, we check for normality. Figure 1 gives normal qq-plots for each component of the testing data. We also performed an adjusted Jarque-Bera test for normality on each component separately. In both cases, the p-value was less than \(10^{-16}\). Clearly, the normal distribution is not reasonable for either component. Instead, we follow our methodology and fit a bivariate diagonal TS model.

Fig. 1
figure 1

Example \(d=2\). Normal qq-plots for the testing data

To fit the model, we begin by estimating the means and standard deviations of each component separately. We then standardize the training data as in (21). After standardizing the data, we do not need to fit the drift \(\gamma\), thus there are 14 parameters remaining to be fit. For this reason we take \(m=14\) to be the number of \(z_\ell\)’s in (20). We chose these to be evenly spaced on \({\mathbb {S}}^{1}\). Specifically, we take \(z_{\ell }=(\cos \theta _{\ell },\sin \theta _{\ell })\) with \(\theta _{\ell }= 2\pi (\ell -1)/m\), for \(\ell =1,2,\dots ,m\). Next, we fit the parameters using the standardized training data by minimizing the objective function in (20). To perform the optimization we first used Particle Swarm Optimization (Kennedy and Eberhart 1995) as implemented in the hydroPSO function of the “hydroPSO” R package, to get initial values. These were then plugged into the optim function in R with the L-BFGS-B option. After optimization, the value of the objective function was \(9.330 \times 10^{-5}\). Next we applied Theorem 1 to get parameter estimates for the original unstandardized data. We found the estimated distribution to be \({\text{DiTS}}_{\hat{\alpha }}\,({\hat{a}},\,{\hat{b}},\,{\hat{\beta }},\,{\hat{\gamma }})\) with \(\hat{\alpha } = 0.014\), \({\hat{a}}=(1.560,\ 1.250,\ 0.434,\ 0.000,\ 0.461,\ 0.713)\), \({\hat{b}}=(105.465,\ 118.740,\ 102.327,\ 109.843,\ 43.709,\ 40.720)\), \({\hat{\beta }} = (0.485,\ 0.515)\), and \({\hat{\gamma }}=(-9.470 \times 10^{-5},\ 3.816*10^{-5})\).

Next, we check the goodness-of-fit. Since it is computationally intractable to evaluate the cdfs and pdfs of bivariate TS distributions, we use simulation-based approaches. Toward this end, we began by simulating 1109 observations from the fitted (i.e., estimated) model. We refer to this as the fitted data. It has the same number of observations as the testing data. In Fig. 2 we give qq-plots comparing the quantiles of the testing and fitted data. These suggest that the diagonal TS model provides a much better fit than the normal distribution. We also performed formal goodness-of-fit testing. First, we performed Kolmogorov–Smirnov (KS) tests comparing each component of the testing and fitted data, separately. The p-values for the first and second components are 0.499 and 0.210, respectively. Next, we tested both components together, using the kernel consistent density equality test, which was introduced in Li et al. (2009) and is implemented in the npdeneqtest function of the R package ‘np’. Here, the p-value is 0.788. These results suggest that the estimated diagonal TS model is reasonable for this data.

Fig. 2
figure 2

Example \(d=2\). We give qq-plots comparing the testing and the fitted data. The x-axis is quantiles of test data and y-axis is quantiles of fitted data. The solid line is \(y=x\)

We now turn to the problem of finding a risk-neutral measure and pricing options. We measure time in trading days and, for concreteness, we assume that time \(t=0\) corresponds to when the market closed on Oct 22, 2021. Since META and GOOGL are both non-dividend paying stocks, we take the dividend rate to be 0 in both cases. For the interest rate, we use the yield on the 13-week treasury bill to be the annualized interest rate. On Oct 22, 2021 the yield was \(\$0.05\). Transforming it to a daily interest rate gives \(r=0.05/252=1.984\times 10^{-4}\). Next, we use (12) to calibrate parameter \(\eta\). The optimization is performed using hydroPSO and we find \(\hat{\eta }=(-2.546, -2.488)\). At this value the objective function is \(2.229*10^{-15}\), which is very close to 0. It follows that, under the risk-neutral measure, the distribution of the log-return over one day is \({\text{DiTS}}_{{\hat{\alpha }}}\,({\hat{a}},\,{\hat{b}}_{{\hat{\eta }}},\,{\hat{\beta }},\,{\hat{\gamma }})\) with \({\hat{b}}_{{\hat{\eta }}}=(108.012,\ 116.194,\ 104.814,\ 107.355,\ 46.225, \ 38.204)\). Since we are assuming that the dynamics of the log-return follow a Lévy process, the log-return over a period of T days has distribution \({\text{DiTS}}_{{\hat{\alpha }}} (T{\hat{a}},\,{\hat{b}}_{{\hat{\eta }}} ,\,{\hat{\beta }},\,T{\hat{\gamma}})\) under the risk-neutral measure.

We can now use Monte-Carlo methods to price options. For simplicity, we focus on European-style options. In this case, we just need to evaluate (13) with the appropriate pay-off function. We do this using Algorithm 1 with \(N=5000\) replications. For all options considered in this section, we take the time to expiration to be \(T=5\) trading days, which (ignoring weekends) corresponds to expiration on Oct 29, 2021.

To get a baseline for the performance of our method, we begin by considering options based on just one asset. Such options are traded on standard exchanges and price data is readily available. In Table 1, we price four European call options, two for each company, and compare with the market prices. In the column labeled ‘\(\textrm{DiTS}\)’ we give the estimated price under our model and, for a comparison, in the column labeled ‘BS’ we give the estimated price under the classical Black–Scholes model. In both cases, we compare the estimated price with the market price using the relative error, which is given by

$$\begin{aligned} \text {Rel Error} = \frac{ \text {Estimated Price} - \text {Market Price} }{\text {Market Price} }\times 100\%. \end{aligned}$$

We can see that the relative error is always smaller for our model than for Black–Scholes. This is not surprising, as the Black–Scholes model assumes that returns follow a normal distribution, which is an assumption that we already rejected.

We note that, when comparing estimated prices to market prices, one needs to be careful in how \(S_0\) is selected. Even if the market price is the closing price for the option on a given day, one should not, in general, select \(S_0\) to be the closing price for the stock. This is because stocks tend to be more liquid than options. Thus, while the last trade of the day on a stock is likely to be from very close to when the market closed, the last trade for the option is likely to be from much earlier. As such, to make things comparable, we select \(S_0\) to be the price of the stock at the time when the last trade on the option occurred.

Next, we turn to the pricing of multi-asset options. Specifically, European call on min and call on max options. These are not traded on standard exchanges and price data is limited. For this reason, we do not make a comparison with market prices and we take \(S_0\) to be the closing prices of the two stocks. The results for 21 different strike prices are given in Tables 2 and 3.

Table 1 Example \(d=2\). Comparison of option price from \(\textrm{DiTS}\) and BS models with market prices. Here, ‘Rel Error’ refers to relative error. It is positive when we overestimate and negative when we underestimate
Table 2 Example \(d=2\). Multi-asset call on min option prices for several values of the strike price K
Table 3 Example \(d=2\). Multi-asset call on max option prices for several values of the strike price K
Fig. 3
figure 3

Convergence in the Monte-Carlo pricing. The x-axis is the number of replications and y-axis is the resulting option price

Throughout this paper, we use \(N=5000\) replications for Monte-Carlo option pricing. We now show that this is large enough to get convergence. In Fig. 3 we plot the prices of call options on META and GOOGL using n Monte-Carlo replications as n ranges from 1 to 5000. We can see that, in both cases, the option price seems to converge by the time we get to \(n=1500\) and are quite stable by \(n=5000\). We performed similar experiments with the assets used in the three and four dimensional cases and obtained similar results.

Three-dimensional example

We now consider a basket of three stocks: Netflix, Inc. (NFLX), DISH Network Corporation (DISH), and Charter Communications, Inc. (CHTR). All three are non-dividend paying stocks. The data consists of vectors of log-returns of closing prices from June 30, 2011 to June 30, 2021 and contains 2516 ordered triplets. We randomly split the data into training and testing datasets, each consisting of 1258 data points. Figure 4 gives normal qq-plots for each component of the testing data. Further, an adjusted Jarque-Bera test was performed on each component of the testing data. In all cases the p-value was less than \(10^{-16}\). Clearly, the normal distribution is not reasonable for any component. Instead, we follow our methodology and fit a three-dimensional diagonal TS model.

Fig. 4
figure 4

Example \(d=3\). Normal qq-plots for the testing data

The approach is similar to what we did in the two-dimensional case. This time, after standardizing the data, we must fit 19 parameters and we take \(m=19\) to be the number of \(z_\ell\)’s in (20). Specifically, we take \(z_{\ell }=(w_\ell , \sqrt{1-w_\ell ^2}\cos \theta _\ell ,\sqrt{1-w_\ell ^2}\sin \theta _\ell )\), where \(\theta _\ell = 2\pi (\ell -1)/m\) and \(w_\ell = \ell /m\) for \(\ell =1,2,\dots ,m\). After optimization, the value of the objective function was \(2.521\times 10^{-4}\). We again apply Theorem 1 to get the parameter values of the original, untransformed data. We found the estimated distribution to be \({\text{DiTS}}_{{\hat{\alpha }}} (\hat{a},\,\hat{b},\,\hat{\beta },\,\hat{\gamma })\) with \(\hat{\alpha }=0.032\), \({\hat{a}}=(1.916, \ 0.584, \ 0.396, \ 0.078, \ 0.690,\) \(1.656, \ 2.691,\ 2.567)\), \({\hat{b}}=(122.884, \ 67.043, \ 66.179, \ 36.556, \ 46.974, \ 76.662,\) \(91.611,\ 78.544)\), \({\hat{\gamma }}=(-0.007, \ -0.002, \ 0.010)\), and \({\hat{\beta }} = (0.372, 0.382,0.246)\).

To check the goodness-of-fit, we simulated 1258 observations from the fitted model. We refer to this as the fitted data. In Fig. 5 we give qq-plots comparing the quantiles of the testing and the fitted data. While not perfect, there is a clear improvement over the normal model. Next, we perform KS tests for each component separately. The p-values range from 0.138 to 0.715. Finally, we test all of the components together using the kernel consistent density equality test and obtain a p-value of 0.667. The results suggest that the estimated diagonal TS model is reasonable for this data.

Fig. 5
figure 5

Example \(d=3\). We give qq-plots comparing the testing and the fitted data. The x-axis is quantiles of test data and y-axis is quantiles of fitted data. The solid line is \(y=x\)

Next, we turn to option pricing. We take Nov 18, 2021 (at closing) to Nov 26, 2021 to be the period of the option. Thus, the time to maturity is \(T=6\). We again use the yield on the 13-week treasury bill as the annualized interest rate. Its closing price on Nov 18, 2021 was \(\$0.045\), which leads to a daily interest rate of \(r=0.045/252=1.786\times 10^{-4}\). Using (12), we calibrate parameter \(\eta\) and get \(\hat{\eta }=(-30.434,\ -13.327,\ 23.153)\). The value of the objective function is \(6.396*10^{-16}\), which is very close to 0. It follows that, under the risk-neutral measure, the distribution of the log-return over one day is \({\text{DiTS}}_{{\hat{\alpha }}} ({\hat{a}},\,{\hat{b}}_{{\hat{\eta }}} ,\,{\hat{\beta}},\,{\hat{\gamma} })\) with \(\hat{b}_{{\hat{\eta }}}=(153.318, \ 36.608, \ 79.506, \ 23.230,\ 23.821,\) 99.815,  \(102.335, \ 67.820)\).

Table 4 Example \(d=3\). Comparison of option price from \(\textrm{DiTS}\) and BS models with market prices. Here, ‘Rel Error’ refers to relative error. It is positive when we overestimate and negative when we underestimate

For Monte-Carlo option pricing we again use Algorithm 1 with \(N=5000\) replications and for a baseline, we again begin with options on a single asset. Table 4 compares our prices with the market price and with the price from the Black–Scholes model. Our model again outperforms Black–Scholes. Turning to multi-asset options, Tables 5 and 6 give prices for European call on min and call on max options for several choices of the strike prices.

Table 5 Example \(d=3\). Multi-asset call on min option prices for several values of the strike price K
Table 6 Example \(d=3\). Multi-asset call on max option prices for several values of the strike price K

Four-dimensional example

In this example we consider a basket of four stocks: Advanced Micro Devices, Inc. (AMD), Fiserv, Inc. (FISV), Micron Technology, Inc. (MU), and Autodesk, Inc. (ADSK). The data consists of vectors of log-returns from June 30, 2011 to June 30, 2021 and contains 2516 ordered quadruples. We randomly split the data into training and testing datasets, each consisting of 1258 data points. To check normality, qq-plots for each component of the testing data are given in Fig. 6. Further, an adjusted Jarque-Bera test was performed on each component of the testing data separately, with each resulting in a p-value less than \(10^{-16}\). Clearly, the normal distribution is not reasonable for either component. Instead, we follow our methodology and fit a four-dimensional diagonal TS model.

After standardizing the data, we must fit 24 parameters. Thus, we took \(m=24\) to be the number of \(z_\ell\)’s in (20). We take \(z_{\ell }=(\sqrt{w_\ell }\cos \theta _{\ell }, \sqrt{w_\ell }\sin \theta _{\ell }\), \(\sqrt{1-w_\ell }\cos \theta '_{\ell }\), \(\sqrt{1-w_\ell }\sin \theta '_{\ell })\), where \(\theta _{\ell }= \dfrac{2\pi (\ell -1)}{m}\), \(\theta '_{\ell }= \dfrac{\pi (2\ell -1)}{m}\), and \(w_\ell =\ell /m\) for \(\ell =1,2,\dots ,m\). After optimization, the value of the objective function was \(2.717\times 10^{-3}\). Applying Theorem 1, we get parameter estimates for the original, untransformed data. We found the estimated distribution to be \({\text{DiTS}}_{{\hat{\alpha }}} ({\hat{a}},\,{\hat{b}},\,{\hat{\beta}},\,{\hat{\gamma}})\) with \(\hat{\alpha }=0.0103\), \({\hat{a}}=(0.962, \ 1.095,\) \(\ 0.794, \ 1.001, \ 1.853,\) 1.618,  \(0.815, \ 0.314,\ 1.854, \ 0.937)\); \({\hat{b}}=(46.993,\ 66.275,\) \(122.963,\ 122.520,\) \(73.792,\ 93.638, \ 72.149, \ 54.961, \ 29.384,\ 30.528 )\), \({\hat{\beta }} = (-0.322,\) \(-0.134,\) \(-0.324,\) \(-0.219)\), and \({\hat{\gamma }}=(0.008,\ 0.007,\ 0.004,\) 0.003).

Fig. 6
figure 6

Example \(d=4\). Normal qq-plots for the testing data

Next, we turn to goodness-of-fit testing. We begin by simulating 1258 observations from the fitted model, which we again refer to as the fitted data. In Fig. 7, we give qq-plots comparing the quantiles of the testing and the fitted data. We can see an improvement over the normal model. Next, we performed KS tests for each component and found that the p-values ranged from 0.165 to 0.840. We also performed a kernel consistent density equality test, which gave a p-value of 0.667. The results suggest that the estimated diagonal TS model is reasonable for this data.

Fig. 7
figure 7

Example \(d=4\). We give qq-plots comparing the testing and the fitted data. The x-axis is quantiles of test data and y-axis is quantiles of fitted data. The solid line is \(y=x\)

For our options, we consider the period from May 3, 2022 to May 13, 2022. Thus, (ignoring weekends) the time to maturity is \(T=8\). We again use the yield on the 13-week treasury bill as the annualized interest rate. Its closing price on May 3, 2022 was \(\$0.898\), which leads to a daily interest rate of \(r=0.898/252=3.563\times 10^{-4}\). The only dividend paying stock is MU, which had a dividend rate of \(1.671 \times 10^{-5}\). Using (12) we calibrate parameter the \(\eta\) and get \(\hat{\eta }=(7.152, 53.246, -5.530, -11.476)\). The value of the objective function is \(4.268\times 10^{-15}\), which is again very close to 0. It follows that, under the risk-neutral measure, the distribution of the log-return over one day is \({\text{DiTS}}_{{\hat{\alpha }}} ({\hat{a}},\,{\hat{b}}_{{\hat{\eta }}} ,\,{\hat{\beta}},\,{\hat{\gamma}})\) with \({\hat{b}}_{{\hat{\eta }}}=(39.841,\ 73.426, \ 69.717, \ 175.767, \ 79.322,\) 88.108,  83.626,  43.485,  34.540,  25.372).

For Monte-Carlo option pricing, we again use Algorithm 1 with \(N=5000\) replications and, for a baseline, we begin with options on a single asset. Table 7 compares our prices with the market price and with the price from the Black–Scholes model. Our model again outperforms Black–Scholes. Turning to multi-asset options, Tables 8 and 9 give prices for European call on min and call on max options for several choices of the strike price.

Table 7 Example \(d=4\). Comparison of option price from \(\textrm{DiTS}\) and BS models with market prices. Here, ‘Rel Error’ refers to relative error. It is positive when we overestimate and negative when we underestimate
Table 8 Example \(d=4\). Multi-asset call on min option prices for several values of the strike price K
Table 9 Example \(d=4\). Multi-asset call on max option prices for several values of the strike price K

Conclusion

In this paper we considered multi-asset option pricing using multivariate TS Lévy processes. These models satisfy many stylized facts about financial returns and are known to be realistic and to provide a good fit to real-world data. There had previously been only limited work in this area, most of which focused on the bivariate case.

We accomplished four main tasks. First, we stated and proved Theorem 3, which gives an approach for finding a risk-neutral measure for essentially any TS Lévy process. Second, we developed the TS diagonal model, where the number of parameters grows only linearly in the dimension. Third, we fit this model to real-world datasets in three, four, and five dimensions. Detailed goodness-of-fit methods where used to show that the model fits well, whereas the more standard normal distributions (i.e. geometric Brownian motion) does not provide a good fit. Fourth, we used our model for option pricing, and we found that it gives prices that are closer to market prices than those provided by the classical Black–Scholes model.

There are several directions for future work. First, it is important to apply the model to more datasets and to verify how well it works. Second, while the current model satisfies many stylized facts about financial returns, it does not model volatility clustering. In future work, we will add this component to the model. This can be done using, e.g, a multivariate GARCH model (Rombouts and Stentoft 2011) or a multivariate stochastic volatility model (Muhle-Karbe et al. 2012). Third, in Theorem 3, in order to find a risk-neutral measure, we need to satisfy the assumption \(\inf _{s\in {\mathbb {S}}^{d-1}}b_\eta (s) \ge 1\). While, in practice, this was satisfied for all of the data that we analyzed and does not seem to be an issue, there may be situations where it is not satisfied. One can remove this assumption by considering tempering functions of the form \(q_p(x,s) = e^{-b(s)x^p}\) for some \(p>1\). See Grabchak (2021) and the references therein for a discussion of TS distributions with such tempering functions. We note that this model presents additional challenges as there is no closed form for its characteristic function, which would need to be evaluated numerically.