Nonparametric estimates of option prices via Hermite basis functions

We consider approximate pricing formulas for European options based on approximating the logarithmic return’s density of the underlying by a linear combination of rescaled Hermite polynomials. The resulting models, that can be seen as perturbations of the classical Black-Scholes one, are nonpararametric in the sense that the distribution of logarithmic returns at fixed times to maturity is only assumed to have a square-integrable density. We extensively investigate the empirical performance, defined in terms of out-of-sample relative pricing error, of this class of approximating models, depending on their order (that is, roughly speaking, the degree of the polynomial expansion) as well as on several ways to calibrate them to observed data. Empirical results suggest that such approximate pricing formulas, when compared with simple nonparametric estimates based on interpolation and extrapolation on the implied volatility curve, perform reasonably well only for options with strike price not too far apart from the strike prices of the observed sample.


Introduction
Our aim is to construct approximate pricing formulas for European options with fixed time to maturity by series expansion of return distributions, to discuss their implementation, and to test their empirical accuracy.The approach is nonparametric in the sense that we do not make any parametric assumption on the distribution of returns.Such distribution is instead approximated by (the integral of) a truncated series of suitably weighted and scaled Hermite polynomials, in such a way that the zeroth-order approximation coincides with the standard Black-Scholes model.
Let (Ω, F , P) be a probability space endowed with a filtration (F t ) t≥0 , on which all random elements will be defined.Let S : Ω×R + → R be the adapted price process of a dividend-paying asset, with adapted dividend process q : Ω × R + → R such that exp t 0 q s ds is bounded for every t ∈ R + , and let Y : Ω × R + → R be the corresponding adapted yield process defined by Denoting by β the adapted continuous strictly positive price process of the riskless cash account, i.e.
with r the risk-free rate, and by S := β −1 S the discounted asset price, we assume that the discounted yield process Y defined by is a martingale.In other words, we assume that P is a martingale measure.The martingale property of Y implies that the process S exp • 0 q s ds is also a martingale (see, e.g., [12] for details).In the Black-Scholes setting, one assumes that there exists a constant σ > 0 such that S t exp t 0 q s ds = S 0 exp σW t − 1 2 σ 2 t for every t ≥ 0, where W is a standard Wiener process.Therefore, denoting by Z a standard Gaussian random variable, one has in law for every t ≥ 0. Assuming for simplicity that q is constant, this implies where φ is the density of the standard Gaussian measure γ on R and g : R → R is any measurable function either positive or such that the above integral is finite.As is well known, this identity yields, as particular cases, the Black-Scholes formula for put and call options 1 , as well as the Black-Scholes PDE (see, e.g., [5]).Empirical evidence suggests that observed option prices are not compatible with such a model, and a vast literature exists about alternative models that offer a better accuracy for pricing purposes.A simple nonparametric approach consists in the following steps: estimate the implied volatility from a set of observed call and put option prices; view the implied volatility surface as a function of (at least) the time to maturity and the strike price, say v : [t 0 , t 1 ] × [k 0 , k 1 ] → R + ; given an option with strike price k and time to maturity t, obtain an estimate v(t, k) of the corresponding volatility by interpolation on v; obtain an estimate of the option price in the Black-Scholes setting with volatility v(t, k).This is reasonable if the point (t, k) belongs to the convex envelope of the set of points (t i , k i ) for which option prices are observed, and it was shown to perform well in practice in [12].
Restricting to the case where the time to maturity t is fixed, another approach consists in the estimation of the law of the return over the interval [0, t], and to integrate with respect to the estimated law to obtain estimates of option prices.To fix ideas, discarding dividends for simplicity, one could write S t = S 0 e R , where R is the logarithmic return over [0, t], and, assuming that R has a density f R , Eg(S t ) = R g S 0 e x f R (x) dx.
In order to proceed in a nonparametric way, i.e. without assuming that f R belongs to a family of density functions indexed by finitely many parameters, a possibility is to assume that f R belongs to L 2 (R), to expand it as a series with respect to a complete orthonormal basis, and to use as approximation a truncation of the series to a finite sum.For instance, Lemma 2.2 below yields, for any parameters m and σ > 0, as an identity of functions in L 2 (R), where h n is the n-th Hermite polynomial and Introducing the random variable X defined by R = σX + m, so that S t = S 0 exp(σX + m), and denoting the density of X by f , this is equivalent to writing Eg(S t ) = g(S 0 e σx+m )f (x) dx and approximating f by The coefficients m, σ as well as α 0 , . . ., α N can then be calibrated by minimizing a distance between observed prices and "approximate" prices implied by replacing f with f N in equation (1).Several procedures to achieve this are discussed in Section 4.Moreover, note that a zero-th order approximation of f reduces to Black-Scholes pricing, choosing σ = σ 0 √ t and m = −σ 2 0 t/2, with σ 0 the volatility of the underlying.Expansion in Hermite polynomials have already been used to approximate densities of financial returns (with fixed time) in diffusion and jump-diffusion models (see, e.g., [17] and references therein), but we are not aware of any previous work where the natural nonparametric Ansatz proposed here is studied.On the other hand, a somewhat related, short study using simulated index prices and other families of orthogonal polynomials can be found in [6].
Our main interest is to test the empirical performance of the approximate pricing approach described above, dubbed Hermite pricing for convenience, investigating its dependence on several factors, such as the number of Hermite polynomials used, the calibration procedure, the corresponding optimization algorithm, and so on.We shall consider as benchmark a simple nonparametric pricing technique based on the Black-Scholes model and interpolation on the implied volatility curve.It was shown in [12] that this simple method outperforms more sophisticated techniques based on estimating the density of logarithmic returns by second derivative of call prices with respect to strike results (cf., e.g., [1]), as well as some parametric methods.It seems therefore sufficient to use just this simple technique as term of comparison.
The extensive empirical study conducted here suggests that Hermite pricing performs reasonably well for options with strike price not too far away from the strike prices of observed option prices, and is quite unreliable otherwise.This appears to be the case across all calibration methods used, even though some techniques are more robust than others.Such an observation is certainly not surprising: most (nonparametric) methods generally suffer, roughly speaking, of poor performance on points that lie outside the convex hull of the observed data set, or, more generally, on regions of the data set that are "sparsely populated".On the other hand, Hermite polynomials are defined on the whole real line, so once the coefficients of a linear combination of them are estimated, the model can in principle produce estimates for any data points, without resorting on extrapolation, as is the case for the elementary implied volatility method already mentioned.Even on rather rich data sets, however, estimates obtained by Hermite pricing are often unreliable.In essence, we believe that one can reasonably conclude that Hermite pricing can usefully complement other pricing techniques, but it is not a plausible tool to price (nonparametrically) "outside the convex hull" of observed data points.The qualitative results of the empirical analysis on real data are essentially confirmed by an analogous statistical exercise conducted on a smaller synthetic dataset generated using (non-Gaussian) Hermite processes.
The content of the remaining part of the text is organized as follows: in Section 2 we collect some facts about Hermite polynomials, compute some integrals with respect to Gaussian measures (on the real line), and we recall the connection among European option prices, distributions of returns, and implied volatility.Pricing estimates for put options, essentially in closed form, implied by approximating the density of returns with finite linear combinations of rescaled Hermite polynomials are discussed in Section 3. Corresponding formulas for call options can be formally obtained in a very similar way, but the payoff function of put options is bounded, while the payoff function of call options is not.For this (technical) reason we concentrate on the case of put options.The important issue of calibration is discusses in Section 4. The main criterion is the minimization of the relative pricing error, both in ℓ 2 and in ℓ 1 sense (corresponding to least squares and least absolute deviation, respectively).While the objective functions are smooth, in general there is no convexity, so global optimization is hard.Explicit expressions for the minimum points cannot be obtained, so numerical minimization is needed.An extensive empirical analysis is carried out in Section 5: for fixed time and time to maturity, we use a set of option prices to calibrate the model, the pricing estimates of which are in turn compared with actual option prices.A similar analysis is carried out on a set of synthetic data in Section 6, where functionals obtained from the payoff function of put options are applied to exponentials of simulated Hermite processes.Finally, auxiliary material is collected in the appendix.

Notation
The usual Lebesgue spaces L p (R), p ∈ [1, ∞], will simply be denoted by L p .The scalar product in L 2 will be denoted by •, • .We shall use the same symbol for the scalar product in other spaces, whenever it is clear from the context what is meant.Given a countable set of indices I, we shall use standard notation for the usual sequence spaces ℓ p (I), defined as the set of sequences x = (x i ) i∈I such that Whenever I is omitted, it is either Z + or a finite set clear from the context.

Gaussian measures and Hermite polynomials
For any real numbers m and σ, with σ = 0, let γ m,σ denote the Gaussian measure on R with mean m and variance σ 2 , that is the measure having density with respect to Lebesgue measure given by If m = 0 and σ = 1, we shall just write γ in place of γ 0,1 .
The Hermite polynomials (h n ) n≥0 , defined by form a complete orthogonal system of the Hilbert space L 2 (γ).The first few of them are Moreover, ((n!) −1/2 h n ) n≥0 is a complete orthonormal basis of L 2 (γ) -see, e.g., [10] for details.
Simple calculations based on a change of variable immediately show that the rescaled shifted Hermite polynomials x → (n!) −1/2 h n (σ −1 (x − m)) form a complete orthonormal system of the Hilbert space L 2 (γ m,σ ).
The following observations are elementary but important in the sequel.

Integrals with respect to Gaussian measures
We shall need some explicit Gaussian indefinite integrals.In particular, for any n ≥ 1, integration by parts gives the identities (here and in the following c ∈ R denotes a constant), from which it follows that x and, by iteration: for n ≥ 4 even as well as, for n ≥ 5 odd, (if n = 5 the product (n − 1)(n − 3) • • • 4 must be interpreted as just equal to 4).Alternative expressions can be written in terms of the incomplete Gamma function, defined as Γ(s, x) := +∞ x y s−1 e −y dy for s ∈ C, Re s > 1 and x ≥ 0 (see, e.g., [7]).We have to distinguish two cases: (a) if n is even and a < 0, then and (b) in all other cases, +∞ a x n e −x 2 /2 dx = 2 We shall often use the following simple identity: for any a ∈ R, x 0 , x 1 ∈ [−∞, +∞], and measurable g such that x → g(x)e x 2 /2+ax ∈ L 1 (x 0 , x 1 ), one has which follows by − a 2 and a change of variable.We conclude computing the integrals of rescaled Hermite polynomials with respect to a standard Gaussian measure.
Proof.For any λ, x ∈ C the generating function identity holds, with uniform convergence of the series on compact sets (see, e.g., [10, p. 7].Then where the exchange of integration and summation can be justified by approximation and passage to the limit.Writing Setting immediately implies The identity implies, by linearity of complex differentiation, thus also In particular, we immediately have that m n = 0 for every n odd, as odd Hermite polynomials do not have terms of order zero.We are going to use the following expression for the coefficients of Hermite polynomials (see, e.g., [14, Eq. 18.5.13]): For any n ∈ 2N (the only case that matters for our purposes), the term of order zero has coefficient hence, recalling that Γ(k + 1) = k! for any integer k, It follows by the Legendre duplication formula thus also Remark 2.4.As it immediately follows from equations ( 8) and ( 9), the value on the righthand side of equation ( 7) is just the absolute value of the term of order zero in the Hermite polynomial of order n.

Pricing functionals
Let t > 0 be a fixed time and S t the discounted price at time t of an asset, which is supposed to be strictly positive and, for simplicity, with zero dividend process.The price at time zero of a put option with strike price k ≥ 0 can be written, setting k := β −1 t k and denoting the distribution function of log S t /S 0 by F , as Since (k − S 0 e x ) + = S 0 (k/S 0 − e x ) + for every k ≥ 0 and x ∈ R, we can and shall assume S 0 = 1 without loss of generality.As is well known, the pricing functional (at time zero) of a call option with strike k on the same asset, defined by is related to π by the put-call parity relation 1 − k = π c − π, which follows immediately by the identity We shall use the following properties of the function π, the short proof of which is included for the reader's convenience.A more detailed treatment can be found in [11].
Proof.Positivity and boundedness are trivial by definition, while monotonicity follows by (k 1 − e x ) + ≥ (k 2 − e x ) + and (e x − k 1 ) ≤ (e x − k 2 ) + for every x ∈ R whenever k 1 ≥ k 2 ≥ 0. Note that k → k − e x is 1-Lipschitz continuous for every x ∈ R. Since y → y + is 1-Lipschitz continuous, so is k → (k −e x ) + by composition, uniformly with respect to x ∈ R. The property is then preserved integrating with respect to a measure the total mass of which is one.The proof of convexity is similar: k → k − e x is affine, in particular convex, and y → y + is convex increasing, hence k → (k − e x ) + is convex.Finally, integration with respect to a positive measure preserves convexity.
Remark 2.6.All properties in the statement of the previous proposition hold also for call options, except for the boundedness.In fact, the integrand x → (e x − k) + in the definition of π c is unbounded, which implies that k → π c (k) is itself unbounded.The boundedness of the integrand in the definition of π plays a key role in the discussion to follow, and is the main reason for us to consider put options rather than call options.
It is clear that the distribution of logarithmic returns determines the pricing functional for put options π.The following proposition says that the correspondence is in fact bijective, i.e. prices of put options for all maturities determine the distribution of logarithmic returns.This can be seen as a non-smooth extension of a classical result going back (at least) to Breeden and Litzenberger [3].
Proof.The map is surjective by definition.To prove injectivity, let F 1 and F 2 be distribution functions and assume that hence, setting G := F 1 − F 2 and integrating by parts, Since this identity holds for every k > 0, the (signed) measure with density x → e x G(x) with respect to the Lebesgue measure is equal to the zero measure, hence G is equal to zero almost everywhere.Since G is càdlàg, it follows that G = 0 everywhere, i.e.
One can explicitly construct the inverse of the map F → π as follows: integrating by parts as in the previous proof yields 2 Since F is not necessarily continuous, but just càdlàg (right-continuous with left limits), one has, for any càdlàg function G, where, if G is continuous, one can obviously replace G(x−) by G(x).
which implies that the càdlàg version of the derivative of π coincides with F (log k).In particular, if F is continuous, then π is of class C 1 with π ′ (k) = F (log k) for all k > 0.
Let us also recall that, for any fixed time to maturity, there is a one-to-one correspondence between put option prices and the implied volatility.Let v t : R + → R + be the (unique) function satisfying BS(1, t, k, v t (k)) = π(k), where BS(s, t, k, σ) denotes the Black-Scholes price of a put option on an underlying with price s at time zero, time to maturity t, strike price k, interest rate equal to zero, and volatility σ.Then we immediately have the following claim.Since here we are concerned only with the case where the time to maturity t is fixed, we shall denote the volatility function just by v.
Proposition 2.8.There is a bijection between the implied volatility function v and the distribution function F of the logarithmic return.
Let us assume that F admits a density f ∈ L 2 .As mentioned in the introduction, we are going to construct a sequence of functions (f n ) converging to f in L 2 , hence it is natural to ask whether the sequence of approximations (π n (k)) defined by converges to π(k) as n → ∞.This is in general not the case, because the function x → (k−e x ) + belongs to L ∞ but not to L 2 , hence it induces a continuous linear form on L 1 , but not on L 2 .One can show, however, that put option prices for all k can be reconstructed from approximation to option prices with payoff of the type More precisely, to identify the pricing functional π, it suffices to know, for any sequence (f n ) converging to f in L 2 (R), the values θ k 1 ,k 2 , f n for all k 1 , k 2 > 0 and all n ≥ 0, where we recall that •, • stands for the scalar product of L 2 .In fact, since i.e. lim (see [11] for more detail).Taking into account Proposition 2.7, the proof of the following claim is then immediate.
Proposition 2.9.If there exists a sequence and the distribution F of logarithmic returns.
Completely analogously, if π(k 1 ) is known, then the convergence in equation ( 10) also holds with f ∈ L 1 , i.e. without any extra integrability assumption on f , because for every x ∈ R, hence the claim follows by dominated convergence.

Pricing estimates via Hermite series expansion
We are going to discuss the construction and some properties of a class of approximations of the pricing functional π for put options with fixed time to maturity based on Hermite series expansion of the density of logarithmic returns.Particular attention is given to reducing as many computations as possible to integrals of polynomials with respect to Gaussian measures.This is desirable in practical implementations because such integrals, as seen in §2.3, can be numerically computed in an efficient way.
Recall that time to maturity, denoted by t, is fixed.Let us define the (F t -measurable) random variable X by S t e qt = S 0 exp σX + m , where m and σ > 0 are constants, and is the mean dividend rate over the time interval [0, t].We assume that the law of X admits a density f .Then Moreover, as the discounted yield process associated to the (discounted) price process S is a martingale, one has Let us further assume that the density Note that f ∈ L 1 by definition of density, hence f is automatically in L 2 if, for instance, it is bounded (which is often the case for many parametric families of densities that are used to model returns).
By Lemma 2.2 there exists (α n ) ∈ ℓ 2 such that the sequence of functions (f N ) defined by one has Replacing f by f N in the previous formula, one obtains ) and and writing Therefore Note that all integrals with respect to the Gaussian density appearing in the above expansions can be computed in closed form, in terms of the Gaussian density and distribution functions, or in terms of incomplete Gamma functions, as shown in §2.3.
Remark 3.1.The Black-Scholes formula is a special case of the above with N = 0, replacing σ and m by σ 0 √ t and − 1 2 σ 2 0 t, respectively, where σ 0 stands for the volatility of the underlying.
We now discuss some properties of this class of approximations: (i) As it follows by §2.4, the convergence of f N to f in L 2 as N → ∞ does not imply that π N → π, but one has nonetheless enough information to uniquely determine π.
(ii) The function f N in general is not a density, as it is not guaranteed to be positive and its integral over the real line is not necessarily equal to one.Furthermore, in general [2], where the authors prove that the result is sharp, in the sense that convergence fails for p ∈ [1, 4/3] and for p ≥ 4, and [13]).This is the case, for instance, if the density f is bounded, in which case, by interpolation between L 1 and L ∞ , f belongs to is not preserved substituting f with f N .However, a kind of "asymptotic martingale property" holds: note that from below.Let ε > 0 be arbitrary but fixed.Then there exists a 0 = a 0 (ε) such that for every a > a 0 Let a > a 0 be arbitrary but fixed.By the Cauchy-Schwarz inequality, i.e. there exists Some of the above issues can be avoided assuming that there exists δ > 0 such that In fact, let ( f N ) be a sequence of function converging to f in L 2 and define (f N ) by In particular, even though f N is in general not a density, as its L 1 norm may not be equal to one, it does converge to a density as N → ∞, in the sense that where In particular, lim which is a kind of asymptotic martingale property improving upon equation (11).Approximate pricing formulas for put options involving only integrals of polynomials with respect to a standard Gaussian measure can also be obtained proceeding analogously to the case treated above, even though computations are more cumbersome.For the sake of completeness, full detail is provided in the appendix.The extra integrability assumption, however, could be too strong for certain applications, as it implies that X admits exponential moments.In fact, if x → e α|x| f (x) ∈ L 2 , then, for any β < α/2, the Cauchy-Schwartz inequality yields Note that exponential integrability of the return X = log S T is not needed to ensure that S T has finite expectation.

Calibration of approximate pricing functionals
For any m ∈ R, σ ∈ R + , and α = (α 0 , . . ., α N ) ∈ R N +1 , the approximate pricing method introduced in Section 3 can be represented as a function k → π(k; m, σ, α), where m, σ, α are treated as parameters (we omit the variable t because we assume, as before, that time to maturity is fixed).Let (k i ) i∈I be a set of strike prices for which prices of put options (π i ) i∈I = (π(k i )) i∈I are observed.Moreover, we assume that f N → f in L 1 , so that the correction procedure described in subsection 2.4 is not necessary.Even though this is a loss of (theoretical) generality, it does not imply any loss of precision in the empirical analysis carried out in the next section.
The approximating Hermite pricing model with parameters (m, σ, α) can be calibrated to observed prices via a minimization problem of the form inf where Θ stands for a subset of R × R + × R N +1 and L is a loss function defined on ℓ(I) × ℓ(I), with ℓ(I) denoting the vector space of sequences indexed by the set I. Since our main interest is the minimization of the relative pricing error, we shall set where y/x is defined pointwise, i.e. (y/x) i := y i /x i for every i ∈ I, and • is a norm on ℓ(I), typically the ℓ 2 norm, corresponding to ordinary least squares, or the ℓ 1 norm, corresponding to least absolute deviation.Note that L(x, y) = +∞ as soon as x i = 0 for some i ∈ I.However, in practice this does not cause trouble because no options with price zero are traded anyway.On the other hand, out-of-the-money options with very short time to maturity will have prices close to zero, hence calibration is sensitive to the presence of such option prices in the set (π i ) i∈I .In practice, this is also not too problematic, as one could use weighted norms on ℓ(I), or just disregard options with prices too close to zero, i.e. select a suitable subset I ′ of the index set I.
Let us write the objective function J as J = R , with Denoting the cardinality of I by |I|, the relative error R can be seen as a function from (a subset of) which turns out to be very regular.
The derivatives of R can be computed easily: assuming for simplicity |I| = 1, one has Explicit expressions can also be obtained for derivatives of higher order, which can be useful to check numerically first and second-order conditions for optimality.For instance, if the norm in the definition of J is the ℓ 2 norm, then the function (m, σ, α) → R(m, σ, α) 2 is continuously differentiable and its (Fréchet) derivative is 2 R, R ′ , where R ′ can be identified with the n R N +3 -valued functions On the other hand, using the above explicit expressions to identify possible local minima solving R, R ′ = 0 may not be feasible, as the equation is highly nonlinear.If J = R ℓ 1 , that is, if the optimality criterion is defined in terms of least absolute deviation, then J is not differentiable, because the ℓ 1 norm is not.For practical purposes, this suggests that derivative-free minimization algorithms should be preferred.
We are now going to discuss a convexity properties of J with respect to the variable α for fixed m and σ.It follows immediately from Section 3 that it is possible to write where Φ 1 and Φ 2 are R N +1 -valued functions depending on the parameters m and σ, but not on α.Therefore, defining the matrix Ψ ∈ R n×(N +1) by we have Although the objective function J is not convex, the function α → J(m, σ, α) is convex.This observation is useful in view of the identity inf m,σ,α where the minimizers of α → Ψα − 1 can be characterized by ∂ Ψα − 1 = 0, with ∂ denoting the subdifferential in the sense of convex analysis.If In particular, if Ψ ⊤ Ψ is invertible, then the minimizer is unique and equal to where Of course α * is nothing else than the estimate of α by ordinary least squares.
If instead the ℓ 1 norm is used in the definition of J, the function α → Ψα − 1 is not differentiable, and its subdifferential is multivalued, hence not easy to deal with.However, the minimization problem inf α Ψα − 1 ℓ 1 can be solved by linear programming, writing it in the equivalent form inf u,α As already mentioned, using the ℓ 1 norm is equivalent to estimating α by least absolute deviation, a method that is less sensitive to outliers than ordinary least squares, which corresponds to using the ℓ 2 norm.
We are now going to consider additional constraints on α, for fixed m and σ, implying that the approximation f N to the density f integrates to one and satisfies an approximate martingale condition, i.e. that respectively.Defining the vector c = (c 0 , c 1 , . . ., c N ) ∈ R 1+N by the first condition in equation ( 13) can be written as The vector c can be computed in close form thanks to Proposition 2.3.The approximate martingale condition, that is the second condition in equation (13), is equivalent to where, by equation ( 6), We are going to obtain closed-form expressions for the coefficients of the polynomial defined by More generally, let P n (x) ∈ R[x] be a polynomial of degree n, and let us compute the coefficient of the polynomial in R[σ] defined by One has It follows by the definition of the gamma function that and finally ), the approximate martingale condition ( 14) can be written as The first few polynomials F n (σ) are Both constraints in equation ( 13) are affine in α (for fixed m and σ), in particular they are convex, as well as their intersection Θ = Θ(m, σ) ⊂ R 1+N .Moreover, since F n (σ) > 0 for every σ > 0 and n ∈ N, and c n = 0 for every odd n, the two constraints are non-redundant.
The minimization with the constraints in equation ( 13) then becomes which is still a convex minimization problem.If the norm is the ℓ 1 norm, adding the constraint α ∈ Θ to the linear programming formulation of the minimization is trivial.On the other hand, if the norm is the ℓ 2 norm, we can no longer use ordinary least squares, but the minimization problem can be solved by quadratic programming.In fact, setting n := |I|, Remark 4.2.If N = 1, the constrained minimization problem (15) degenerates, in the sense that the constraints already uniquely identify the solution.In fact, the two affine equations c, α = 1 and F (σ), α = exp(−m − σ 2 /2) have a unique solution because the vectors c and F (σ) are independent.Similarly, if N = 2, each constraint identifies a plane of R 3 , hence their intersection is a line in R 3 , i.e. the constrained minimization problem can be reduced, by a reparametrization, to an unconstrained minimization problem in one real variable.More precisely, let A be the matrix defined by with c and F considered as row vectors, v a vector in R 3 generating the kernel of A, and α 0 any vector in Θ, i.e. any solution to the equation Then Θ = {α 0 + av} a∈R .Recalling that explicit computations show that a generator of the kernel of A is (1, σ/2, −1), and a solution to equation ( 16) is Finally, assume that, for given m and σ, α * = α * (m, σ) is a minimizer of the function α → Ψα − 1 , with or without the constraints in equation ( 13), and recall that Ψ depends on m and σ, but not on α.Then inf m,σ,α Unfortunately it does not look possible to make any claim about the convexity of the function to be minimized.Therefore, results obtained by numerical minimization may depend on the initialization and may get trapped at local minima.Empirical aspects related to this issue will be discussed in the next section.

Empirical analysis
We are going to test the empirical performance of several instances of the model introduced in Section 3, that differ among each other for the way they are calibrated and for some constraints on the parameters m, σ, and α.
The calibration of each instance of the model is done in the following way: given a set of option prices observed at the same day and with the same time to maturity, labeled from 1 to n, for each j = 1, . . ., n we use the data with label (1, . . ., j − 1, j + 1, . . ., n) to calibrate the model, and with the calibrated parameters we produce an estimate π j of the price π j of the j-th option.The relative absolute pricing error of π j with respect to π j is then defined as | π j /π j − 1|.
Before describing each calibration method in detail and the corresponding empirical performance, we briefly describe the data set used.We use S&P500 index option data 3 for the period January 3, 2012 to December 31, 2012.During 2012 the annualized mean and standard deviation of daily returns of the S&P500 index were equal to 11.09% and 12.64%, respectively.During the same period the 1-year T-bill rate was very close to zero, with minimal variations: in particular, its mean was equal to 0.16%, with a standard deviation equal to 0.023%.Our sample contains 77 408 observations of European call and put options, 46 854 of which are put options.Prices are averages of bid and ask prices.Data points with time to maturity shorter than one day or volume less than 100 are eliminated.Descriptive statistics of the whole dataset are collected in Table 1.
[Table 1 about here.] 3The raw data are obtained from Historical Option Data, see www.historicaloptiondata.com.
We focus on put options, and we eliminated from the dataset those put options that (i) do not display price monotonicity with respect to the strike price; (ii) have the same price and time to maturity but different strike price.In case (i) we eliminated options with low trading volume breaking the monotonicity condition, and in case (ii) we kept only the options with the highest and the lowest strike prices.This reduces the size of the sample to 43 469 put contracts.As is well known, index options on the S&P500 are very actively traded: the day with the largest number of unique put contracts is December 21, 2012, that has 14 expiration dates and 269 quoted put options prices (after the cleaning procedure described above).The underlying price for this trading day was 1 430.20 while the strike prices had values of 1 100, 1 310, and 1 425 at the 10 th , 50 th , and 90 th percentile, respectively, with 93% of the contracts in the money.The time to maturity ranges from 4 days to almost 3 years, in line with most other trading days.

A simplified model
The simplest calibration method that we consider slightly simplifies the setting of Section 3, assuming that there exists a constant σ 0 > 0 such that where t denotes time to maturity.Note that this can be considered as a perturbation of the Black-Scholes model, where the standard Gaussian density of suitably normalized returns is replaced by a finite linear combination of (scaled) Hermite polynomials.In fact, in the degenerate case where such linear combination reduces to a multiple of the Hermite polynomial of order zero, one recovers precisely the Black-Scholes model.Throughout this subsection we shall write σ in place of σ 0 for simplicity.The model's calibration can thus be formulated as the minimization problem where Ψ is the matrix defined in equation ( 12) and • is a norm on R n+1 .The first calibration technique that we consider starts, for any σ > 0, with the minimization problem inf which can be solved by the standard ordinary least squares method to provide a minimum point α * = α * (σ), as discussed in Section 4. Let E : ]0, ∞[ → R + be the function defined by and consider the minimization problem inf σ>0 E(σ).
Assuming that a minimum point σ * exists, we take σ * and α * (σ * ) as estimates of the parameters of the model.The calibration procedure thus obtained will be referred to as procedure H σ .The model produces pricing estimates that are consistently better than the standard Black-Scholes one for every N = 1, . . ., 5, in the sense that the 10%, 25%, 50%, 75%, 90%, and 95%, quantiles of the relative pricing error empirical distribution are smaller (up to the 75% quantiles they are around 50% smaller).The pricing error considerably improves with N = 2, remains essentially unchanged with N = 3, and improves again quite drastically with N = 4, to remain again unchanged with N = 5.The numerical results indicate that Hermite approximations truncated at even degree N are likely to be a better choice, at least in the setting of calibration procedure H σ .It should however be remarked that the pricing performance of Black-Scholes with interpolated implied volatility is still much better.Another important observation is that the size and frequency of large pricing errors increase with N , consistently with the "conventional wisdom" according to which the use of more and more basis functions may cause numerical instability.Finally, the relative error of Hermite pricing is particularly pronounced for options with strike price lying far away from the strike prices of observed options.This is checked by computing relative pricing errors only for those options with strike k such that k min < k < k max , where k min and k max are the smallest and the largest strike prices, respectively, of the options used for calibration.One finds that higher quantiles of the error distribution decrease considerably (cf.Table 3).This observation is consistent with approximations of densities by Hermite polynomials being usually good around the center of the density, but not much so in the tails, where they could even become negative (see, e.g., [9] for a more complete discussion).For this reason one cannot really expect good approximate pricing for options that are deep out of the money, unless prices of options with comparable strike prices are observed.A further natural idea to try to limit the occasional large pricing errors is to constrain the calibrated density to integrate to one and to satisfy an approximate martingale condition, as in equation ( 13).In particular, the calibration procedure resulting from adding these constraints to H σ , i.e. replacing equation ( 17) by inf where Θ accounts for the constraints mentioned above, as discussed in Section 4, is labeled H c,2 σ .Numerical results, however, are discouraging (cf.Table 3), and suggest that the extra computational burden is not worth.
Remark 5.1.The martingale condition is equivalent, under the present assumptions, to Recalling that Ee λX = e λ 2 /2 for every λ ∈ R if and only if X is a standard Gaussian random variable, it follows that if equation ( 18) is fulfilled for every t ≥ 0, then X is Gaussian.However, we do not require equation ( 18) to be verified for all t, but just for certain choices of t.Furthermore, we should recall that X itself depends on t, so it does not necessarily have to be Gaussian.
[Table 2 about here.] [Table 3 about here.] The calibration of model H σ discussed so far is somewhat inconsistent because it "mixes" the ℓ 2 and the ℓ 1 norms.It is then natural to ask whether a consistent use of the ℓ 1 norm, i.e. of least absolute deviations, would improve the statistics of relative pricing error.It turns out that this is hardly the case, with empirical results suggesting that the (relative) accuracy of procedure H σ is very satisfying.Moreover, the method of ordinary least squares is very fast and less prone to numerical instability in comparison to the method of least absolute deviations.In order to substantiate these claims, let us introduce further calibration procedures: if equation ( 17) is replaced by inf the resulting procedure is labeled H 1 σ .Consider now the (numerical) minimization problem inf with starting point (σ BS , α BS ), where σ BS is such that the ℓ 1 distance between observed option prices and Black-Scholes prices with volatility σ BS is minimized, and α BS = (1/ √ 2π, 0, . . ., 0).The resulting calibration procedure is labeled H 1,0 σ .If the initial point for the minimization algorithm is chosen as the minimum point of the H σ procedure, the resulting procedure is labeled H 1,2  σ .Note that, due to the lack of convexity of the function (σ, α) → Φ(σ)α − 1 , numerical minimization algorithms are only expected to converge to a local minimum around the initial point (σ 0 , α 0 ), for which there appears to be no "canonical" choice.Procedure H 1,0 σ amounts to looking for a Hermite model minimizing the ℓ 1 error starting its search on the "degenerate" Hermite model of order zero, i.e. from the Black-Scholes model.Similarly, procedure H 1,2  σ looks for a local minimum point around the optimal solution provided by H σ .It is perhaps useful to recall that in both procedures H σ and H 1 σ the minimization step in σ can be done with numerical algorithms that require just an upper and a lower bound, rather than a starting point.
Numerical results on our dataset indicate that (a) H 1 σ performs slightly better than H σ at the level of lower quantiles of the error distribution (up to 50%), and slightly worse at the level of higher quantiles, with the slight advantage reducing as the order N of the Hermite approximation increases; (b) The performance of H 1,0  σ is overall comparable to the ones of both H σ and H 1 σ for values of N up to three, while it is clearly worse for values of N = 4 and N = 5; (c) the minimum point of H 1,2  σ is consistently very close to the one of H σ , and, accordingly, the improvement in pricing error is very small across all values of N and percentiles of the error distribution.Moreover, the distribution of pricing error becomes almost indistinguishable from the one of H 1 σ as N increases (cf.Table 4).
[Table 4 about here.] These empirical observations suggest that, in spite of its theoretical inconsistency, procedure H σ is not necessarily worse than the sounder procedure H 1 σ .One should also take into account that, even though least absolute deviation is more robust to outliers than ordinary least squares, standard numerical routines for the former did not run nearly as smoothly as those for the latter in our dataset (see Appendix B for more detail about the numerical implementation of H 1 σ via linear programming, as outlined in Section 4).Moreover, the rather simple-minded procedure H 1,0  σ turns out to be a viable alternative for lower values of N , even though it is clearly considerably slower than H σ and H 1 σ , as it involves the minimization of a function on a higher-dimensional space.It seems interesting to observe that the lack of convexity mentioned above appears to have a considerable negative impact on the pricing error only for values of N larger than three.It is natural to speculate that, as the dimension of the state space over which the objective function is minimized increases, more and more local minima appear.

Analysis of the full Hermite model
We now turn to examining the empirical performance of the full model introduced in Section 3. The simplest calibration procedure, labeled H m,σ , consists in the minimization problem where, for any real numbers m and σ, with σ > 0, α * (m, σ) is a minimum point of the convex minimization problem inf The starting point for the numerical minimization algorithm over m and σ is chosen as the minimum point of calibration procedure H σ .More precisely, if (σ * , α * ) is the calibration produced by H σ , the initial point for the numerical solution of equation ( 19) is where t is the time to maturity.Adding the constraints in equation ( 13) to (20) produces the calibration procedure labeled H c,2 m,σ .In this case the numerical solution of equation ( 19) takes as starting point the minimum point obtained by calibration procedure H c,2 σ , in the same sense already discussed above.
Empirical results (see Table 5) show that the extra degree of freedom of H m,σ with respect to H σ produces massive improvements in pricing accuracy only for N ≤ 3, and a more modest improvement with N = 4 and N = 5.In particular, for N ≤ 3, all quantiles of the error distribution up to 95% are lower than the corresponding quantiles for the models in the previous subsection.For N = 4 and N = 5, quantiles up to 75% improve, but become worse at higher levels.This is not too surprising considering that extra parameters tend to improve accuracy but to worsen stability.On the other hand, the improvement of H c,2 m,σ with respect H c,2 σ is very strong for all values of N , to the point that, for N = 5, its performance is not much worse than the ones of H σ and H m,σ .Moreover, the large errors produced by H c,2 σ for N = 5 are considerably smaller than those of other procedures.However, empirical observations already made in the previous subsection are confirmed: passing from H m,σ to H c,2 m,σ reduces the number of large errors in some cases, but does not improve the precision: the error distribution of H c,2 m,σ dominates the one of H m,σ up to the 75% quantile across all values of N .
An important numerical observation is that the estimated values of α are often enormous (of order of magnitude 10 150 ).Even though such values can hardly be interpreted, they do not compromise, in the overwhelming majority of cases, neither calibration error nor pricing error.Perhaps somewhat surprisingly, at least from the point of view of numerical stability, adding lower and upper bounds to equation (20) produces worse results (numerical output relative to these attempts is not reproduced).On the other hand, in the case of calibration procedure H c,2 m,σ the minimization problem (20) subject to the additional constraints ( 13) is solved numerically using quadratic programming, for which, to avoid numerical crashes, it was necessary to constrain |α| to be less than the inverse of machine precision.This bound however is never reached, and estimates of α are in this case much better behaved.On the other hand, as already remarked, the calibration without constraints displays better pricing accuracy in the large majority of cases.
The calibration procedure obtained replacing the ℓ 2 norm in equation ( 20) by the ℓ 1 norm, which would naturally be labeled H 1 m,σ , turns out to be numerically very unstable on our dataset, with minimization by linear programming, via the GLPK routines, crashing too often to be usable.Roughly speaking, the reason is that the matrix Ψ(m, σ) becomes very singular and the numerical linear programming routines break down.For this reason, whenever Ψ(m, σ) is too "large" (see Appendix B for detail), we use instead the estimates produced by the procedures H 1,0 m,σ and H 1,2 m,σ , that correspond to the minimization of the function (m, σ, α) → Ψ(m, σ)α − 1 ℓ 1 over the set R × ]0, ∞[ × R 1+N , using as starting point the Black-Scholes parameters and the H σ parameters, respectively.With a slight abuse of notation, the procedures so obtained are still labeled H 1,0 m,σ and H 1,2 m,σ , respectively.Note that it would not make sense to use as starting point the parameters calibrated by H m,σ for the reasons discussed above.
The empirical results reported in Table 6 show that least absolute deviation estimates starting from the Black-Scholes parameters are no longer comparable to the estimates produced by the two-step OLS optimization (i.e. by H m,σ ), even for lower values of N .On the other hand, the performance of the H 1,2 m,σ procedure is indeed comparable to the one of H m,σ for N = 4 and N = 5, but it does not offer any worthy advantage, apart from the size of α.In fact, one should take into account that, for reasons already discussed above, the minimization algorithm used by H 1,2 m,σ is much slower than the two-step procedure of H m,σ .Moreover, H 1,2 m,σ has a performance that is only slightly better than the one of H 1,2 σ in the range N = 2 to N = 4, and essentially identical for N = 5 for percentiles up to 50%.Therefore, also by considerations of computational complexity, it does not seem particularly interesting.The results, however, are important in the sense that they confirm the good empirical performance of our proposed procedure H m,σ , in spite of its theoretical inconsistency.

Empirical analysis on synthetic data
We are going to describe the results of an empirical analysis, analogous to the one described in the previous section, on a set of synthetic data, generated using Hermite processes (see Appendix B for basic definitions and results, and, e.g., [15] and references therein for financial applications). 4Such an analysis can be considered as a sort of empirical robustness test, as a financial interpretation along the lines described in previous sections is, in general, not possible.More precisely, we shall produce synthetic data of the type where Y 0 > 0, σ > 0 and m are constants, and F is the distribution function of a (non-Gaussian) Hermite process at time one.The values π(k), however, cannot be interpreted as prices of options in a Hermite market, as Hermite processes are not semimartingales, hence the standard pricing methods in terms of expectations under a risk-neutral measure do not make sense any longer.
On the other hand, the problem of estimating π(k) (or, more generally, of estimating F , as explained in §2.4) from a finite set of observations (π(k i )) i∈I is meaningful for any distribution function F , independently of any financial interpretation.It is in this sense that the numerical results obtained should be interpreted as a sort of robustness test.
We produced synthetic values of π(k), as defined by (21), with the parameters k, Y 0 , σ and m chosen in terms of the dataset considered in the previous section.As a first step, we randomly selected 25 days from the dataset.For each day there are "blocks" of options with the same time to maturity.Let us now consider a day and a block fixed: we set Y 0 equal to the price of the underlying S 0 , and denoting the time to maturity and the calibrated Black-Scholes implied volatility for the block under consideration by t and σ 0 , respectively, we set Furthermore, random samples of a Hermite process with parameters k = 3 and H = 0.63 evaluated at time one, denoted by Z 3 0.63 (1), are generated using the weak convergence results gathered in Appendix B. 5 The empirical distribution function of the set of simulated random samples is denoted by F .Finally, we computed π(k) as in (21) for the values of k corresponding to the strike prices in the block under consideration in the original dataset.The whole procedure is repeated for each block of each day, thus obtaining a synthetic dataset that has approximately 10% the size of the real dataset used in the previous section.The empirical analysis described in the previous section is then applied to the synthetic data thus produced.
Before describing the results of the analysis, some remarks are in order.The choice of the parameters σ and m (see (22) above) is guided simply by an analogy to the case discussed in the previous section.In this regard it is probably worth mentioning that the process Y t = Y 0 exp σZ k H (t) does not have, in general, finite expectation, as elements of the n-th Wiener chaos, with n ≥ 3, do not admit any exponential moments (see [8,Corollary 6.13]).However, since 0 ≤ (k − e x ) + ≤ k for every x ∈ R, the expectations E(k − Y t ) + are always finite.Analogously, the distribution of Z k H (1) is not expected to have a density in L 2 (R) (see Appendix C for more detail).However, the empirical distribution function F is compactly supported and bounded, hence (a smoothed version of) its density is certainly in L 2 (R).
Let us now discuss the empirical results obtained on the synthetic dataset, on which we have applied the estimation methods H σ , H m,σ , H 1,0 σ , and H 1,2 σ , in addition to the Black-Scholes methods with implied volatility and with interpolation on the implied volatility curve (to which we shall refer as BS and BS i , respectively).Methods involving least absolute deviation techniques implemented via linear programming have been excluded because of their numerical instability (see the corresponding remarks in the previous section and Appendix D).Similarly, the constrained methods would not make sense in the present setting, as the synthetic data cannot be interpreted as prices, as already discussed, hence the approximate martingale property would just be a spurious constraint. 6ven though we shall make some comparisons between the empirical performance of the various methods on the real and the synthetic datasets, these must of course be taken with caution, at least because the latter dataset is much smaller than the former.
It turns out that also on synthetic data the BS i method displays an outstanding performance, that is much better than what the various other methods can achieve, consistently over all degrees of Hermite polynomials considered and all quantiles of the error distribution.The BS i method achieves better accuracy on the synthetic dataset than on real data.This may be explained by the fact that synthetic data are more "regular" than real data, in the sense that the latter are more noisy, hence may have a more irregular distribution.It is interesting also to observe that, for synthetic data, accuracy within the hull is much better than the accuracy on the whole dataset (i.e.including out-of-the-hull points).This points to the plausibility of the previous argument, in the sense that regularity of the distribution of synthetic data implies that estimates in the hull are particularly precise.On the other hand, the "naive" Black-Scholes estimator BS performs considerably worse on synthetic data than on real data.A possible explanation for this is that Hermite processes of order three, as the one used to generate the data, are strongly non-Gaussian.In a somewhat loose way, one may argue that the non-Gaussianity of the Hermite process used here is stronger than the non-Gaussianity of returns in real data.
Method H σ produces estimates that are considerably poorer than those produced by BS i , in analogy with the corresponding results for the real dataset.On the other hand, the accuracy improves considerably with respect to the BS method, showing that the Hermite approximation method captures deviations from Gaussianity to a certain extent.Note also that there is essentially no improvement passing from N = 4 to N = 5.It should also be mentioned that the method performs worse on the synthetic data than on the real data with N ≤ 3, while with N = 4, 5 the performance is very similar in the hull, but still worse (for the synthetic data) out of the hull.This is probably still due to a stronger deviation from Gaussianity in the synthetic data that cannot be captured sufficiently well by Hermite approximations of the density of order up to five.
Method H 1,0 σ performs significantly worse than the much quicker method H σ .As already remarked, the optimization algorithm suffers from the existence of many local minima, and the local minimum closest to the BS parameters, to which it converges, may arguably be quite far from the global minimum.This phenomenon was already observed in the case of real data, and it is even more pronounced for synthetic data.It is perhaps worth noting that the method has a median error that decreases as the order N increases, but produces large errors that strongly influence the error distributions at higher quantiles.
In contrast to H 1,0 σ , method H 1,2 σ searches for a local minimum, in the ℓ 1 sense, starting from the parameters of H σ .This method, that could be seen as a refinement of method H σ , has an entirely similar accuracy to that of the latter across all values of N .Strictly speaking, this may just be explained by the existence of a local minimum quite close to the initial datum for the search algorithm.In practice, however, in analogy to the case of real data, this shows that the much quicker method H σ , although theoretically not fully consistent, produces estimates that can hardly be improved by standard (non-global) optimization algorithms.It appears interesting to observe that the median error is worse in the synthetic data than in the real data, but that the frequency of large errors, at least for sufficiently high order N , is lower for synthetic data than for real data.This might be consistent with real data having a higher Gaussianity than synthetic data, but more extreme outliers.
Finally, the extra parameter m allows method H m,σ to achieve a higher accuracy than the simpler method H σ .This is of course not surprising from the mere statistical viewpoint, but it may be somewhat interesting nonetheless, considering how the synthetic data are generated (i.e., roughly speaking, choosing m as in H σ ).This observation can be interpreted as further evidence for a deviation from Gaussianity of Hermite processes that is hard to capture with Hermite approximations (of order up to five, at least).

Concluding remarks
We have analyzed the empirical performance of a class of nonparametric models to price European options with fixed time to maturity, based on approximating the density of logarithmic returns by truncated series of weighted and scaled Hermite polynomials.As a term of comparison we considered a simple Black-Scholes model coupled with linear interpolation on the implied volatility curve.The empirical performance of the methods, measured by off-sample relative pricing error, is studied on one year of daily European put options on the S&P500 index.The results suggest that Hermite models performs reasonably well for options with strike price not too far away from the strike prices of observed option prices.This appears to be the case across all calibration methods used.For options with strike price far apart from the set of strike prices of observed options, estimates obtained by the Hermite methods are less reliable than those obtained by the simple nonparametric Black-Scholes method mentioned above, and are in general not better otherwise.Therefore it seems fair to say that Hermite methods can be useful in particularly well-behaved situations, but cannot be considered reliable stand-alone nonparametric pricing tools.These qualitative observations are confirmed by a statistical exercise conducted on synthetic data generated in terms of a class of non-Gaussian stochastic processes (the Hermite processes).

A Pricing formulas under extra integrability conditions
Assume that there exists δ > 0 such that Let ( f N ) be a sequence of function converging to f in L 2 and define (f N ) by Moreover, ))e x 2 /2 dx,

B Hermite processes: weak convergence and simulation
We collect some facts about Hermite processes, using as main source [16] (see also [4]).
The Hermite process with parameters k and H is defined by where x + denotes the positive part of x and the integral is a multiple Wiener-Itô stochastic integral with respect to a Wiener process W with parameter space R. Then Z k H is a meanzero square-integrable process with stationary increments, Z k H (0) = 0, and E(Z k H (1)) 2 = 1.Moreover, Z k H is self-similar with parameter H, i.e.Z k H (t) = t H Z k H (1) in distribution for every t ∈ R + .Finally, for k ≥ 2 the process is not Gaussian.
Let g ∈ L 2 (γ) be a function such that γ(g) = 0. Then g can be written as The function g is said to have Hermite rank Taqqu has proved in [16, Theorem 5.5] a convergence theorems for integral functionals of a class of Gaussian processes X that admits a representation of the type where e : R → R is a function satisfying a set of conditions spelled out in [16, §2], that include lim s→∞ e(s) s H 0 −3/2 L(s) = 1 and σ := e L 2 (R) .Then one has, for any g ∈ L 2 (γ) of Hermite rank k, in the sense of weak convergence of measures in C[0, 1].Here A(x) is a normalizing constant defined by The process B H := Z 1 H is the usual fractional Brownian motion.The centered stationary Gaussian process X defined by X t := B H t − B H t−1 is called fractional Gaussian noise.The process X admits a representation of the type (23), with (see [16, §3]).As a consequence, lim in the sense of weak convergence of measures in the Skorokhod space D[0, 1] (cf.[16,Theorem 5.6]).
The convergence result (24) is amenable to practical implementation, as the (well-known) covariance function of B H is hence, by an elementary computation, the (stationary) correlation function of the corresponding fractional Gaussian noise X is given by

C On the density of a cubic function of a Gaussian
Let h(x) = x 3 − 3x be the Hermite function of order three.Setting and that the function h has a local maximum at −1 and a local minimum at 1.The inverse of the function h j , j = 1, 2, 3, will be denoted by h ← j .Let Z a standard Gaussian random variable.Then the distribution function of the random variable h(Z) can be written as The inverse function theorem readily shows that the limits of h ← In particular, there exists j ∈ {1, 2, 3} such that This in turn implies where lim y→2 h ← j (y) = 1 and hence G ′ (y) 2 tends to infinity as y → 2− as 1/(2 − y), which is not integrable.This implies that the (unbounded) density of h(Z) does not belong to L 2 (R).

D Numerical implementation
All numerical computations are done with Octave 7.1.0on Linux, using the Octave Forge packages statistics and optim.Minimization with respect to σ in H σ is done through the function fminbnd, with lower and upper bounds 0.1 and 1, respectively.Minimizations in H 1,0 σ and H 1,2  σ are done through the function fminsearch.Minimization in H 1 σ is done through the function glpk, i.e. through the GNU Linear Programming Kit (GLPK).In about 10% of the computations it returns no solution.For these points the result produced by H 1,2  σ is used instead.Without any constraint on α, the GLPK algorithm sometimes breaks down or returns very high values of α that translate into unusable pricing estimates.By trial-and-error, we determined that a reasonable bound on the absolute value of α is 10, and we implemented such a constraint.The issues with minimization via GLPK are much more severe in the case of H 1 m,σ .Simple bounds on the absolute value of α do not help in this case.The problem appears to be the "size" of the matrix Ψ, which depends on the parameters m and σ.Sporadic crashes of glpk (as opposed to very frequent ones) are obtained by running it conditional on the absolute value of the determinant of Ψ ⊤ Ψ being bounded by 10 6 .However, the proportion of pricing estimates obtained this way is only around 5% of the total.

1 ′ and h ← 2 ′ at 2, and the limits of h ← 2 ′ and h ← 3 ′ at − 2 ,
are infinite, hence it is not clear whether the square of the derivative of G is integrable on neighborhoods of 2 and −2.To answer this question, note that the cubic equation x 3 − 3x = y, with y ∈ ]−2, 2[, admits the three real solutionsx k = 2 cos 1 3 arccos y/2 − 2π 3 k , k = 0, 1, 2.

Table 1 :
Summary statistics for S&P500 index options dataThis table collects some simple statistics for prices of European call and put options on the S&P500 index.The sample period is January 3, 2012 to December 31, 2012.Implied volatilities, expressed in percentage points, are annualized, time to maturity is expressed in days, strike and futures prices are expressed in index points.

Table 2 :
Pricing errors Black and ScholesThe table reports selected quantiles of the distribution of empirical pricing errors (in percentage points) of the Black and Scholes estimators.Each column matches the corresponding one in the tables relative to Hermite pricing.Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 3 :
Pricing errors for Hermite models H σ and H c,2 σ The table reports selected quantiles of the distribution of empirical pricing errors (in percentage points) of the Hermite estimators Hσ and H c,2 σ .Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 4 :
Pricing errors of Hermite models H 1 The table reports selected quantiles of the distribution of the empirical pricing errors (in percentage points) of the Hermite estimators H 1 σ , H 1,0 σ , and H 1,2 m,σ .Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 5 :
Pricing errors of Hermite models H m,σ and H c,2 m,σ The table reports selected quantiles of the distribution of the empirical pricing errors (in percentage points) of the Hermite estimators Hm,σ and H c,2 m,σ .Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 6 :
Pricing errors of Hermite models H 1,0 m,σ and H1,2The table reports selected quantiles of the distribution of the empirical pricing errors (in percentage points) of the Hermite estimators H 1,0 m,σ and H 1,2 m,σ .Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 7 :
Synthetic data: pricing errors Black and ScholesThe table reports selected quantiles of the distribution of empirical pricing errors (in percentage points) of the Black and Scholes estimators.Results are calculated on a set of synthetic data, generated using Hermite processes.Each column matches the corresponding one in the tables relative to Hermite pricing.Figures in parenthesis refer to pricing errors obtained excluding strikes outside the interval of observed ones.

Table 8 :
Synthetic data: errors for Hermite models H σ and H m,σThe table reports selected quantiles of the distribution of empirical estimation errors (in percentage points) of the Hermite estimators Hσ and Hm,σ.Results are calculated on a set of synthetic data, generated using Hermite processes.Figures in parenthesis refer to estimation errors obtained excluding values of k outside the interval of observed ones.

Table 9 :
Synthetic data: errors of Hermite models H 1,0 σ and H1,2The table reports selected quantiles of the distribution of empirical estimation errors (in percentage points) of the Hermite estimators H 1,0 σ and H 1,2 σ .Results are calculated on a set of synthetic data, generated using Hermite processes.Figures in parenthesis refer to estimation errors obtained excluding values of k outside the interval of observed ones. σ.