1 Introduction

The classical Wiener–Hopf factorization of a probability measure was given by Spitzer [29] and Feller [13], and has a strong connection to random walks. This result was generalized by Rogozin [28], Fristedt [16], and other authors using approximation based on discrete time skeletons. Greenwood and Pitman [17] used a direct approach which relies on excursion theory for reflected process; for details, see [7, 20]. Another approach is presented in [14] where the link with scattering theory is also made. Presman [26] and Arjas and Speed [5] generalized Spitzer identity in a different direction, to the class of Markov additive processes (MAPs) in discrete time (see also [2, 25]). Later, Kaspi [19] proved Wiener–Hopf factorization for a continuous time parameter Markov additive process, where Markovian component has a finite state space and is ergodic. The fluctuation identity given by Kaspi [19] involves distribution of the inverse local time. Dieker and Mandjes [12] investigate discrete-time Markov additive processes and use an embedding to relate these to a continuous-time setting (see also [9, 27]).

The use of MAPs is widespread, making it a classical model in applied probability with a variety of application areas, such as queues, insurance risk, inventories, data communication, finance, environmental problems and so forth; see, e.g., [2, Chap. XI], [4, 9, 10, 15], [25, Chap. 7]. The reason comes from considering seasonality of prices, recurring everyday patterns of activity, burst arrivals, occurrence of events in phases, and so on. This leads to regime-switching models, where the process of interest is modulated by a background process. The so-called phase-type distributions fit also naturally into the framework of MAPs. MAP with positive phase-type jumps can be reduced to a MAP with no positive jumps without losing any information. This procedure is called fluid embedding. Informally, it involves enlarging the state space of the background process and replacing the jumps by linear stretches of unit slope. Apart of above a MAP is a natural generalization of a Lévy process with many analogous properties and characteristics although various new mathematical objects appear in the theory of MAPs posing new challenges.

This paper presents Wiener–Hopf factorization for a special, but nonetheless quite general, class of Markov additive processes. For this class of processes, we give short proof of Wiener–Hopf factorization based on Markov property and additivity. We also express the terms of Wiener–Hopf factorization directly in terms of the basic data of the process. Finally, we derive Spitzer–Rogozin theorem for this class of processes which serves for obtaining Kendall’s formula and Fristedt representation of the cumulant matrix of the ladder epoch process. We also present the ballot theorem.

The paper is organized as follows. Section 2 introduces basic definitions, facts and properties related with MAPs. In Sect. 3, we give the main results of this paper. Finally, in Sect. 4, we prove all the theorems.

2 Preliminaries

2.1 Markov Additive Processes

Before presenting our main results, we shall simply begin by defining the class of processes we intend to work with and their properties. Following [3], we consider a process X(t), where X(t)=X (1)(t)+X (2)(t), and the independent processes X (1)(t) and X (2)(t) are specified by the characteristics: q ij ,G i ,σ i ,a i ,ν i (dx) which we shall now define. Let J(t) be a right-continuous, ergodic, finite state space continuous time Markov chain, with \({\mathcal{I}}=\{1,\ldots,N\}\), and with the intensity matrix Q=(q ij ). We denote the jumps of the process J(t) by {T i } (with T 0=0). Let \(\{U^{(i)}_{n}\}\) be i.i.d. random variables with distribution function G i (⋅). Define the jump process by

For each \(j\in \mathcal{I}\), let X j(t) be a Lévy process with the Lévy–Khinchine exponent:

(1)

where \(\int _{-\infty}^{\infty}(1\wedge |y|^{2}) \nu_{j}(dy)<\infty\). By X (2)(t) we denote the process which behaves in law like X i(t), when J(t)=i. We shall assume that the afore-mentioned class of MAPs is defined on a probability space with probabilities \(\{ \mathbb{P}_{i}:i\in \mathcal{I} \} \), where ℙ i (⋅)=ℙ(⋅|J(0)=i), and right-continuous natural filtration \(\mathbb{F}= \{\mathcal{F}_{t}: t \geq 0\}\). In fact, we can consider more general MAP where additional jumps \(U_{n}^{(i)}\) appearing during the change of the state of J(t) could also depend on the state J(T n+1) (so-called anticipative MAP). This could be done by considering the vector state space \(\mathcal{I}^{2}\) and the modified governing Markov process J on it. If each of the measures ν i are supported on (−∞,0] as well as the distributions of each \(U^{(i)}_{n}\) then we say that X is a spectrally negative MAP. These definition and more concerning the basic characterization of MAPs can be found in [2, Chap. XI].

2.2 Time Reversal

Predominant in the forthcoming analysis will be the use of the bivariate process \((\widehat{J},\widehat{X})\), representing the process (J,X) time reversed from a fixed moment in the future when J(0) has the stationary distribution π. For definitiveness, we mean

under . Note that \(\widehat{X}\) is also Markov additive process. The characteristics of \((\widehat{J},\widehat{X})\) will be indicated by using a hat over the existing notation for the characteristics of (J,X). Instead of talking about the process \((\widehat{J},\widehat{X})\), we shall also talk about the process (J,X) under probabilities \(\{\widehat{\mathbb{P}}_{i}:i\in \mathcal{I}\}\). Note also for future use, following classical time reversed path analysis, for y≥0 and st,

(2)

where I(t)=inf0≤st X(s), S(t)=sup0≤st X(s) and \(\overline{G}(t)=\sup\{s<t: X(s)=S(s)\}\), \(\underline{G}(t)=\sup\{s<t: X(s)=I(s)\}\). (A diagram may help to explain the last identity.)

From now on, we assume that at least one of the processes X i is not a downward subordinator and compound Poisson process. To include compound Poisson process X (i)(t) in the main Theorem 1(i), it is necessary to work with the new definition \(\underline{G}(t)=\inf\{s<t: X(s)=I(s)\}\) instead the previous one. Under the above assumption, we have also \(\overline{G}(t)=\sup\{s\leq t: X(s)=S(s)\}\) and \(\underline{G}(t)=\sup\{s\leq t: X(s)=I(s)\}\).

2.3 Ladder Height Process

We start from recalling the representation of the local time given in [19, formula (3.21)]. For MAP, we say that state i∈{1,…,N} is regular when ℙ i (R=0)=1 for \(R=\inf\{t> 0: t\in \overline{\mathcal{M}}\}\), where \(\overline{\mathcal{M}}\) is a closure of \(\mathcal{M}=\{t\ge 0: X(t)=S(t)\}\). Denote by {U n } the stopping times at which R(t−)=0 and R(t)>0 for the R(t)=inf{s>t:S(t)=X(t)}−t and J(t) is irregular. Denote

By Kaspi [19, Theorem 3.28] (see also [22]), for the MAP we can define the ladder height process:

choosing the local time:

(3)

where L c(t) is a continuous additive process that increases only on \(\mathcal{M}\) and \(\mathbf{e}_{1}^{(n)}\) are independent exponential random variables with intensity 1,

Obviously, to make the functional (3) measurable, we enlarge probability space to include the exponential random variables. One can easily verify that (L −1(t),H(t),J(L −1(t))) is again a (bivariate) MAP (see [19, p. 185]). For each moment of time, we can define the excursion:

where is a cemetery state. Let ζ(ϵ t )=L −1(t)−L −1(t−) be the length of the excursion if ϵ t . From ( 3) it follows that the excursion process \(\{(t, \epsilon_{t}),\;t\geq 0\}\) is a (possibly stopped at the first excursion with infinite length) marked Cox point process with the intensity n(J(L −1(t−)),) depending on the state process J(L −1(t−)). Denote by \(\mathcal{E}\) the σ-field on the excursion state space.

2.4 Spectrally Negative Markov Additive Process

Letting \({\mathbf{Q}}\circ{\widetilde{ \mathbf{G}}}(\alpha)=(q_{ij}\widetilde{G}_{i}(\alpha) )\), where \(\widetilde{G}_{i}(\alpha)= \mathbb{E} (\exp( \alpha U^{(i)}_{1}) )\), for spectrally negative MAP we can define the cumulant generating matrix (cgm) of a MAP X(t):

(4)

where ψ j (α)=−Ψ(−) for Ψ defined in (1). Perron–Frobenius theory identifies F(α) as having a real-valued eigenvalue with maximal real part which we shall label κ(α). The corresponding left and right 1×N eigenvectors we label v(α) and h(α), respectively. In this text, we shall always write vectors in their horizontal form and use the usual T to mean transpose. Since v(α) and h(α) are given up to multiplicative constants, we are free to normalize them such that

Note also that h(0)=e, the 1×N vector consisting of a row of ones. We shall write h i (α) for the ith element of h(α). The eigenvalue κ(α) is a convex function (this can also be easily verified) such that κ(0)=0 and κ′(0) is the asymptotic drift of X in the sense that for each \(i\in \mathcal{I}\) we have lim t↑∞ \(\mathbb{E} (X(t)|J(0)=i, X(0)=x)/t=\kappa^{\prime } ( 0 ) \). For the right inverse of κ, we shall write Φ.

It can be checked that under the following Girsanov change of measure,

(5)

the process \((X,\mathbb{P}_{i}^{\gamma })\) is again a spectrally negative MAP whose intensity matrix F γ (α) is well defined and finite for α≥−γ. Generally, for all quantities calculated for ℙγ, we will add subscript γ. Further, if F γ (α) has largest eigenvalue κ γ (α) and associated right eigenvector h γ (α), the triple (F γ (α),κ γ (α),h γ (α)) is related to the original triple (F(α),κ(α),h(α)) via

(6)

where I is the N×N identity matrix and

Similarly, the time reversed process \(\widehat{X}(t)\) is the spectrally negative MAP with the characteristics \(\widehat{\mathbf{F}}\), \(\widehat{\mathbf{h}}\), \(\widehat{\kappa }\). To relate them to the original ones, recall that the intensity matrix of \(\widehat{J}\) must satisfy

where Δ π is the diagonal matrix whose entries are given by the vector π. Hence according to (4) we find that

(7)

Moreover, \(\widehat{\kappa } ( \alpha ) = \kappa ( \alpha ) \) and (see [21] for details).

Here and throughout, we work with the definition that e q is a random variable which is exponentially distributed with mean 1/q and independent of (J,X).

As much as possible, from now on, we shall also prefer to work with matrix notation. For a random variable Y and (random) time τ, we shall understand E(Y;J(τ)) as the matrix with the (i,j)th element \(\mathbb{E}_{i}(Y; J(\tau)=j)\). For an event A, P(A;J(τ)) will be understood in a similar sense. Let I ij (q)=ℙ i,0(J(e q )=j), in other words,

The spectrally negative MAP is easier to analyze since its ladder height process (L −1(t),H(t),J(L −1(t))) has explicit cumulant generating matrix Ξ(q,α). Let

where a≥0. Denote the generator of the Markov process \(\{J(\tau^{+}_{a}),a\geq 0\}\) by Λ(q) on ℙΦ(q). Pistorius [24], Ivanovs et al. [18] show that it solves the following equation:

(8)

where the above equation is understood as a matrix equation by putting −Λ(q) in the place of α in (6) with γ=Φ(q) and the obvious meaning of an exponential matrix. In other words, we should understand the latter as

(9)

where is the diagonal matrix with entries \(a_{i}+\varPhi(q)\sigma_{i}^{2} -\int_{-\infty}^{0} y1_{[-1,0]}(y)(1-e^{\varPhi(q) y})\nu_{i}(dy)\) (i=1,…,N) along the diagonal; similarly, is diagonal with elements \(\sigma^{2}_{i} /2\) (i=1,…,N), matrix is diagonal with entries e Φ(q)⋅ ν i (⋅) (i=1,…,N) on the diagonal, and the matrix QG Φ(q) has entries q ij e Φ(q)⋅ G i (⋅) (i,j=1,…,N). For details, check [21] and [23, Prop. 5.6]. D’Auria et al. [6] give a numerical algorithm of calculating Λ(q) based on the theory of Jordan chains. Note that the ladder height process can be identified as \(\{(\tau_{a}^{+}, X(\tau^{+}_{a})=a,J(\tau_{a}^{+})), a\ge 0\}\). It is a bivariate Markov additive process with the cumulant generating matrix:

(10)

for α, q>0. The above could be deduced from the equalities

(11)

and Theorem 1 of Kyprianou and Palmowski [21] stating that

(12)

3 Main Results

The main result of this paper is given in the next theorem.

Theorem 1

(i) For a general MAP, the random vectors \((S(\mathbf{e}_{q}),\overline{G}(\mathbf{e}_{q}))\) and \((S(\mathbf{e}_{q})-X(\mathbf{e}_{q}),\mathbf{e}_{q}-\overline{G}(\mathbf{e}_{q}))\) are independent conditionally on \(J(\overline{G}(\mathbf{e}_{q}))\), that is, for αR, ξ≥0,

(13)

and

(14)

(ii) For the spectrally negative MAP and α,ξ≥0,

(15)
(16)
(17)
(18)

Remark 1

Applying Theorem 1(i) to the reversed process yields a similar conclusion for the infimum functional. Namely, processes \(\{(X(t),J(t)),0\le t< \underline{G}(\mathbf{e}_{q})\}\) and \(\{(X(\underline{G}(\mathbf{e}_{q})+t)-X(\underline{G}(\mathbf{e}_{q})), J(\underline{G}(\mathbf{e}_{q})+t)),t\ge 0\}\) are independent conditionally on \(J(\underline{G}(\mathbf{e}_{q}))\).

Remark 2

For N=1 (hence Λ(q)=0, I(q)=1), the above theorem gives well-known identities for a spectrally negative Lévy process:

where ψ(θ)=−Ψ(−) is the Laplace exponent of X. Finally, for ξ=0, the above theorem gives an already known identity for spectrally negative MAP (see [21]):

(19)

which was derived using Asmussen–Kella martingale.

Remark 3

It is hard to give an explict expression for the expression appearing in Theorem 1(ii) which depends on the matrix Ξ defined in (10) and hence requires solving the matrix equation (8). There only a few known examples; see, e.g., the examples given in [10]. There are still two numerical methods in the literature. The first one uses an iteration scheme, and the second one uses a theory of generalized Jordan chains; see, e.g., [1, 6].

We prove also the following counterpart of Spitzer–Rogozin version of Wiener–Hopf factorization and the Fristedt theorem:

Theorem 2

Assume that the matrix Eexp{iαX(1)} has distinct eigenvalues and that for any t,s≥0:

(20)

Then

and

By Markov property, the assumption (20) heuristically means that for the MAP (X,J) on time interval t+s going at time t above 0 and then below the present value is statistically equivalent to going first below 0 at time s and then above the present value. This assumption is satisfied, for example, for the two-state Markov process J(t) and for X(t)=0 when J(t)=1 and X(t)=B(t), which is a Brownian motion, when J(t)=2. It is not satisfied, for example, for the two-state Markov process J(t) with q 12=q 21=λ and for X(t)=t when J(t)=1 and X(t)=−t+B(t) when J(t)=2. Indeed, take α=0 and note that for s↓0 we have \(\mathbb{P}_{1}(X(s)<0, J(s)=1)=\int_{0}^{s} \mathbb{P}_{1}(T_{1}\in dw)\mathbb{P}_{2}(X(s-w)<w, J(s-w)=1)\leq \int_{0}^{s} \mathbb{P}_{1}(T_{1}\in dw)\mathbb{P}_{2}(X(s-w)<w, J(s-w)=2)+ \mathrm{o}(s)=\mathbb{P}_{1}(X(s)<0, J(s)=2)+\mathrm{o}(s)\). Hence for s,t↓0, we have

The following generalizations of Kendall’s identity and the ballot theorem also hold.

Theorem 3

Consider the spectrally negative Markov additive process X(t) with absolutely continuous transition probabilities. Under the assumptions of Theorem 2, we have

Theorem 4

Let

where {σ(t),t≥0} is a Markov additive subordinator without the drift component and c>0. Under the assumptions of Theorem 2, the following identity holds

Summarizing, the theorems given here might be seen as a foundation of the fluctuation theory for the (spectrally negative) MAP and might serve for deriving counterparts of the well-known identities for the Lévy processes.

4 Proofs

4.1 Proof of Theorem 1

(i) Sampling the MAP process (X(t),J(t)) up to an exponential random time e q corresponds to sampling the marked Cox point process (double Poisson point process) of the excursions up to time L(e q ). Moreover, since conditioning on a realization of the process J(t) the point process (t,ϵ t ) is a non-homogeneous marked Poisson process, we know that, conditioning on J(L −1(σ A−)) for

the point process {(t,ϵ t ),t<σ A} is independent of \(\epsilon_{\sigma^{A}}\). Indeed, for Borel sets B 1 and B 2 and k 0=max{k:T k σ A} we have

Hence

Consider now

and

Note that σ 2 is σ A for A={ζ(ϵ)>e q } and possibly σ 1=∞ (e.g., when the set of maxima has Lebesgue measure 0). If σ 2<σ 1, then conditioning on J(L −1((σ 1σ 2)−))=J(L −1(σ 2−)) the process

(21)

is independent of \(\epsilon_{\sigma_{2}}=\epsilon_{\sigma_{1}\wedge\sigma_{2}}\). If σ 1<σ 2, then \(\epsilon_{\sigma_{1}}=\epsilon_{\sigma_{1}\wedge\sigma_{2}}=\partial\) and is also independent of the process (21). Hence conditioning on J(L −1(σ 1σ 2−)) the excursion \(\epsilon_{\sigma_{1}\wedge\sigma_{2}}\) is independent of the process (21). Note also that

(22)

and the last excursion \(\epsilon_{\sigma_{1}\wedge\sigma_{2}}\) occupies the final \(\mathbf{e}_{q}-\overline{G}(\mathbf{e}_{q})\) units of time in the interval [0,e q ] and reaches the depth X(e q )−S(e q ). The proof of the identities (22) completely follows the arguments given in [17, Sect. 4]. These identities complete the proof of the first part of Theorem 1(i). Note that \((\mathbf{e}_{q}-\overline{G}(\mathbf{e}_{q}), X(\mathbf{e}_{q})-S(\mathbf{e}_{q}))\) has the same law as \((\widehat{\underline{G}}(\mathbf{e}_{q}), \widehat{I}(\mathbf{e}_{q}))\). The second part of Theorem 1(i) follows now from the first part applied to the reversed process.

(ii) To prove Theorem 1(ii), we follow [8]. Fix n∈ℕ and set

(23)

where [⋅] stands– for the integer part. Applying the strong Markov property at time \(\tau_{k/n}^{+}\) and using (12) yields

(24)

Taking n→∞ we have \(i_{n}^{+}\to S(\mathbf{e}_{q})\). Moreover, \(S(\mathbf{e}_{q})\geq X_{\tau_{i_{n}^{+}}^{+}}\geq i_{n}^{+}\) and \(X_{\overline{G}(\mathbf{e}_{q})}=S(\mathbf{e}_{q})\). Hence \(\tau_{i_{n}^{+}}^{+}\to \overline{G}(\mathbf{e}_{q})\) where we use fact that by excluding a compound Poisson processes X j the supremum S(e q ) is uniquely attained. Hence the left-hand side of the above equation converges by the dominated convergence theorem. Thus the right-hand side also converges. Note that for any matrix A,

(25)

Thus

for some matrix B using the fact that the matrix (Φ(q+ξ)+α)IΛ(q+ξ) is invertible for q>0. Taking ξ=α=0, we obtain

which completes the proof of (16). Similarly, from (24),

and therefore,

(26)

for

(27)

Now (15) will follow straightforwardly from (27) and (16) since, by (12),

and hence

(28)

Identity (17) follows from (15) and Theorem 1(i). Indeed, note that the LHS of (13) applied to the dual equals . Putting (15) into (13) (where both identities are taken for the dual) will produce (17). Finally, from the proof of the Theorem 1(i) it follows that

which completes the proof of (18) in view of (15).

4.2 Proof of Theorem 2

For a general matrix A with the distinct eigenvalues λ i (hence with the independent eigenvectors s i ) such that \(\mathrm{Re}\;\lambda_{i}>0\), using Frullani integral and the representation A=S diag{λ i }S −1 with S=(s 1,…,s N ), for q>0 we can derive the following identities:

(29)

and

(30)

Lemma 1

Under assumption (20), for ξ strictly larger than the largest real part of an eigenvalue of F(α),

Proof

By additivity of the process X(t), there exists a matrix F such that Eexp{iαX(t)}=exp{F()t} (see [2, Prop. XI.2.2, p. 311]). Note that this matrix also has distinct eigenvalues. Then we have

Note that, by identity (20), the matrices

and

commute. This gives the assertion of the lemma by the factorization. □

From Lemma 1 and Theorem 1, using classical extension arguments for ξ≥0, we have

(31)

where

(32)

and

(33)

From Theorem 1(i), it follows that matrices H(α,ξ) and T(α,ξ) are invertible. Thus,

(34)

Moreover, each entry of the matrix H(α,ξ) is analytic in the upper half of the complex plane. The same applies also to the matrix H −1(α,ξ). Thus each entry of the LHS of (34) extends analytically to the lower half of the complex plane in α, and similarly each entry of the matrix on the RHS of (34) extends analytically to the upper half of the complex plane in α. Hence from Morera’s Theorem matrices on both sides of (34) can be defined in the whole α-plane. Observe that each entry of these matrices is a continuous and bounded function. Indeed, from definitions (32) and (33), by Jensen inequality it follows that each entry of the matrices H(α,ξ) and T(α,ξ) is bounded in respective regions. Note that the reciprocal of the determinant of H(α,ξ) is also bounded. Indeed, by (10), (16), and (32), we have

(35)

as |α|→∞ in the upper half complex plane, where f(α)∼g(α) means that f(α)/g(α)→1. Now the fact that each entry of H −1(α,ξ) is bounded follows from Phragmen–Lindelöf Theorem (see [11, Corr. 4.4]) and the asymptotics

which is a consequence of (35). Similarly, one can prove that each entry of the second factors of the RHS and LHS of (34) is bounded. Thus, by Liouville’s Theorem, each entry of (34) must be a constant. Putting α=ξ=0 gives the assertion of the theorem.

4.3 Proof of Theorem 3

From (15) and (10),

where A=Δ h (Φ(q))(Φ(q)IΛ(q))Δ h (Φ(q))−1. Using (30) and (12), this gives

In view of Theorem 2, this completes the proof.

4.4 Proof of Theorem 4

If there is an atom at x=ct in P(X(t)∈dx;J(t)), then the assertion of the theorem remains true. Assume now that P(X(t)∈ dx;J(t)) is absolutely continuous. By Kendall’s identity given in Theorem 3, it suffices then to prove that

or that for all q>0 and sufficiently large s>0

(36)

that is equivalent to

(37)

We prove (37) passing from its left-hand side to its right-hand side. Let \(\overline{q}=q-\kappa(s)\). The change of measure (5) and Wiener–Hopf factorization given in Theorem 1(ii) yields

where π s is a stationary measure of X under \(\mathbb{P}^{s}_{i}\). Note that

and from (10)

Hence

Using classical arguments for the reversed process, note that under \(\mathbb{P}^{s}_{\pi^{s}}\) we have

(38)

We can now proceed as follows:

where in the first equality we use (11) and in the third one we apply (38). The last equality gives the right-hand side of (37), which completes the proof.