The Skorohod Topologies

This paper presents a gentle and informal introduction to the Skorokhod topologies. Focus is on motivating examples and concepts


Introduction
As soon as we stray from the theory of continuous stochastic processes, we are in need of a suitable space of discontinuous functions and a topology on it.Skorokhod proposed in [Sko56] the topology used predominantly today and which has since inherited his name.When I started to work with discontinuous stochastic processes and needed to understand the Skorokhod space, I struggled to find textbooks or lecture notes providing an easy start into the topic.The general tenor is that "constructing [the] Skorokhod topology and deriving tightness criteria are rather tedious" (see [JS03,Chapter VI]).That gave me the impression that the Skorokhod topology is a very technical tool which has no real motivation.
After working with it for some years, I believe that there are simple and intuitive ideas underlying this construction which might facilitate the understanding.Unfortunately, these are not the main focus in most textbooks as the proofs are already long enough as is.For the same reason, very few textbooks explore all four Skorokhod topologies and focus only on the main one, also known as J 1 -topology.Here, I want to take the time to expose the ideas underlying the four Skorokhod topologies.
This paper is not meant to give a complete overview on the Skorokhod topologies, nor will I include any proofs.Instead, I will concentrate on pictures, examples and heuristics in the hope of building intuition.Nevertheless, I try to give a reference to every major fact I mention.Note that I will not discuss non-Skorokhod topologies such as the S-topology defined in [Jak97].
Most of what I present is taken from the two books [Bil99; Whi02] which I highly recommend for delving into the subject: the former focuses on the prevalent topology on real-valued processes, the latter takes a more general approach.Some general results are taken from [EK86].For examples on how to apply these general results to measure-valued processes, I recommend the first chapter of [Eth00].
The rest of the paper has a very simple structure: first, I try to motivate why we need a new topology and what we should expect it to look like.Next, I derive the two main J 1 -and M 1 -topologies for real-valued processes on a finite time interval.Eventually, I conclude by presenting generalisations of these topologies.
Acknowledgement I would like to thank Andrey Pilipenko for bringing the collection [DKP + 16] of selected works of Skorokhod to my attention.This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 Scaling Cascades in Complex Systems, Project Number 235221301, Project C02 Interface dynamics: Bridging stochastic and hydrodynamic descriptions.

Motivation
In this first section, I illustrate why we need to define a new topology and what properties we would want it to have.

Convergence of Continuous Processes
Let us start with the most famous example of convergence of stochastic processes: Donsker's Theorem, a.k.a. the functional central limit theorem.Consider a simple random walk (S n ) n≥0 on Z starting at S 0 = 0, and define the rescaled and interpolated continuous process We consider (Y N • ) N ≥1 as a random sequence in C([0, 1]) endowed with the Borel σ-algebra of the topology of uniform convergence.Donsker's Theorem states that the sequence (Y N • ) N ≥1 converges in distribution to a standard Brownian motion B on the time interval [0, 1].To prove this, one uses that C([0, 1]) endowed with the topology of uniform convergence is a Polish space1 so that we may apply Prokhorov's Theorem.
Theorem 2.1 (Prokhorov; see e.g.[Bil99, Section 5] or [Bog07, Theorems 8.6.2 and 8.9.4] for more complete statements).Let E be a Polish space with its Borel σ-algebra.Let P(E) denote the set of probability measures on this measurable space endowed with the topology of weak convergence of measures.Then the following holds true: 1. the space P(E) is again Polish, 2. a set of probability measures K ⊆ P(E) is relatively compact if and only if it is tight2 .
Corollary 2.2.A sequence of processes (X N ) N ≥1 ⊂ C([0, 1]) converges in law to some X ∈ C([0, 1]) if and only if the sequence (X N ) N ≥1 is tight and the finite-dimensional distributions of (X N ) N ≥1 converge to those of X, i.e. for all That means that one only needs to check those two things to prove Donsker's Theorem.More precisely, we really need to worry only about tightness as the convergence of the finite-dimensional distributions is an application of the usual Central Limit Theorem.Since tightness is a statement about compact sets of the underlying space, we need a good characterisation of compact sets in C([0, 1]).For this, we use another powerful theorem: ii) the set K is uniformly equicontinuous, i.e. for every ε > 0 there exists some δ > 0 such that In terms of the modulus of continuity the second condition may be rewritten as ) is tight if and only if i) the sequence is bounded at 0 in the sense that for all ε > 0, one has ii) for every ε > 0, one has With all this machinery at our disposal, it becomes straightforward to prove Donsker's Theorem.One only needs to do two things: 1. use Prokhorov's Theorem to prove relative compactness via tightness; 2. identify all limit points through their finite-dimensional distributions.Don't get me wrong: each point in itself might be difficult to prove.But at least we have a strategy on how to approach the problem.It turns out that this strategy is very general: only the first point depends on the particular topology which we put on the space of functions.For it to work, we first and foremost need Prokhorov's Theorem.That means that we want a Polish topology.But there is a second ingredient on which we relied heavily.To prove tightness, we need a good way to characterise the compact sets of the topology we work in.In the case of the uniform topology on C([0, 1]), this is taken care of by the Arzelà-Ascoli Theorem.

Convergence of Discontinuous Processes
We will now try to apply our insights from the previous section to the convergence of discontinuous processes.More precisely, we first need to identify what space of functions we are interested in and what topology we can endow it with.
A first idea could be to go to the next bigger space we are familiar with and which extends the topology of uniform convergence: the space of bounded measurable functions B([0, 1]) with the topology of uniform convergence.The way I am presenting this, it becomes clear that this is not a good choice: even though the space is complete, it is not separable and therefore not Polish.And when we want to do probability theory, that is not a good sign; particularly with Prokhorov's Theorem in mind.Despite having quite a nice characterisation of the compact sets of B([0, 1]) similar to the Arzelà-Ascoli Theorem 2.3, it is not the right space to work in.Now that we ruled out the obvious choice, we need to decide on how to proceed.The first step is to choose the right space of functions.In other words: what type of functions are relevant to us?Note that we are mostly interested in martingales and Markov processes, often characterised through their generator or their martingale problem.For these, we have a very nice regularity result: Theorem 2.5 (see e.g.[Low09]).A sub-or supermartingale (M t ) t≥0 with respect to a right continuous filtration has a càdlàg3 modification whenever t → E[M t ] is right continuous.In particular, a martingale w.r.t. a right continuous filtration always has a càdlàg modification.
That indicates that the space of càdlàg functions seems to be the right choice.We will denote this so-called Skorokhod space4 by D [0,1] (R).In general, the Skorokhod space of càdlàg functions on [0, T ] (resp.[0, +∞)) with values in a (hopefully Polish) space E will be denoted by Now that we have identified the "right" space of functions, we need to identify the "right" topology.It turns out that there is not one good topology.So instead, we will identify the right properties a good topology should have.The most important part is the applicability of Prokhorov's Theorem.In other words, we want The topology is Polish.
(Polishness Property) to hold.In most applications, it is enough to weaken this condition to The space is separable; and if a family of measures on it is tight, then it is relatively compact w.r.t. the topology of weak convergence.
(SepProkhorov Property) which amounts to the "important" part of Prokhorov's Theorem.However, it is usually preferable to have a Polish space.The second important ingredient in our strategy was the Arzelà-Ascoli Theorem that characterises the compact sets for the topology of uniform convergence.Hence, we would want An Arzelà-Ascoli type theorem describing compact sets exists.
If both conditions (Polishness Property) and (ArzAsc Property) are satisfied, we have a "good" topology.Nevertheless, there are other properties that one could wish for.For example, it would be great if the new topology extends the topology of uniform convergence on C([0, 1]).More formally, this means that The trace5 topology on C([0, 1]) is the topology of uniform convergence.
(Extension Property) Even though this property seems very natural, there is an important argument against it: the space of continuous functions with the topology of uniform convergence is complete.That means that if the new topology extends it, it is impossible for a sequence of continuous functions to converge to a discontinuous function.In other words, a topology extending the topology of uniform convergence may be too strong for some applications.
There is a last "bonus" property which would be nice to have.Imagine a sequence of functions converging to some continuous function.Since the limit lies in the subspace, where the topology is "stronger", it would be great if this would automatically strengthen the mode of convergence to uniform convergence, i.e.
(Bonus Property) If the new topology satisfies (Bonus Property), we would not need to worry about the topology of uniform convergence any more at all.Whenever the limit is continuous, we get the uniform convergence for free!Equipped with these constraints, we will start constructing Skorokhod topologies!
3 Skorokhod Topologies on D [0,1] (R) In this main section, we construct the topologies on the Skorokhod space: first, we will try to understand how we might want to tweak the uniform topology; then, we will see the actual definitions of these topologies.A little word of caution: the rigorous proofs of the facts that I will state are very technical.So instead, I will use the old magician's trick and refer to the books [Bil99; Whi02] that present those proofs nicely.

What Exactly Goes Wrong with the Uniform Topology?
In Section 2.2, I pointed out that one major problem of the uniform topology is that it is not separable anymore.I did not give any proof of this statement, so here it is: all functions of the form 1 [x,1) are at ∥ • ∥ ∞ -distance one of each other.Indeed, if we take x < y < 1, then This proves that there is an uncountable family of functions which are all at distance one from each other in the uniform topology.Hence, the uniform topology is not separable on But let us take another point of view.Perhaps the non separability is not the main problem here.Perhaps it is rather a consequence of an even bigger problem: shouldn't hold from an intuitive point of view?That would immediately force these functions to get "closer" together and prevent the non separability6 .This insight transforms the problem of finding a topology similar to the uniform topology but preventing non separability into the problem of tweaking the uniform convergence so that this sort of convergence is allowed.To narrow down on what exactly keeps these indicator functions apart, let us have a closer look at what ε-balls look like in the uniform topology.Let us take the example of f = 1 [ 1 2 ,1) .Then, the ε-ball around f contains all functions whose graphs lie in the so-called ε-tube around the graph of f , see Figure 1.This tube forces functions to be ever closer to f , but allows them to wiggle a little bit up and down.Note that this corresponds to a spatial wiggle.When functions are continuous, that is all perfectly fine, because a wiggle in time can be translated into a wiggle in space.However, when we have a discontinuity, this is not true anymore.As soon as we move the discontinuity a bit to the left or to the right, we necessarily leave the tube and are immediately "far away" from f .
That means that we need to modify the ε-tubes to allow for some temporal wiggle in addition to the spatial wiggle already accounted for.There are two options that may come to mind.The more minimalistic approach would be to extend the ε-tube by a little bit at a discontinuity to get ε-"gloves", see Figure 2a.The other, more generous approach, would be to connect the two ends of the ε-tube at a discontinuity, see Figure 2b.
In reality, Skorokhod defined in its foundational paper [Sko56] four different topologies, a strong and a weak version for each approach.They are now commonly referred to by  the rather obscure names J 1 -, J 2 -, M 1 -and M 2 -topologies.The J-topologies arise from the minimalistic approach and the M -topologies from the more generous one.For all those intimidated by these cryptic names: in the end, only the J 1 -topology is commonly used and therefore referred to as the Skorokhod topology.It appears that the M -topologies also have their use in various problems, whereas you will most certainly not encounter the J 2 -topology at all.That means that this mess reduces to i) one main (J 1 -)Skorokhod topology everybody should be familiar with and ii) a second type of (M -)Skorokhod topologies one should have a general idea of.
Figure 3: Relationships between the different Skorokhod topologies.Here τ → σ means that τ is stronger than σ in the sense that any sequence converging in τ does also converge in σ.U denotes the topology of uniform convergence.
In this paper, I will only discuss the two main topologies J 1 and M 1 .But before getting into the details, I want to illustrate how these topologies relate to each other.Keeping the above in mind, we should expect the J-topologies to be stronger than the M -topologies, as the latter allow more functions to be close.However, the situation is a bit more complicated, see Figure 3.The good news are that the Skorokhod topology J 1 is stronger than all the other "new" topologies.That means that whenever a convergence is shown to hold in J 1 , then it holds in all the other Skorokhod topologies.Figure 4 illustrates for what extra type of convergence the different topologies allow.These examples are taken from [Whi02, Figure 11.2] and can partially be found already in [Sko56], see also [DKP + 16, Limit Theorems for Stochastic Processes].

The Topologies J 1 and M 1
Now comes the more difficult part of translating our intuition into real definitions.We will start with the Skorokhod topology J 1 and finish with the M 1 -topology.
In the minimalistic setting, we want to allow for some temporal wiggle, without connecting the two ends of the ε-tube across a discontinuity.To reformulate this mathematically, we will perform a time change.By a change of time I mean that we take a strictly increasing bijection λ : [0, 1] → [0, 1] and consider f • λ instead of f .Naturally, we are only interested in parametrisations that are "close" to the unitary flow of time, corresponding to the trivial parametrisation id : t → t.In other words, we need to penalise parametrisations which are "too far away" from id.This leads to the following definition of distance: where the infimum is taken over all increasing bijections on [0, 1].It can be shown that d J 1 is a metric and it is usually referred to as Skorokhod metric, see e.g.[Bil99, Section 12] and one defines J 1 to be the topology induced by this metric.Unfortunately, it turns out that this is actually a bad metric in the sense that it is not complete.Consider the following example from [Bil99, Example 12.2].Take the indicator functions f n := 1 [0,2 −n ) and define the change of time λ n as the linear interpolation of the three points (0, 0), (2 −n , 2 −(n+1) ) and (1, 1), see Figure 5.One easily checks that the change of time is such that To bound the penalty ∥λ n − id∥ ∞ on the change of time, note that they differ maximally at n+1) .This gives In particular, this error is summable and we conclude that (f n ) n is a Cauchy sequence.Since f n (x) converges to 0 for all x ∈ (0, 1), the only possible limit is the null function f = 0.However, whatever change of time we apply, the null function doesn't change.Hence, That means that although (f n ) n is Cauchy, it does not converge.The problem of the Skorokhod metric is quite subtle and lies within the penalty we put onto the parametrisation λ: we measure the absolute distance between λ and id.However, it is better to think of parametrisations as a modified flow of time.In this sense, it would be better to measure the difference in flow speed.In other words, we want parametrisations with a nearly constant speed λ(t) − λ(s) t − s ≈ 1.
In the above example, the slope of λ n will never converge to 1 (not even pointwise), as for every n ≥ 1.To fix this, we introduce the new penalty leading to the modified metric It turns out that both metrics are equivalent, i.e. induce the same topology.However, this modified metric is complete!For this reason, this metric is sometimes called Skorokhod metric instead of the previous one, leaving the original metric without any special name.It can be shown that J 1 is separable (see e.g.[Bil99, Theorem 12.2]), meaning that J 1 satisfies the (Polishness Property).
Recalling that the uniform topology on continuous functions does not care about small temporal distortions, one easily verifies that J 1 also satisfies the (Extension Property), i.e. if a sequence of continuous functions converges in J 1 , then the convergence is uniform (and conversely).
The only thing we still need is a good description of compact sets, i.e. that J 1 satisfies the (ArzAsc Property).Fortunately, there is indeed a result similar to the Arzelà-Ascoli Theorem!The only thing we need to adapt is the definition of the modulus of continuity so that it ignores jump discontinuities.This is achieved by allowing the function to jump at a finite number of points: where the infimum is taken over all finite partitions 0 = t Note that the innermost supremum only ranges over the right open interval [t i−1 , t i ), allowing for jumps at times t i .To distinguish it from the "real" modulus of continuity, I refer to it as modulus of continuity type function.
ii) the modulus of continuity type function vanishes uniformly over K: Note that it is not enough to have a uniform bound of |f (0)| as before!This was possible in the case of continuous functions, because we also imposed uniform equicontinuity.Since we now allow for jumps, we need to strengthen this condition to a uniform bound on the entire interval.
That means that J 1 satisfies all of the constraints that are really important to us.The only thing we are left to check is the (Bonus Property) of strengthened convergence whenever the limiting function is continuous.From the heuristic, it is conceivable that this holds for J 1 as we only allow for small distortions in time.This idea can be made rigorous by noting that we may shift the change of time onto the limit: take f n → f in J 1 with f continuous.Then there is a sequence (λ n ) n of time changes such that A extension of this statement can be found e.g. in [EK86, Proposition 3.6.5]and the preceding comment.
We finish with the J 1 topology by pointing out that there is an immediate drawback in satisfying the (Extension Property): continuous functions cannot converge to discontinuous functions as C([0, 1]) is closed in the uniform topology!This explains why we cannot expect to see convergences of the type shown in Figure 4b.
Let us now come to the M 1 -topology.Recall that the idea was to generously extend the ε-tube across the discontinuity, see Figure 2b.To make this more rigorous, we will work with the so-called completed graph containing the graph of f together with the straight lines connecting the two ends of a jump discontinuity.Here, I use the notation f (t−) to denote the left limit of f in t which exists by definition of D [0,1] (R).For example, in Figure 2b, the completed graph corresponds to the graph together with the dotted line.
From here on, the idea is very similar to the one used for the J 1 -topology.To allow for some freedom, we use again parametrisations.The only difference is that we will now parametrise the completed graph, i.e. we will take non decreasing functions (λ, ρ) : [0, 1) → Γ(f ) which are onto.Here, λ is the temporal component and ρ is the spatial component.Note also that we use the intuitive order on Γ(f ) to define monotonicity: In words, we use the temporal ordering coming from drawing the completed graph from left to right without lifting the pencil.We then define the distance between two functions f and g through as the minimal distance between any two parametric representations of f and g.Again, one checks that d M 1 indeed is a metric and one defines M 1 to be the induced topology.The advantage of this metric is that we get the additional convergence illustrated in Fig- ure 4b.Symmetrically, the drawback is that M 1 does not verify the (Extension Property).Except from this, all other constraints are satisfied: M 1 defines a Polish topology7 on D [0,1] (R), see e.g.[Whi02, Theorem 12.8.1]and also satisfies the (Bonus Property) property of strengthened convergence whenever the limit is continuous.The description of compact sets is a bit more difficult, but it is still possible.Before we define the new modulus of continuity we will need here, one should keep in mind that M 1 is weaker than J 1 .That means in particular that the above criterion for compactness still is sufficient.Then again, in most of the cases if the above applies, one usually directly works with the J 1 -topology.
Define the new modulus of continuity type function Using the notation d(x, A) for the distance between x and the set A, this can be written more compactly as A small modulus of continuity type function ensures that on small intervals, the graph is close to straight lines.This is a way to exclude oscillations, but allows for ever steeper slopes, see Figure 4b.
ii) the oscillations vanish uniformly over K: The above description seems to differ from the short characterisation in Theorem 3.1, but a similar description for compact sets w.r.t.J 1 can be found e.g. in [Bil99,Theorem 12.4].
We finish this section with a small overview of which topologies have which properties.For completeness, I also include the topologies J 2 and M 2 .These additional properties follow columnwise from [Whi02, Theorem 11.6.6];[DKP + 16, Limit Theorems for Stochastic Processes, Section 2.7] and [Whi02, Theorems 12.12.2];a similar argument for J 2 as in Section 3 and for M 2 the fact that it is weaker than M 1 ; and [Whi02, Corollary 12.11.1]together with the fact that J 2 is stronger than M 2 .In this section, I simply restate the previous compactness results in terms of tightness of stochastic processes on D [0,1] (R).I will furthermore state the generalisation of Corollary 2.2 to the topologies J 1 and M 1 .This is not immediate, because the projections π t : f → f (t) are not continuous anymore: already in J 1 , the projection π t , t ∈ (0, 1), is continuous in f if and only if f is continuous in t.(The projections π 0 and π 1 are always continuous.)That means that we cannot hope to have the convergence of all finite-dimensional distributions.Instead, we define the set of all continuity points of a stochastic process X by Note that 0 is always a continuity point of any stochastic process X, but X may be discontinuous in t = 1.As such, it would be better to say that T X is the set of times t ∈ [0, 1] such that the projection ii) for every ε > 0, it holds that For sufficient criteria of tightness in J 1 in a very general setting, one may refer to [EK86, Chapter 3], or [Eth00, Chapter 1] for a less general approach.Convergence criteria for M 1 and M 2 may be found in [Whi02,Chapter 12].

Extending J 1 and M 1 to More General Skorokhod Spaces
There are some generalisations we are interested in.The first concerns the time interval: obviously, there is no problem with substituting [0, 1] with some other finite time interval [0, T ], but what about the entire half line [0, +∞)?Secondly, we would like to be able to consider the space D [0,T ] (R k ) and compare its topology with the product topology D [0,T ] (R) k .Finally, we would like to generalise the topology to a more general range space E to get Skorokhod topologies on D [0,T ] (E).
This section is relatively short as I will simply point out possible restrictions and the necessary modifications needed to generalise the topologies.

Extending Time
As mentioned above, there is no difficulty in extending the topology to D [0,T ] (R) for any finite time horizon T > 0. One simply replaces the 1s in all definitions by T s.
Once we can define the topology on any finite time horizon, we would like to extend it to and only if all restrictions to time intervals of the form [0, T ] converge to the restrictions of f to these intervals.However, this approach is doomed: the sequence given by f The problem is that the endpoint of the time interval is different from all other points.Since this problem disappears whenever it is a continuity point of the limit function, one remedy is to say that f n converges to f if and only if all restriction to the time intervals of the form [0, T ], with T a continuity point of f , converge to the restrictions of f to these intervals.A different approach is demonstrated in [Bil99,Section 16].
It is possible to define a metric which induces this topology, see e.g.[Whi02, Sections 3.3 and 12.9] or [Bil99, Section 16].One checks that D [0,+∞) (R) is again a Polish space, see e.g.[Bil99, Theorem 16.3] in the case of J 1 .By our definition, compactness can be checked by restricting to compact time intervals and using the compactness criteria discussed above.

Product Skorokhod Topologies
The next step is to generalise the topologies to processes with values in R k .The easiest solution is to wait for the next section and take the range space R k .However, there is a second way we can get a topology on D [0,1] (R k ) by using the fact that k in the sense of a bijection.From the point of view of topologies, this implies that we may endow D [0,1] (R k ) with the product topology on D [0,1] (R) k .It turns out that this topology differs from the topology we would get from the next section.More precisely, the product topology is weaker.For this reason, we speak of the strong topologies obtained by viewing R k as the range space and the weak topologies obtained as product topologies.Intuitively, the product topology allows for different time parametrisations in every component whereas in the strong topology, the same time change is used for all components.
A detailed discussion of the weak M -topologies can be found in [Whi02, Chapter 12].Unfortunately, I have not found much literature on the weak J-topologies.It might be that they are less commonly used.One use may be found e.g. in [EK19, Proof of Lemma 4.3], see also [Whi02,Theorem 11.5.1].

Towards General Range Spaces
Let E be a general Polish space.When going back to the definition of J 1 in Section 3.2, we only need to replace ∥f • λ − g∥ ∞ by sup where d E is a complete metric on E. It turns out that everything else goes through without any problem.To generalise the compactness result, we only need to adjust again the definition of the modulus of continuity to ω ′ δ (f ) := inf One can verify that J 1 preserves all the good properties as long as E is Polish, see e.g.[EK86, Chapter 3].
One would hope that this is also possible with the M 1 -topology.However, its definition relies on the fact that we can define the straight line between two points in E. In other words, we need an additive structure to define M 1 , i.e. we can generalise M 1 only to Banach spaces.In this setting, the interval [f (t 1 ), f (t 2 )] has to be interpreted as the line {(1 − α)f (t 1 ) + αf (t 2 ) : α ∈ [0, 1]} from f (t 1 ) to f (t 2 ).This restriction is another reason why J 1 has become the more prevalent topology.

Figure 5 :
Figure 5: The change of time λ n compared to the usual time flow id.
Convergence of Stochastic Processes in J 1 and M 1 is a.s.continuous.It turns out to be enough to check convergence on T X , which is almost surely dense in [0, 1], see [Bil99, Section 13].Theorem 3.3 (see e.g.[Whi02, Theorem 11.6.6]).A sequence of càdlàg processes (X n ) n≥1 converges in law to a càdlàg process X w.r.t.either J 1 or M 1 if and only if (X n ) n≥1 is tight in the respective topology and all finite dimensional distributions at times t i ∈ T X converge to those of X.Theorem 3.5 (Tightness in M 1 , see e.g.[Whi02, Theorem 12.12.3]).A sequence of càdlàg processes (X n ) n≥1 is tight w.r.t.M 1 if and only if Theorem 3.4 (Tightness in J 1 , see e.g.[Bil99, Theorem 13.2]).A sequence of càdlàg processes (X n ) n≥1 is tight w.r.t.J 1 if and only if