The Manifold Of Variations: hazard assessment for short-term impactors

When an asteroid has a few observations over a short time span the information contained in the observational arc could be so little that a full orbit determination may be not possible. One of the methods developed in recent years to overcome this problem is based on the systematic ranging and combined with the Admissible Region theory to constrain the poorly-determined topocentric range and range-rate. The result is a set of orbits compatible with the observations, the Manifold Of Variations, a 2-dimensional compact manifold parametrized over the Admissible Region. Such a set of orbits represents the asteroid confidence region and is used for short-term hazard predictions. In this paper we review the Manifold Of Variations method and make a detailed analysis of the related probabilistic formalism.


Introduction
An asteroid just discovered has a strongly undetermined orbit, being it weakly constrained by the few available astrometric observations. This is called a Very Short Arc (VSA, Milani et al. (2004)) since the observations cover a short time span. As it is well-known, the few observations constrain the position and velocity of the object in the sky-plane, but leave almost unknown the distance from the observer and the radial velocity, respectively the topocentric range and range-rate. This means that the asteroid confidence region is wide in at least two directions instead of being elongated and thin. Thus one-dimensional sampling methods, such as the use of the Line Of Variations (LOV, Milani et al. (2005)), are not a reliable representation of the confidence region. Indeed, VSAs are often too short to allow a full orbit determination, which would make it impossible even to define the LOV. To overcome the problem three systems have been developed in recent years: Scout at the NASA JPL (Farnocchia et al., 2015), NEOScan at SpaceDyS and at the University of Pisa (Spoto et al., 2018;, and neoranger at the University of Helsinki (Solin and Granvik, 2018).
Scout 1 and NEOScan 2 are based on the systematic ranging, an orbit determination method which explores a suitable subset of the topocentric range and range-rate space (Chesley, 2005). They constantly scan the Minor Planet Center NEO Confirmation Page (NEOCP), with the goal of identifying asteroids such as NEAs, MBAs or distant objects to confirm/remove from the NEOCP and to give early warning of imminent impactors. NEOScan uses a method based on the systematic ranging and on the Admissible Region (AR) theory (Milani et al., 2004) to include in the orbit determination process also the information contained in the short arc, although little. Starting from a sampling of the AR, a set of orbits compatible with the observations is then computed by a doubly constrained differential correction technique, which in turn ends with a sampling of the Manifold Of Variations (MOV, Tommei (2006)), a 2-dimensional manifold of orbits parametrized over the AR and representing a two-dimensional analogue of the LOV. This combination of techniques provided a robust short-term orbit determination method and also a two-dimensional sampling of the confidence region to use for the subsequent hazard assessment and for the planning of the follow-up activity of interesting objects.
In this paper we review the algorithm at the basis of NEOScan and examine in depth the underlying mathematical formalism, focusing on the short-term impact monitoring and the impact probability computation. In Section 2 we summarise the AR theory and the main definitions concerning the MOV. Section 3 contains the results to derive the probability density function on an appropriate integration space, by assuming that the residuals are normally distributed and by a suitable linearisation. In Section 4 we show that without the linearisation assumed in the previous section we find a distribution known to be inappropriate for the problem of computing the impact probability of short-term impactors (Farnocchia et al., 2015). Lastly, in Section 5 we give a probabilistic interpretation to the optimisation method used to define the MOV, namely the doubly constrained differential corrections.

The Admissible Region and the Manifold Of Variations
As anticipated in the introduction, the systematic ranging method on which NEOScan is based makes a deep use of the Admissible Region. In this section we briefly recap its main properties and we point the reader to further references.
When in presence of a VSA, even when we are not able to fit a full least squares orbit, we are anyway able to compute the right ascension α, the declination δ, and their time derivativesα andδ, to form the so-called attributable (Milani et al., 2004). For the notation, we indicate the attributable as where all the quantities are referred to the mean of the observational times. The AR has been introduced to constrain the possible values of the range ρ and the range-rateρ that the attributable leaves completely undetermined. Essentially, we require that the observed object is a Solar System body, discarding the orbits corresponding to too small objects to be source of meteorites (the so-called shooting star limit). The AR turns out to be a compact subset of R ≥0 × R and to have at most two connected components. In the most common cases the AR has one component and the case with two components indicates the possibility for the asteroid to be distant. Since the AR is compact we can sample it with a finite number of points, and we use two different sampling techniques depending on the geometric properties of the AR and on the existence of a reliable least squares orbit. A nominal solution is an orbit obtained by unconstrained (i.e. full) differential corrections, starting from a preliminary orbit as first guess. If the differential corrections achieve convergence and the value of the geodesic curvature signal-to-noise ratio is greater than 3 (Milani et al., 2007), we say that the nominal orbit is reliable.
Grid sampling. When a reliable nominal solution does not exist, the systematic ranging is performed by a two-step procedure and in both steps it uses rectangular grids over the AR. In the first step a grid covers the entire AR, with the sample points for ρ spaced either uniformly or with a logarithmic scale depending on the geometry of the AR (see Figure 1, left). The corresponding sampling of the MOV is then computed, with the doubly constrained differential corrections procedure described below. Then we are able to identify the orbits which are more compatible with the observations by computing their χ 2 value (see Equation (2.2)), and we can restrict the grid to the region occupied by such orbits. Thus the grid used in the second step typically covers a smaller region and thus it has a higher resolution than the first one (see Figure 1, right).
Cobweb sampling. If a reliable nominal orbit exists, instead of using a grid we compute a spider web sampling in a suitable neighbourhood of the nominal solution. This is obtained by following the level curves of the quadratic approximation of the target function used to minimise the RMS of the observational residuals (see Figure 2).
For the formal definition of the AR and the proof of its properties the reader can refer to Milani et al. (2004). The operative definition used in NEOScan and a detailed explanation of the sampling techniques can be found in Spoto et al. (2018) and Del .
We now describe the Manifold Of Variations, a sample of orbits compatible with the observational data set. In general, to obtain orbits from observations we use the least squares method. The target function is where x are the orbital elements to fit, m is the number of observations used, ξ is the vector of the observedcomputed debiased astrometric residuals, and W is the weight matrix. As explained in the introduction, in general a full orbit determination is not possible for short arcs. Nevertheless, from the observational data set it is possible to compute an attributable A 0 . The AR theory has been developed to obtain constraints on Figure 1. Admissible Region sampling with the rectangular grids. Left. AR sampling performed with the first grid, covering the whole AR. Right. Sampling of the region containing the orbits with χ < 5 resulting from the first grid. In both plots the sample points are marked in blue when χ ≤ 2 and in green when 2 < χ < 5. the values of (ρ,ρ), so that we can merge the information contained in the attributable with the knowledge of an AR sampling. The basic idea of this method is to fix ρ andρ at some specific values ρ 0 = (ρ 0 ,ρ 0 ) obtained from the AR sampling, compose the full orbit (A 0 , ρ 0 ), and fit only the attributable part to the observations with a suitable differential corrections procedure.
Definition 2.1. Given a subset K of the AR, we define the Manifold Of Variations to be the set of points (A * (ρ 0 ), ρ 0 ) such that ρ 0 ∈ K and A * (ρ 0 ) is the local minimum of the function when it exists. We denote the Manifold Of Variations with M.
Remark 2.2. In general M is a 2-dimensional manifold, since the differential of the map from the (ρ,ρ) space to M has rank 2 (see Lemma 3.2).
The set K is the intersection of a rectangle with the AR in the case of the grid sampling, whereas K is an ellipse around ρ * in the cobweb case, where x * = (A * , ρ * ) is the reliable nominal orbit. To fit the attributable part we use the doubly constrained differential corrections, which are usual differential corrections but performed on a 4-dimensional space rather than on a 6-dimensional one. The normal equation is (2.1) We call K the subset of K on which the doubly constrained differential corrections converge, giving a point on M.
Definition 2.3. For each orbit x ∈ M we define the χ-value to be where Q * is the minimum value of the target function over K , that is Q * := min ρ∈K Q(A * (ρ), ρ). The orbit for which this minimum is attained is denoted with x * and referred to as the best-fitting orbit.
The standard differential corrections are a variant of the Newton method to find the minimum of a multivariate function, the target function Q (Milani and Gronchi, 2010). The uncertainty of the result can be described in terms of confidence ellipsoids (optimisation interpretation) as well as in the language of probability (probabilistic interpretation). In Section 5 we give a probabilistic interpretation to the doubly constrained differential corrections procedure.

Derivation of the probability density function
Starting from Spoto et al. (2018) we now give the proper mathematical formalism to derive the probability density function on a suitable space S which we define below. Upon integration this will lead to accurate probability estimates, the most important and urgent of which is the impact probability computation for a possible imminent impactor.
Our starting point is a probability density function on the residuals space, to propagate back to S. In particular, we assume the residuals to be distributed according to a Gaussian random variable Ξ, with zero mean and covariance Γ ξ = W −1 , that is Without loss of generality, we can assume that where I m is the m × m identity matrix. This is obtained by using the normalised residuals in place of the true residuals (see for instance Milani and Gronchi (2010)). For the notation, from now on we will use ξ to indicate the normalised residuals. Note that with the use of normalised residuals the target function becomes Q(x) = 1 m ξ(x) ξ(x). 3.1. Spaces and maps. Let us introduce the following spaces.
(i) S is the sampling space, which is R ≥0 × R if the sampling is uniform in ρ, R 2 if the sampling is uniform in log 10 ρ, and R ≥0 × S 1 in the cobweb case.
(ii) K ⊆ R ≥0 × R has already been introduced in Section 2, and it is the subset of the points of the AR on which the doubly constrained differential corrections achieved convergence.
(iv) M is the Manifold Of Variations, a 2-dimensional submanifold of X .
(v) R m is the residuals space, whose dimension is m ≥ 6 since the number of observations must be ≥ 3.
We now define the maps we use to propagate back the density p Ξ . First, the residuals are a function of the fit parameters, that is ξ = F (x), with F : X → R m being a differentiable map. The second map goes from the AR space to the MOV space.
Definition 3.1. The map f µ : K → M is defined to be f µ (ρ) := (A * (ρ), ρ), where A * (ρ) ∈ A is the best-fit attributable obtained at convergence of the doubly constrained differential corrections.
Lemma 3.2. The map f µ is a global parametrization of M as a 2-dimensional manifold.
Proof. The set K is a subset of R 2 with non-empty interior, the map f µ is at least C 1 and its Jacobian matrix is from which is clear that it is full rank on K .
The last map to introduce is that from the sampling space S to the AR space R.
Definition 3.3. The map f σ : S → R is defined according to the sampling technique: (i) if the sampling is uniform in ρ, f σ is the identity map; (ii) if the sampling is uniform in log 10 ρ, we have S = R 2 and f σ (x, y) := (10 x , y); (iii) if we are in the cobweb case S = R ≥0 × S 1 and the map f σ is given by where λ 1 > λ 2 are the eigenvalues of the 2 × 2 matrix Γ ρρ (x * ), the latter being the restriction of the covariance matrix Γ(x * ) to the (ρ,ρ) space, and v 1 is the unit eigenvector corresponding to λ 1 .
We then consider the following chain of maps and we use it to compute the probability density function on S obtained by propagating p Ξ back.
3.2. Conditional density of a Gaussian on an affine subspace. In this section we establish general results about the conditional probability density function of a Gaussian random variable on an affine subspace. Let m and N be two positive integers, with m > N . Let B ∈ M m,N (R) a m × N matrix with full rank, that is rk(B) = N . Consider the affine N -dimensional subspace of R m given by We can also assume that ξ * is orthogonal to W , that is ξ * ∈ Imm(B) ⊥ , otherwise it is possible to subtract the component parallel to W . Given the random variable Ξ with the Gaussian distribution on R m as in (3.1), we want to find the conditional probability density p Ξ|W of Ξ on W . Let R ∈ M m (R) be a m × m rotation matrix and let f R : R m → R m be the affine map Throughout this section, we use the notation f R (ξ) =: ξ R = ξ ξ , with ξ ∈ R m−N and ξ ∈ R N . We choose R in such a way that (⇒) Condition (3.3) implies that for all ξ ∈ R N there existsx ∈ R N such that 0 ξ = RBx. Since the multiplication by RB is injective, suchx is unique. Therefore there exists a well-defined map P : R N → R N such that P (ξ ) =x. The map P is linear and injective since Ker P = {0}, thus it is a bijection. If A is the N × N matrix associated to P −1 , it easily follows RB = ( 0 A ).
Theorem 3.6. The conditional probability density of Ξ is N (0, I N ), that is Proof. By the standard propagation formula for probability density functions under the action of a continuous function, we have that The conditional probability density of the variable f R (Ξ) on f R (W ) = U (see Lemma 3.5-(i)) is obtained by using that ξ = 0 and Lemma 3.5-(ii). We have where we have used that R n exp − 1 2 y y dy = (2π) n/2 for all n ≥ 1. Now the thesis follows by noting that the random variable f R (Ξ)|f R (W ) coincides with Ξ .
Corollary 3.7. The conditional probability density of Ξ on W is where ξ ∈ W .
Proof. From Lemma 3.5-(i) we know that f R | W is a bijection. Thus for all ξ ∈ R N there exists a unique ξ ∈ W such that 0 ξ = f R (ξ) = R(ξ − ξ * ). Now it suffices to use this fact in the equation for p Ξ (ξ ) in Theorem 3.6.
3.3. Computation of the probability density. We first derive the probability density function on the Manifold Of Variations from that on the normalised residuals space. The computation is based on the linearisation of the map F around the best-fitting orbit. In Section 4 we will instead show the result of the full non-linear propagation of the probability density function and discuss the reason for which this approach, although correct, is not a suitable choice for applications.
Theorem 3.8. Let X be the random variable of X . By linearising F around the best-fitting orbit x * , the conditional probability density of X on T x * M is given by Proof. The map F is differentiable of class at least C 1 and the Jacobian matrix of F is the design matrix B(x) = ∂F ∂x (x) ∈ M m,N (R). Since the doubly constrained differential corrections converge to x * , the matrix B(x * ) is full rank. It follows that the map F is a local parametrization of in a suitable neighbourhood of ξ * := F (x * ) = ξ(x * ). The set V is thus a 2-dimensional submanifold of the residuals space R m . Consider the differential We claim that ξ * is orthogonal to T ξ * V : since x * is a local minimum of the target function Q that is ξ(x * ) = ξ * ∈ Imm (B(x * )) ⊥ . By applying Corollary 3.7 we have that for ξ ∈ T ξ * V . The differential map is continuous and invertible (since it is represented by the matrix A(x * ), as in Lemma 3.4), thus we can use the standard formula for the transformations of random variables to obtain is the 6 × 6 normal matrix of the differential corrections converging to x * . Furthermore, in the approximation used, we have Now the thesis follows by normalising the density just obtained.
We now complete the derivation of the probability density function on the space S. In particular, to obtain the density on the space R we use the notion of integration over a manifold because we want to propagate a probability density on a manifold to a probability density over the space that parametrizes the manifold itself (see Theorem 3.9). Then the last step involves two spaces with the same dimension, namely R and S, thus we simply apply standard propagation results from probability density functions (see Theorem 3.10).
Theorem 3.9. Let R be the random variable on the space R. Assuming (3.4) to be the probability density function on M, the probability density function of R is where χ 2 (ρ) = χ 2 (x(ρ)) and G µ is the Gramian determinant Moreover, neglecting terms containing the second derivatives of the residuals multiplied by the residuals themselves, for all ρ ∈ K we have Proof. We have already proved that the map f µ : K → M is a global parametrization of M. From the definition of integral of a function over a manifold we have and the thesis follows since from Equation (3.2) The equation for ∂A * ∂ρ (ρ) is proved in Spoto et al. (2018), Appendix A, but to be complete we repeat the argument here. Let ρ ∈ K . By definition, the point x(ρ) = (A * (ρ), ρ) ∈ M is a zero of the function g : X → R defined to be The function g is continuously differentiable and we have where we neglected terms containing the second derivatives of the residuals multiplied by the residuals themselves. The matrix C A (x(ρ)) is invertible because ρ ∈ K , which means that the doubly constrained differential corrections did not fail. By applying the implicit function theorem, there exists a neighbourhood U of ρ, a neighbourhood W of A * (ρ), a continuously differentiable function f : The derivative ∂g ∂A is already computed above. For ∂g ∂ρ (x) we have The thesis now follows from Equation (3.5).
The final step in the propagation of the probability density function consists in applying the last map f σ : S → K , where S := f −1 σ (K ) is the portion of the sampling space mapped onto K . Theorem 3.10. Let S be the random variable on the space S. Assuming (3.4), the probability density function of S is where we used the compact notation χ 2 (s) = χ 2 (x(ρ(s))), G µ (s) = G µ (ρ(s)), and G σ is the Gramian of the columns of Df σ (s), so that G σ (s) = | det Df σ (s)|.
Proof. Equation (3.6) directly follows from the transformation law for random variables between spaces of the same dimension, under the action of the continuous function f σ : it suffices to change the variables from the old ones to the new ones and multiply for the modulus of the determinant of the Jacobian of the inverse transformation, that is f σ .
The determinant det Df σ (s) depends on the sampling technique and it is explicitly computed in Spoto et al. (2018), Appendix A. In particular, the straightforward computation yields the following: (i) if the sampling is uniform in ρ then det Df σ (s) = 1 for all s ∈ S = R ≥0 × R; (ii) if the sampling is uniform in the logarithm of ρ then det Df σ (s) = log(10)ρ(s) for all s ∈ S = R 2 ; (iii) in the case of the spider web sampling det Df σ (s) = r √ λ 1 λ 2 for all s ∈ S = R ≥0 × S 1 .
For the sake of completeness, we now recall how the density p S is used for the computation of the impact probability of a potential impactor. Each orbit of the MOV sampling is propagated for 30 days, recording each close approach as a point on the Modified Target Plane (Milani and Valsecchi, 1999), so that we are able to identify which orbit are impactors. Let V ⊆ M be the subset of impacting orbits, so that f −1 σ (f −1 µ (V)) ⊆ S is the corresponding subset of sampling variables. The impact probability is then computed as Of course the above integrals are evaluated numerically as finite sums over the integration domains: since they are subsets of S we use the already available sampling described in Section 2. In particular, we limit the sums to the sample points corresponding to MOV orbits with a χ-value less than 5. As pointed out in Spoto et al. (2018), this choice guarantees to NEOScan to find all the impacting regions with a probability > 10 −3 , the so-called completeness level of the system (Del Vigna et al., 2019).

Full non-linear propagation
In this section we compute the probability density function on the space R obtained by a full non-linear propagation of the probability density on the residuals space. In particular, the map F is not linearised around x * as we did in Theorem 3.10. This causes the inclusion in Equation (3.4) of the contribution of the normal matrix C as it varies along the MOV and not the fixed contribution C(x * ) coming from the orbit x * with the minimum value of χ 2 . With the inclusion of all the non-linear terms the resulting density of R has the same form as the Jeffreys' prior and is thus affected by the same pathology discussed in Farnocchia et al. (2015). This is the motivation for which we adopted the approach presented in Section 3.3.
Theorem 4.1. The probability density function of the variable R resulting from a full non-linear propagation of the probability density of Ξ is where C ρρ = Γ −1 ρρ and Γ ρρ ∈ M 2 (R) is the restriction of the covariance matrix Γ to the R space. Proof. The map f µ : K → M is a global parametrization of M. From the properties of the map F it is easy to prove that the map F • f µ : K → F (M) is a global parametrization of the 2-dimensional manifold V = F (M). From the notion of integration on a manifold we have and by the chain rule so that the Gramian matrix results to be where the matrices C AA = C A , C Aρ , C ρA = C Aρ , and C ρρ are the restrictions of the normal matrix C(x(ρ)) to the corresponding subspace. From Theorem 3.9 we have so that the previous expression becomes where the last equality is proved in Milani and Gronchi (2010), Section 5.4.

Conditional density on each attributable space
In this section we prove that the conditional density of the attributable A given ρ = ρ 0 ∈ K is Gaussian, providing a probabilistic interpretation to the doubly constrained differential corrections.
Once ρ 0 ∈ K has been fixed, consider the fibre of ρ 0 with respect to the projection from X to R, so that The fibre H ρ0 is diffeomorphic to A and thus H ρ0 is a 4-dimensional submanifold of X , actually a 4dimensional affine subspace, and the collection {H ρ0 } ρ0∈K is a 4-dimensional foliation of A × K ⊆ X . Theorem 5.1 gives a probability density function on each leaf of this foliation. Moreover, let us denote by φ ρ0 : A → H ρ0 the canonical diffeomorphism between A and H ρ0 , that is φ ρ0 (A) := (A, ρ 0 ) for all A ∈ A.
Theorem 5.1. Let A be the random variable of the space A. For each ρ 0 ∈ K , the conditional probability density function of A given R = ρ 0 is where we have used the compact notation C A (ρ 0 ) := C A (A * (ρ 0 ), ρ 0 ).
Proof. Define the map G ρ0 := F • φ ρ0 : A → R m . The differential of G ρ0 is represented by the design matrix B A ∈ M m,4 (R) introduced in (2.1). Consider the point x 0 = (A * (ρ 0 ), ρ 0 ) ∈ H ρ0 , where A * (ρ 0 ) is the best-fit attributable corresponding to ρ = ρ 0 , that exists since ρ 0 ∈ K . Given that the doubly constrained differential corrections converge to x 0 , the matrix B A (x 0 ) is full rank. It follows that the map G ρ0 is a global parametrization of that turns out to be a 4-dimensional submanifold of the residuals space R m , at least in a suitable neighbourhood of ξ 0 := F (x 0 ) = G ρ0 (A 0 (ρ 0 )). The map G ρ0 induces the tangent map between the corresponding tangent bundles DG ρ0 : T A → T V ρ0 . In particular we consider the tangent application (DG ρ0 ) A * (ρ0) : T A * (ρ0) A → T ξ0 V ρ0 .
To use this map for the probability density propagation, we first need to have the probability density function on T ξ0 V ρ0 , that is an affine subspace of dimension 4 in R m . By Theorem 3.6 we have that the conditional probability density of Ξ on T ξ0 V ρ0 is N (0, I 4 ). Let R represent the rotation of the residuals space R m such that condition (3.3) holds, and let A(x 0 ) ∈ M 4 (R) as in Lemma 3.4, so that the matrix A(x 0 ) −1 represents the inverse map (DG ρ0 ) A * (ρ0) −1 . By the transformation law of a Gaussian random variable under the linear map A(x 0 ) −1 , we obtain a probability density function on the attributable space A given by where A is the random variable on A and Γ A (A * (ρ 0 )) = A(x 0 ) −1 I 4 (A(x 0 ) −1 ) = A(x 0 ) −1 (A(x 0 ) −1 ) .
As a consequence, the normal matrix of the random variable A is which is in turn the normal matrix of the doubly constrained differential corrections leading to x 0 , computed at convergence. This completes the proof since A and A×{ρ 0 } are diffeomorphic and thus the density p A (A) is also the conditional density of A given R = ρ 0 .

Conclusions and future work
In this paper we considered the hazard assessment of short-term impactors based on the Manifold Of Variations method described in Spoto et al. (2018). This technique is currently at the basis of NEOScan, a software system developed at SpaceDyS and at the University of Pisa devoted to the orbit determination and impact monitoring of objects in the MPC NEO Confirmation Page. The aim of the paper was not to describe a new technique, but rather to provide a mathematical background to the probabilistic computations associated to the MOV.
Concerning the problem of the hazard assessment of a potential impactor, the prediction of the impact location is a fundamental issue. Dimare et al. (2020) developed a semilinear method for such predictions, starting from a full orbit with covariance. The MOV, being it a representation of the asteroid confidence region, could be also used to this end and even when a full orbit is not available. The study of such a technique and the comparison with already existing methods will be subject of future research.