Impact probability computation of Near-Earth Objects using Monte Carlo Line Sampling and Subset Simulation

This work introduces two Monte Carlo (MC)-based sampling methods, known as line sampling and subset simulation, to improve the performance of standard MC analyses in the context of asteroid impact risk assessment. Both techniques sample the initial uncertainty region in different ways, with the result of either providing a more accurate estimation of the impact probability or reducing the number of samples required during the simulation with respect to standard MC techniques. The two methods are firstly described and then applied to some test cases, providing evidence of the increased accuracy or the reduced computational burden with respect to a standard MC simulation. Finally, a sensitivity analysis is carried out to show how parameter setting affects the accuracy of the results and the numerical efficiency of the two methods.

other celestial bodies, with the risk of impacting and contaminating them. For this reason, planetary protection policies set specific requirements to avoid the contamination of celestial bodies due to man-made debris in interplanetary missions, with time periods under study that generally span up to 100 years (Kminek , 2012). The estimation and propagation of the orbital state of these objects is therefore of paramount importance.
Current approaches for robust detection and prediction of planetary encounters mainly refer to linearised models or full nonlinear orbital sampling. The application of linear methods in the impact plane was introduced by Chodas (1993), whereas the introduction of the Monte Carlo technique to this problem was developed by Yeomans and Chodas (1994) and Chodas and Yeomans (1999), and it is based on the sampling of the linear six dimensional confidence region at the observation epoch and the integration over the time interval of investigation using fully nonlinear equations (Milani et al. , 2002). Milani (1999), Milani et al. (2000b) and Milani at al. (2000a) applied the multiple solutions approach to sample the central line of variations (LOV) of the nonlinear confidence region at the initial epoch and then numerically integrate over the time span of interest in a similar way. This method currently represents the reference approach for impact monitoring.
The preferred approach depends on the uncertainty in the estimated orbit, the investigated time window and the dynamics between the observation epoch and the epoch of the expected impact. Linear methods are preferred when linear approximations are reliable for both the orbit determination and uncertainty propagation. When these assumptions are not valid, more computationally intensive techniques are used: among these, Monte Carlo methods are the most accurate but also the most computationally intensive, whereas the LOV method guarantees compute times 3-4 orders of magnitude lower than those required in MC simulations, though the LOV analysis may grow quite complex after it has been stretched and folded by multiple close planetary encounters (Farnocchia et al. , 2015;Losacco et al. , 2018).
We present in this paper two advanced Monte Carlo methods, known as line sampling and subset simulation, and we show their possible application to NEA impact probability computation. The line sampling method probes the impact region of the uncertainty domain by using lines instead of points. The impact probability is estimated via analytical integration along such lines, resulting in a more accurate solution. The subset simulation method computes the impact probability as the product of larger conditional probabilities. The method progressively identifies intermediate conditional levels moving towards the impact event, reducing the overall number of samples required for the estimation.
The paper is organised as follows. Sections 2 and 3 show a detailed theoretical descriptions of the two methods. Then, the results of four different test cases are presented in Section 4, and a comparison with the performance of standard MC is shown. Finally, the sensitivity of both methods to parameter setting is investigated in Section 5.

Line sampling
The Line Sampling (LS) method is a Monte Carlo-based approach for the estimation of small probabilities. It was originally developed for the reliability analysis of failure in complex structural systems (Zio and Pedroni , 2009), and it was adapted and applied to the estimation of impact probability of small objects with major celestial bodies for this work. The main concept of this method is the estimation of the probability via analytical evaluation, by identifying within the uncertainty domain the boundaries of the impact regions, i.e. the set of initial conditions that lead to an impact within the given time. This result is obtained by considering several one-dimensional problems across the uncertainty domain. The analytical evaluations are carried out along lines following a reference direction, which is determined so that it points toward the impact region of the domain. If this direction is properly chosen, the method can considerably reduce the number of required propagations with respect to a standard MC.
The LS method has four steps: 1) the mapping of random samples from the physical coordinate space into a normalised standard space; 2) the determination of the reference direction α; 3) the probing of the impact region along the lines following the reference direction; 4) the estimation of the impact probability. A summary of each step is provided hereafter, along with an introduction to the choices made for the numerical implementation of the LS technique.

Mapping onto the standard normal space
The random vectors x ∈ R n of physical coordinates (position and velocity) need to be mapped onto the standard normal space. This transformation grants efficiency to the method, especially for problems with high dimensionality, as each component θ j , j = 1, ..., n of the new parameter vector θ ∈ R n , to which x is mapped, is associated with a standard normal distribution. The joint probability density function (pdf) of these random parameters is where φ j denotes the unit Gaussian pdf associated with the j-th component of θ (Zio and Pedroni , 2009): This enables a simplification of the computation of the probability later in the procedure, as it reduces the problem to a series of one-dimensional analytical evaluations. The direct and the inverse transformations, from the physical domain to the standardised one and vice versa, preserve the joint Cumulative Distribution Function (CDF) between the two coordinate spaces, and they are defined as: with Φ and F being the CDF of the unit Gaussian distribution and the input uncertainty distribution of the problem, respectively. The Rosenblatt transformation is applied in this work (Rosenblatt , 1952), since, for Gaussiandistributed uncertainty parameters (as in the cases under study), both the direct and the inverse transformations (respectively Eqs. 4 and 5) become linear (Zio and Pedroni , 2009;Rosenblatt , 1952).

Determination of the reference direction
The reference direction α can be determined in different ways (Zio and Pedroni , 2009). In this work, it is determined as the direction of a normalised "center of mass" of the impact region. This region is approximated by applying the Metropolis-Hastings algorithm to generate a Monte Carlo Markov Chain (MCMC) lying entirely in the impact domain, starting from an impacting initial condition (Zio and Pedroni , 2009). In the current implementation, this starting condition is found with an optimisation process aimed at minimising the planetocentric distance from the target body (the MATLAB function fmincon was applied), but alternative methods, such as a pure MC sampling, can be applied. The reference direction α is then computed in the standard normal space as where θ u , u = 1, ..., N S are the points of the Markov chain made of N S samples converted from the physical space into the standard normal space. The simulations performed for the Markov chain require additional computational effort with respect to standard MC methods. Nevertheless, this option provides a good coverage of the impact region and a resulting better accuracy of the final probability estimate.

Line sampling
For each random sample θ k , k = 1, ..., N T , a line parallel to α is defined in the standard normal space according to the parameter c k , such that where α, θ k is the scalar product between α and θ k . In this way, the problem is reduced to a series of one-dimensional evaluations associated with each sample, with c k normally distributed in the standard space.
The standard domain is then explored along each line by iteratively evaluating a performance function Y (c) to identify the values of c k corresponding to the intersections between the line and the impact region, as displayed in Fig.  1. The performance function considered in this work is the minimum distance from the celestial body of interest (e.g. the Earth) in the given time window. Due to the nature of the problem under analysis (that is, a single close approach event within a given time interval), two intersections between each line and the impact region are found, meaning that two values of c k exist for each standard normal random sample θ k , k = 1, ..., N T where the performance function is equal to zero: Y (c k 1 ) = 0 and Y (c k 2 ) = 0.
An iterative process (i.e. numerical Newton iterations) is used to identify (c k 1 , c k 2 ), thus requiring extra evaluations for each standard normal random sample θ k , k = 1, ..., N T with respect a standard MC simulation. The method adopted here makes use of Newton iterations; however, since no analytical expression for the performance function Y (c) exists, its derivative is approximated numerically.

Estimation of the impact probability
Once the values (c k 1 , c k 2 ), k = 1, ..., N T are known for all the sampling lines, the unit Gaussian CDF provides each random initial condition θ k , k = 1, ..., N T with the conditional impact probabilityP k (I): The total probability of the eventP (I) (which is identified with a planetary collision in the approach presented in this paper) and the associated variancê σ 2 (P (I)) are then approximated aŝ 3 Subset simulation The Subset Simulation (SS) method is a Monte Carlo method based on the principle of computing small event probabilities as the product of larger conditional probabilities (Au and Beck , 2010;Cadini et al. , 2012;Zuev et al. , 2012). Given a target event F , i.e. an event whose small probability is to be computed, let Given a sequence of conditional probabilities, the target event probability can be written as where P (F i+1 |F i ) represents the probability of F i+1 conditional to F i . In the approach presented in this paper, the event is identified with a planetary collision. The method is initialised using standard Monte Carlo to generate N samples at the so called conditional level 0 (CL0) starting from the available estimate of the object state vector and its uncertainty at the observation epoch. The same number of samples is generated for each subsequent conditional level. Once the event region F 1 is identified, an MCMC Metropolis Hastings algorithm is used to generate conditional samples in F 1 (Metropolis at al. , 1953;Hastings , 1970). The subsequent intermediate region F 2 is then located, and other samples are generated. The procedure is repeated until the impact region is identified. An illustration of the SS method is given in Fig. 2.
Though originally developed for the identification of structural failures, this approach was then used in different research areas, including the assessment of collision probability among resident space objects (see Morselli et al. (2015)). In this work, the intermediate event regions are identified by assuming a fixed value of conditional probability P (F i+1 |F i ) = p 0 . The identification of each conditional level is affected by this value and changes accordingly step by step, as explained hereafter. Following the general description offered in Morselli et al. (2015), the resulting SS algorithm goes through the steps 1. Set i = 0 and generate N samples x 0,k 0 , k = 0, ..., N at CL0 by standard Monte Carlo starting from the available estimate of the object state vector at the epoch t 0 . 2. Propagate each sample and compute its minimum distance from the selected celestial body (e.g. the Earth) in the given time window. 3. Sort the N samples in descending order according to the associated planetocentric distance. 4. Identify an intermediate threshold value D i+1 as the (1 − p 0 )N -th element of the list. Define the (i + 1)-th conditional level as where d is the planetocentric distance. Considering how the threshold was defined, the associated conditional probability where D lim is the target threshold distance, go the last step, otherwise select the last p 0 N samples of the list x i,j 0 , j = 0, ..., p 0 N . By definition, these samples belong to the (i + 1)-th conditional level. 6. Using MCMC, generate (1 − p 0 )N additional conditional samples starting from the previously selected seeds belonging to F i+1 . A sample is set to belong to F i+1 according to the following performance function: 7. Set i = i + 1 and return to step 2 8. Stop the algorithm.
The total number of generated samples is where n is the overall number of conditional levels required to reach the impact region. Since the conditional probability is equal to p 0 for each level, the impact probability becomes where N F is the number of samples belonging to the last conditional level whose planetocentric distance is lower than D lim . Zuev et al. (2012) suggest a Bayesian post-processor for SS (SS + ) to refine the computed impact probability and determine higher moments. If we define the first moment of the distribution of the impact probability becomes whereas the second moment is expressed by Therefore, the variance of the estimator can be computed as Equations 17 and 19 are the references for the analyses presented in this paper.
The main setting parameters of the method are the selected fixed conditional probability p 0 , the number of samples per conditional level N , and the proposal auxiliary distribution for the generation of the samples for each MCMC phase. The relevance of each parameter to the accuracy and efficiency of the method is discussed in the following paragraphs. Table 1 Nominal equinoctial parameters (second row) and related uncertainty as covariance matrix (last six rows) for asteroid 2010 RF 12 at 6656 MJD2000 (March 23, 2018).

Numerical simulations
This section is devoted to assess the performance of the LS and SS methods on the computation of the impact probability with respect to standard MC for four different test cases with decreasing expected impact probability: the NEAs 2010 RF 12 , 2017 RH 16 , 2009 JF 1 , and 99942 Apophis. The asteroid 2010 RF 12 is a small NEA that on date has the highest probability of hitting the Earth (around 1/16) during a close fly-by in 2095; this event was chosen as a test case due to the high expected impact probability. The object 2017 RH 16 is also a NEA featuring a close approach with the Earth in 2026, with a currently expected impact probability of 1/689. The third test case, asteroid 2009 JF 1 , will have a close fly-by with the Earth in 2022, with an expected impact probability of 1/4464. The last case is asteroid 99942 Apophis. For this latter case, in order to test the two methods on the worst scenario, the initial conditions are sampled from the estimate obtained by discarding observations performed after 2009. In such a scenario, after a very close encounter in 2029, the asteroid would have a resonant return with the Earth in 2036, with an expected impact probability of 3·10 −5 . In each case, the studied event is modelled as an impact with the Earth's surface in a time window around the nominal date of the close approach associated with the highest impact risk.
In all cases, the initial conditions are obtained in terms of equinoctial parameters (Broucke and Cefola , 1972), with initial uncertainties in the form of covariance matrices, and then converted into Cartesian coordinates for the propagation. All the required data were retrieved in July 2018 from the Near Earth Object Dynamic Site (https://newton.spacedys.com/neodys/) and the NEO Risk List from ESA (http://neo.ssa.esa.int/risk-page), and are reported in Tables 1-4.
The comparison between the standard MC and the two proposed approaches is performed by analysing the following parameters: the number of random initial conditions N IC (equal to the number of lines in the LS, and to the number of samples per conditional level in the SS) Table 2 Nominal equinoctial parameters (second row) and related uncertainty as covariance matrix (last six rows) for asteroid 2017 RH 16 at 6475 MJD2000 (September 24, 2017).  Table 4 Nominal equinoctial parameters (second row) and related uncertainty as covariance matrix (last six rows) for asteroid 99942 Apophis at 1820 MJD2000 (December 25, 2004).
the total number of orbital propagations N P (larger than N IC for both LS, due to the iterative procedure implemented, and SS, due to the required conditional levels) the impact probability estimateP (I) the sample standard deviationσ ofP (I) the coefficient of variation δ of the probability estimate, defined asσ/P (I) the Figure of Merit (FoM) of the method, defined as 1/(σ 2 · N P ) The overall number of propagations N P is selected as a measure of the computational burden of the methods. The standard deviation and the coefficient of variation are instead used as indicators of the accuracy of the result, with lower values corresponding to lower variability. The FoM involves both the variance and the required number of propagations, and it is a measure of the computational efficiency and impact probability variability: the higher the value, the higher the efficiency of the method (Zio and Pedroni , 2009;Morselli et al. , 2015).
The propagations are carried out in Cartesian coordinates with respect to the J2000 reference frame centred in the Solar System Barycenter (SSB), with the inclusion of the gravitational contributions of the Sun, all the major planets, and the Earth's moon, including relativistic effects modelled as in Armellin et al. (2010). All the physical constants (gravitational parameters, planetary radii, etc.) and the ephemerides are obtained from the JPL Horizons database via the SPICE toolkit (https://naif.jpl.nasa.gov/naif/).
The propagations are carried out in normalised units (with reference length and time equal to 1 AU and 1 solar year, respectively) using the adaptive Dormand-Prince Runge-Kutta scheme of 8 th order (RK78), with absolute and relative tolerances both set to 10 −12 . Table 1 shows the initial conditions in terms of nominal equinoctial parameters and related uncertainties for asteroid 2010 RF 12 on March 23, 2018. Figure 3(a) shows the result of a standard MC sampling of the initial uncertainty set for asteroid 2010 RF 12 in terms of deviations of semi-major axis and equinoctial longitude from the nominal initial conditions, with red dots representing initial conditions leading to an impact at the investigated epoch. The resulting estimated impact probability is 6.51·10 −2 , whereas the estimated Poisson statistics uncertainty is 2.47·10 −3 . Figures 3(b) and 3(c) show the results obtained with LS and SS, respectively. The grey dots in Fig. 3(b) are the samples from the initial distribution that do not lead to impact, whereas the green dots are the initial conditions lying on the boundaries of the impact region as identified by the LS algorithm. Using 1000 initial samples with 8 propagations for the iterative process along each sampling line, the resulting probability is 6.57·10 −2 , whereas the estimated uncertainty is 3.30·10 −3 . Figure 3(c), instead, shows the evolution of the conditional samples obtained with SS. The method was applied with 1000 samples per conditional level, a fixed conditional probability equal to 0.2, and an auxiliary distribution centered in the current sample and with the same magnitude of the original one. In the plot, gray dots represent samples drawn at CL0 by standard MC, whereas different colors are used for each conditional  level. As can be seen, only two conditional levels are generated to obtain an impact probability of 6.53·10 −2 . Table 5 shows a comparison between the three methods. For both LS and SS, two additional simulations are presented: the results obtained using a number of samples granting the same accuracy level of standard MC (σ M C ), and the results obtained by performing the same number of propagations of the standard MC (N M C P ). As can be seen, for the analysed test case, neither LS nor SS offer a real advantage with respect to the standard MC: the latter guarantees the same accuracy level with a very similar or lower number of propagations, or higher accuracy with the same number of propagations. This result is strictly related to the relatively high value of expected impact probability, as shown in Table 5. Table 2 reports the initial conditions in terms of nominal equinoctial parameters and related uncertainties for the second test case, asteroid 2017 RH 16 , whereas Fig. 4 shows the results of the numerical simulations obtained with the three methods. An overview of the obtained performance is shown in Table 6. The expected probability is 1.42·10 −3 , which is lower than the one associated with asteroid 2010 RF 12 . It is here interesting to compare the results obtained with the different methods. As can be seen from Table 6, both LS and SS outperform standard MC in terms of both achieved accuracy and computational cost. More in detail, the same accuracy level of the standard MC simulation can be obtained by both LS and SS with a number of propagations that is one order of magnitude lower. Alternatively, LS and SS achieve a higher accuracy level with the same number of samples adopted for the standard MC. The superiority of LS and SS is further confirmed by the values of the FoM.

Asteroid 2009 JF 1
The initial conditions of asteroid 2009 JF 1 are reported in Table 3, whereas Fig. 5 and Table 7 show the results of the numerical simulations obtained with the three methods. The expected impact probability (1.47·10 −4 ) is one order of magnitude lower than in the previous test case. The improvements granted by LS and SS over the standard MC, which have been highlighted in the previous test case, are even more evident here: LS and SS provide the same accuracy as standard MC with a number of propagations that is a factor of 23 and 5 lower, respectively. Table 4 shows the initial conditions in terms of nominal equinoctial parameters and related uncertainties for the last test case, asteroid 99942 Apophis, whereas Fig. 6 and Table 8 report the results of the numerical simulations obtained with the three methods. This is the most challenging test case, due to the presence of the deep close encounter in 2029, which triggers the resonant return in 2036.  In addition, this test case features a further reduction of about one order of magnitude in the expected impact probability. The performance comparison in Table 8 confirms the benefits provided by both LS and SS over the standard MC. In addition, the relative performance of LS against SS can be assessed here: while SS is very efficient in providing reliable impact probability results for relatively low number of samples, LS definitely provides the most accurate results as the number of initial samples (and thus of sampling lines) increases.

Sensitivity analysis
This section reports an analysis of the sensitivity of the LS ans SS performance to the different setting parameters. For both methods, the analysis is carried out in terms of the performance criteria defined in Section 4.

Line Sampling
As described in Section 2, the determination of the sampling direction and the identification of the boundaries of the impact region are the two main aspects of the LS method. In the following paragraphs, their effect on the method performance is investigated.

Sampling direction
The reference direction is the main parameter affecting the numerical performance of the line sampling method, since the subsequent sampling procedure for the identification of the boundaries of the impact region depends on it. As already mentioned in Section 2.2, this direction can be determined in different ways, which are introduced in (Zio and Pedroni , 2009). For this analysis, the reference direction is first identified with the procedure in Section 2.2 and then slightly perturbed. The NEAs 2010 RF 12 and a) b) c) Fig. 6 Samples dispersion in the initial uncertainty space (δa,δl) for the case of asteroid 99942 Apophis: a) initial conditions leading to impact obtained via standard MC, b) boundaries of the subdomain identified via LS, c) samples per conditional level obtained with SS.  99942 Apophis are used as reference test cases, since they feature the highest and lowest impact probability, respectively, among all the cases addressed in Section 4. In the examples, the deviation of the α direction is obtained by adding a component of arbitrary size orthogonal to the nominal direction in the (δa,δl) space (the same outcome could be obtained by using a lower number of points in the initial Markov chain, resulting in an uneven coverage of the impact region). The results in Tables 9 and 10 reveal that the LS performance drops off in the perturbed case, since the method shows larger standard deviations and lower FoMs with respect to the results presented in Section 4 (reported again in these tables for reader's convenience). The performance loss is contained for  2010 RF 12 , due to the high impact probability and because the sampling lines still intersect most part of the impact region (see Fig. 7). For 99942 Apophis, the loss is more relevant, as a larger number of samples is needed to converge to a solution because both the impact probability is lower and fewer sampling lines intersect the impact region. It is notable, however, how the LS method in the case of 2010 RF 12 can uncover an extended and curved impact region when a different reference direction is used to sample the uncertainty distribution.

Identification of the boundaries
The identification of the boundaries of the impact region is another key parameter of the method. Due to the nonlinear nature of the problem under analysis, Table 11 Application of LS with a lower number of Newton's iterations to the case of asteroid 2010 RF 12 . Comparison against the standard MC simulation. The subscripts nom and sens refer to the nominal solution and the results of the sensitivity analysis, respectively.
In the implementation adopted for this work, the intersections with the impact region along each sampling line are determined via Newton's iterations and interpolation. The result is the numerical approximation of the values of c k (the parameter describing the k-th sampling line) at the two intersection points, i.e. (c k 1 , c k 2 ). It is worth specifying that, since no information about the derivative of the performance function is available (see Section 2), each iteration involves two orbital propagations, which are needed for the numerical differentiation.
To show how the accuracy of this numerical process affects the accuracy of the overall probability estimation, the asteroids 2010 RF 12 and 99942 Apophis are taken again as reference test cases. Starting from the same random initial conditions associated to the nominal results presented in Section 4, a lower number of Newton's iterations are here imposed, which worsens the accuracy of the approximation of the intersection points along each sampling line. For the sake of a more complete comparison, in both test cases the results are reported for both the same number of initial conditions and the same total number of propagations.
Tables 11 and 12 show the results of the numerical simulations. The results are reported for the same number of sampling lines and are compared with those of the standard MC and with the nominal results already reported in Tables 5 and 8. Both analyses show that an incorrect determination of the boundaries of the impact region can lead to inaccurate impact probability estimation, even though comparable values ofσ(P (I)) are obtained in both cases (as already explained in Section 2.4, this depends on the number of line integrals that are evaluated). This can be explained by considering that the overall probability is estimated through the sum of 1D integrals, which are computed on each sampling line between the two (c k 1 , c k 2 ) values. Thus, evaluating the integrals with different values is equivalent to considering a different impact region with a different associated impact probability. While using one less iteration does not affect the value of the probability in a relevant way, especially in the case of 99942 Apophis, further reductions could yield significantly wrong estimations. The minimum number of iterations for a correct estimate depends on the probability level (the lower the probability is, the more difficult is to identify the boundaries, and thus more iterations are required), on the shape of the impact region, and on the sampling direction itself, and must be selected to trade off between accuracy and computational effort.

Subset Simulation
The analysis presented in Section 4 was carried out considering different values of N , whereas predefined p 0 and auxiliary distribution were selected. This section is devoted to study the impact of these two parameters on the accuracy and efficiency of the SS method.

Conditional probability level
The conditional probability p 0 is a critical parameter of the SS method. It affects the number of intermediate regions F j that are needed to reach the target impact region, and in turn the efficiency of SS (Zuev et al. , 2012). Smaller p 0 values grant fewer intermediate levels, though may require a larger number of samples per conditional level for the accurate estimation of the small conditional probability. Conversely, the number of samples needed per conditional level can be reduced by increasing the value of p 0 , with the drawback of increasing the number of conditional levels to reach the final impact region as well. The selection of p 0 must take all these considerations into account.
The value of p 0 also affects the overall number of samples. Assuming not to alter the value of N , the expression for the total number of samples show that N T depends on both p 0 and n CL . The aim of our analysis is to investigate the impact of p 0 on the performance of the SS method. In order to decouple the effect of p 0 and N T , the analysis is carried out by fixing the overall number of samples and changing the value of p 0 . This of course implies that the value of N must be tuned accordingly. This value is selected by estimating the required number of conditional samples based on the known estimate of the impact probability for the specific test case. Given a test case with estimated impact probabilityp, the procedure is the following: 1. Select the value of p 0 . 2. Select the overall number of samples N T . 3. Guess the number of conditional levels n CL . 4. Compute the value of N from Eq. 20. The number must be an integer value, so round the value to the closest integer. 5. Use Eq. 20 to change the value of p 0 with the rounded value of N . 6. Compute n imp from Eq. 17 and round the value to the closest integer. 7. Use Eq. 17 to change the value of p ref . 8. Consider n imp : if n imp < N p 0 , stop the algorithm, otherwise set n CL = n CL + 1 and go back to step 5.
The above routine provides the estimate of the value of N required to start the SS method with different values of p 0 , without changing the overall number of samples.
The analysis is run on the four test cases of Section 4. The overall number of samples is set to 10000, which is sufficiently large to be used in all the test cases. Different values of p 0 are tested. The lower limit (p 0 = 0.001) was selected to be compatible with the value of N T in case the simulation performs only one iteration, i.e. in case SS collapses into an MC simulation. The upper limit, instead, was selected very close to 1 (0.99), since the SS procedure is singular for p 0 = 1. Figure 9 shows the trend of δ for all the four test cases and for increasing values of p 0 . As can be seen, the test cases of asteroids 2017 RH 16 , 2009 JF 1 and 99942 Apophis show a general decreasing trend, whereas only small variations appear for asteroid 2010 RF 12 . In the latter case, δ initially increases while moving from very low values of p 0 (i.e., MC simulation) to larger values. Then, it reaches its minimum at p 0 ≈ 0.4 and finally slightly increases again. This trend confirms the results reported in Section 4 for the same test case, where MC outperformed SS with a value of conditional probability equal to 0.1.
As can be seen from Fig. 9, the trend of the coefficient of variation depends on the expected impact probability, with the worst performance obtained in general when SS degenerates into MC, and with a decrease of δ for increasing values of p 0 . Nevertheless, when the value of conditional probability exceeds 0.2-0.3, no significant improvement can be obtained in terms of accuracy. It should be noted that, if the overall number of samples is fixed, the number of conditional levels increases, and the number of samples per conditional level decreases. In practical situations, when the values of N and p 0 are selected a priori and the overall number of samples is automatically determined by the required number of conditional samples, increasing p 0 may provide improvements in terms of accuracy, with the drawback of increasing the number of propagated samples and required pointwise propagations. Therefore, it is reasonable to say that selecting p 0 out of the range 0.1-0.4 does not necessarily grant sufficient accuracy to the SS method.

Proposal probability density function
Another important parameter affecting the accuracy and the efficiency of the SS method is the proposal distribution. The proposal probability density function plays a key role in the definition of the efficiency of the SS method, as it governs the correlation between consecutive samples of the MCMC algorithm, thus affecting the variance of the estimate of the impact probability. It was observed that the efficiency of the Metropolis-Hastings algorithm mainly depends on the spread of the proposal distribution, rather than its type (Zuev et al. , 2012). Both small and large spreads tend to increase the dependency between successive sampling, thus increasing the variance of the estimator. Large spread may reduce the acceptance rate in the Metropolis-Hastings algorithm, thus increasing the number of repeated MCMC samples. Small spreads may instead lead to high acceptance rate, but still produce very correlated samples, which negatively affects the variance. Zuev et al. (2012) proposed an adaptive algorithm capable of scaling the proposal distribution in order to satisfy some acceptance rate requirements. In this work, a fixed proposal distribution was adopted.
In this section, different spreads and shapes of the proposal distribution are tested to investigate their effect on the results. The analysis focuses on the test case of asteroid 99942 Apophis. Tables 13 and 14 show the results obtained by considering N = 1000, p 0 = 0.1 and two different proposal distributions: a first one with exactly the same shape of the one used in Section 4, but centered in the current sample, and a second one obtained by adopting a diagonal covariance matrix with the same variances as the nominal distribution, still centered in the current sample. The associated covariance matrices are scaled by 3 different scaling factors (1, 10 2 , 10 6 ). Table 13 shows that, for the nominal proposal distribution, the algorithm cannot converge to the correct solution if we maintain both the original shape and size. During the MCMC process, indeed, the algorithm cannot easily find new acceptable samples. Thus, the validity of the process is compromised and the algorithm provides wrong solutions. Only by reducing the size of the proposal distribution, a good acceptance rate can be obtained and the algorithm provides accurate estimates of the impact probability. The behaviour of the algorithm improves when the diagonal covariance matrix is used, as acceptable results are obtained even for larger sizes of the proposal distribution (see Table 14).
The selection of the proposal distribution, therefore, is probably the most critical aspect of the method. As a general guideline, the first step consists in investigating the spread of the original distribution, and scale it properly. As suggested by the presented analyses, a scaling factor of around 10 2 is needed to obtain good acceptance rates during the process, thus preserving the validity of the method. As for the shape of the distribution, according to the selected test case, both a diagonal or the nominal distribution can provide good results if properly scaled. Once again, a careful investigation of the nominal covariance matrix is of fundamental importance.

Conclusions and future work
This paper introduced the application of two advanced Monte Carlo sampling methods, the line sampling and subset simulation, to the critical issue of computing the impact probability of near Earth asteroids. Different test cases were presented and the performance of the two methods were compared against standard Monte Carlo. Both methods allow significant savings in computational time with respect to standard MC, as they reduce the number of required propagations while granting the same level of accuracy by either identifying optimal sampling directions or limiting the sampling to specific regions of the phase space. Further analysis has shown that the performance of each method can suffer from particular choices of the parameters characterising their implementation, either affecting the accuracy of the probability estimate or the computational load. Future work will focus on improving the efficiency of the two methods. Part of the work will be dedicated to the improvement of the identification of the boundaries of the impact region for the LS method, by finding more efficient iterative procedures. A significant effort will be then devoted to extend the applicability of the LS method to scenarios with multiple impact events and different impact regions. An analysis on the sensitivity of the SS method to the proposal distribution will be carried out, with the aim of obtaining more rigorous guidelines for its selection. All these activities will be carried out in order to investigate the possibility of coupling the action of the two methods in a single sampling technique.