1 Introduction

In risk analysis, uncertainties are generally classified into two categories: aleatory and epistemic. Aleatory uncertainty is an inherent variation associated with the physical system or the environment. It may arise from environmental randomness, variation in space, fluctuations in times or other system variability. Given the repetitiveness of the error, it is generally well quantified with a known probability distribution, and in principle, it cannot be eliminated by further observational data, although it may be better characterized (Kolmogorov and Bharucha-Reid 2018). Conversely, epistemic uncertainty is due to a lack of information about the system or phenomenon under investigation. It may come from a lack of experimental data to characterize new materials and processes, poor understanding of coupled physics phenomena or a lack of knowledge about the model formulation (see, e.g., Helton 1994; Oberkampf et al. 2002; Helton et al. 2004; Zio and Pedroni 2013).

In the last decades, there has been an increasing awareness that classical probability theory is inadequate for the treatment of epistemic uncertainty. Therefore, different non-probabilistic approaches have been developed: imprecise probability, also known as interval analysis, after Walley (1991) and Berger et al. (1994); probability bound analysis, which combines probability analysis and interval analysis (see, e.g., Ferson and Ginzburg 1996); Dempster–Shafer theory, which uses Belief and Plausibility functions, in the two forms proposed by Dempster et al. (1968) and Shafer (1976); possibility theory, which is a special case of interval analysis; and Dempster–Shafer theory (see, e.g., Baudrit and Dubois 2006). For a review on methods of representing uncertainty, see, e.g., Zio and Pedroni (2013).

Recently, Tardioli and Vasile (2016) have presented an approach to the design of optimal collision avoidance and reentry maneuvers under uncertainty. They considered a dynamical model with six aleatory variables (the three components of the position and velocity vectors of the spacecraft) and four epistemic variables (some model parameters). The uncertainty was propagated through the dynamics, and the expectation of an optimal deflection maneuver was computed with two different techniques: one using Belief and Plausibility functions and the other using families of parametric distributions. The Belief and Plausibility functions use no a priori assumption on the kind of the distribution, but any distribution enclosed between an upper and lower distribution is admissible. In the parametric distribution approach, the assumption is that the probability density function of the uncertain quantity belongs to a family of known distributions parametrized with unknown parameters. Following this work, in this paper, we use a parametric distribution approach to include epistemic uncertainties in the estimation of impact probabilities for potentially hazardous asteroids.

Asteroid (99942) Apophis was discovered by R.A. Tucker, D.J. Tholen and F. Bernardi at Kitt Peak, Arizona on June 2004 (Minor Planet Supplement 109613). With a minimum orbit intersection distance less than 0.05 au and an absolute magnitude less than 22 (i.e., its diameter is larger than 140 m), Apophis is classified as potentially hazardous asteroid. In December 2004, both SentryFootnote 1 and NEODySFootnote 2 reported a probability of impact with our planet of a few percent in April 2029 (Chesley 2006). Although further observations reduced Apophis’ orbital uncertainty and ruled out any impact possibility for 2029, the asteroid remains an interesting object worth investigating. In fact, because of the scattering effect of the 2029 encounter, even small perturbations to the orbit of Apophis affect subsequent impact predictions (Giorgini et al. 2008).

Chesley (2006) shows that the exact encounter circumstances in 2029 depend on the magnitude of the Yarkovsky effect (order of \(10^{-15}\) au/day\(^2\)). The Yarkovsky effect is a non-gravitational perturbation due to the anisotropic emission of thermal radiation of Solar System objects that causes a secular drift in the semimajor axis (Bottke Jr et al. 2006; Vokrouhlický et al. 2015a). For some near-Earth asteroids, the orbital drift associated with the Yarkovsky effect can be measured from the orbital fit to the observations (Farnocchia et al. 2013b). That was the case of (101955) Bennu (Chesley et al. 2014) and (29075) 1950 DA (Farnocchia and Chesley 2014). When the astrometry data do not allow such a detection a Monte Carlo simulation relying on the best physical model currently available is performed. Nevertheless, the assumed distributions on the asteroid physical parameters are themselves uncertain and may rely on subjective judgment. That was the case of Apophis as analyzed by Farnocchia et al. (2013a), which we revisit within the epistemic uncertainty framework in this paper.

2 Apophis risk analysis and Yarkovsky effect

The geometry of close approaches can be described projecting the relative Earth–asteroid distance on an appropriate plane called the \(b\)-plane or target plane, which is defined as the plane including the center of the Earth and normal to the incoming asymptote of the small body trajectory with respect to the planet during the close encounter (see, e.g., Milani et al. 2002; Valsecchi et al. 2003). Conventionally, the coordinates on the \(b\)-plane are called \((\xi ,\zeta )\) and are defined so that the projection of the Earth’s heliocentric velocity onto the \(b\)-plane defines the negative \(\zeta \)-axis. The \(\zeta \) coordinate tells whether the asteroid is early or late for the minimum possible distance encounter. The \(\xi \) coordinate is the minimum distance that can be obtained by varying the timing of the encounter. The \(b\)-plane coordinates leading to a resonant impact lie on predictable circles (Valsecchi et al. 2003). The intersection between the orbital uncertainty region and a Valsecchi circle is called keyhole (Chodas 1999) and corresponds to a future impact. Assigning a probability to the uncertain variables involved in the dynamical system, a probability of impact can be estimated. For a comprehensive review on the \(b\)-plane and the corresponding encounter analysis, refer to Farnocchia et al. (2019).

Farnocchia et al. (2013a) presented an impact risk analysis for asteroid (99942) Apophis. The authors located 20 keyholes on the 2029 \(b\)-plane and found that the probability density function of the \(\zeta \)-coordinate was completely driven by the dispersion due to the Yarkovsky effect. The Yarkovsky perturbation was modeled as a transverse acceleration \(A_2/r^2\) where r is the heliocentric distance and \(A_2\) is a function of certain physical quantities (diameter, Bond albedo, bulk density, thermal inertia, rotational period and spin orientation). Thus, different physical characteristics of the asteroid result in a different Yarkovsky perturbation. Farnocchia et al. (2013a) assumed a specific distribution for each of the unknown physical parameters that define the Yarkovsky effect. Then, for each keyhole the impact probability was computed as the product of the keyhole width and the value of the \(\zeta \) distribution, evaluated in a point of the keyhole. The keyhole related to April 2068, situated in the core of the \(\zeta \) probability density distribution, was found to have the highest risk encounter with an impact probability of three in a million. On the other hand, the keyhole associated with April 2036, situated in the tail of the distribution, had the highest width (about 600 m) with an impact probability of seven in a billion.

These results were superseded by later astrometry and physical characterization (Vokrouhlický et al. 2015b; Brozović et al. 2018). However, in this work we consider Apophis’ 2013 scenario and validate our uncertainty quantification method by computing an upper and lower limit for the impact probabilities related to the 2036 and 2068 keyholes on the 2029 \(b\)-plane.

3 Uncertainty quantification of the physical model

The Yarkovsky perturbation can be approximated as (Vokrouhlický et al. 2015b)

$$\begin{aligned} a_t \simeq \dfrac{4(1-A)}{9}\varPhi (1~{\mathrm{au}})f(\theta )\cos \gamma , \end{aligned}$$
(1)

where A is the Bond albedo, \(\varPhi (1~{\mathrm{au}}) = 3 G_S /(2 D\rho c)\) is the standard radiation force factor at 1 au, \(G_S = 1361\) W/m\(^2\) is the solar constant, D is the asteroid’s diameter, \(\rho \) is the bulk density, \(\gamma \) is the spin axis obliquity and \(\theta _{rot}\) is the thermal parameter, which depends on the rotation period, spin rate, thermal inertia, thermal emissivity, geometric albedo and radial distance from the Sun. These physical quantities are in general poorly characterized, especially shortly after discovery when there is little or no information regarding the key physical characteristics. Therefore, the uncertainties associated with the physical quantities are epistemic rather than aleatory. Following Tardioli and Vasile (2016), we employ a parametric distribution approach to treat epistemic uncertainties.

In a parametric distribution approach, the actual probability density function is unknown, but belongs to a family of distributions parametrized with unknown parameters. For example, a family of Gaussian probability density function (\({\mathrm{pdf}}_\mathcal {N}\)) with mean \(\mu \in [0.4,0.5]\) and standard deviation \(\sigma \in [0.5,1.5]\) is

$$\begin{aligned}{}[\mathrm{pdf}_\mathcal {N}] = \{ \mathrm{pdf}_\mathcal {N}(u;\mu ,\sigma ): 0.4\le \mu \le 0.6, \ 0.05\le \sigma \le 0.15,\ \forall u\in \mathbb {R}\}, \end{aligned}$$
(2)

or using Gaussian cumulative distribution functions (\({\mathrm{cdf}}_\mathcal {N}\)),

$$\begin{aligned}{}[{\mathrm{cdf}}_\mathcal {N}] = \{ {\mathrm{cdf}}_\mathcal {N}(u;\mu ,\sigma ) : 0.4\le \mu \le 0.6,\ 0.05\le \sigma \le 0.15,\ \forall u\in \mathbb {R}\}\ . \end{aligned}$$
(3)

The families of distributions in Eq. (2) and Eq.(3), often called probability box, or, shortly, p-box, are illustrated in Fig. 1.

Fig. 1
figure 1

P-box (grey region) defined by a family of Gaussian pdf’s (left) and a family of Gaussian cdf’s (right)

A p-box of pdf’s (resp. cdf’s) is a set enclosing the minimum and maximum values of the pdf’s (resp. cdf’s). Particularly, the p-box of cdf’s can always be enclosed from above and from below by piece-wise curves, called, respectively, upper expectation \(E_u\) and lower expectation \(E_l\) (Ferson et al. 2015):

$$\begin{aligned}{}[{\mathrm{cdf}}] = \{{\mathrm{cdf}}\,:\, E_l \le {\mathrm{cdf}} \le E_u\}\ . \end{aligned}$$
(4)

In the multivariate case, under the independence and non-correlation assumption, the joint probability distribution belongs to a p-box whose elements are the product of univariate pdf’s. For example, the p-box of two families of Gaussian pdf’s as in Eq. (2) can be written as

$$\begin{aligned} \big [\mathrm{pdf}_{\mathcal {N}_1\mathcal {N}_2}\big ] = \big \{\mathrm{pdf}_{\mathcal {N}_1} \times \mathrm{pdf}_{\mathcal {N}_2}\, : \, \mathrm{pdf}_{\mathcal {N}_1}\in \big [\mathrm{pdf}_{\mathcal {N}_1}\big ]\ \wedge \ \mathrm{pdf}_{\mathcal {N}_2}\in \big [\mathrm{pdf}_{\mathcal {N}_2}\big ] \big \}\ . \end{aligned}$$
(5)

A similar definition holds for the product of two families of Gaussian cdf’s.

Table 1 Distribution families associated with the uncertain variables defining the Yarkovsky effect for Apophis’ 2013 scenario

For the Farnocchia et al. (2013a) scenario, the families of distributions associated with Apophis’ physical quantities describing the Yarkovsky effect are reported in Table 1. To enhance the variability in the impact probability, we purposely considered a wide range of parameters, thus not necessarily representing the most realistic scenario.

  • Diameter.Delbò et al. (2007a) estimate a diameter \(D=(270\pm 60)\) m, while Müller et al. (2014) derive greater values \(D=375^{+14}_{-10}\) m. Licandro et al. (2016) confirm a diameter in range 380–393 m. We model the diameter uncertainty with normal distributions with mean between 270 and 385 m, and standard deviation between 5 and 60 m.

  • Slope parameter.Pravec et al. (2014) assume a slope parameter \(G=0.24\pm 0.11\) for S- and Q-type asteroids. Mommert et al. (2014) use \(G=0.18\pm 0.13\), obtained averaging all G measurements of asteroids from JPL Small-Body Database Search EngineFootnote 3. Farnocchia et al. (2013a) assumed \(G = 0.15\pm 0.05\). Therefore, we choose normal distributions with mean between 0.15 and 0.24, and standard deviation between 0.05 and 0.15.

  • Geometric albedo. From polarimetric observations, Delbò et al. (2007a) find a geometric albedo \(p_V=0.33\pm 0.08\). Müller et al. (2014) obtain \(p_V=0.30^{+0.05}_{-0.06}\) from far-infrared observations with ESA’s Herschel space observatory. Recent thermal infrared observations put \(p_V\) in the range 0.24–0.33 (Licandro et al. 2016), in accordance with radar observations that gives \(p_V=0.35\pm 0.10\) (Brozović et al. 2018). Farnocchia et al. (2013a) used \(p_V = 0.23\pm 0.02\). Therefore, we use lognormal distributions whose corresponding Gaussian distributions have mean between 0.23 and 0.35, and standard deviation between 0.02 and 0.10. Then, the Bond albedo can be derived from the slope parameter and the geometric albedo through the formula \(A=(0.29 + 0.684 G)\,p_V\) (Bowell et al. 1989).

  • Bulk density. From the grain density and total porosity reported in Binzel et al. (2009), Farnocchia et al. (2013a) obtained a lognormal distribution with mean \(2.2~\hbox {g/cm}^3\) and variance \(0.3~\hbox {g}^2/\hbox {cm}^6\). However, Apophis is a Sq-type asteroid and the typical density of this taxonomic class is \(2.78 \pm 0.85~\hbox {g/cm}^3\) (Carry 2012). Thus, we use lognormal distributions with mean between 2 and \(3~\hbox {g/cm}^3\), and standard deviation between 0.5 and \(0.85~\hbox {g/cm}^3\).

  • Thermal inertia. According to (Delbò et al. 2007b, Fig. 6), the thermal inertia \(\varGamma \) of near-Earth objects ranges between 100 and 1 000 \(\hbox {Jm}^{-2}\hbox {s}^{-0.5}\hbox {K}^{-1}\); thus, the uncertainty of \(\varGamma \) can be represented by \(\log _{10}(\varGamma )=2.5\pm 0.5\). Müller et al. (2014) use full range of \(\varGamma \) of 250–800 \(\hbox {Jm}^{-2}\hbox {s}^{-0.5}\hbox {K}^{-1}\), with best solution at \(\varGamma = 600\) \(\hbox {Jm}^{-2}\hbox {s}^{-0.5}\hbox {K}^{-1}\), giving \(\log _{10}(\varGamma )=2.7\pm 0.25\). Licandro et al. (2016) constrain the thermal inertia of Apophis to lie in the range 50–500 \(\hbox {Jm}^{-2}\hbox {s}^{-0.5}\hbox {K}^{-1}\), giving \(\log _{10}(\varGamma ) = 2.2\pm 0.5\). Farnocchia et al. (2013a) used a generic relationship between the thermal inertia and the diameter which led to \(\log _{10}(\varGamma )=2.65\pm 0.08\). Hence, we consider a normal distribution for \(\log _{10}(\varGamma )\) with mean between 2.2 and 2.7, and standard deviation between 0.08 and 0.5.

  • Rotational period. The uncertainty on the rotational period is very small and it does not affect the impact probability (Farnocchia et al. 2013a). Pravec et al. (2014) report a rotational period \(P_{\mathrm{rot}}=(30.56\pm 0.01)\) h; therefore, we used a normal distribution with this parameter.

  • Obliquity. In our scenario, we neglect Apophis’ complex rotation and retrograde motion (Pravec et al. 2014). We consider a simple discrete model with 4 bins in \(-1\le \cos \gamma \le 1\) and positive frequencies \(f_1,f_2,f_3,f_4\) such that \(f_1+\ldots +f_4=1\).

Table 2 Uncertainty space defined by the physical quantities involved in the Yarkovsky effect for Apophis’ 2013 scenario

In total, we have six epistemic variables and one aleatory variable. The uncertainty analysis under aleatory and epistemic uncertainty starts from the definition of the uncertainty space \(\mathcal {U}\), which is a seven-dimensional hyper-rectangle defined by the full ranges of Apophis’ physical parameters involved in the Yarkovsky effect (see Table 2). The underlying assumption is that of independence and non-correlation among the uncertain variables.

Farnocchia et al. (2013a) map the Yarkovsky-related semimajor axis drift onto the 2029 b-plane and its \(\zeta _\mathrm{2029}\) coordinate. Denoting this map as \(f:\mathcal {U}\rightarrow \mathbb {R}\), we want to assess the probability of the event \(\mathcal {A}=\{u\in \mathcal {U}:|f(u)-\zeta _o|\le w/2\}\), where \(\zeta _o\) is the center of a generic keyhole on the 2029 \(b\)-plane and w its width. If all uncertain variables are aleatory, then the probability of \(\mathcal {A}\) coincides with the impact probability associated with the keyhole \(\zeta _o\): \({\mathrm{IP}}={\mathrm{P}}(\mathcal {A})\). Instead, if the distribution parameters associated with the uncertain physical quantities vary into intervals, then the probability of \(\mathcal {A}\) is bound by a lower and upper value given by

$$\begin{aligned} {\mathrm{P}}_*(\mathcal {A}) = \min _{\mathbf {j}\in J}\int _{\mathcal {A}} \mathrm{pdf}_{\mathbf {j}}(\mathbf {u})\, \, \mathrm {d}\mathbf {u},\qquad {\mathrm{P}}^*(\mathcal {A}) = \max _{\mathbf {j}\in J}\int _{\mathcal {A}} \mathrm{pdf}_{\mathbf {j}}(\mathbf {u})\, \, \mathrm {d}\mathbf {u}, \end{aligned}$$
(6)

where \(\mathbf {u}=(D,G,p_V,\rho ,\varGamma ,\cos \gamma )\) is the epistemic variable vector,

$$\begin{aligned} \mathbf {j}=(\mu _1,\sigma _1,\ldots ,\mu _5,\sigma _5,f_1,f_2,f_3) \end{aligned}$$
(7)

is the unknown distribution parameter vector (the parameters defining the first six distribution families in Table 1), \(J=J_1\times \ldots J_{13}\) is the distribution parameter space with \(J_i\) the range defined in Table 1 for each \(i=1,\ldots 13\) and \(\mathrm{pdf}_\mathbf {j}\) is the joint probability distribution product varying in the p-box \([\mathrm{pdf}_\mathbf {j}:\mathbf {j}\in J]\) with

$$\begin{aligned} \mathrm{pdf}_\mathbf {j}\propto & {} \mathrm{pdf}_{\mathcal {N}(\mu _1,\sigma _1)}\times \mathrm{pdf}_{\mathcal {N}(\mu _2,\sigma _2)}\times \mathrm{pdf}_{\mathcal {LN}(\mu _3,\sigma _3)}\times \mathrm{pdf}_{\mathcal {LN}(\mu _4,\sigma _4)} \nonumber \\&\times \, \mathrm{pdf}_{\mathcal {N}(\mu _5,\sigma _5)}\times \mathrm{pdf}_{\mathcal {N}(\mu _{rot},\sigma _{rot})}\times \mathrm{pdf}_{\mathcal {D}_4(f_1,f_2,f_3,1-f_1-f_2-f_3)}, \end{aligned}$$
(8)

where \(\mathcal {N}\) is the Gaussian, \(\mathcal {LN}\) is the lognormal and \(\mathcal {D}_4\) the discrete 4-bin distribution. As a consequence, the impact probability associated with the keyhole \(\zeta _o\) is itself uncertain and the following inequalities hold

$$\begin{aligned} {\mathrm{IP}}_* = {\mathrm{P}}_*(\mathcal {A})\le {\mathrm{IP}}\le {\mathrm{P}}^*(\mathcal {A}) = {\mathrm{IP}}^*\ . \end{aligned}$$
(9)

The two optimization problems in Eq. (6) can be solved numerically by replacing the calculation of the exact integrals with approximated forms using a sampling scheme. In fact, given M sample points \(\mathbf {u}_1,\ldots ,\mathbf {u}_M\), each integral in Eq. (6) can be approximated as

$$\begin{aligned} \sum _{k=1}^{M} I_\mathcal {A}(\mathbf {u}_k)\, p_k,\quad p_k\propto \mathrm{pdf}_\mathbf {j}(\mathbf {u}_k),\quad \sum _{k=1}^M p_k = 1, \end{aligned}$$
(10)

where \(I_\mathcal {A}\) is the indicator function of the set \(\mathcal {A}\), that is 1 if \(\mathbf {u}_k\) belongs to \(\mathcal {A}\), 0 otherwise. In this work, we will use a scheme with \(10^{12}\) uniformly distributed Monte Carlo points.

4 Results

We consider Apophis’ 2013 scenario and the keyholes found by Farnocchia et al. (2013a) on the 2029 \(b\)-plane. The uncertainties for the physical parameters are reported in Tables 1 and 2. For two keyholes, we compute the maximum and minimum of the impact probability. One keyhole corresponds to an impact in 2036. This is the widest keyhole and is located in the tail of the probability distribution. The other keyhole corresponds to an impact in 2068. It is much smaller but within the core of the distribution and yielded the highest impact probability in Farnocchia et al. (2013a). These two keyholes require negative values of \(A_2\) in order to be possible. In this paper, we are considering all possible distributions for the obliquity (which determines the sign of \(A_2\)). Thus, the minimum value of IP is always zero and our uncertainty analysis gives an upper bound for IP rather than a proper interval.

Results are given in Table 3. To validate our method, for each keyhole, we compute a reference impact probability corresponding to fixed distributions derived by the ones used in Farnocchia et al. (2013a) (see Table 4). The distribution parameters corresponding to \({\mathrm{IP}}^*\) are found using multiple runs of a global optimization method proposed by Vasile et al. (2011) and called IDEA (inflationary differential evolution algorithm). The values, out of all the runs of IDEA, that give the highest impact probability are reported in Table 5 and illustrated in Fig. 2. Figure 3 shows the uncertainty region of \(\zeta _{2029}\) due to the uncertainties on the physical parameters associated with the Yarkovsky effect. The distributions associated with the reference and to the maximum impact probability for the 2036 and 2068 keyholes are also displayed.

Table 3 Upper bound of the impact probability (\(\hbox {IP}^*\)) for the 2036 and 2068 keyholes relative to the distribution families in Table 1
Table 4 Reference distribution parameters
Table 5 Distribution parameters corresponding to the maximum impact probability (\(\hbox {IP}^*\)) for the 2036 and 2068 keyholes, respectively

5 Discussion and conclusions

We presented an approach to include epistemic uncertainty in the physical model of the Yarkovsky effect and to estimate an upper bound for the impact probability of a potentially hazardous asteroid. To illustrate the method, we considered asteroid Apophis and the 2036 and 2068 keyholes as discussed by Farnocchia et al. (2013a). The maximum impact probability required the solution of an optimization problem with a multidimensional integral as objective function.

Our analysis was not meant to provide the most current and realistic hazard assessment for Apophis. Rather, for demonstration purpose, we chose a wide range of parameters that result in very conservative upper bounds for the impact probability. For example, if we had fixed the 4-bin distribution of the obliquity and set the frequencies to the reference values, we would have found a maximum impact probability of \(2.69\times 10^{-6}\) for the 2036 keyhole and of \(7.13\times 10^{-6}\) for the 2068 keyhole.

Fig. 2
figure 2

Probability boxes and probability density functions of the asteroid physical quantities for the 2036 and 2068 keyholes corresponding to \(\hbox {IP}^*\) and \(\hbox {IP}^{\mathrm{ref}}\)

Fig. 3
figure 3

Probability box on the 2029 b-plane associated with the distribution families in Table 1 in the pdf-space (left) and the cdf-space (right). The origin of the \(\zeta _{2029}\)-coordinate is set at 47659.24 km. The curves represent the distribution functions corresponding to \({\mathrm{IP}}^{\mathrm{ref}}\) and \(\hbox {IP}^*\) for the 2036 and 2068 keyholes

As more asteroids are discovered, the number of cases where the impact hazard assessment is affected by the Yarkovsky effect and its modeling is set to increase. The technique presented in this paper provides a rigorous framework to characterize the sensitivity of the impact probability to the physical model assumptions and in turn characterize its degree of variability.

The spin state of Apophis may change as a result of the 2029 encounter (Scheeres et al. 2005). As already discussed by Farnocchia et al. (2013a), such a change would slightly change the location of the keyholes, but have a small effect on the impact probability. Therefore, our analysis remains valid even in that case. Eventually, we remark that that the parametric distribution approach can be applied to more general families of distributions, for example, defined by a weighted combination of kernels.