Introduction: The Drake Equation

In 1961 the American astronomer Frank D. Drake tried to estimate the number N of communicating civilizations in the Milky Way galaxy by virtue of a simple equation now called the Drake equation. N was written as the product of seven factors, each a kind of filter, every one of which must be sizable for there to be a large number of civilizations:

Ns,:

the number of stars in the Milky Way Galaxy

fp,:

the fraction of stars that have planetary systems

ne,:

the number of planets in a given system that are ecologically suitable for life

fl,:

the fraction of otherwise suitable planets on which life actually arises

fi,:

the fraction of inhabited planets on which an intelligent form of life evolves (Human History)

fc,:

the fraction of planets inhabited by intelligent beings on which a communicative technical civilization develops (as we have it today); and

fL,:

the fraction of planetary lifetime graced by a technical civilization (a totally unknown factor).

Written out, the equation reads

$$ N = Ns \cdot fp \cdot ne \cdot fl \cdot fi \cdot fc \cdot fL $$
(1)

All of the f’s are fractions, having values between 0 and 1; they will pare down the large value of Ns. To derive N we must estimate each of these quantities. We know a fair amount about the early factors in the equation, the number of stars and planetary systems. We know very little about the later factors, concerning the evolution of life, the evolution of intelligence or the lifetime of technical societies. In these cases our estimates will be little better than guesses.

In the fifty years elapsed since Drake proposed his equation, a number of scientists and writers tried either to improve it or criticize it in many ways. For instance, in 1980 C. Walters, R. A. Hoover, and R. K. Kotra (Walters et al. 1980) suggested to insert a new parameter in the equation taking interstellar colonization into account. In 1981 S. G. Wallenhorst (Wallenhorst 1981) tried to prove that there should be an upper limit of about 100 to the number N. In 2004 L. V. Ksanfomality (Ksanfomality 2004) again asked for more new factors to be inserted into the Drake equation, this time in order to make it compatible with the peculiarities of planets of Sun-like stars. Also the temporal aspect of the Drake equation was stressed by M. M. Ćirković, (Ćirković 2004). But while these authors were concerned with improving the Drake equation, other simply did not consider it useful and preferred to forget about it, like M. J. Burchell (Burchell 2006).

Also, it has been correctly pointed out that the habitable part of the Galaxy is probably much smaller than the entire volume of the Galaxy itself (The important relevant references are Gonzalez et al. 2001; Lineweaver et al. 2004; and Gonzalez 2005). For instance, it might be a sort of a torus centered around the so called “corotation circle”, i.e. a circle around the Galactic Bulge such that stars orbiting around the Bulge and within such a torus never fall inside the dangerous spiral arms of the Galaxy, where supernova explosions would probably fry any living organism before it could develop to the human level or beyond. Fortunately for Humans, the orbit of the Sun around the Bulge is just a circle staying within this torus for 5 billion years or more (Marochnik and Mukhin (1988), Balazs (1988)).

In all cases the final result about N has always been a sheer number, i.e., a positive integer number ranging from 1 to thousands or millions. This “integer or real number” aspect of all variables making up the Drake equation is what this author regarded as “too simplistic”. He extended the Drake equation so as to embrace Statistics in his 2008 paper (Maccone 2008). This paper was later published in Acta Astronautica (Maccone 2010a), and more mathematical consequences were derived in Maccone (2010b) and Maccone (2011).

Statistical Drake Equation

Consider Ns, the number of stars in the Milky Way Galaxy, i.e. the first independent variable in the Drake equation (1). Astronomers tell us that approximately there should be about 350 billions stars in the Galaxy. Of course, nobody has counted all stars in the Galaxy! There are too many practical difficulties preventing us from doing so: just to name one, the dust clouds that don’t allow us to see even the Galactic Bulge (central region of the Galaxy) in visible light, although we may “see it” at radio frequencies like the famous neutral hydrogen line at 1420 MHz. So, it doesn’t really make much sense to say that Ns = 350 × 109, or similar fanciful exact integer numbers. More scientific is saying that the number of stars in the Galaxy is 350 billion plus or minus, say, 50 billions (or whatever values the astronomers may regard as more appropriate).

It makes thus sense to REPLACE each of the seven independent variables in the Drake equation (1) by a MEAN VALUE (350 billions, in the above example) PLUS OR MINUS A CERTAIN STANDARD DEVIATION (50 billions, in the above example).

By doing so, we made a step ahead: we have abandoned the too-simplistic Eq. 1 and replaced it by something more sophisticated and scientifically serious: the STATISTICAL Drake equation. In other words, we have transformed the simplistic classical Drake equation (1) into a statistical tool capable of investigating of a host of facts hardly known to us in detail. In other words still:

  1. 1)

    We replace each independent variable in (1) by a RANDOM VARIABLE, labelled D i (from Drake);

  2. 2)

    We assume the MEAN VALUE of each D i to be the same numerical value previously attributed to the corresponding input variable in (1);

  3. 3)

    But now we also ADD A STANDARD DEVIATION \( \sigma _{{D_{i} }} \) on each side of this mean value, as provided by the knowledge obtained by scientists in the discipline covered by each D i .

Having so done, we wonder: how can we find out the PROBABILITY DISTRIBUTION for each D i ? For instance, shall that be a Gaussian, or what? This is a difficult question, for nobody knows, for instance, the probability distribution of the number of stars in the Galaxy, not to mention the probability distribution of the other six variables in the Drake equation (1). In 2008, however, this author found a way to get around this difficulty, as explained in the next section.

The Statistical Distribution of N is Lognormal

The solution to the problem of finding the analytical expression for the probability density function of the positive random variable N is as follows:

  1. 1)

    Take the natural logs of both sides of the statistical Drake equation (1). This changes the product into a sum.

  2. 2)

    The mean values and standard deviations of the logs of the random variables D i may all be expressed analytically in terms of the mean values and standard deviations of the D i (Maccone 2008).

  3. 3)

    The Central Limit Theorem (CLT) of statistics, states that (loosely speaking) if you have a SUM of independent random variables, each of which is ARBITRARILY DISTRIBUTED (hence, also including uniformly distributed), then, when the number of terms in the sum increases indefinitely (i.e. for a sum of random variables infinitely long)… the SUM RANDOM VARIABLE APPROACHES A GAUSSIAN.

  4. 4)

    Thus, the ln(N) approaches a Gaussian.

  5. 5)

    Namely, N approaches the LOGNORMAL DISTRIBUTION (as discovered back in the 1870s by Sir Francis Galton). Table 1 shows the most important statistical properties of a lognormal.

    Table 1 Summary of the properties of the lognormal distribution that applies to the random variable N = number of ET communicating civilizations in the Galaxy
  6. 6)

    The mean value and standard deviations of this lognormal distribution of N may all be expressed analytically in terms of the mean values and standard deviations of the logs of the D i already found previously, as shown in Table 1.

For all the relevant mathematical proofs, more mathematical details and a few numerical examples of how the Statistical Drake Equation works, please see Maccone (2010a).

Darwinian Evolution as Exponential Increase of the Number of Living Species

Consider now Darwinian Evolution. To assume that the number of species increased exponentially over the 3.5 billion years of evolutionary time span is certainly a gross oversimplification of the real situation, as proven, for instance, by Rhode and Muller (2005). However, we will assume this exponential increase of the number of living species in time just in order to cast the theory into a mathematically simple and fruitful form. Later we will do better, we hope.

In other words, we assume that 3.5 billion years ago there was on Earth only one living species, whereas now there may be (say) 50 million living species or more (see, for instance, the site http://en.wikipedia.org/wiki/Species ). Note that the actual number of species currently living on earth does not really matter as a number for us: we just want to stress the exponential character of the growth of species. Thus, we shall assume that the number of living species on Earth increases in time as E(t) (standing for “exponential in time”):

$$ E(t) = A\;{e^{{B\,t}}} $$
(2)

where A and B are two positive constants that we will soon determine numerically. Let us now adopt the convention that the current epoch corresponds to the origin of the time axis, i.e. to the instant t = 0. This means that all the past epochs of Darwinian Evolution correspond to negative times, whereas the future ahead of us (including finding ETs) corresponds to positive times. Setting t = 0 in (2), we immediately find

$$ E(0) = A $$
(3)

proving that the constant A equals the number of living species on earth right now. We shall assume

$$ A = 50\,{\text{million}}\,{\text{species}} = 5 \cdot {10^7}{\text{species}}{.} $$
(4)

To also determine the constant B numerically, consider the two values of the exponential (2) at two different instants t 1 and t 2, with t 1 < t 2, that is

$$ \left\{ {\begin{array}{*{20}{c}} {E\left( {{t_1}} \right) = A\,{e^{{B\,{t_1}}}}} \hfill \\ {E\left( {{t_2}} \right) = A\,{e^{{B\,{t_2}}}}.} \hfill \\ \end{array} } \right. $$
(5)

Dividing the lower equation by the upper one, A disappears and we are left with an equation in B only:

$$ \frac{{E\left( {{t_2}} \right)}}{{E\left( {{t_1}} \right)}} = {e^{{B\left( {{t_2} - {t_1}} \right)}}}. $$
(6)

Solving this for B yields

$$ B = \frac{{\ln \left( {E\left( {{t_2}} \right)} \right) - \ln \left( {E\left( {{t_1}} \right)} \right)}}{{{t_2} - {t_1}}}. $$
(7)

We may now impose the initial condition stating that 3.5 billion year ago there was just one species on Earth, the first one (whether this was RNA is unimportant in the present simple mathematical formulation):

$$ \left\{ {\begin{array}{*{20}{c}} {{t_1} = - 3.5 \cdot {{10}^9}{\text{years}}} \\ {E\left( {{t_1}} \right) = 1\quad {\text{whence}}\quad { \ln }\left( {E\left( {{t_1}} \right)} \right) = \ln (1) = 0.} \\ \end{array} } \right. $$
(8)

The final condition is of course that today (t 2 = 0) the number of species equals A given by (4). Upon replacing both (4) and (8) into (7), the latter becomes:

$$ B = - \frac{{\ln \left( {E\left( {{t_2}} \right)} \right)}}{{{t_1}}} = - \frac{{\ln \left( {5 \cdot {{10}^7}} \right)}}{{ - 3.5 \cdot {{10}^9}{\text{year}}}} = \frac{{1.605 \cdot {{10}^{{ - 16}}}}}{{\sec }}. $$
(9)

Having thus determined the numerical values of both A and B, the exponential in (2) is thus fully specified. This curve is plot in Fig. 1 just over the last billion years, rather than over the full range between −3.5 billion years and now.

Fig. 1
figure 1

Darwinian Exponential curve representing the growing number of species on Earth up to now

Introducing the “Darwin” (d) Unit, Measuring the Amount of Evolution That a Given Species Reached

In all sciences “to measure is to understand”. In physics and chemistry this is done by virtue of units such as the meter, second, kilogram, coulomb, etcetera. So, it appears useful to introduce a new unit measuring the degree of evolution that a certain species has reached at a certain time t of Darwinian Evolution, and the obvious name for such a new unit is the “Darwin”, denoted by a lower case “d”. For instance, if we adopt the exponential evolution curve described in the previous section, we might say that the dominant species on Earth right now (Humans) have reached an evolution level of 50 million darwins.

How many darwins may have an alien civilization already reached? Certainly more than 50 millions, i.e. more than 50 Md, but we will not check out until SETI succeeds for the first time.

We are not going to discuss further this notion of measuring the “amount of evolution” since we are aware that endless discussions might come out of it. But it is clear to us that such a new measuring unit (and ways to measure it for different species) will sooner or later have to be introduced to make Evolution a fully quantitative science.

Darwinian Exponential as the Envelope of All b-Lognormals Representing Each a Different Species Started by Evolution at the Time t = b > 0 (Cladistics)

How is it possible to “match” the Darwinian exponential curve with the lognormals appearing in the Statistical Drake Equation?

Our answer to such a question is by letting the Darwinian exponential become the ENVELOPE of the b-lognormals representing the cladistic branches, i.e. the new species that were produced by Evolution at different times as long as Evolution unfolded.

Let us now have a look at Fig. 2 hereafter.

Fig. 2
figure 2

Darwinian Exponential as the ENVELOPE of b-lognormals. Each b-lognormal is a lognormal starting at a time (t = b = birth time) larger than zero and represents a different species “born” at time b of the Darwinian Evolution

The envelope shown in Fig. 2 is NOT really an envelope in the strictly mathematical sense explained in calculus textbooks. However, it is “nearly the same thing in the practice” because it actually is the geometric LOCUS OF THE PEAKS of all b-lognormals. We shall now explain this in detail.

First of all, let us write down the equation of the b-lognormal, i.e. of the lognormal starting at any positive instant t = b > 0 (while ordinary lognormals all start just at zero):

$$ \left\{ {\begin{array}{*{20}{c}} {{\text{b\_lognormal}}\left( {t,\mu, \sigma, b} \right) = \frac{1}{{\sqrt {{2\pi }} \sigma \left( {t - b} \right)}}{e^{{ - \frac{{{{\left( {\ln (t - b) - \mu } \right)}^2}}}{{2{\sigma^2}}}}}}} \\ {holding\;for\;t > b\;and\;up\;to\;t = \infty .} \\ \end{array} } \right. $$
(10)

Then, notice that its PEAK falls at the abscissa p and ordinate P given by, respectively (as given by the 8th and 9th line in Table 1):

$$ \left\{ {\begin{array}{*{20}{c}} {p = b + {e^{{\mu - {\sigma^2}}}}{\text{ = b\_lognormal\_peak\_abscissa,}}} \hfill \\ {P = \frac{{{e^{{\frac{{{\sigma^2}}}{2} - \mu }}}}}{{\sqrt {{2\pi }} \sigma }} = {\text{b\_lognormal\_peak\_ordinate}}{.}} \hfill \\ \end{array} } \right. $$
(11)

Can we MATCH the second Eq. 11 with the Darwinian Exponential (2)? Yes, if we set at time t = p:

$$ \left\{ {\begin{array}{*{20}{c}} {A = \frac{1}{{\sqrt {{2\pi }} \sigma }}} \\ {{e^{{B\;p}}} = {e^{{\frac{{{\sigma^2}}}{2} - \mu }}}} \\ \end{array} } \right.\;\quad {\text{that}}\;{\text{is}}\quad \left\{ {\begin{array}{*{20}{c}} {A = \frac{1}{{\sqrt {{2\pi }} \sigma }}} \\ {B\;p = \frac{{{\sigma^2}}}{2} - \mu .} \\ \end{array} } \right. $$
(12)

The last system of two equations may then be inverted, i.e. exactly solved with respect to μ and σ:

$$ \left\{ {\begin{array}{*{20}{c}} {\sigma = \frac{1}{{\sqrt {{2\pi }} A}}} \\ {\mu = - B\;p + \frac{1}{{4\pi {A^2}}}} \\ \end{array} } \right. $$
(13)

showing that each b-lognormal in Fig. 2 (i.e. its μ and σ) is perfectly determined by the Darwinian Exponential (namely by A and B) plus a precise value of the birth time b. In other words, this is a one-parameter (the parameter is b) family of curves that are all constrained between the time axis and the Darwinian Exponential.

Clearly, as long as one moves to higher values of b, the peaks of these curves become narrower and narrower and higher and higher. For instance, Fig. 3 shows the two b-lognormals corresponding to the two largest mass extinctions on Earth, occurred about 250 and 64 million years ago, respectively (end of Paleozoic and Mesozoic eras, respectively).

Fig. 3
figure 3

Darwinian Exponential as the ENVELOPE of two important b-lognormals: those positioned at the P/T and K/T mass extinctions, ending the Primary (or Paleozoic) Era and the Secondary (or Mesozoic) Era, respectively

Cladogram Branches Made up by Increasing, Decreasing or Stable (Horizontal) Exponential Arches

It is now possible to understand how cladograms shape up in our mathematical theory of Evolution: they depart from the time axis at the birth time (b) of the new species and then either:

  1. 1)

    INCREASE if the b-lognormal of the i-th new species has

$$ \left\{ {\begin{array}{*{20}{c}} {{A_i} = \frac{1}{{\sqrt {{2\pi }} {\sigma_i}}}} \\ {{B_i} = \frac{{\frac{{{\sigma_i}^2}}{2} - {\mu_i}}}{{{p_i}}} > 0\;\;{\text{that}}\;{\text{is}}\,\,\frac{{{\sigma_i}^2}}{2} > {\mu_i}.} \\ \end{array} } \right. $$
(14)
  1. 2)

    DECREASE if the same b-lognormal has

$$ \left\{ {\begin{array}{*{20}{c}} {{A_i} = \frac{1}{{\sqrt {{2\pi }} {\sigma_i}}}} \\ {{B_i} = \frac{{\frac{{{\sigma_i}^2}}{2} - {\mu_i}}}{{{p_i}}} < 0\;\;{\text{that}}\;{\text{is}}\,\,\frac{{{\sigma_i}^2}}{2} < {\mu_i}.} \\ \end{array} } \right. $$
(15)
  1. 3)

    KEEP STAYING CONSTANT (i.e. rather than exponential arches we have horizontal segments) for all time values for which the ith-b-lognormal is characterized by:

$$ \left\{ {\begin{array}{*{20}{c}} {{A_i} = \frac{1}{{\sqrt {{2\pi }} {\sigma_i}}}} \\ {{B_i} = 0\;\;{\text{that}}\;{\text{is}}\,\,\frac{{{\sigma_i}^2}}{2} = {\mu_i}.} \\ \end{array} } \right. $$
(16)

This case really is the most “routine” one, inasmuch as the given species neither increases nor decreases in time, but rather, for generations and generations, “the parents are born, mate, babies are born, the parents die, the babies mate, and so on endlessly”. This we call a STATIONARY species. And, mathematically, the surprise is that a STATIONARY species no longer is described by b-lognormals, but rather by the new probability density found by replacing the last Eq. 16 into (10), with the result is that (10) becomes the NEW STATIONARY pdf:

$$ f_{\text{NoEv}} \left( {t,\sigma, b} \right) = \frac{1}{{\sqrt {{2\pi }} \sigma \sqrt {{t - b}} }}{e^{{ - \frac{{{{\left( {\ln (t - b)} \right)}^2}}}{{2{\sigma^2}}}}}}{e^{{ - \frac{{{\sigma^2}}}{8}}}}. $$
(17)

In plain words, this is the pdf for species that undergo NO EVOLUTION at all! Clearly, more words and examples would be needed to better clarify our theory, but we have no space for that here, just as we had only 12 min for our talk!

Table 2 hereafter shows the main statistical properties of this new NoEv probability density function. They were evaluated by the author by virtue of a suitable Maxima code, where Maxima is the symbolic manipulator described at the site http://maxima.sourceforge.net/.

Table 2 Summary of the statistical properties of the new random variable NoEV given by Eq. 17 and representing the STATIONARY LIFE of a new species born at time b and undergoing NO EVOLUTION thereafter

KLT-Filtering in the Hilbert Space and Darwinian Selection Are “the Same Thing” In Our Theory

As a glance to the future developments of our mathematical theory of Darwinian Evolution, let us now recall that the KLT is… a principal axes transformation in the Hilbert space spanned by the eigenfunctions of the autocorrelation of a noise plus a possible signal in it. Put this way, the KLT (standing for Karhunen-Loève transform) may look “hard to understand” (Maccone 2010c; Szumski 2011). But we wish to describe by easy words that it amounts to the well-known Darwinian Selection process. In fact, consider a Euclidean space with a large number N of dimensions. A point there means giving N coordinates. Each coordinate we assume to be a function of the body that Humans have in common with other animals, but other animals may OR MAY NOT (because too primordial) have in common with Humans. Then, the axis representing Humans in this N-space has the largest variance of the set of points around it because Humans have ALL functions. Monkeys have NEARLY the same number of functions as Humans but in practice they have FEWER of them. Thus, the Monkey axis in the N-space has the SECOND LARGEST VARIANCE around it. In the mathematical jargon of the KLT this is re-phrased by saying that Humans are the DOMINANT = FIRST EIGENVALUE in the KLT of the N-space, whereas Monkeys are the SECOND EIGENVALUE, and so on for lower species, that are really almost “noise” (i.e. rubbish) when compared to Humans.

Now about filtering, i.e. extracting a tiny signal by virtue of the KLT from thick noise (this works so much better by virtue of the KLT than by virtue of the trivial FFT used by engineers all over the world, but that is another story, for which the reader may see Maccone 2010c).

So, just as the Darwinian Evolution FILTERED HUMANS OUT OF A LOT OF “NOISE” (i.e. other lower-level living organisms), so the KLT applied to the above large N-dimensional space may DESCRIBE MATHEMATICALLY the SELECTION carried on by Darwinian evolution across 3.5 billion years.

But that requires another paper at least, or, better, the new book entitled “Mathematical SETI” that this author is now writing.

Conclusion

Evolution, as it occurred on Earth over the last 3.5 billion years, is just one chapter of the larger book encompassed by the Drake equation, which covers a time span of 10 billion years or so.

In this paper we sought to outline a unified and simple mathematical vision of both Evolution and SETI, as the title of this paper says.

Our vision is based on the lognormal probability distribution characterizing N in the Statistical Drake Equation.

We have shown that the envelope of such lognormal distributions “changing in time” (b-lognormals) may account for the exponential increase of the number of living species on Earth over 3.5 billion years.