Introduction

The origin of living systems on Earth is a mystery whose answer will likely never be definitively known. This is not because we cannot study origins (there are many fine researchers who have devoted their lives to this subject), but rather because there are necessary limits to human, scientific knowledge. Like many issues in evolutionary history, origins was likely contingent on conditions that cannot be known with certainty and may even have been swayed by chance astronomical events (meteor impacts, showers of matter from comets) that will be difficult if not impossible to recapitulate. Thus, most scientists remain agnostics on the exact mechanism of life's origins, and rightly so.

However, this ambiguous starting point should not be taken as a rebuff of the science that underlies our understanding of origins. Evolutionary biologists will sometimes suggest that origins is a subject different than the evolutionary history of life, but in so doing they reveal themselves as closet vitalists who assume that life is different than nonlife. Rather, origins is merely one stage of the grand history of replicators, which have elaborated themselves over time from simple strings of nucleic acids to complex strings of nucleic acids surrounded by the diversity of biological bags that we see today.

In understanding origins, I believe there are several key issues that need to be dealt with, starting with a definition (or lack of definition) for life itself. From there, we consider possible mechanisms for origins and finally deal in a rough and qualitative way with the probabilities for the likely terrestrial mechanism. In each section, there are many opportunities to add new questions to an already long list and to recognize that the subject of origins is fraught with mystical significance. The readers are welcome to question assumptions and conclusions, and to initiate analyses of their own that can contribute to the field. However, as with all science, such questions should be bounded by naturalism, to avoid the temptation to slide into the supernatural just because the natural is often frustrating.

What Is Life?

One issue with identifying life's origins is that no one really seems to know what life is. There have been varying definitions over the years, most of which have focused on the properties of living systems that we know (rather than on a more fundamental definition of life) and none of which has stood up to intellectual scrutiny. From my own vantage, there is a simple reason for this: there is no such thing as life; life is a term for poets, not for scientists. It is baggage from a vitalistic era that has little meaning in a more scientific era. That said, we can utilize the definition put forth by Gerry Joyce for NASA: Life is a self-sustaining system capable of Darwinian evolution. While this definition is more than a bit circular (“Life is the thing that we see, and the properties that we see are the properties that make it life.”), it may contain a key bit of truth: Self-sustaining systems (whether they be called “life” or not) may require Darwinian evolution. This hypothesis or observation can be examined at a more fundamental level by comparison with other systems that do not obviously bear the hallmarks of life.

First, it is clear that the living systems we can observe all contain linear strings of information based on DNA. Interestingly, scientists have created other, replicating linear (and nonlinear) strings of information that are not based on DNA. The best example of this comes from Reza Ghadiri at the Scripps Institute. Dr. Ghadiri and his coworkers took a helical structure made by peptides, a coiled coil known as a leucine zipper in which leucine residues interdigitate with one another at regular positions, leading to structural stability. They broke a leucine zipper in two and used a whole leucine zipper to template the ligation of the two half zippers. This again resulted in a whole zipper, which could dissociate from its template, becoming a template itself for another cycle of replication and ultimately leading to the self-replication of wholes by halves (Fig. 1a). The very interesting thing about this system was that it was incapable of mutation in the same way that DNA is (Lee et al. 1997). When different residues (isoleucine or valine) were introduced near the leucine–leucine interdigitation, the isoleucine could template not only itself but also the valine, and vice versa. This led to a mutualistic, interconnected cycle in which valine-substituted replicators could make isoleucine-substituted replicators, and vice versa (Fig. 1b). Contrast this with DNA: when a guanosine residue mutates to adenosine, the adenosine no longer efficiently pairs with cytosine. The mutation is quantized; it leads to an offspring with a unique identity, one that is not lost by mutualism. The mutant offspring of DNA of necessity compete with their parents for resources; the mutant offspring of peptide replicators do not. Darwinian evolution is possible with DNA in a way that is not possible with peptides. This also brings out a feature of Darwinian evolution that is not often noted: the basis of evolution is not only survival, but competition itself. Not all replicating systems compete. To the extent that there is still a wish on the part of the reader to use the hackneyed term “life,” one could say that life (or at least the living systems that we know of) of necessity compete/s.

Fig. 1
figure 1

Peptide replicators. A Leucine zipper peptides templating their own coupling from half molecules (red electrophile substrate and blue nucleophile substrate). B Peptide hypercycles. Green represents a peptide with, say, valine in a key position, while red represents a peptide with an isoleucine in the same position. Unlike nucleic acids, they are capable of efficient cross-replication

Second, it is clear that the living systems we can observe replicate their information strings from simpler compounds. DNA is replicated via the polymerization of nucleoside triphosphates. Again, this is not a requirement for replication; it is merely one form of a replicator. Scientists have created other replicators that do not resynthesize their information, but instead present it in a new form. One example of this is prions, diseased forms of protein molecules that replicate via conformational changes. A more experimentally tractable example, though, has recently arisen from the field of DNA computation. Two DNA hairpins can be constructed such that they would prefer to pair with one another, making a long, double-stranded molecule (Fig. 2) (Yin et al. 2008). However, they cannot, as the sequences that would be involved in the pairing are hidden within the hairpins. This is known as a “kinetic trap,” a molecular reaction that is energetically favorable but very slow. Once a hairpin opens even transiently, it can potentially react with the other hairpin, but until it does, the hairpins will remain…hairpins. Now, if a catalyst strand (not unlike a polymerase) is introduced into the reaction, it will assist with opening one of the hairpins and thus speed the formation of the double-stranded product and in the process, will itself be recycled so that it can act on other hairpins. Overall, the hairpin substrates become a double-stranded product with the help of a catalyst strand, just as nucleoside triphosphate hairpins become double-stranded DNA with the help of a polymerase catalyst. The analogy is inexact because there is no “template” that guides the formation of the double-stranded DNA. It does not recreate itself; it is merely the most stable product. But this is in fact the point. Conformational replicators, whether prions or DNA computers, are quite different from synthetic replicators in that they “fall down” to the most stable state and do not populate multiple, intermediate states of roughly equal energy, which is what DNA helices with different sequences do. It can be argued that the huge energy driver available from covalent bond formation is what allows the plethora of information-rich intermediates to be populated, whereas the energy ensembles that conformational replicators inhabit are anathema for a Darwinian replicator. Or in other words, just as lacking a competitive identity disqualifies you from being a living system, an insufficient kinetic barrier between substrates and products prevents you from having an identity, much less a competitive identity.

Fig. 2
figure 2

Conformational nucleic acid replicators. Two stable hairpins, H1 and H2, could potentially form a longer, double-stranded molecule, H1:H2 (bottom left). However, they are kinetically trapped. Addition of a catalyst strand (C1) leads to invasion and strand displacement of H1 (letter a), revealing a “toehold” region (3*). 3* can in turn invade H2 (letter b), ultimately resulting in release of C1 (letter c) and a repeat of the cycle

Where Did Life Come From?

There are various theories where life came from, but they essentially boil down to this: life arose on the planet Earth, or life arose elsewhere and seeded the planet Earth. Given that the latter theory just puts off thinking about the possibilities inherent in the former, we'll just assume for the moment that life arose on the planet Earth. To the extent that this is true, we search for ways to examine the events that likely occurred. Unfortunately, we lack a time machine. Lacking a time machine, we cannot say with any authority what must have occurred. We are forced to rely upon inference, and that inference comes from three sources: (a) paleontology, (b) molecular paleontology, and (c) experimental science.

While paleontology tells us with some authority that there must have been a thing called “dinosaurs,” it tells us with somewhat less authority that there must have been a thing called “cells” and when such cells arose. The problem is that cells are much less morphologically distinct in the fossil record than is, say, a T. rex, and that the older the rocks one is examining, the less likely that a fossil, bacterial or otherwise, will have been preserved. That said, there are at least some fossil stromatolites that resemble modern-day biological consortia. These fossils likely really are the remains of bacterial superstructures whose purpose was to live in tidal pools and harvest sunlight. So at best, what we can know from paleontology is that bacteria existed a long time ago, and from isotopic records, we can discern that life quickly took over the planet and eventually altered its chemistry to the oxygen-rich environment we see today.

Molecular paleontology is a more secure source of knowledge about what early living systems looked like. This is because the molecules inside of organisms are likely better conserved than the varying shells that surround them. This hypothesis in turn derives from the way in which evolution works on small molecules. The structures of molecules such as amino acids are constrained because they are used in so many different polymers and processes. The near universality of the genetic code makes it likely that the 20 common amino acids we know today were the same 20 common amino acids that were present billions of years ago. In a broader form, this is called the “principle of many users,” and it also applies to other small molecules, including nucleotides and cofactors.

What's very interesting is that most of the molecules that sit at the core of modern metabolism are nucleotides or are derived from nucleotides. The principle “energy coin” of most cells is ATP, adenosine triphosphate, a nucleotide. The principle “redox coins” are FAD and NAD, both of which also contain adenosine. The largest cofactor, vitamin B12, contains adenosine, and the versatile one-carbon carrier, folate, is derived from GTP. Wherever you look, you find nucleotides. This suggests that when metabolism was invented, it was invented based on the material at hand (nucleotides), and once multiple users started relying on these metabolites, their structures became fixed in time, like insects in amber. We could no more now go back and change adenosine to 2,6-diaminopurine riboside than we could globally change arginine to homoarginine. The system would crash; we are constrained to use the ribonucleotides that were present from the start.

The other hint that a metabolism based on RNA catalysis may have preceded our modern metabolism based on protein catalysis is the fact that the ribosome is a ribozyme. The engine of protein biosynthesis is composed largely of RNA and is serviced by a variety of tRNA machines. The core of the ribosome, its active site, where peptide bond formation takes place, is almost devoid of accessory proteins (Noller et al. 1992). These observations are all consistent with the evolution of protein biosynthesis in the context of a complex RNA world (Fig. 3) (Benner et al. 1989). In this view, the modern domains of life were preceded by a last common ancestor or “progenote” that had already invented translation (hence the uniformity of the genetic code) and that was metabolically complex. Other molecular lineages that could provide more information on the putative RNA world have long since gone extinct, leaving us with only chemical inferences. Fortunately, at least some of these chemical inferences can be tested in the laboratory, as described below.

Fig. 3
figure 3

A putative RNA world. The three domains of modern life, eubacteria, archaebacteria, and eukaryotes, clearly have a last common ancestor that was already metabolically complex. The chemical nature of the “metabolic fossils” conserved between these domains suggests that this antecedent arose from a RNA world in which ribozymes were the principle catalysts. The likely extinction of many early molecular lineages obscures any attempt to identify one or more origins of life

If there was a RNA world in which ribozymes rather than proteins directed a wide swath of metabolism, then it should be possible to recreate these ribozymes, or more appropriately their doppelgangers, in the laboratory. This has indeed proven to be the case, using a technique known as directed evolution or in vitro selection. Large, random sequence libraries of molecules can be generated synthetically and then sieved for functionality, such as the ability to bind metabolites or catalyze reactions (Ellington et al. 2009). In this way, RNA-binding species, aptamers, were discovered that could in fact bind a range of small molecules, and as were new ribozyme catalysts with properties that would have been valuable in a nascent RNA world, such as phosphodiester and carbon–carbon bond rearrangements. More importantly, many of the reactions that are involved in translation, such as tRNA charging and peptide bond formation, can be performed by selected ribozymes, providing further evidence for the emergence of the ribosome from a community that was initially filled with ribozymes (Orgel 1968; Crick 1968).

The complexity of the selected functional RNAs varies widely but gives some idea of the probability (or improbability) of their having arisen de novo. It was especially exciting to identify a ribozyme that could catalyze 3′–5′ phosphodiester linkages, akin to how modern life's proteinaceous enzymes polymerize nucleic acids (Bartel and Szostak 1993). A large randomized pool (>200 nucleotides) was generated, and ribozyme ligases that could append a specific sequence tag to themselves were selectively amplified by reverse transcription and PCR. After multiple cycles of selection and amplification, ligase activity was indeed enriched in the pool. Further characterization revealed seven different families of ligases. Surprisingly, all of the ribozymes found catalyzed a 2′–5′ ligation reaction with the exception of one, the class I Bartel ligase. This ribozyme was large and relatively complex; indeed, additional experiments that determined its informational complexity suggested that it should only have been selected about once in every ten thousand times the experiment was carried out (Ekland et al. 1995). This has been taken to mean that there may be many different ligases of roughly equal complexity in the vast sequence space that was explored and thus that complex structures and catalytic functionalities could in fact have been discovered in early evolution. While the initial catalytic rate of the class I ligase was modest, additional engineering and modifications have converted the ligase to a limited polymerase, capable of acting on exogenous templates and comparable to at least some protein enzymes that catalyze similar reactions (Wochner et al. 2011). Joyce and his coworkers have even adapted the ligase to continuous evolution, in which it is capable of self-improvement in a chemostat-like environment (Wright and Joyce 1997). While we still await the evolution of a “xeroxase,” a ribozyme that can replicate itself, these initial steps toward self-replication provide strong experimental validation of the possibility of an ancient RNA world.

The Inevitability of Life

While we can now posit a pathway from what some would call prebiotic compounds to living systems (from chemicals to replicating systems), the plausibility of individual steps in this pathway remain unknown (and probably unknowable), but arguments can be made as to how, over time, certain features of living systems were all but inevitable.

First, there is the inevitability of base pairing. As indicated above, nucleic acids are very special molecules, and complementary nucleobase interactions are very special interactions. Over the universe of possible compounds, it is likely that nucleobases are privileged for replication. This does not mean that nucleobase interactions will necessarily be seen in biomolecules…except for the fact that it is also relatively simple to generate nucleobases by relatively simple prebiotic routes. Oro and coworkers demonstrated the simplicity of adenine generation (Yuasa et al. 1984), while Miller and coworkers have chimed in with guanosine (Levy et al. 1999). The formation of pyrimidines and of glycosidic bonds to nucleic acid backbones remains problematic, but the presence of nucleobases capable of taking up their unique replication functionality was all but assured on planet Earth.

Second, there is the inevitability of function. It has proven possible to select functional nucleic acids from even relatively small random sequence pools, giving greater credence to the de novo emergence of function at even the earliest junctures. It is possible that functionality was selected even in advance of replication, allowing certain classes of nucleic acid chemistries, sequences, or structures to build up in isolated environments. In addition, the emergence of nucleic acid functionality reinforces and expands upon the inevitability of base pairing and the attendant inevitability of self-replication, below. Greater function can be garnered from very short nucleic acids than from other classes of compounds, including short peptides. This is because very short nucleic acids are already capable of forming structures (by virtue of base pairing) and thus of forming pockets for interacting with other molecules or performing catalysis.

Third, there is the inevitability of self-replication. To the extent that oligomers existed which were capable of base pairing, then there would have been a strong driver for the emergence of self-replication. The seminal experiments of Orgel and von Kiedrowski show us that even very simple oligonucleotides can catalyze template-directed ligation and reproduce themselves (Orgel 1992). While it is true that such parabolic replicators would have been limited by product release, it is also true that correctly paired substrates and products would have increased proportionately in a population relative to mismatched pairs. In a sea of prebiotically available sugars and nucleobase variants, oligonucleotide hybridization and self-replication could have led to the purification of chemically correct compounds (i.e., ribose backbones with guanine) relative to incorrect compounds (i.e., arabinose backbones with 1-methyl guanine). The correct compounds would find themselves in strings that got progressively longer (and that would enjoy a further replicative advantage), while the incorrect compounds would remain stubby and incompetent.

Fourth, there is the inevitability of mononucleotide polymerization. While correct chemistry aids hybridization and thus replicability, the longer the oligonucleotide substrate, the more likely that it could absorb the energetic consequences of mispairing. Therefore, replicators would “selfishly” select for not only correct chemistry but eventually for shorter and shorter substrates (James and Ellington 1999). This selection would also have been driven by the ready exhaustion of rare longer substrates relative to more plentiful shorter ones. In the limit, monomer polymerization is the only strategy likely to be sustainable, both in terms of fidelity and substrate availability. In consequence, there would have been strong evolutionary driving forces for the emergence of the xeroxase/replicase, a polymerase capable of acting on itself and/or another template. This argument makes the experimental proof that ribozyme polymerases could have existed, and has made optimized selection all the more compelling.

Fifth, there is the inevitability of cellularization. One of the first problems an efficient replicator would have encountered would have been parasitization. Fortunately, cell-like entities may have already been available for nascent replicators to escape their parasitic derivatives. The Luisi lab has generated lipid replicators that have semidefined compositions, rather than defined sequences or structures, and have demonstrated self-replication of micelles. The compound ethyl-caprylate slowly hydrolyzes in alkaline solution, yielding ethanol and sodium-caprylate, which is amphipathic and forms micelles. These micelles in turn catalyze hydrolysis, thereby slowly increasing the rate of micelle formation. Once a critical concentration of micelles is reached, micelle concentration increases exponentially.

Sixth, there is the inevitability of metabolism. As we have already mentioned, nucleic acid replicators would likely have exhausted their foodstuffs relatively quickly and would have been forced to develop adjunct catalysts to resupply the replicator. A network of reactions in which ribozymes were replicated as long as they supplied the replicator could have readily formed; such a network would have been the first genome, irrespective of whether the ribozyme templates/genes were covalently connected or not. Experimentally, the Holliger lab has convinced a ribozyme polymerase to synthesize another ribozyme, the hammerhead cleavase, much as an ancient polymerase would have coordinated the production of the ribozymes in its subordinate metabolism. Unfortunately, maintaining such a network at the expense of parasites (above) would have been difficult, and it is therefore likely that cellularization preceded (or was coincident with) the development of metabolism. Moreover, while a nucleic acid replicator and its catalytic adjuncts could have invaded a replicating lipid amalgam, there would have been no way to ensure the continued replication of the cell-like compartment…unless ties between nucleic acid catalysis and lipid metabolism were built. Such ties would cement the ad hoc cellularization arrangement.

Seventh, there is the inevitability of diversification. At this point, we are talking about a replicating genome within a cell with attendant metabolism. For all practical purposes, we are talking about the equivalent of a modern cell. Many more changes would occur to this cell before it eventually became us, including the invention of translation, the bottlenecking through the last common ancestor, and then the remarkable diversification into the three domains of life we know today. However, at some level, these are all just details, rather than the more fundamental story of life's origins.

Conclusion

It is hoped that the reader has gained some insight into at least one scientist's view of life, its origin, the mechanism of its origin, and the reason that a naturalistic view of such origins is not as frustrating as one might initially think. The “seven inevitabilities to the origins of life” echoes a previous exposition by Cairns-Smith (The Seven Clues to the Origins of Life) but is hopefully more rigorous both philosophically and experimentally. In the end, though, this guide is just a jumping-off point for your own explorations. In this regard, the reader is invited to join an ongoing discussion which will likely never be fully resolved.