Introduction

A major unsolved issue in the study of the origin of life is the nature of the evolutionary processes that led to the selection of the L α-amino acids found in proteins from the large pool of prebiotic compounds. The ease of their formation in one-pot reactions such as those in Miller-Urey type experiments suggest they were present in the primitive Earth, a possibility strongly supported by the chemical characterization of approximately 80 different amino acids in carbonaceous chondrites (Burton et al. 2012). However, the fact that a number of amino acids found in contemporary life forms can be synthesized non-enzymatically under laboratory conditions or are present in meteorites does not necessarily imply that they were also essential for the appearance of life.

It is also possible that not all amino acids present in proteins today were available in the prebiotic environment but may be in fact the outcome of biological evolution. For instance, although a non-enzymatic synthesis of histidine has been reported (Shen et al. 1990), His may be a vestigial remnant of a catalytic nucleotide from the RNA World (White III 1976). Arginine and lysine, which are essential in nucleic acid-binding domains, are conspicuously absent both in carbonaceous chondrites and in prebiotic experiments (McDonald and Storrie-Lombardi 2010). Some of these compounds may have formed in minute amounts in the parental bodies of meteorites but have since decomposed (cf. Cleaves 2010). This possibility finds some support in the presence of small amounts of D/L ornithine, which is a product of arginine decomposition, in the preserved sample extracts from the Miller’s simulation of the action of spark discharges over the reduced mixture of volcanic gases during prebiotic times (Johnson et al. 2008).

In an insightful paper published over 30 years ago, Weber and Miller (1981) discussed the occurrence of the twenty α-amino acids found in proteins. As they underlined, it is quite surprising that α-aminobutyric acid, α-aminoisobutyric acid, norvaline, alloisoleucine, norleucine, and homoserine, some of which are major prebiotic products, are missing in extant proteins (Table 1). Specifically, they raised the issue of the absence of hydrophobic amino acids including norvaline (Nva, 2-aminopentanoic acid) and norleucine (Nle, 2-aminohexanoic acid), which as they noted, “is most striking and a major challenge to any attempt to account for the selection of the twenty protein amino acids” (Weber and Miller 1981, p. 227). Weber and Miller (1981) suggested that this could be due to norvaline’s enhanced rotational freedom around the protein backbone compared to that of valine. As noted by Weber and Miller, this could alter the conformation of proteins or prevent their proper folding, but concluded that it was not a compelling explanation and suggested that perhaps the absence of norvaline was the outcome of chance events or a frozen coding accident.

Table 1 Prebiotic synthesis of methionine and some protein and non-protein hydrophobic amino acids*

As discussed below, a number of recent developments allow a reexamination of this issue based on the evidence of the presence of norvaline and norleucine in extant proteins. This may provide some insights that may change our understanding of the composition of primitive proteins. We discuss separately the case of norvaline and norleucine.

L-Norvaline is Incorporated into Proteins

Detectable amounts of L-norvaline were first reported in a serine-rich antifungal peptide produced by Bacillus subtilis which also included Asp, Glu, Thr, Ala, Tyr and Phe (Nandi and Sen 1953), and in Serratia marcescens regulatory mutants of leucine biosynthesis (Kisumi et al. 1976). These observations were confirmed by reports of the presence of small amounts of norvaline instead of leucine in the α and β subunits of recombinant human hemoglobin produced by Escherichia coli (Apostol et al. 1997).

It is now well established that together with norleucine and homoisoleucine, the intracellular accumulation of norvaline results from the low-substrate specificity of the branched-chain amino acid pathway enzymes (Kisumi et al. 1976; Bogosian et al. 1989). As is typical of a number of enzymes, substrate specificity is far from absolute, and many of the branched-chain amino acid biosynthetic enzymes can act over a number of related α-ketoacids. There is conclusive experimental evidence demonstrating that in recombinant strains of E. coli both norvaline and norleucine are synthesized by chain elongation from pyruvate-derived α-ketobutyrate as an alternate substrate of α-isopropylmalate synthase (EC 2.3.3.13). As demonstrated by Apostol et al. (1997), pyruvate is the primary substrate in the biosynthesis of both Nva and Nle. High concentrations of intracellular pyruvate lead to the rapid synthesis and accumulation of norvaline, which at first is similar to that reported for norleucine, but after 5 h Nva accumulates 3 or 4 times faster than norleucine (Apostol et al. 1997).

Recent experimental evidence has shown that the in vivo production and accumulation of norvaline is rapidly enhanced under low oxygen pressures in a non-recombinant E. coli strain (Soini et al. 2008). Under anaerobic conditions, such as those that likely existed in the primitive environment prior to the development of an oxidizing atmosphere, high glucose concentrations lead to a rapid accumulation of pyruvate, which is immediately used as an alternative substrate for direct keto chain elongation to α-ketobutyrate first, and then to α-ketovalerate which undergoes transamination and forms L-norvaline (Soini et al. 2008).

Measurements of amino acid concentrations in clarified crude broth lysates containing medium and cells indicated after 10 h a norvaline concentration of 1.0 mM, as high as that observed for aspartic acid (Soini et al. 2008), i.e., the enzyme-mediated production of norvaline is as efficient as that of Asp, a proteinogenic amino acid. The intracellular accumulation of pyruvate under the anaerobic conditions and high levels of glucose described by Soini et al. (2008) or due to other universally distributed pathways such as alanine transamination may be interpreted as an ancient process that may have taken place during the early stages of metabolic evolution.

The misincorporation of norvaline for leucine in recombinant hemoglobin is correlated with the ratio of free norvaline/leucine, suggesting that its presence is due to the misaminoacylation of tRNALeu (Apostol et al. 1997). Although both leucine and norvaline are good helix formers (Lyu et al. 1991; Padmanabhan and Baldwin 1994), the incorporation of Nva in place of Leu in recombinant hemoglobin is not randomly distributed. This suggests that norvaline in certain positions of the recombinant hemoglobin may affect structural requirements for protein folding, insertion of the heme group or the assembly of the functional enzyme (Apostol et al. 1997).

The complex structure of extant enzymes is the evolutionary outcome of a complex series of events including internal gene duplications, slippage, accretion of functional domains and fusion events. It is likely that the earliest genetically encoded enzymes were small polypeptides with relatively simple structure, and that the basic traits of primitive active sites have been preserved during evolution and are among the oldest components of universally distributed extant enzymes. Although together with other nonpolar amino acids leucine plays a key role in aggregation processes that lead to proper protein folding, with few exceptions, it is conspicuously absent in the catalytic sites of most enzymes (Holliday et al. 2009). As shown in Table 2, only 15 of the 335 enzyme crystal structures available in the MACiE database (http://www.ebi.ac.uk/thornton-srv/databases/MACiE/queryMACiE.html), which includes detailed information of the catalytic activity of every Enzyme Commission (EC) enzyme subclass, have Leu in their catalytic site. Two of these 15 enzymes (malonyl-CoA-acyl carrier protein transacylase and phosphoenolpyruvate mutase) may be present in the three major cellular lineages due to horizontal gene transfer (HGT) events, and others have a restricted taxonomic distribution or are oxygen-dependent, which suggests that they are relatively recent developments (Table 2). As discussed below, this data suggests that leucine’s catalytic activities are rather limited, and that its substitution by the less hydrophophic norvaline in small structurally simple primitive polypeptides during early biochemical evolution would have affected mostly their folding properties.

Table 2 Enzymes with leucine residues in their reactive center (based on Holliday et al. (2007, 2009))

Branched-Chain Amino Acid Aminoacyl-tRNA Synthases Err

The fidelity of protein synthesis is dependent on accurate substrate recognition by the aminoacyl-tRNA synthetase (aaRS), which catalyzes the first step reaction of protein biosynthesis. Each aaRS recognizes a single cognate amino acid and covalently attaches it to the correct tRNA. The key role of the editing activity in recognizing the amino acid that corresponds to the precise anticodon is a striking demonstration of the dependency of the genetic code on the aaRS proofreading and editing of their mistakes (Döring et al. 2001; Ribas de Pouplana and Schimmel 2004).

However, such precision is far from absolute and in fact varies for different amino acids among the different aaRS (Young et al. 2011). Many aaRS cannot discriminate between cognate amino acids or structurally similar non-cognate amino acids in the synthetic reaction (Budisa 2004; Yadavalli and Ibba 2012; Cvetesic et al. 2013). This is the case of the branched aliphatic amino acids. Leucyl-, isoleucyl-and valyl-tRNA synthases (LeuRS, IleRS and ValRS, respectively) are members of an ancient homologous set of enzymes that are part of the class I subgroup of aaRS endowed with editing activities that hydrolyze mischarged tRNAs (Brown and Doolittle 1995). Their homologous editing domains share the same fold that must have appeared soon after the aaRS catalytic core structure was established, but prior to the duplications and divergence events that led to the three synthases (Cusack et al. 2000; Ribas de Pouplana and Schimmel 2004; Zhu et al. 2007). This evolutionary sequence of events can be understood in terms of the selection pressure imposed by the loose chemical specificity of primitive enzymes.

As Apostol et al. (1997) concluded, the incorporation of norvaline into proteins results from the misaminoacylation of tRNALeu. Recent experimental data have demonstrated that the E. coli elongation factor Tu cannot discriminate against mischarged tRNALeu, as it binds with similar affinities to both leucyl-tRNALeu and the mischarged valyl-tRNALeu, putting the burden of the editing activity on the aaRS itself (Cvetesic et al. 2013). Since the norvaline thee-carbon side chain cannot be excluded from the mischarged LeuRS, evasion of the translational proofreading activities leads to norvaline-containing proteins. As noted above, the available data shows that the misincorporation of norvaline for leucine is correlated with the ratio of free norvaline/leucine, suggesting that significant amounts of Nva can be incorporated in the hydrophobic regions of proteins. This possibility is amenable to experimental analysis with model polypeptides.

Size is Not Enough: Norleucine as a Substitute for Methionine in Protein Chains

Norleucine is also a by-product of the leucine biosynthetic pathway enzymes starting from pyruvate or α-ketobutyrate in place of α-ketoisovalerate, and can be misincorporated in place of methionine in recombinant proteins (Bogosian et al. 1989). Methionine and norleucine have comparable sizes and slightly different electric charges, but Met is much more hydrophilic than Nle (Table 4). It is known that the in vitro aggregation properties of human prion protein are dramatically altered by changing methionine for the more hydrophobic norleucine (Wolschner et al. 2009), and there is an ample bibliography that it can replace Met in a number of bacterial proteins, in many cases without loss of catalytic activity (Munier and Cohen 1959; Cohen and Munier 1959; Cowie et al. 1959; Anfinsen and Corley 1969; Old and Jones 1975). Like leucine, methionine is a good helix forming amino acid (Pace and Scholtz 1998), and is rarely found in the catalytic site of enzymes (Holliday et al. 2009). As shown in Table 3, only ten enzymes of the 335 present in the MACiE database, have methionine in their catalytic site, and only in one of these ten enzymes (4-cresol dehydrogenase) it partakes in catalysis. Of these ten entries, two of them (chloride peroxidase and xylose isomerase) may be present in the three primary kingdoms due to horizontal gene transfer (HGT) events, and three others (nitrate reductase, 4-cresol dehydrogenase, and phosphoenolacetaldehyde hydrolase) have a restricted taxonomic distribution or are oxygen-dependent, which suggests that they are relatively recent developments.

Table 3 Enzymes with methionine residues in their reactive center (based on Holliday et al. (2007, 2009))

Norleucine can be substituted for methionine in recombinant bovine protein (Bogosian et al. 1989). The presence of Nle in the dimer interface of the human enteric α-defensin HD5, a multifunctional small antimicrobial peptide, leads to an atypical parallel mode of dimerization but without affecting its antibacterial activity (Rajabi et al. 2012). As suggested by Barker and Burton (1979), the toxic effects of norleucine on microbial growth (Richmond 1962) are due to its inhibition of methylation reactions. Indeed, in addition to its role as a building block in proteins, methionine is the immediate precursor of S-adenosylmethionine (SAM), which is one of the major methyl-group donors in trans-methylation reactions in contemporary biochemistry. Accordingly, the suggestion that ribonucleotide-like coenzymes are remnants of an ancestral stage in which ribozymes played a more conspicuous role in metabolism (Orgel and Sulston 1971; White III 1976) would imply that methionine was incorporated into biological systems mostly because of its involvement in methyltransferase activities that evolved in a primordial RNA-dependent world (Parker et al. 2011b).

Conclusions

As summarized by Cleaves (2010), the presence of the 20 protein α-amino acids defies a simple, lineal explanation and should be seen as the outcome of a complex combination of their prebiotic accessibility, their metabolic availability, and the selection advantages derived from their different functional groups. Evidence of the incorporation of norvaline and, to a lesser extent, norleucine and other amino acids, suggests that the lack of absolute substrate specificity that characterizes not only the enzymes of branched-chain amino acid biosynthesis but also a number of aaRS implies that their chemical ambiguity has played a role in defining the expression of the genetic code.

The high values of amino acid half-lives as compared to those of ribonucleotides and RNA (cf. Cleaves 2012) support the possibility that when protein biosynthesis first evolved amino acids of prebiotic origin were still available. The inventory of prebiotic compounds may have not included key amino acids like histidine but, as reviewed here, the available evidence suggests that, in addition to valine, leucine and isoleucine, a number of related compounds such as norvaline, norleucine, α-amino-n-butyric acid and alloisoleucine, among others, must have also been present (Table 1). The alcoholic fermentation of α-amino-n-butyric acid (Kepner et al. 1954) suggests that some of these compounds may have ended up as sources of carbon and energy for primordial heterotrophs.

The evidence that leucyl-, valyl- and isoleucyl-tRNA synthases are the outcome of serial gene duplications implies that they descend from a largely unspecific enzyme (Zhu et al. 2007) whose substrate ambiguity we hypothesize may have led to the incorporation of higher amounts of prebiotic norvaline in early proteins. Evolutionary analysis of substrate promiscuity in enzymes mediating physiological and metabolic adaptations demonstrates that is in many cases it represents the starting point towards the evolution of new catalytic activities (Tawfik and Khersonsky 2010; Tawfik 2013). As summarized by Liu and Schultz (2010), there is evidence that under in vivo or in vitro conditions the lack of absolute specificity of the elongation factor Tu and the ribosome can lead to the incorporation of noncanonical amino acids and other compounds, including D amino acids and α- hydroxy acids. These observations support Woese’s contention that primitive protein synthesis was an error-prone process producing small catalytic peptides with somewhat different sequences lacking the accuracy and specificity of extant enzymes (Woese 1965, 1987).

The exhaustion of the prebiotic budget of norvaline, norleucine and other non-proteinic amino acids did not stop their misincorporation into proteins. The mechanisms described here may have operated during the early stages of biochemical evolution, but continued afterwards when the development of the biosynthesis of branched-chain amino acids led to norvaline and norleucine as by-products. The incorporation in proteins of norvaline in place of leucine is an outcome of the combination of the substrate ambiguity and multifunctionality of both leucyl-tRNA synthase and the branched-chain amino acid biosynthetic enzymes. Such functional flexibilities may confer evolutionary advantages, especially under anaerobic conditions that favor the accumulation of pyruvate.

The lack of absolute substrate specificity is neither primitive nor a specific adaptation, but the outcome of the inability of a number of enzymes to distinguish between related substrates. Comparative quantifiable data on chemical and physical properties of coded and non-coded amino acids provide important insights on the processes that led to the establishment of the standard genetic code (Lu and Freeland 2008; Philip and Freeland (2011). As summarized by Freeland and his coworkers, hydrophobicity, and charge of amino acids play a key role in protein folding and stability (cf. Lu and Freeland 2008). As shown in Table 4, although the electric charges of leucine and norvaline are comparable, Nva is not only significantly smaller that Leu, but also less hydrophobic. Protein folding is driven by the formation of a hydrophobic core, and it is reasonable to assume that as enzymes became increasingly complex, replacing leucine with the smaller and less hydrophobic norvaline would affect the packing of secondary structural elements required for the formation of the core. It would also affect protein ligand binding and protein-protein interactions. It can be hypothesized that the lower hydrophobicity of norvaline would probably also limit its ability to form stabilizing structures such as the Leu-rich repeats and the Leu-zippers that play a key role in a number of DNA- and RNA-binding proteins.

Table 4 Hydrophobicity (log P), size (van der Waals volume) and electrical charge (p I) of hydrophobic amino acids*

The evidence that norvaline biosynthesis is enhanced under anaerobic conditions (Soini et al. 2008) is a surprising demonstration of the indirect biochemical and metabolic consequences of the planetary transition to a highly oxidizing environment (Lazcano 2012). It also suggests that the broad LeuRS editing properties of Aquifex aeolicus (Zhu et al. 2007), a deeply branching hyperthermophilic microaerophilic bacteria (Deckert et al. 1998), may be understood in terms of their role in limiting the misincorporation of norvaline and norleucine that accumulate because of enhanced anaerobic pyruvate production. The results reviewed here also suggest that the search for hypothetical extraterrestrial life forms should consider the possibility that other biochemistries (Bada 2001; Pace 2001) may be based on a wider range of monomers.