Introduction

The protein world hypothesis proposes that life originated from proteins (Lacey et al. 1999; Ikehara 2002; Andras 2006). Because the chemical structures of amino acids are simpler than those of nucleic acids, amino acids can be synthesized from inorganic substances more easily (Miller 1953; Schlesinger and Miller 1983). Additionally, peptides are more stable than RNA (Vékey et al. 1996; Lönnberg 2011). In the protein world hypothesis, pseudoreplication is thought to have occurred instead of the replication observed in extant life. Pseudoreplication is a process where proteins comprising the same set of amino acids, which possess similar but non-identical structures, are generated by a random process without resorting to exact duplication (Ikehara 2009). The GADV hypothesis is a form of the protein world hypothesis (Ikehara et al. 2002; Ikehara 2002, 2005, 2009; Oba et al. 2005).

In the GADV hypothesis as proposed by Ikehara et al. (2002), life originated from proteins constructed of only glycine (Gly, G), alanine (Ala, A), aspartic acid (Asp, D), and valine (Val, V). These proteins are called [GADV]-proteins. These four amino acids can be easily synthesized from inorganic substances suggested to have been present on the primordial earth environment (Miller 1953; Schlesinger and Miller 1983; Kobayashi et al. 1998, 1999; Takahashi et al. 1999), and they all correspond to the generic genetic code triplet GNC (N = G, C, U, or T). Among these four amino acids, both hydrophobic (Ala and Val) and hydrophilic (Asp) amino acids are included, and Gly, Ala, Asp, and Val are frequently found in extant proteins. Thus, Ikehara et al. hypothesized that proteins constructed from Gly, Ala, Asp, and Val could form α-helix and β-sheet structures similar to those of extant proteins. Although complete replication, in which the biopolymer identical to the template is produced, cannot occur in a protein world without nucleic acids, pseudoreplication has been proposed to occur for [GADV]-proteins (Ikehara 2009). Because [GADV]-proteins include only four types of amino acids, Ikehara et al. conjectured that the three-dimensional (3D) structures of randomly generated [GADV]-proteins may occasionally be similar to those of their “parent” [GADV]-proteins (Ikehara 2009). In this scenario, “parent” [GADV]-proteins catalyze the syntheses of Gly, Ala, Asp, and Val, and these amino acids are randomly polymerized to generate “child” [GADV]-proteins. Although the sequences of these child proteins differ from those of the parent proteins, the GADV hypothesis predicts that their 3D structures may occasionally be similar (Ikehara 2009). Pseudoreplication is also considered in the garbage-bag world (Dyson 1985) and Garakuta world hypotheses (Kobayashi et al. 2010), in which the origin of metabolism is thought to have preceded the origin of replication.

The GADV hypothesis is a hypothesis for the origin of life, as well as for the origin of proteins. Extant proteins are constructed of approximately 20 types of amino acids [called the “Magic 20” (Crick et al. 1957)]. Several researchers considering the origin of proteins have assumed that primitive proteins were constructed of a more limited number of amino acids (Zuckerkandl et al. 1971; Jordan et al. 2005; van der Gulik et al. 2009; Oshima 2011). In “limited amino acid set” hypotheses for primitive proteins, Gly, Ala, Asp, and Val are frequently included. For example, van der Gulik et al. (2009) hypothesized that peptides including only Gly, Ala, Asp, Val, and metal ions played important roles in the protobionts of the RNA world hypothesis.

As previously mentioned, [GADV]-proteins attract attention in considerations of the origin of life and/or origin of proteins. However, the 3D structure formations of [GADV]-proteins have been insufficiently investigated, especially at the atomic level. Although investigations using the 3D structures of extant proteins and observations of the enzymatic activities of [GADV]-proteins have been conducted, the 3D structures for [GADV]-proteins themselves have not yet been examined (Ikehara 2002; Oba et al. 2005).

Previously, we predicted the 3D structures of randomly generated [GADV]-proteins using protein structure prediction methods (Oda et al. 2013). Although threading, in which the known 3D structures of extant proteins are used, cannot construct stable structures of [GADV]-proteins, ab initio modeling, in which information about extant proteins is used to a lesser degree than in threading, can predict the 3D structures of some [GADV]-proteins (Oda et al. 2013). Additionally, molecular dynamics (MD) simulations, in which only Newton’s equations of motion are used and no information about the 3D structures of extant proteins is required, are useful for structural predictions of [GADV]-proteins (Oda et al. 2013). However, known structural information for extant proteins is still used even for ab initio modeling, and the 3D structures of short peptides cannot be predicted by this method (Xu and Zhang 2012). Therefore, predictions of the 3D structures for primitive short peptides proposed by the GADV hypothesis cannot be conducted by ab initio modeling. In addition, because the computational costs of normal MD simulations for the structural predictions of proteins are very high, normal MD simulations at ordinary temperatures are difficult to perform.

In this study, REMD simulations (Sugita and Okamoto 1999) were conducted to predict the 3D conformations of randomly generated peptides including only Gly, Ala, Asp, and Val ([GADV]-peptides). In REMD simulation, many replicas of the system are generated, and MD simulations under different temperatures are conducted for different replicas. When the simulations reach a predetermined time, the replicas are exchanged using statistical physical procedures (Sugita and Okamoto 1999). Thus, the results of the MD simulations at high temperatures are included in the results of the MD simulations under ordinary temperatures, and the structural searches of proteins or peptides can be conducted efficiently. The 3D conformations of [GADV]-peptides with 20 residues were investigated using REMD simulations in this study. Random peptides with only 20 residues cannot form fixed stable structures, because the intramolecular interactions that are indispensable to the formations of fixed 3D structures cannot be sufficiently formed. In fact, even one of the smallest and well-known natural proteins i.e., the human parathyroid hormone (1–34), consists of 34 residues (Jin et al. 2000). Thus, the conformation tendencies and abilities of secondary structure formation were evaluated for the [GADV]-peptides.

Methods

Peptide Sequences

In silico structural predictions were conducted for the computationally generated random peptides including only Gly, Ala, Asp, and Val. The probability of occurrence was equal for all four amino acids. The sequences used in this study are shown in Table 1. In this study, 40 sequences consisting of 20 residues each were used. In the GADV hypothesis, aggregates of short [GADV]-peptides are considered to have played enzymatic roles in the earliest stages of life. The lengths of peptides that are nonenzymatically generated under the estimated primordial earth environment have previously been reported (Ferris et al. 1996; Imai et al. 1999; Oba et al. 2005; Futamura and Yamamoto 2005), and some researchers have reported that peptides of a few dozen residues can be generated. For example, Ferris et al. (1996) reported obtaining 55-residue peptides. Therefore, 20-residue peptides were used in the present study (peptide 1 ~ peptide 40 in Table 1). Linear structures constructed using the tleap module of AMBER12 (Case et al. 2012) were used for the initial structures of the REMD simulations to avoid any initial structure biases.

Table 1 [GADV]-peptides used in this study

REMD Simulations

In the preparatory calculations before the MD simulations, structural optimizations of the randomly generated, computationally constructed [GADV]-peptides were performed. The maximum number of optimizations was 20,000. Multiple MD simulations at different temperatures were performed in parallel for REMD, and in this study, the following 16 temperatures were used: 269.7 K, 284.4 K, 300.0 K, 316.4 K, 333.8 K, 352.0 K, 371.3 K, 391.7 K, 413.1 K, 435.7 K, 459.6 K, 484.8 K, 511.3 K, 539.3 K, 568.8 K, and 600.0 K. In preparation for REMD, 200-ps normal MD simulations with a 2-fs time step were performed to achieve equilibria at the given temperatures. Then, 50-ns REMD calculations with a 2-fs time step were performed. The number of MD steps between each exchange attempt was set at 500. Because high-temperature MD calculations were included in the REMD simulations, artificial (undesirable) chiral inversions frequently occurred. Thus, structural restrictions using the initial structures were adopted to avoid chiral inversions. The atomic radii of mbondi3 and GBn (Mongan et al. 2006) with igb = 8, which is one of the generalized Born implicit solvent methods using re-optimized parameters (Nguyen et al. 2013), were used. The GBn model was used to recreate the influence of water solvent. The concentration of counter-ions for the ion screening of interactions according to the Debye–Hückel limiting law was set at 0.1 M, because 0.1 to 0.2 M is the optimal concentration range recommended to simulate the physiological condition (Case 2005). The lengths of bonds containing hydrogen atoms were constrained by SHAKE (Ryckaert et al. 1977). The AMBER ff12SB force field was used for preparatory optimizations and production MD. The cutoff for non-bonded interactions was 999 Å, i.e., no cutoff. The calculations were conducted using AMBER12 (Case et al. 2012).

Analysis

For the analyses of the results of the REMD simulations, the trajectories of the last 10 ns were used, because the early parts of the trajectories were influenced by initial structures. The results of REMD simulations of structurally known peptides suggest that it is preferable to eliminate the first 20 ns of REMD trajectories (Oda et al. 2011). In this study, because the structurally unknown random [GADV]-peptides were used, the first 40 ns of trajectories were eliminated. The root mean square deviations (RMSDs) were calculated for the trajectories, and the stabilities of the [GADV]-peptide structures were evaluated. Even if the RMSDs did not completely converge, the tendencies to form 3D structures were investigated. The structures at 40 ns were used as the reference structures of the RMSD calculations, because the start point of analyses were 40 ns. The secondary structures for all residues of the [GADV]-peptides were assigned using the DSSP method (Kabsch and Sander 1983) during the trajectories. The occurrence rates of secondary structures in the trajectories were also calculated. For example, when the occurrence rate of the helix structure was 0.5 for one residue, the residue was included in the helix structures for 5 ns of the last 10 ns of the simulation. For the assignments of secondary structure, the residues classified as α helix, 310 helix, and π helix were defined as helix residues, while those classified as parallel β sheet, antiparallel β sheet, and hydrogen-bonded turn were defined as β-structure residues. The definitions of these secondary structures are identical to those used by Ikebe et al. (2007). In addition, the root mean square fluctuations (RMSFs) were calculated to evaluate the fluctuations of the residues. Small RMSFs indicate that the residues were located in structurally fixed portions of the peptides, which may form fixed 3D structures. For the RMSF calculations, the average structures of the peptides during the last 10 ns of the simulations were used as the reference structures. The solvent-accessible surface areas (SASAs) and polar surface areas (PSAs) of the peptides were also calculated to evaluate the molecular sizes and solubilities of the peptides. The RMSD, DSSP, and RMSF calculations were performed using the cpptraj module of AmberTools (Case et al. 2012). The SASAs and PSAs were calculated using the dms program (Richards 1977; Huang 2002).

Results and Discussion

First, the RMSDs of the trajectories for the last 10 ns were calculated. The average values of the RMSDs during the trajectories of the last 10 ns are illustrated in Fig. 1. As shown in this figure, the RMSDs were large for all [GADV]-peptides. This result indicates that the simulations did not completely converge and that completely fixed 3D structures could not be formed for these short [GADV]-peptides. However, the RMSD values were widely different for each peptide: for example, the average RMSD was less than 5.0 Å for peptide 12 and greater than 6.5 Å for peptide 28. These results suggest that the 3D structure formation abilities of [GADV]-peptides with 20 residues depend on their amino acid sequences and that comparatively stable 3D structures may be formed from the appropriate sequences of peptides.

Fig. 1
figure 1

Average root mean square deviations (RMSDs) during the trajectories of the last 10 ns for glycine, alanine, aspartic acid, and valine [GADV]-peptides with 20 residues

The numbers of residues assigned as helix or β structures in 50 % or more of the trajectories for the last 10 ns are indicated in Fig. 2. As shown in this figure, the occurrence frequency of β structures was much less than that of helix structures. In addition, only turn structures were observed for β structures in peptides 20, 29, and 31, and parallel and antiparallel β sheet structures were not observed when the threshold was 50 % for the trajectories of the last 10 ns. Because the peptides used in this study were only 20 residues in length, the interactions that are indispensable to maintaining stable sheet structures could not be formed. Conversely, helix structures were frequently observed in [GADV]-peptides with 20 residues. For peptides 4 and 13, 8 of the 20 residues were assigned as helix structures in the 50 % or more of the trajectories for the last 10 ns. Although these peptides also did not form completely fixed 3D structures, as shown in Fig. 1, the results shown in Fig. 2 indicate that these sequences are prone to form secondary structures. The RMSD of peptide 4 was comparatively small among the 40 peptides, and the RMSD of peptide 13 was close to the average. Thus, these structures seem not to be divergent. These results suggest that peptides with 20 residues including only Gly, Ala, Asp, and Val can form secondary structures without aggregation. As shown in our previous study (Oda et al. 2013), the abilities of secondary structure formations seem to be higher for longer sequences, and β sheet structures, which were rarely observed in this study, could be found in these peptides. Additionally, aggregates of short peptides may form secondary structures more frequently than do single short peptides. Indeed, β sheet structures are observed more frequently than helix structures in long chains because of the high β sheet formation ability of Val (Oda et al. 2013). Therefore, [GADV]-proteins with several secondary structures can be generated. However, few to no residues were included in secondary structures for many peptides. In peptides for which all residues were not included in secondary structures, many RMSD values (such as that of peptide 23) were also large, and these peptides seemed to form random coils. Although amino acids are considered to be randomly polymerized under the GADV hypothesis, the results indicate that both stable- and non-stable structure-forming [GADV]-peptides are constructed.

Fig. 2
figure 2

Numbers of residues assigned as helix and β structures

In Fig. 2, the number of residues included in secondary structures with a threshold of occurrence >50 % are shown. The residues in some peptides formed secondary structures with significantly higher occurrence rates. Although these peptides did not form completely fixed 3D structures, stable secondary structures were frequently found in these peptides. The occurrence rates of helix structures for each residue in peptide 9, one of these peptides, are shown in Fig. 3. For comparison, the occurrence rates of helix structures for each residue in peptide 13, in which the number of helix residues was the largest (as reported in Fig. 2), are also illustrated. Although only six helix residues were observed for peptide 9 (Fig. 2), the occurrence rates of helix structures for four of these six residues (Val8–Ala11) were greater than 70 %. The highest occurrence rate, for Ala9, was 78.8 %. While eight helix residues were observed for peptide 13, a larger number than in peptide 9, the highest occurrence rate in this peptide, for Asp5, was only 63.3 %. These results suggest that the residues of [GADV]-peptides occasionally form secondary structures with high probability and that fixed secondary structures may be formed even from randomly generated [GADV]-proteins.

Fig. 3
figure 3

Occurrence rates of helix structures for each residue in peptides 9 and 13

Examples of the obtained conformations are illustrated in Figs. 4 and 5. The occurrence rates of helix and sheet structures for each residue in these peptides are also shown. The structure shown in Fig. 4 is that of peptide 4 at 40.011 ns, and the structure shown in Fig. 5 is that of peptide 31 at 45.008 ns. Although these structures were transient, helices and/or sheets were occasionally formed in the REMD trajectories (about 70 % for peptide 4), as shown in the occurrence rates of Figs. 4 and 5. As shown in Fig. 4, helices were formed in both the N-terminal and C-terminal regions for peptide 4. Conversely, in peptide 31, two residues had antiparallel β-sheet occurrence rates greater than 10 % (Ala2 and Ala8). Thus, antiparallel β-sheet structure (yellow region) may be observed in Fig. 5. These results suggest that REMD simulations can evaluate the abilities of secondary structure formation for [GADV]-peptides. Using normal MD simulations, 3D structures cannot be sufficiently formed for most [GADV]-proteins, even if simulations are conducted for hundreds of nanoseconds. However, the REMD method allows 3D structures to be predicted with comparatively lower computational costs than normal MD.

Fig. 4
figure 4

Occurrence rates of secondary structures and three-dimensional (3D) structure at 40.011 ns for peptide 4

Fig. 5
figure 5

Occurrence rates of secondary structures and the three-dimensional (3D) structure at 45.008 ns for peptide 31

The number of residues with RMSFs lesser than or equal to 4.0 Å are shown in Fig. 6. These values thus indicate the number of “rigid” (less fluctuated) residues: a larger number for a peptide indicates a greater tendency to form a fixed structure. Although RMSFs cannot be simply compared to the abilities of secondary structure formation, some peptides, such as peptide 4, have both many rigid residues and high secondary structure formation ability. These peptides may be used as structural models of primitive proteins. Additionally, peptides 9 and 13 seem to form helix structures (as shown in Fig. 3). Although the numbers of rigid residues were relatively small for peptides 9 and 13, the presence of nine and eight rigid residues for peptides 9 and 13, respectively, indicated that their structures were fixed to a certain degree. However, some peptides possessed large RMSDs and low numbers of rigid residues; secondary structures were not observed in these peptides. For example, peptide 23 had an average RMSD of 6.485 Å (Fig. 1), zero residues included in secondary structures (Fig. 2), zero rigid residues (Fig. 6). Such peptides were did not appear to be the components of proteins that can form fixed 3D structures in primitive life, because they seemed to be structurally too flexible. To investigate this aspect, we planned to perform the simulations of [GADV]-peptide aggregates and/or longer [GADV]-peptides.

Fig. 6
figure 6

Numbers of “rigid” residues. The numbers of residues with root mean square fluctuations (RMSFs) smaller than or equal to 4.0 Å are shown

SASAs and PSAs were evaluated for the last 10 ns of the trajectories of the [GADV]-peptides, and the average values were calculated. The SASA and PSA values varied widely across the examined peptides despite their identical lengths. The maximum values for SASA and PSA were 1449.451 Å2 and 594.1586 Å2 (both in peptide 2), respectively. The minimum values for SASA and PSA were 1168.158 Å2 (peptide 34) and 327.7273 Å2 (peptide 4), respectively. SASA represents the molecular size, and PSA indicates the hydrophilicity of the peptide (Palm et al. 1997). These results indicate that a wide variety of peptides with a wide variety of physicochemical properties can be constructed even when only Gly, Ala, Asp, and Val are used as peptide components. The values of PSA/SASA, representing the ratio of PSA (surface area corresponding to polar atoms) and SASA (surface area of the whole molecule) are shown in Fig. 7. Because polar moieties have high affinities for water, PSA/SASA is expected to reflect peptide water solubility. The wide range of PSA/SASA values suggests that [GADV]-peptides have widely varying solubilities. For example, the PSA/SASA value of peptide 4, which has a very high ability of helix formation (Fig. 4), was the lowest of all peptides. This result indicates that the hydrophobic surface was comparatively large in peptide 4. The hydrophobic surface is known to play an important role in the molecular recognition of proteins (Yamaotsu et al. 2008; Oda et al. 2009). Protein-protein interfaces are hydrophobic in many cases, and the ligand binding sites of proteins frequently include hydrophobic surfaces (Ajay and Murcko 1995; Oda et al. 2009). The GADV hypothesis assumes that aggregates of short peptides played enzymatic roles in the early stages of life. Therefore, peptides with low PSA/SASA values, like peptide 4, may play important roles in the GADV hypothesis. It should be noted that soluble peptides also may have played important roles in the primordial soup, possibly serving as homogeneous catalysts i.e., primitive enzymes. Together, the results indicate that peptides including only Gly, Ala, Asp, and Val could have populated the primordial soup with significant variation.

Fig. 7
figure 7

Polar surface areas (PSAs)/solvent-accessible surface areas (SASAs) for glycine, alanine, aspartic acid, and valine [GADV]-peptides

Conclusion

In this study, the 3D conformations of randomly generated peptides including only Gly, Ala, Asp, and Val were investigated using computational methods. The results of this study indicate that some of the 40 examined peptides formed stable secondary structures to a certain degree. In addition, a wide variety of peptides with a wide variety of physicochemical properties, such as molecular size and water solubility, can be constructed even when using only Gly, Ala, Asp, and Val as peptide components. These results support the hypothesis that primitive proteins could have been constructed from a limited range of amino acids early in the history of life.