Collagen’s Triglycine Repeat Number and Phylogeny Suggest an Interdomain Transfer Event from a Devonian or Silurian Organism into Trichodesmium erythraeum
- First Online:
- Cite this article as:
- Layton, B.E., D’Souza, A.J., Dampier, W. et al. J Mol Evol (2008) 66: 539. doi:10.1007/s00239-008-9111-7
Two competing effects at two vastly different scales may explain collagen’s current translation length. The necessity to have long molecules for maintaining mechanical integrity at the organism and supraorganism scales may be limited by the need to have small molecules capable of robust self-assembly at the nanoscale. The triglycine repeat regions of all 556 currently cataloged organisms with collagen-like genes were ranked by length. This revealed a sharp boundary in the GXY transcript number at 1032 amino acids (344 GXY repeats). An anomalous exception, however, is the intron-free Trichodesmium erythraeum collagen gene. Immunogold atomic force microscopy reveals, for the first time, the presence of a collagen-like protein in T. erythraeum. A phylogenetic protein sequence analysis which includes vertebrates, nonvertebrates, shrimp white spot syndrome virus, Streptococcus equi, and Bacillus cereus predicts that the collagen-like sequence may have emerged shortly after the divergence of fibrillar and nonfibrillar collagens. The presence of this anomalously long collagen gene within a prokaryote may represent an interdomain transfer from eukaryotes into prokaryotes that gives T. erythraeum the ability to form blooms that cover hundreds of square kilometers of ocean. We propose that the collagen gene entered the prokaryote intron-free only after it had been molded by years of mechanical selective pressure in larger organisms and only after large, dense food sources such as marine vertebrates became available. This anomalously long collagen-like sequence may explain T. erythraeum’s ability to aggregate and thus concentrate its toxin for food-source procurement.
KeywordsCollagenPersistence lengthIntronTrichodesmium erythraeumInterdomain transferCyanobacteria
Collagen is the most abundant protein as measured by the total mass present in humans (Di Lullo et al. 2002) and all organisms (Nimni et al. 1988). Fibrillar collagen is highly organized at critical mechanical stress locations such as bones, tendons, ligaments, nerve sheaths, skin, and cornea. In single-celled organisms, the necessity for a protein typically found in connective tissue is less clear. Exceptions are the collagen-like fragments found in Streptococcus equi (Liden et al. 2008) and Bacillus cereus (Daubenspeck et al. 2004). Nevertheless, collagen may have been one of the key emergent bridges from single-celled prokaryotes to multicellular eukaryotes. For a review of the importance of the triglycine repeat and how it is maintained in fibrillar collagens, see Kadler et al. (2007). Current consensus holds that the collagen gene began as a relatively short sequence that allowed for small groups of cells to organize into larger systems and perhaps eventually differentiate into complex colonies. Through the process of mutation and natural selection this gene eventually lengthened, diversified through gene duplication events, and was refined into its characteristic triple-helical, self-interacting structure, allowing for greater mechanical stresses to be sustained among cells (Boot-Handford and Tuckwell 2003; Buehler 2006b; Wada et al. 2006). This conferred the ability of the organism to maintain cellular diversity, accelerate quickly, absorb mechanical energy, and resist fracture (Buehler 2006b), allowing organisms that possessed it to rise to the “top” of the food chain.
The recently predicted, but yet undetected collagen-like gene found in Trichodesmium erythraeum by Orcutt et al. (2002) shares many characteristics with vertebrate collagen. It has a triglycine repeat region that is approximately 10% longer than that of most vertebrates, and it shares several identical or similar residues in its N and C termini. An intriguing possibility is that it acquired this gene through viral-mediated interdomain transfer from a larger marine organism.
T. erythraeum is a colonial marine cyanobacterium that can be seen with the naked eye. During periods of low wind stress and warm temperatures it forms blooms of surface aggregations that can be tens of thousands of kilometers wide (Capone et al. 1997). From NASA satellite images, blooms of this size have been seen (Negri 2006). Trichodesmium was given its name from its appearance: the Greek word “trichoma” for hair and “desmus” for bonded—”bonded-hair.” As the cells age, they become positively buoyant and rise to the surface (Walsby 1994). Once these segmented structures reach a critical length, they fracture, allowing new growth to occur at a new set of free ends as they enter their exponential growth phase (Bell et al. 2005). Its collagen-like gene may serve the purpose of maintaining colony contact even as individual cells lose direct contact with each other. These blooms or “rafts” consist of healthy and aged cells mixed with detritus in a “mucilaginous matrix” (Endean and Monks 1993). According to observations from the tropical and subtropical North Atlantic, Trichodesmium produces more nitrogen than any other macroscopic (0.5–4 mm) cyanobacteria and about half of the new nitrogen used for primary production (Capone et al. 1997). In fact, the biological productivity of large expanses of the ocean is often limited by the availability of nitrogen and Trichodesmium as an N2 fixer, thus making it of critical importance for supporting the metabolic requirements of a fast number of non-nitrogen-fixing organisms.
Genetic characterization of Trichodesmium species suggests that two distinct clades are present in the oceans: one including the closely related species T. thiebautii, T. tenue, T. hildebrandtii, and K. spiralis and the other containing only T. erythraeum as determined primarily by comparing introns and intein presence (Orcutt et al. 2002). The T. erythraeum ribonucleotide reductase (RIR) gene was found to encode four inteins and three group II introns, which is extremely unlikely to have occurred by chance, considering the rarity of inteins and introns (Liu et al. 2003) within this genus.
If early collagen genes were to survive as “the protein of choice” in large, multicellular organisms to maintain the mechanical integrity of tissues together under static loading conditions, and under extreme loading conditions such as avoiding predators or pursuing prey, then they must have been selected or “purified” by the two competing metrics: the ability to avoid rupture at high mechanical stress and the ability to self-assemble rapidly. This suggests the question: What is the optimal collagen translation length?
We present a theoretical framework for how the collagen translation length is balanced by the competing metrics of being long to support tensile loads while, at the same time, being short enough to robustly self-assembly. We also present experimental results demonstrating the presence of a collagen-like protein in T. erythraeum and a phylogenetic analysis. Experimental methods include immunogold atomic force microscopy, sequence alignment, and phylogeny tree construction.
Restated, the contour length, l, of the collagen triglycine repeat should have a tendency to increase as the mass, m, of an organism, and the accelerations, a, it is subjected to increase. However, having a tissue with a large cross-sectional area, A will diminish the need for long fibril-forming molecules.
The relatively nonmotile organism T. erythraeum uses gas vesicles rather than flagella to stratify (Walsby 1994). Thus forces generated by its environment such as wind and ocean currents likely to rend its colonies apart, are represented by the ma term of Eq. 3. This force could also be expressed as being proportional to Stokes drag: −6πηrv, where η is the fluid viscosity, r is the cell radius, and v is the fluid velocity. Although T. erythraeum has not been observed to manufacture true collagen fibrils, this does not preclude the presence of single-, double-, or triple-helical collagen at the cell surfaces that provide a cohesive material for maintaining colony integrity.
At the molecular scale, a contour length that is significantly longer than the persistence length may become a liability to self-assembly. Restated, the contour length l competes with the persistence length lp. Once l surpasses lp, the probability of self-entanglement increases, thus diminishing the probability of a molecule such as collagen to aggregate radially and axially into fibrils. Recent work by Buehler (2006a) indicates qualitatively that longer molecules do have higher probabilities for their two ends to come into close proximity through the accumulation of molecular-scale bends. For a review on the mechanisms of fibrillogenesis see Hulmes (2002).
Materials and Methods
Triglycine repeat ranking
Samples of T. erythraeum were provided by John Waterhouse, Woods Hole Oceanographic Institute. Upon receipt, samples of T. erythraeum suspended in 50-ml vials of native ocean water were aliquoted into 1.5-ml vials and immediately stored at −70°C. Frozen samples were thawed at room temperature and centrifuged at 10,000 g for 5 min. Pellets were then pipetted onto standard microscopy slides (Fisher Scientific, USA), coverslipped, and imaged in phase-contrast mode at 200× on an Olympus IX81 inverted light microscope. Images were captured digitally with a SPOT-RT camera (Diagnostic Instruments, Sterling Heights, MI).
Environmental scanning electron microscopy
Samples from the same aliquot as used for light microscopy were also prepared for environmental scanning electron microscopy in a Phillips XL30 ESEM (FEI, Hillsboro, OR) with a gaseous secondary electron detector (GSED) at a chamber pressure of 4.0 Torr and 15 kV. Samples were placed directly on an aluminum stub, which was then placed on a sample holder inside the ESEM chamber in environmental (wet) mode. Excess water was evaporated from the sample to facilitate observation of the sample. By circulating water molecules and maintaining the pressure of the chamber within the range of 2–10 Torr, ESEM allows for observation of samples that are insulating without the need for a coating and keeps hydrated samples moist.
Bovine collagen was used as a control and E. coli were used as a negative control. Fresh samples of E. coli were supplied by MinJun Kim of Drexel University. This technique was chosen after initial efforts with immunohistochemistry revealed that T. erythraeum emits strong autofluorescence at each of the three standard fluorescence microscopy colors, FITC, TRITC, and rhodamine, making standard immunofluorescence techniques impractical. E. coli were cultured and stored as described elsewhere (Steager et al. 2007). Both of the bacteria samples were thawed at room temperature and lightly vortexed. Five hundred microliters of T. erythraeum and 200 μl of E. coli were each pipetted into separate microcentrifuge tubes. The two specimens were then placed in a microcentrifuge for 2 min at 10,000 g. After the supernatant was removed, 500 μl of 1.8% formaldehyde (diluted from Fisher Scientific BP531-500) was added. Specimens were then vortexed and kept at 5°C. Bovine collagen strands (Sigma C9879) (Einbinder and Schubert 1951) were teased apart into a single fragment of approximately 100 μm3 and placed in a 1.5-ml microcentrifuge tube. All three specimens were then spun for 1 min at 13,000 rpm and the supernatant was removed. Two hundred microliters of PBS (0.01 M, pH 7.4) was added to each while the samples were kept at 5°C. Fifty microliters of rabbit collagen (I + II + III + IV + V primary antibody ab24117; Abcam, Cambridge, MA) was diluted in 1.5 ml of PBS (0.01 M, pH 7.4). There is some contention as to what the specific binding sites are for collagen antibodies. It is generally assumed that the triglycine repeat portion is nonimmunogenic, and that the globular N and C termini are antigenic. Thus we chose a polyclonal antibody with a high likelihood of binding to a wide array of collagens, since the T. erythraeum collagen is yet to be classified. The mixture was vortexed and then centrifuged at 10,000 g for 5 min to remove any impurities. The specimens were centrifuged for 1 min at 10,000 g and the supernatant was removed. Two hundred fifty microliters of the supernatant of the diluted primary antibody was added to each specimen. Next the specimens were vortexed and kept at 5°C for 30 min. To prepare the anti-rabbit IgG (whole-molecule) gold colloid secondary antibody (Sigma G7402), 1400 μl of 0.5 M NaCl (diluted from Ricca Chemical 7215-16), 1.4 mg of BSA (Sigma A7030), 0.7 μl of Tween 20, and 70 μl of FBS (Fisher Scientific BW-14-503F) were added to a microcentrifuge tube and vortexed. Seventy-two microliters of this solution was discarded, 100 μl of secondary antibody was added, and then the mixture was vortexed. The three specimens were centrifuged for 2 min at 13,000 rpm and the supernatant was removed. Three hundred microliters of PBS was added to T. erythraeum and E. coli and the specimens were centrifuged for 2 min at 10,000 g. The supernatant was removed, and the wash was repeated. Six hundred microliters of PBS was added to the bovine collagen and the specimen was centrifuged for 2 min at 10,000 g. The supernatant was removed, and 250 μl of diluted 1000:1 secondary antibody was added to all three specimens. The specimens were then vortexed and kept in the dark at 5°C for 1 h. The specimens were centrifuged for 5 min at 10,000 g and the supernatant removed. Three hundred microliters of PBS was then added to each specimen and the samples were vortexed. The specimens were centrifuged for 5 min at 10,000 g and the supernatant was removed. This process was repeated two more times, for a total of three washes with PBS. Fifty microliters of PBS was added to each specimen, which were kept at −20°C in the dark until imaging. Samples were imaged on a Digital Instruments Series 3100 Nanoscope in air-tapping mode with DNP-S tips (Veeco Probes) at resonance frequency (∼300 kHz). Image sizes ranged from 153 to 500 nm2, and all were sampled at a resolution of 256 × 256 pixels. Images were postprocessed using the Digital Instruments 5.12r5 Nanoscope software. Individual features with diameters between 2 and 15 nm were selected visually and counted in four images from each of the three samples: positive control, bovine collagen (BC), negative control, E. coli (EC), and the experimental samples, T. erythraeum.
Homology scoring and phylogenetic tree construction
Collagen sequence ascession numbers and organisms used for T. erythraeum collagen phylogeny tree construction
Alvinella pompejana (hydrothermal worm)
Bacillus cereus ATCC 10987
Danio rerio (zebrafish)
Danio rerio (zebrafish)
Drosophila melanogaster (fruit fly)
Ephydatia muelleri (Mueller’s freshwater sponge)
Gallus gallus (chicken)
Gallus gallus (chicken)
Homo sapiens (human)
Homo sapiens (human)
Homo sapiens (human)
Mus musculus (house mouse)
Mus musculus (house mouse)
Mytilus edulis (common mussel)
Oncorhynchus mykiss (rainbow trout)
Oncorhynchus mykiss (rainbow trout)
Oncorhynchus mykiss (rainbow trout)
Paralichthys olivaceus (bastard halibut)
Paralichthys olivaceus (bastard halibut)
Rana catesbeiana (bullfrog)
Rana catesbeiana (bullfrog)
Riftia pachyptila (thermal tubeworm)
Shrimp white spot syndrome virus
Tetraodon nigroviridis (pufferfish)
Trichodesmium erythraeum IMS101
These sequences were multiply aligned using the ClustalW program (Higgins et al. 1994) and then refined using the TreeRefiner program (Manohar and Batzoglou 2005). Formatting was performed using Boxshade 3.21 from EMBnet. The final tree was created using the Jukes-Cantor (1969) method of determining evolutionary distance and then using the sequence neighbor-joining (NJ) method (Gascuel 1997) as implemented in the Matlab R2007a bioinformatics toolbox (Cai et al. 2005). Gaps were treated as ‘missing values’ rather than as ‘differences’ so as not to overestimate resulting distance values. Bootstrap values were obtained from the ClustalW program using the default parameters of 1000 bootstrap repetitions and are reported as percentages (Higgins et al. 1994).
We used the NJ method since we are comparing multiple organisms from vastly different environmental niches. This is justified since it is likely that the selection criteria for the maintenance of the triglycine repeat region of marine cyanobacteria are as different from those for a fibrillar vertebrate collagen as are the selection criteria for afibrillar versus fibrillar collagens. We chose the NJ method rather than the unweighted pair group with arithmetic mean (UPGMA) method, since UPGMA only provides information about the relative order of the evolutionary path (e.g., Wiersma et al. 2005), whereas the NJ method provides an estimate of evolutionary distance.
Finally, to test whether the triglycine region artificially “forced alignment” between the T. erythraeum sequence and the other sequences included in the tree construction, we ran ClustalW with the glycine weight set to zero. This more rigorous approach was performed to evaluate the phylogeny purely on the nonglycine regions. With the glycines, which comprise nearly one-third of the sequence, absent from the analysis alignment and phylogeny, tree construction is more stringent, thus making the results more compelling.
Triglycine repeat ranking
The shelf in the graph in Fig. 2 indicates that this critical length for the collagen triple helix occurs at about 344 triglycine repeats. The obvious outlier, T. erythraeum, at 383 triglycine repeats suggests that this shelf is created by the necessity for the contour length, l, to be less than some critical multiple of the persistence length, lp. Indeed, Buehler (2006b) has predicted that past a contour length of ∼200 nm, extra length becomes a liability. This is explained through the following mechanics argument. A single collagen triple helix has a rupture force of approximately 22.5 nN (Buehler 2006a). By contrast, hydrogen bonds have a rupture forces three orders of magnitude lower (Gao et al. 2002), and individual cross-link rupture forces are approximately one order of magnitude lower (Sulchek et al. 2006). Thus, to break a single collagen triple helix, approximately 1000 hydrogen bonds, or approximately 10 cross-links, must be present. Therefore, a single collagen triple helix with a fixed cross-sectional area that accumulates more cross-links than the covalent bonds of its triple helix can support is more likely to rupture than a shorter triple helix with fewer cross-links. From this we conclude that there is no evolutionary advantage for individual triple helices to grow beyond a length that permits excessive accumulation of cross-links.
Environmental scanning electron microscopy
Atomic force microscopy
Immunogold atomic force microscopy
We have provided the first evidence that the collagen-like gene in the marine cyanobacterium T. erythraeum is expressed and present on the cell surface. We have also provided theoretical evidence that the collagen translation length may be determined by competing metrics of strength and self-assembly. Phylogenetic analysis indicates that the collagen-like gene appeared in this organism after the divergence of fibrillar and afibrillar collagens, but before the divergence of the fibrillar collagens. Finally, we argue that the maintenance of this gene within the genome of T. erythraeum provides it with a selective advantage in that it allows aggregations that enable it to “prey” on larger organisms through concentration of its neurotoxin and through mechanical gill-clogging mechanisms. Four discussion sections—theoretical, experimental, bioinformatic, and ecological—follow.
The triglycine portion of the collagen gene transcript appears to have reached an evolutionary “ledge” at approximately 340 GXY. This ledge likely appears at this specific length for two reasons: (1) if it were to become longer, this would represent a liability for self-assembly; and (2) a longer triple helix would not necessarily increase fibril strength, as additional length would allow the possibility of additional cross-links and hydrogen bonds to accumulate along its length in excess of what the covalent bonds of the triple helix itself can support.
Long collagen transcripts are driven by the need for a sufficient number of cross-links to develop between a single triple helix and its approximately 24 neighboring triple helices through hydrogen bonding and hydroxyproline-lysine cross-links. However, the need to self-assemble keeps the collagen molecule from becoming too long so that self-interaction prior to fibrillogenesis is less likely. The persistence length/contour length ratio has recently been discussed, but not systematically studied to determine the likelihood of self-interaction (Buehler 2006a).
In rope building, long subfibers are clearly an asset for developing great tensile strength. Shear lag theory (Weitsman and Beltzer 1992) states that the tensile force within a subcomponent of a tension member is proportional to the amount of shear stress developed within it. This theory remains valid for the molecular scale within and among individual collagen triple helices: the ratio between the sum of the strength of collagen’s cross-links and that of the triple helix itself determines its success or failure as an effective molecular rope. Adding additional binding sites along a triple helix might overwhelm the bond strength along a single triple helix (Buehler 2006b).
We have provided the first evidence that the abnormally long triglycine repeat within the collagen-like gene of T. erythraeum is being expressed. The collagen of T. erythraeum, which shares a great deal of identity with the Clade A fibrillar collagens of large vertebrates, apparently does not form fibrils. Other discoveries of “superlong” collagen molecules from the cuticle of marine tube worms and annelids (Gaill et al. 1991), interpreted from histograms taken from rotary-shadowed TEM images, indicate that there may be collagen molecules up to 2.4 μm in length. However, since this publication, the full collagen sequence data for the two organisms Riftia pachyptila and Alvinella pompejana, from which these samples were taken, indicate that their collagen protein sequences are substantially shorter: 1027 and 890, respectively. Gel electrophoresis data from Gaill et al. (1991) indicated that these two marine organisms do have massive collagens and that banded fibrils are present in the cuticle and interstitial tissues of both. However, the possibility exists that the molecular lengths measured by Gaill et al. were fibril fragments rather than individual triple helices.
The lack of fibrils present in our samples indicates that although the collagen-like sequence of T. erythraeum shares a great degree of identity with the Clade A fibrillar collagens, the lack of posttranslational enzymes required for N- and C-terminal cleavage prevents fibrillogenesis. Clearly further work is required to determine if the N termini of the T. erythraeum collagens are capable of performing the trimerizing required to initiate tropocollagen formation.
There are multiple alternatives for the origin of the collagen-like gene of T. erythraeum. One possibility is that it acquired its collagen-like gene intron-free through a horizontal transfer at a time when large vertebrates were prevalent, during the Devonian or Silurian epochs. The second alternative is that it evolved from a shorter prokaryotic version of the protein via repeat expansion. The relative likelihood of these two alternatives may be evaluated through more exhaustive phylogenetic analyses once additional sequence data become available from a greater number of organisms. Nakamura et al. (2004) (Nakamura et al. 2004) found, in analyzing whole genomes of 116 prokaryotes, that 14% of open reading frames were subjected to recent horizontal gene transfer. The most frequently transferred genes were those related to cell surface function, DNA binding, and pathogenicity.
Support for the former alternative is that shrimp white spot syndrome virus also carries a long, intron-free collagen sequence, with a length of 5054 bp (van Hulten et al. 2001). Lateral gene transfer from within a given species, from organelles to the nucleus, is a commonly observed occurrence (e.g., Adams et al. 1999) and has been used to estimate the amount of time organelles such as mitochondria have inhabited eukaryotic cells (Parkinson et al. 2005). Recently, lateral gene transfer has also been observed and discussed among prokaryotes, P → P, among eukaryotes, E → E, from prokaryotes to eukaryotes, P → E, and, recently, from eukaryotes to prokaryotes, E → P (Jenkins et al. 2002). An E → P event, however, might help explain why the gene for an extracellular matrix protein typically associated with vertebrates or multicellular metazoans might have found its way into contemporary prokaryotes through an event such as viral infection or naked DNA transfer. If this occurred in T. Erythraeum, it may have happend via a spliced mRNA intermediate as discussed by Andersson (2005). The lability of exposed RNA makes this less likely, but the abundance of introns in eukaryotic collagen DNA (50 for human type I αI) and the lack thereof in T. erythraeum make it likely that the infectious agent path is the only way for interdomain transfer to have occurred (e.g., Davis 2002). Another alternative is that this event occurred via a viral transfection event as mentioned by Gogarten (2003). The ecological relationships among viral phages and prokaryotes is vast and complex and may offer a viable explanation as to how this particular collagen-like gene entered the prokaryotic genome (Weinbauer 2004).
Regarding the second alternative, namely, that the origin of Trichodesmium erythraeum’s collagen-like gene was from a repeat expansion mechanism, the probability of this happening compared to the probability of transfection is difficult to estimate, but is presumably less involved and more probable in an organism with a plasmid-based DNA instruction set. Similar hypotheses have been put forth as an explanation for the large variety of collagen genes currently observed naturally (Boot-Handford and Tuckwell 2003).
The primary argument against our contention that the collagen-like gene found in T. erythraeum was inherited through horizontal gene transfer is that two vertebrate species such as human vs. fish (trout) sequences are nearly identical, whereas less identity is shared between T. erythraeum and any of the three vertebrates in Fig. 8. More specifically, 400 million years of evolution between two vertebrates such as trout and human has not greatly changed the amino acid sequence of the repeat region. However, with a few notable exceptions, such as the approximately 15% identity in the C terminus, and a few tenuous identities in the N terminus, the nonglycine residues of T. erythraeum within the triglycine region appear to share little identity with those of the vertebrates. Why, then, should T. erythraeum maintain such a long, interrupted, intron-free collagen-like sequence in its genome? This may be partially explained as follows: the fibrillar and afibrillar collagens have diverged to a much greater extent within individual species than, for example, human type I collagen and trout type I collagen have. Thus it is not merely the presence of a perfectly uninterrupted triglycine repeat region, but also the preservation of other critical residues such as prolines and lysines that make triple-helix formation and fibrillogenesis possible. This is reasonable since greater selective pressure is placed on organisms that rely on perfect triglycine repeats and their associated cross-bridge-forming lysines and rotationally stiff prolines to maintain fibril-forming collagens. Indeed, fibril formation confers a selective advantage: osteogenesis imperfecta. In an organism such as T. erythraeum that apparently does not form fibrils, but does apparently rely on its collagen-like protein for survival, the maintenance of the identity of its second and third residues is likely not critical, whereas the maintenance of its triglycine repeat appears to be especially valued.
Based on prior satellite imagery and our own evidence of filamentous structures found in unlabeled atomic-force microscopy preparations, we have presented evidence suggesting that this filamentous type of collagen might be a mat-forming matrix similar to other afibrillar collagens. A gene does not long remain within a genome unless it is serving the purpose of increasing organism (and gene) survivability (Dawkins 1989). Any “excessive” genes that do not contribute to fitness are quickly eliminated from the genome (Pal et al. 2003). We suggest that the remarkably long collagen-like protein found in T. erythraeum confers a selective advantage to its host organism by enabling it to maintain large colonies that give it the selective advantage of concentrating its secreted toxin (Cox et al. 2005; Wolk 1973), thus potentially enriching its available food supply.
Early in life’s history, when there were no multicellular organisms, organic energy supplies were likely scarce and diffuse. In this environment, it would likely have been to a single-celled organism’s advantage to diffuse or actively move away from its neighbors to reduce competition for resources. However, if a large, swimming energy source were present, the ability to colonize might prove to be advantageous (Burchard 1981; Martin 2002). A single bacterium attempting to intoxicate and kill an organism in the ocean would have little chance of success. But if colonization were to be made possible by the inclusion of an extracellular matrix protein, and large volumes and high concentrations of toxin could be produced, a larger food source might be killed and used as a food source. This purported selective advantage to colonize in a community that lacks signaling and motility necessitates the need for a glue to hold the bacterial colony together. This extraordinarily long collagen-like triglycine sequence may have provided this early glue, effectively creating the “earth’s first fishing net.”
While other collagen fragments appear in bacteria such as Bacillus cereus, these are likely used to attach to host extracellular matrix rather than for colonization purposes. Indeed, no other marine species of prokaryote has been shown to colonize to the extensive degree that T. erythraeum does. This cooperative nature of a group of primitive cells may even provide clues as to the origins of multicellular primitive organisms such as the hydra and sponges.
Interestingly, the triglycine repeat motif of collagen shares similar characteristics with two diseases: fragile X syndrome (Fu et al. 1991) and Huntington’s disease (Andrew et al. 1993). Genes responsible for both of these diseases cause the repeat of either a single amino acid or a triad of amino acids. While the molecular machinery that enables the “gene stuttering” necessary for producing collagen, the glue of multicellular life, it may have the detrimental effect of allowing some disease states to persist.
We would like to thank Pia Rossi for her help with environmental scanning electron microscopy, The State of Pennsylvania Department of Health award “Nanotechnology Meets Neuroscience” 4100026196-240418, the Nanotechnology Institute of Philadelphia for provision of the AFM, the Keck Foundation, and the National Science Foundation (Grant BES-0216343) for provision of the ESEM. Equally beneficial were fruitful discussions with David Hulmes, Fredrick Silver, and Donald McEachron.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.