KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

Dunlap, Walter C; Starcevic, Antonio; Baranasic, Damir; Diminic, Janko; Zucko, Jurica; Gacesa, Ranko; H van Oppen, Madeleine J; Hranueli, Daslav; Cullum, John; Long, Paul F

doi:10.1186/1471-2164-14-509

KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

Database
Open access
Published: 26 July 2013

Volume 14, article number 509, (2013)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

Download PDF

Walter C Dunlap^1,2,
Antonio Starcevic⁴,
Damir Baranasic⁴,
Janko Diminic⁴,
Jurica Zucko⁴,
Ranko Gacesa⁴,
Madeleine J H van Oppen¹,
Daslav Hranueli⁴,
John Cullum⁵ &
…
Paul F Long^2,3

14k Accesses
45 Citations
3 Altmetric
Explore all metrics

Abstract

Background

Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics.

Description

Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca²⁺-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics.

Conclusions

We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives.

De novo transcriptome assembly for four species of crustose coralline algae and analysis of unique orthologous genes

Article Open access 30 August 2019

Comparative analysis of the genomes of Stylophora pistillata and Acropora digitifera provides evidence for extensive differences between species of corals

Article Open access 14 December 2017

Comparative analysis of the Pocillopora damicornis genome highlights role of immune system in coral evolution

Article Open access 31 October 2018

Background

All of the reef-building corals (Scleractinia; phylum Cnidaria) that create the vast calcium carbonate deposits of coral reefs have evolved an endosymbiotic partnership with photosynthetic dinoflagellates of the genus Symbiodinium (Dinophyceae), commonly known as zooxanthellae, which reside within the gastrodermal cells of their scleractinian host [1–3]. Coral-algal symbiosis is a cooperative metabolic adaptation necessary for survival in the shallow oligotrophic (nutrient-poor) waters of tropical and subtropical marine environments [4, 5] that drives the productivity of coral reefs [6]. Coral reefs provide habitat and trophic support for many thousands of marine species, the richness of which rival the biological biodiversity of tropical rainforests [7]. Underlying the basic requirements of corals for growth, reproduction and survival are special needs to accommodate symbiont-specific host recognition, to control innate and responsive immune systems, and what is likely to emerge from future research is the extent to which the host is involved in direct regulation of its endosymbiont populations. Much is understood about the cellular biology of cnidarian-dinoflagellate symbiosis (reviewed in [8]), but less is known at the molecular level of coral symbiology. There is little opposition to the contention that environmental and anthropogenic disturbances are causing alarming losses to coral reefs ([9] and reference therein). Threats to productivity are being imposed by the disruption of coral symbiosis (apparent as “coral bleaching”) caused in response to increasing thermal stress attributed to global warming [10, 11], from an increase in stress-related coral disease [12–14], from the discharge of domestic and industrial wastes, pollutants from agricultural development and the transport of sediments in terrestrial runoff [15, 16], and potentially from imminent declines in coral calcification owing to rising ocean acidification [17–19]. Accordingly, we require a better understanding of the molecular stress responses and adaptive potential of corals. Such information is necessary to predict bleaching events and so better inform effective management policies for the conservation of coral reef ecosystems [20–24].

To understand how coral holobionts respond to environmental change at the molecular level, the identification of genes that may respond by transcription to stress is of primary importance [25]. Thus, the use of transcriptomic methodologies to identify stress-responsive genes has been highly successful [26–32]. Transcriptome high-throughput profiling has allowed changes in gene expression across thousands of genes to be measured simultaneously. Fuelled by data-generating power, the number of coral based studies utilising transcriptomics to investigate molecular responses to environmental stressors has expanded greatly by the acquisition of expressed sequence tag (EST) gene libraries, the fabrication of microarray biochips used to estimate levels of mRNA expression, and by direct analysis using next-generation, high-throughput sequencing. However, much of this work has been conducted using the aposymbiotic state of pre-settlement coral larvae, so transcribed genes relevant to metamorphosis and the cytobiology of the adult polyp are limited to a few recent studies [33–36]. The transcriptome additionally does not provide the structural framework and essential regulatory elements of the functional genome for comprehensive evaluation. Recently, deep metatranscriptomic sequencing of two adult coral holobiomes has been made available on searchable databases: PocilloporaBase for Pocillopora damicornis[36] and PcarnBase for Platygyra carnosus[37]. In contrast, high-throughput metaproteomic analyses to quantify the product yield of stress-response genes of the coral holobiome are yet to be widely adopted by the coral reef scientific community, despite the proteome being the ultimate measure of the coral phenotype [38, 39].

The early accumulation of transcriptomic data revealed that a small proportion of coral ESTs matched genes known previously only from other kingdoms of life, implying that the ancestral animal genome contained many genes traditionally regarded as ‘non-animal’ that have been lost from most animal genomes [40]. Furthermore, an unexpected revelation from EST data is the greater extent to which coral sequences resemble human genes than those of the Drosophila and Caenorhabditis model invertebrate genomes [41, 42]. Comparative genomic analysis has revealed higher genetic divergence and massive gene loss within the ecdysozoan lineages. Hence, many genes assumed to have much later evolutionary origins are likely to have been present in an ancestral or early-diverged metazoan [43]. While much of the animal kingdom remains yet to be explored, examples of the metazoan phylum Cnidaria provide a unique insight into the deep evolutionary origins of at least some vertebrate gene families [42]. Thus, the complete genomic sequence of a coral is likely to reveal many genes previously assumed to be strictly vertebrate innovations. To date, cnidarian genomes have been published for the sea anemone N. vectensis[42] and the hydroid Hydra magnipapillata[44]. Only the coral genome of Acropora digitifera is available without restriction on use of its published sequence [45], but the compiled sequence has not been fully annotated. At the time of this writing, the genome assembly of Acropora millepora has been released to the public domain [46], also without full annotation, but an embargo is imposed on use of this data that is highly restrictive to the progress of further studies. Understanding how genomic variation affects molecular and organismal biology is the ultimate justification of genome sequencing, and annotation is an essential step in this process. We envisage that unrestricted access to annotation of the A. digitifera genome will provide an unprecedented foundation to freely interrogate the generic molecular structure, possible endobiotic interactions and the response of coral to environmental stress. Accordingly, we offer annotation of the predicted proteome of A. digitifera on the open access and searchable database, ZoophyteBase [47]. Use of the ZoophyteBase search engines will allow genes of encoded proteins to be identified that can be examined in context of the cellular physiology, processes of ecological significance, the evolutionary and developmental biology of corals and the functional metabolism of the holobiont that collectively underpin the health of coral reefs.

Construction and content

ZoophyteBase is an open access and searchable database of complete annotation of the predicted proteome of the coral A. digitifera[48]. It was constructed using the MEGGASENSE system, which is a general system for constructing annotation databases with different sorts of input data (DNA reads, assembled genomes, predicted proteomes) and the possibility of using different combinations of analysis tools to create the annotation (Gacesa et al, in preparation). In the case of ZoophyteBase, hidden Markov model (HMM) profiles [49] were chosen as the annotation tool rather than the more common BLAST searches [50]. HMM profiles are constructed from multiple alignments of protein families and contain information about conserved differences in amino acid residues as well as deletions and insertions [49]. This is particularly important for a coral database, as corals are evolutionarily distant to most other organisms. This means that known homologous sequences present in the databases will usually have relatively low similarity, making BLAST searches inaccurate. The statistical information in an HMM profile gives more sensitive and accurate detection of sequence homology. An additional advantage of HMM profiles is that the statistical significance of hits (the expected value) is much more accurate than that calculated by BLAST programs.

The quality of sequence annotation is limited by the accuracy of information provided in any database used. It is well known that there are many problems with annotation in the large uncurated databases such as the NCBI GenBank nr sequences. Widely accepted, the most accurate database for functional annotation is the KEGG database [51]. The KEGG database organises sequences as groups of KEGG orthologues. These are sets of homologous sequences from as wide a range of organisms as possible having an assigned molecular function. These functions are arranged in a hierarchical fashion and grouped in biological pathways. The sequences belonging to KEGG orthologues were used to construct HMM profiles for annotating the coral sequences. Accordingly, the 23,524 predicted proteins encoded in the coral genome were analysed using HMM profiles. If a protein showed a highly significant correlation (“hit”) to a single HMM profile, this was used to create a “trusted” annotation of the sequence. Choosing a cut-off for this criterion is not trivial, because longer sequences tend to have more significant e-values. For construction of ZoophyteBase the criterion 1e-5 was used. This resulted in 19,044 predicted proteins giving “trusted” sequence annotation. For many of these proteins there were two or more highly significant hits to established HMM profiles. In these cases, the most significant correlation was used to construct our “best-fit” annotation file, but other hits can be viewed by the database user so that expert knowledge can be employed to override the automatic annotation function. In 8,004 out of 19,044 predicted proteins which were annotated, more than one annotation was assigned based on non-overlapping regions within the protein which were used to construct the “best-fit” annotation file. We interpreted these as “fusion” events generated by the in silico protein prediction method used, and these proteins were treated as multiple instead of single encoded proteins. Hence, this analysis resulted in the annotation of 33,195 proteins in total, generated from the original 23,524 predicted coral proteins. This is a very conservative annotation scheme, so it can be assumed that most of the annotations are biologically meaningful. Almost 81% (19,044 out of 23,524) of the predicted proteome was assigned using this method.

Utility

The MEGGASENSE system was used to generate a web interface for ZoophyteBase. The home page (Figure 1A) allows the use of several functions. A text version of the entire annotation can be downloaded for manual inspection. There is a proteome overview that gives statistics about the database and a breakdown of the annotated functions into different categories of genes. A particularly useful feature of ZoophyteBase is the ability to use text queries employing a search engine that provides a relevant inquiry in the absence of an exact match between key words of a search and those described for a functional protein. The search engine uses text from the KEGG-database, PubMed and other sources to establish links between query words to access protein data using an intelligent Google-like search engine implemented by the search platform Lucene/Solr [52]. This helps to overcome the common problem that different terminology is used by different groups of researchers. The use of this search function is illustrated by using the query “phagocytosis” (Figure 1B). This inquiry finds 42 hits to KEGG orthologue profiles. One of the hits corresponds to amphiphysin (a synaptic vesicle protein) with annotation of two protein homologues encoded in the coral genome. On the data page there is a brief description of the function of amphiphysin together with a PUBMED literature reference. The sequences of the predicted coral proteins (Figure 1C) can be retrieved, and it is also possible to analyse such data with computer aided drug design methods [53] to extract conserved domains. There are also two tools for the user to examine matches to protein sequences. The user can carry out a BLAST search against the coral protein sequence or analyse the predicted sequence against HMM profiles used to annotate the coral proteome. These tools require only the user to paste their queury into the sequence window.

In this manuscript we demonstrate the utility of ZoophyteBase by presenting predicted gene-encoded proteins revealed by annotation of the A. digitifera genome that have physiological, biological and environmental significance. We discuss features of importance in coral physiology: (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca²⁺-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defencesome and (15) coral epigenetics.

Discussion

Regulatory proteins of symbiosis

Metabolic cooperation is a key feature of coral-algal symbiosis that allows reef-building corals to inhabit the often nutrient-poor waters of tropical oceans [54]. In this phototropic symbiosis, fixed carbon produced by resident algae is released to the host for nutrition, and the algal symbionts benefit by acquiring the inorganic nutrient wastes of host metabolism [2, 55]. The symbiotic dinoflagellates reside and proliferate within a specialised phagosome (the symbiosome) maintained within host gastrodermal cells. This arrangement requires complex biochemical coordination by the coral at various metabolic stages that includes endocytosis (phagocytosis) by post-settlement polyps to acquire algal symbionts, accord symbiosome recognition to arrest phagosomal maturation for sustained organelle homeostasis, activate symbiophagy or exocytosis to eliminate damaged symbionts [56, 57], and regulate apoptotic or exocytotic pathways to remove excess or impaired populations, all of which have long been recognised as essential to preserve the stability of coral symbiosis [58]. Although these processes are poorly understood in corals, it has been realised from studies of the sea anemone Aiptasia pulchella, a related anthozoan also containing Symbiodinium sp. endosymbionts, that the persistence of algal-containing symbiosomes in Cnidaria relies on the exclusion or retention of small Rab GTPase family proteins that are key regulatory components of vesicular trafficking and membrane fusion in eukaryotic cells [59]. Significantly, ApRab3 and ApRab4 accumulate in the biogenesis of maturing symbiosomes of A. pulchella[60, 61], and mature symbiosomes enveloping healthy dinoflagellates have tethered ApRab5 [62], a checkpoint antagonist of downstream ApRab7 and ApRab11 proteins that would otherwise direct autophagy of the symbiont cargo [63, 64].

Our annotation of the A. digitifera genome reveals sequences encoding putative Rab homologues of the Ras superfamily of proteins (Table 1). In a comparison of cnidarian Rab proteins, eight proteins of A. digitifera matched homologues of Aiptasia pulchella, twenty-nine matched proteins encoded by the aposymbiotic freshwater H. magnipapillata and the aposymbiotic anemone N. vectensis genomes, while seven Rab and Rab-interacting proteins of A. digitifera did not match other cnidarian proteins (Table 2). Significantly, the eight homologues of A. digitifera that matched exclusively Rab proteins of A. pulchella included homologues of the aforementioned ApRab3, ApRab4 and ApRab5 proteins attributed to the maintenance of healthy symbiosomes in Aiptasia, while homologues of the autophagic ApRab7 and ApRab11 proteins are found also in N. vectensis. While Rab GTPase and their effector proteins coordinate consecutive stages of endocytic vesicular transport [65, 66], soluble N-ethylmaleimide-sensitive factor attachment receptor (SNARE) proteins are essential for Rab assembly to complete endosomal fusion of vesicle membranes [67], a process by which Rab proteins impart specificity by binding distinct Rab and SNARE partner proteins prior to membrane fusion [68]. Genes encoding syntaxin-like SNARE proteins have been unambiguously identified [69] from coral EST database libraries constructed from expressed mRNA isolated from various early life stages of Acropora aspera, A. millepora, A. palmata and Orbicella faveolata (= Monastraea faveolata), as well as from the genome of the sea anemone N. vectensis[70]. In metazoans, vacuolar r-SNARE receptor proteins comprise the syntaxin, synaptobrevin and VAMP family proteins, of which there are eight syntaxin and syntaxin-binding proteins (plus two plant-like syntaxins). Additionally, there are one t-SNARE target protein to direct vacuolar morphogenesis, two synaptosomal proteins, one synaptosomal complex ZIP1 protein (yeast homologue), one synaptobrevin membrane protein of secretory vesicles, ten vesicle-associated membrane proteins (VAMPs), a vacuolar protein-8 regulator of autophagy, four vacuolar-sorting proteins and two SEC22 vesicle trafficking protein encoded in the genome of A. digitifera (Table 1), many of which may interact to provide metabolic transport between the endoplasmic reticulum and Golgi apparatus [71]. Included in this vast but yet unexplored repertoire of vacuolar-acting proteins are the syntaxin-binding amisyn and tomosyn regulators of SNARE complex assembly and disassembly [72, 73], which may control membrane fusion in the phagocytic establishment and dis-sociation of coral symbiosis.

Table 1 Regulatory proteins of symbiosis in the predicted proteome of A. digitifera

KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

Abstract

Background

Description

Conclusions

Similar content being viewed by others

De novo transcriptome assembly for four species of crustose coralline algae and analysis of unique orthologous genes

Comparative analysis of the genomes of Stylophora pistillata and Acropora digitifera provides evidence for extensive differences between species of corals

Comparative analysis of the Pocillopora damicornis genome highlights role of immune system in coral evolution

Background

Construction and content

Utility

Discussion

Regulatory proteins of symbiosis

Planula and early developmental proteins

Neural messengers, receptors and sensory proteins

Calcification and Ca2+-signalling proteins

Plant-derived proteins

Proteins of nitrogen metabolism

DNA repair

Stress response proteins

Antioxidant and redox-protective proteins

Proteins of cellular apoptosis

Microbial symbiosis and pathogenicity

Proteins of viral pathogenicity

Toxins and venom

Detoxification proteins of the chemical defensome

Epigenetic and DNA-remodelling proteins

Conclusions

Availability and requirements

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

12864_2013_5245_MOESM1_ESM.docx

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Calcification and Ca²⁺-signalling proteins