Genomic and microarray approaches to coral reef conservation biology
New technologies based on DNA microarrays and comparative genomics hold great promise for providing the background biological information necessary for effective coral reef conservation and management. Microarray analysis has been used in a wide range of applications across the biological sciences, most frequently to examine simultaneous changes in the expression of large numbers of genes in response to experimental manipulation or environmental variation. Other applications of microarray methods include the assessment of divergence in gene sequences between species and the identification of fast-evolving genes. Arrays are presently available for only a limited range of species, but with appropriate controls they can be used for related species, thus avoiding the considerable costs associated with development of a system de novo. Arrays are in use or preparation to study stress responses, early development, and symbiosis in Acropora and Montastraea. Ongoing projects on several corals are making available large numbers of expressed gene sequences, enabling the identification of candidate genes for studies on gamete specificity, allorecognition and symbiont interactions. Over the next few years, microarray and comparative genomic approaches are likely to assume increasingly important and widespread use to study many aspects of the biology of coral reef organisms. Application of these genomic approaches to enhance our understanding of genetic and physiological correlates during stress, environmental disturbance and disease bears direct relevance to the conservation of coral reef ecosystems.
KeywordsMicroarray Genomics EST WGS Acropora Fast evolving genes
We are entering an era when genomic, or “systems biology”, approaches can be applied to study the biology of coral reef organisms. Genomic approaches differ significantly in scale from other types of molecular analyses; these are large-scale methods that allow the investigation of a substantial fraction of, or sometimes the entire, transcriptome or genome. The scale of systems biology, dealing as it does in thousands or tens of thousands of genes, introduces requirements for automation, the deployment of robotics and computational analyses of large datasets.
One outstanding advantage of genome-scale approaches is that these free us from some of the limitations imposed by candidate gene investigations; one no longer needs to propose simple hypotheses to test. Effectively, hypotheses can be broadened from something like “Are genes encoding homologs of metazoan heat shock proteins induced in corals during a coral bleaching event?” to “What changes in gene expression occur during a coral bleaching event?”. This shift to less constrained hypotheses and the exploratory nature of genomic approaches are likely to be particularly important in the case of corals, given the complex nature of their genomes and the presence of a substantial number of “non-metazoan” genes (Technau et al. 2005).
Sequencing a whole genome is an expensive undertaking (e.g., the human genome project as a whole cost an estimated $US3 billion) and inevitably this leads to questions about the ‘value’ or ‘value for money’ of such undertakings. What can we learn from having available to us one or more genomes from our favourite organisms? Or perhaps the question should be, “What can we learn from a genome above and beyond that which could come from more ‘targeted’ approaches?” In many cases, the more directed approach of sequencing large numbers of ESTs (expressed sequence tags; complete or incomplete sequences of cDNA clones representing expressed genes) is a cost-effective alternative (a “poor man’s genome project”; Rudd 2003). It has often been argued that ESTs provide ‘what you need to know’—information about the ability of an organism to make specific proteins, but one major advantage of genome sequences is that they allow non-coding DNA and regulatory sequence to be investigated. Moreover, evolutionary inferences based on limited numbers of loci can be compromised by population history and linkage effects, whereas whole genome data permit the identification of those areas of the genome that are or have been functionally important. Whole genome sequence data have also become enormously valuable resources for evolutionary biologists as well as those interested in the biology of particular lineages. Comparisons between closely related taxa at the whole genome level are a potentially powerful means of addressing the molecular mechanisms underlying the evolution of novel characteristics. For example, comparative genomic studies of primates permit investigations of the molecular changes associated with the expansion of the cerebral cortex in man (Popesco et al. 2006).
Although it has been predicted that the cost of sequencing individual human genomes may tumble to a few thousand dollars in the near future (e.g., Mardis 2006), at present the expense involved in sequencing entire genomes and printing microarrays of whole transcriptomes puts these technologies beyond the budget of single investigators or even moderate sized teams of investigators; and despite extensive lobbying, there is no immediate prospect of a complete coral genome sequence. However, whole genome sequence data for related animals (see below) can provide powerful comparative data. For example, comparison of existing coral EST data with the complete genome sequence for the sea anemone Nematostella vectensis should permit insights into the molecular bases of coral-specific biological processes, such as the deposition of the aragonite skeleton. EST collections can also be used to provide estimates of divergence rates and for the identification of fast-evolving genes, as outlined below.
Available resources—genome sequence data and EST collections
It is often said that we live in the “post-genomic” era, but the reality is that whole genome sequences are only available for a limited number of animal species, the selection being strongly biased towards vertebrates. Nevertheless, the sequence databases are still enjoying massive expansion, and comparison of coral genes with their homologs in a diversity of well studied species can provide insights into structure/function relationships that would not be available by comparison with other, less studied species. One purpose of this review is to summarise the available resources that are of direct relevance to those of us with interests in corals and other reef organisms.
Whilst no coral genome has been completely sequenced, pilot sequencing was carried out in 2005 on three species—Acropora millepora (25,152 whole genome shotgun reads), Acropora palmata (11,024 whole genome shotgun reads) and Porites lobata (13,824 whole genome shotgun reads). Our best guess as to the size of the A. millepora genome is 200 Mb, but we stress that this is a very rough estimate based on limited data. These pilot data have demonstrated the feasibility of sequencing an entire coral genome and are in the trace archive at NCBI (http://www.ncbi.nlm.nih.gov/Traces/). There are three substantial sets of coral ESTs presently available in Genbank. The species from which these ESTs were made and the numbers of ESTs as of mid-October 2006 are: A. millepora (10,247), A. palmata (4,017) and Montastraea faveolata (2,156) (see Schwarz et al. 2006, for background on the latter two collections). One important difference between these datasets is that the A. millepora ESTs specifically exclude the dinoflagellate symbionts, whereas for the latter two species no attempts were made to resolve the two partners (although it appears that symbiont representation is very low). The EST collections for A. palmata and M. faveolata are likely to be significantly expanded in the near future, with Department of Energy support via the Joint Genome Institute (see http://www.jgi.doe.gov/sequencing/why/CSP2007/coralalgal.html). The A. millepora ESTs that are presently publicly available represent three developmental stages, the pre-gastrulation “prawn chip” (3,285 ESTs), the post-gastrulation motile planula (3,754 ESTs), and the immediate post-settlement “crown” stage (3,208 ESTs) (see Ball et al. 2002; for descriptions of these stages). These ESTs yielded 6,021 predicted genes and 5,063 predicted peptides (Technau et al. 2005, Table S1). On the basis of these ESTs and a collection of 16,598 ESTs from the sea anemone Nematostella it was estimated that the gene number for Acropora would be 20–25,000 (Technau et al. 2005). Two additional sets of A. millepora ESTs, from bleached adult colonies (4,838 ESTs), and eggs (1,088 ESTs), have been sequenced but are not yet public. The Marine Genomics Project (http://www.marinegenomics.org/) is home for two additional sets of coral ESTs from Montastraea annularis (5,066 ESTs) and Porites porites (247 ESTs). These coral resources are complemented by EST projects for Symbiodinium spp., the dinoflagellate symbionts of corals. Strains from A. palmata and M. faveolata are likely to be extensively represented in the near future (http://www.jgi.doe.gov/sequencing/why/CSP2007/coralalgal.html), and 5,161 ESTs from the clade C3 Symbiodinium (from Acropora aspera) have been sequenced (W Leggat, D Yellowlees personal communication) but are not yet public.
Extensive genomic and EST datasets are available for two other cnidarians, and these are enormously useful comparative resources for coral biology. The sea anemone, Nematostella vectensis, was selected as the first representative cnidarian for whole genome sequencing (WGS); the genome has been sequenced to >7.8-fold coverage and assembly version 1 is available via the JGI Nematostella pages (http://genome.jgi-psf.org/Nemve1/Nemve1.home.html). Various sequence comparison algorithms such as BLAST, the Basic Local Alignment Search Tool (Altschul et al. 1997), which allows comparison of a given nucleotide or peptide sequence to appropriate databases, can be used to search both the genome, and large predicted protein and EST collections for Nematostella via the Stellabase site (www.stellabase.org). Somewhat more distantly related to corals, but also a useful comparator for many purposes, Hydra magnipapillata was the second cnidarian targeted for WGS; its genome is much larger than that of Nematostella, but has presently been determined to approximately 6-fold (R. Steele personal communication). The J. Craig Venter Institute is conducting the genome sequencing, and hosts the latest data summaries on its website (https://research.venterinstitute.org). More than 150,000 ESTs have been determined for this species (H. Bode personal communication), and are searchable via Hydrabase (www.hydrabase.org).
Although there are currently no whole genome sequencing projects underway for coral reef fishes, other teleosts have been completely sequenced, e.g., the zebrafish Danio rerio, the pufferfishes Takifugu rubripes and Tetraodon nigroviridis, the medaka Oryzias latipes and the stickleback Gasterosteus aculeatus. These genome sequence data can be accessed via the Ensembl Genome Browser at http://www.ensembl.org/index.html. More detailed information regarding individual fish genomes can be obtained from the respective research communities, for example, the Zebrafish Information Network at http://zfin.org/. There are also significant sequencing efforts for Atlantic salmon Salmo salar, rainbow trout Oncorhynchis mykiss, and some cichlid species, e.g., Oreochromis niloticus. These teleost sequences and ESTs can be accessed via databases at NCBI (http://www.ncbi.nlm.nih.gov/) and provide useful resources for comparative genomic studies. Researchers interested in coral reef fish biology thus have a wealth of genomic resources available for comparative studies.
Next we turn to a technique that allows the visualisation of the simultaneous expression of large numbers of genes, the microarray. There are a number of excellent sources of information on microarrays. Among the most comprehensive are Bowtell and Sambrook (2003) and Kimmel and Oliver (2006a, b). Here we provide a brief description of the principles underlying microarray analysis and some considerations for planning a microarray experiment.
For non-commercial marine organisms, at least for the near future, most microarrays will consist of an array of spots of complementary DNA, made on an RNA template originating from the expressed genes, on a glass slide. The identity of the cDNAs spotted onto the array may be known or unknown and they may be redundant to varying degrees. Therefore the first step in the development of a new cDNA microarray is the isolation and cloning of a cDNA library, which can then be PCR amplified and spotted onto microarray slides.
Microarrays offer the possibility of identifying genes not previously known to be involved in a process, e.g., the response to a potential stressor such as heat. Previously unknown interactions between genes or gene products may also become apparent. Genes which share an expression pattern are frequently co-regulated, with the implication that they are operating in the same molecular pathway, leading to a co-ordinated response. This has been shown to be the case for genes involved in several developmental and metabolic pathways such as cell cycle genes (Cho et al. 1998; Iyer et al. 1999) and yeast mitochondrial genes (Eisen et al. 1998). There are various software packages available for taking the extremely large amount of data generated by a microarray experiment, and integrating it into a more easily interpretable representation. For example Cluster 3.0 (Eisen et al. 1998) allows hierarchical clustering of gene expression profiles and is available as free open source software.
The design of a microarray experiment should be carefully considered in order to do the most informative experiment possible with the (sometimes limited) amount of biological material available. This involves determining how many biological replicates and technical replicates are needed, which samples should be labeled with which fluorophore and which samples should be directly compared on the same slide (Churchill 2002; Yang and Speed 2002; Bowtell and Sambrook 2003). This is particularly pertinent when doing microarrays with material from small animals, embryonic stages or larvae, as material from which to extract RNA can be in short supply and sometimes yields insufficient RNA to do an experiment without amplification. One solution to this problem is the use of commercially available RNA amplification kits. These kits synthesise large numbers of transcripts in a linear, rather than an exponential amplification, and ideally the transcripts will be represented in the same relative amounts as they were in the original sample. This may not always be the case with PCR (exponentially) amplified samples. The potential problem that some transcripts may be preferentially transcribed, skewing their relative abundance, remains, but comparisons between microarray experiments done with amplified and unamplified mRNA suggest that amplified mRNA can give accurate results (Stoyanova et al. 2004; Zhu et al. 2006). Adult hard coral material poses less of a problem in terms of quantities available than embryonic material, but the relatively small amount of living tissue per sample means that even large-scale extractions can sometimes yield little RNA. The situation is further complicated by the presence of dinoflagellate symbionts (zooxanthellae) in at least some life stages of most species. As more sequence data for both corals and symbionts become available this will be less of a consideration, but at present knowing the source of the genes that you are probing is of critical importance for interpretation of results.
It is particularly important to involve a statistician at the planning and experimental design stage of microarray experiments and to understand possible sources of variability. Microarray experiments are expensive and typically substantial time is spent performing and analysing the experiment, so it is very important to design well. Sometimes the statistically “best” experiment may be either practically or financially impossible, and then it becomes a matter of doing the best one can with the available time and resources. The topic of experimental design is too broad to be covered here, but excellent discussions can be found in Yang and Speed (2002, 2003).
The biological material used for microarray analysis also warrants some consideration. The most reliable results will be obtained using “fresh” tissue samples that have been transferred into liquid nitrogen immediately upon collection. Where this is impossible RNAlater and similar products may be suitable. RNAlater has been used to successfully preserve total RNA from A. millepora embryos, however, problems may arise in using RNAlater to preserve adult coral tissue due to the relatively impermeable nature of the coral skeleton, although this has not been systematically tested. Direct comparisons of yields from different RNA preservation methods do not appear to be available. Once RNA is extracted from the sample it is used as a template for reverse transcription into cDNA. During the reverse transcription reaction the sample is fluorescently labelled with either red or green fluorescent dye. Fluorescently labelled cDNA samples are then competitively hybridised to microarray slides. For each gene represented on the microarray, the relative transcript abundance in both samples is inferred from the red and green signal intensities of hybridised transcripts. The brightness of all of the spots is measured by scanning the slides with lasers. A number of different types of scanners and software packages are available for converting the raw signal intensity data into relative transcript abundance. The statistical analysis of extracted cDNA microarray data includes three main steps; normalisation, transformation and testing for differential expression. Various statistical techniques for analysing microarrays are represented by an abundance of software packages available for purchase or for free on the web (www.bioconductor.org). A number of reviews and books have been published on the various packages available for microarray analysis (Kimmel and Oliver 2006b; Gentleman et al. 2005). However, the statistical analysis of microarray data is an active area of research. Hence it is prudent to employ the help of a statistician (Allison et al. 2006).
Once statistical support for differential expression of a subset of genes has been achieved, any further analysis depends on how much information is available for these. Where the identity of the cDNA spots represented on the microarray is known, hierarchical clustering of gene responses (mentioned above) can identify the types of genes that are regulated in response to a certain treatment. The more that is known about the cDNA spots on the array, the more accurate and complete will be the testing of hypotheses and drawing of conclusions concerning possible inter-relationships between the genes represented on the microarray.
Model organisms and beyond—strategies for applying microarray technology to non-model species
Developing a microarray for a new species is usually time-consuming and expensive. It may be possible to avoid this process by using microarrays developed for a closely related species in heterologous array experiments. Human cDNA microarrays, for example, have been used to study bovine, pig and salmon gene regulation (Medhora et al. 2002; Tsoi et al. 2003; Adjaye et al. 2004). Oligonucleotide microarrays can also be successfully applied in heterologous microarray experiments (Ji et al. 2004; Kassahn et al. in press). Oligonucleotide microarrays may consist of either short (25 bp, e.g. Affymetrix GeneChip® arrays) or long (50–65 bp) oligomers deposited on the microarray. The success of such experiments depends on the ability of the heterologous DNA species to bind to the microarray and the level of sequence similarity between the two taxa. In general, microarray analysis is robust to some level of sequence divergence and genes with less than 25% sequence divergence have been shown to produce significant cross-hybridisation on long (50mer) oligonucleotide microarrays (Kane et al. 2000). With increasing sequence divergence, the ability to accurately measure gene regulation decreases due to poor cross-hybridisation (Renn et al. 2004). However, the level of sequence divergence differs across genes with some genes diverging more rapidly than others (Makalowski et al. 1996). Therefore, heterologous microarrays may be more successfully employed to measure gene expression responses at conserved gene loci, while genes that are rapidly diverging may fail to cross-hybridise. Furthermore, an important distinction has to be made between the use of heterologous microarray experiments to measure changes in gene expression within a species, e.g., comparison of two different treatment types within the same species, and the use of a single microarray to compare expression levels between two species. In the latter case, true biological differences in expression level between two species may be confounded with poor cross-hybridisation due to sequence divergence between the two species (Gilad et al. 2005). In contrast, when using heterologous microarrays for within-species comparisons, the potential effects of sequence divergence on cross-hybridisation are accounted for as they affect both treatment types similarly.
Inter-species genomic hybridisation experiments can be used to identify which genes are sufficiently conserved to significantly cross-hybridise in heterologous microarray experiments (Kassahn et al. in press). For this purpose, genomic DNA is extracted from the species of interest and the species for which the microarray was designed and both gDNA samples are competitively hybridised to the microarray. Genes which share significant sequence similarity between the two species will produce equivalent signal intensities, while genes with significant sequence divergence will show reduced signal intensities in the heterologous species. In general, cross-hybridisation on microarrays is positively correlated with sequence similarity (Wu et al. 2001; Hinchliffe et al. 2003; Brunelle et al. 2004). Hence, relative signal intensities can be used to infer the level of sequence similarity for each gene on the microarray and several approaches for analysis are available (Kim et al. 2002; Le Quere et al. 2006; Kassahn et al. in press). Using this technique it is possible to eliminate from consideration those genes that may not cross-hybridise in heterologous experiments. Differences in gene copy number can also potentially confound the results of such experiments (Pollack et al. 1999; Hinchliffe et al. 2003), but with increasing phylogenetic distance such differences will play a minor role (Brunelle et al. 2004). Researchers using cDNA microarrays may also need to account for differences arising from the different structure of genomic DNA and cDNA. Because genomic DNA includes introns and non-coding regions, spots on the microarray with sequences spanning an exon-intron boundary in the genomic DNA sample may not bind genomic DNA, but would successfully bind cDNA during gene expression analysis. In these instances, fluorescently labelled cDNA derived from the species of interest and the species for which the microarray was designed may be competitively hybridised to the cDNA microarray and provide information about the potential for cross-hybridisation (Renn et al. 2004). However, true biological differences in gene expression, gene copy number differences, and gene dosage effects can confound the results of competitive cDNA hybridisation experiments across species. When using cDNA microarrays, the best strategy for selecting those genes that can be successfully measured across species would thus consist of a combination of comparative genomic DNA and cDNA hybridisations.
Because of their high throughput and ability to screen thousands of gene transcripts at once, microarrays are an attractive tool for environmental genomic studies. Here, we have discussed strategies that can be applied in situations where there are no commercial microarrays available. In such cases, researchers can either develop a custom cDNA microarray or use a microarray developed for a close relative. We suggest that inter-species hybridisation experiments using genomic DNA and cDNA can be used to identify which genes show loss of signal in heterologous microarray experiments. Such experiments can thus guide the choice of which heterologous microarray is most suitable for a particular application. Heterologous microarray experiments are particularly suitable for gene discovery studies where the aim is to identify a small number of candidate genes with the most interesting gene regulation. In this context, loss of signal for some genes on the microarray would not compromise the aims of the study. Once a selection of candidate genes has been identified by means of heterologous microarray experiments, differential expression can be validated using an alternative method, such as a northern or virtual northern blot or quantitative real-time PCR.
The significance of fast evolving genes
One of the major goals of evolutionary biology is to understand the processes underlying the evolution of species and populations, and these have direct implications for conservation genetics. Genes with high rates of evolutionary change have been implicated in the process of speciation and species diversification (Turner et al. 2005; Harr 2006). Furthermore, high rates of evolutionary change may indicate strong directional selection and molecular adaptation, especially where gene function can be directly related to the ecological context of the species (Tautz and Schmid 1998; Lecompte et al. 2001; Matzkin 2005; Fairhead and Dujon 2006). The study of fast evolving genes may therefore allow the identification of unique adaptations of the organism to its ecological niche, and pinpointing sources of stress when the ecosystem is perturbed.
Fast evolving genes are typically found in systems where equilibrium is never reached, such as the arms race between host and pathogen. In a wide variety of animals, genes in this category include those involved in immunity, in reproduction (especially in sperm competition and species recognition), and in various environmental interactions. In addition, there are cases of individual genes and classes of genes that show positive selection in specific lineages (e.g., Castillo-Davis et al. 2004; Yu et al. in press), and this is often associated with lineage-specific expansions of gene families. For example, many members of the DUF1220 gene family show hallmarks of positive selection in primates (Popesco et al. 2006); these genes have not been functionally characterised, but are predominantly expressed in brain regions associated with higher cognitive functions.
Amongst the numerous cases of rapidly evolving reproductive proteins (see Swanson and Vacquier 2002 for a review), one of the best-characterised examples is the ‘co-evolutionary chase’ between abalone sperm lysin and the vitelline envelope lysin receptor VERL. External reproduction via broadcast gametes is common amongst marine animals, necessitating gamete-specificity systems, and such systems appear to have evolved independently in various animal lineages. Abalone lysin and VERL appear to be a mollusc-specific solution to this problem. Lysin is a small (16 kD) and highly basic protein that is a major constituent of the abalone acrosomal vesicle. Lysin has the effect of solubilizing the vitelline envelope that is on the outside of abalone eggs by non-enzymatic means, but its ability to do so is species-specific; lysin effectively creates a hole in the vitelline envelope that allows the tip of the sperm acrosomal process to fuse with the plasma membrane of the egg (reviewed in Kresge et al. 2001). The lysin protein is clearly under positive selection, and shows hallmarks of this process throughout the entire sequence, however VERL has a more complex evolutionary history; it has a repeat structure that evolves by concerted evolution in a neutral manner (Swanson et al. 2001), whereas the N-terminal end of the molecule is under positive selection (Galindo et al. 2003). Other animal groups appear to have solved the gamete-specificity problem in similar ways but using unrelated proteins; for example, the bindin protein and its receptor fulfil analogous functions in the sea urchin (Gao et al. 1986). For this reason it is unlikely that these specificity-conferring proteins have strict homologs in other animals.
Many genes with roles in immunity and allorecognition are also known to be subject to positive selection, the best-characterised system being the vertebrate major histocompatibility complex (MHC). Genes encoding other components of the immune repertoire, such as antimicrobial peptides, are also under moderate positive selection (Tennessen 2005). In populations of estuarine fish that experience high levels of pollution, genes of the major histocompatibility complex are under strong positive selection, significantly stronger than in populations living in less disturbed habitats (Cohen 2002). Olfactory and gustatory receptors are families of genes closely involved in the interaction with the environment. In various species, they have been found to evolve under positive selection (Thomas et al. 2005; Shi and Zhang 2006). Adaptation to physical environmental factors, such as cold temperatures or light can also leave their signature in the genome. For instance, antifreeze proteins have been shown to evolve by positive selection in fish and in beetles (Swanson and Aquadro 2002), and the adaptive evolution observed in the opsin gene of cichlid species has been hypothesised to be an adaptation to different photic environments (Spady et al. 2005). Finally, positive selection has been reported in genes acquiring new functions, such as the sodium channel gene of electric organs in electric fishes (Zakon et al. 2006).
Bioinformatic and microarray approaches to the identification of fast evolving genes
Most examples of genes under positive selection have been identified by selecting candidate loci a priori, but the availability of whole genome sequences and large EST collections enables unbiased approaches which are potentially much more informative. With the availability of a substantial number of large datasets for primates, this is a particularly active area, but similar approaches are feasible for corals, fish and other reef animals.
Adaptive evolution can be detected amongst either paralogous or orthologous genes when there is a selection pressure (positive selection) for the diversification of the amino acid sequences of the proteins for which they code. Most methods aimed at detecting positive selection amongst paralogs or orthologs are based on the comparison of the number of synonymous (non-amino acid changing) and non-synonymous mutations. When a sequence has an excess of non-synonymous mutations, it may be under positive selection pressure. A number of programs are available that enable the detection of positive selection in datasets of orthologs or paralogs; for reviews, see Yang and Bielawski (2000), Suzuki and Gojobori (2003) and Nielsen (2005). Several of these programs also allow the identification of individual sites under positive selection (e.g., Suzuki et al. 2001; Bielawski and Yang 2004; Massingham and Goldman 2005), which may provide insights into the relationship between protein structure and function. Positive selection can be detected at the level of a whole gene family but methods also exist to detect subfamilies that are under different selection pressures (Yang and Nielsen 2002; Zhang et al. 2005). However, as a caveat, considerable debate has occurred over the power and robustness of these tests for positive selection (e.g., Nei 2005; Nunney and Schuenzel, 2006), therefore great care should be taken in the choice of test parameters and a combination of tests is commonly used.
When data are available on gene polymorphism within a species, population genetic techniques can be used for the detection of positively selected mutations. A large literature exists on these techniques, and an excellent review of the subject can be found in Nielsen (2005). Tests relying on population genetics data alone can be highly sensitive to assumptions about demographic factors, but combining comparative data with population genetic data can result in more robust tests, such as the McDonald-Kreitman test (McDonald and Kreitman 1991) or tests based on the ratio of non-synonymous to synonymous mutations (see above).
In cases where direct bioinformatic approaches are not possible, microarray-based methods provide an alternative approach for the identification of fast evolving genes. This approach may actually be more useful for many marine organisms for which little sequence data are available. This application of microarray technology follows the same principle as the inter-species genomic DNA hybridisation experiments outlined above. Two genomic DNA samples are competitively hybridised to the microarray. Genes that have significantly diverged between the two species and which show accelerated rates of sequence evolution can be identified on the basis of differences in hybridisation signal between the two species (Kim et al. 2002; Le Quere et al. 2006). Thus far, this application of microarray technology has mainly been used to study the comparative genomics of microbes (Murray et al. 2001). However, microarray approaches for the identification of fast evolving genes are feasible for any organism and would be particularly useful for non-model species and species for which bioinformatics approaches are not applicable because of a lack of genome sequence data.
Finally, it is worth noting that just as mutation of the coding sequence can be of adaptive significance, so can changes in expression profile. For example, northern and southern populations of the killifish Fundulus heteroclitus differ in the expression level of lactate dehydrogenase-B, resulting in important differences in physiological function between these populations (Powers and Schulte 1998). Microarrays have great potential in detecting the evolution of gene expression (Gilad et al. 2006).
Application to reef cnidarians
The application of systems biology approaches to study the biology of reef organisms is still in its infancy, but microarray methods have been applied to analyse stress responses in corals. For example Edge et al. (2005) used a small array containing 32 genes from M. faveolata to examine the effects of increased temperature, salinity and UV light and found differential gene expression patterns for each condition. Studies are presently under way to document changes in gene expression in A. millepora associated with normal development (Grasso et al. unpublished) as well as in response to heat (Seneca et al. unpublished) or sediment (Klueter et al. unpublished) stress using Acropora millepora microarrays containing 13,000 or 17,000 ESTs. In addition, microarray studies are beginning on M. faveolata and A. palmata in order to study changes in gene expression associated with the initiation, maintenance and breakdown of the symbiosis of these corals with their zooxanthellate symbionts (Schwarz et al. 2006).
The next few years are likely to see much broader application of genomic and microarray approaches to study coral reef biology, and these approaches should considerably advance our understanding of how reefs function at levels from the individual to the ecosystem. Studies of the number of genes in individual gene families, using a whole genome or large EST libraries, have already revealed that cnidarian genomes encode a rich assortment of genes (Kortschak et al. 2003; Technau et al. 2005). The availability of the Nematostella and Hydra whole genome sequences will enable the systematic investigation of gene family expansions and reductions in these species. It is also anticipated that lineage-specific genes will be discovered at many levels within the Cnidaria, and that their identification will shed light on the genomic bases of the unique biology of these animals. We call attention to the fact that, while whole genome comparisons are necessary to be confident about the extent of reduction of a gene family, EST data may be sufficient to demonstrate significant expansions. Similarly, selection pressures acting on a group of paralogs can be estimated from EST data alone. The study of orthologs may, however, require whole genome sequences to reduce the possibility of confusion arising from paralogous relationships.
The availability of EST collections for the congeneric coral species A. palmata and A. millepora offers the opportunity to compare these data and identify individual genes under positive selection pressure. An unbiased application of this approach holds great promise for coral biology at every level, as it can deliver candidate genes for roles not only in generic processes such as gamete specificity and allorecognition, but also for coral-specific processes such as symbiont recognition and determination of colony morphology. Although significant progress can be made using existing EST resources, for comprehensive analyses a full genome sequence for a coral is an urgent requirement. In the face of ever-increasing anthropogenic pressures, systems biology approaches are likely to play increasingly important roles in coral reef management and conservation.
Relevance to reef conservation
Coral bleaching is a stress response with many possible causes including extremes of heat or cold, high irradiance, prolonged low light levels or darkness, exposure to heavy metals, herbicides, and pathogenic microbes (reviewed by Douglas 2003). At present, in the absence of an obvious cause, there is no way for marine conservation authorities to determine the cause and take remedial action. However, it seems quite likely that different molecules are upregulated in response to different stresses. Microarray studies are the most effective way of establishing this and identifying the relevant molecules for each stress. Once this has been accomplished it is quite likely that simple and cheap diagnostic tests can be devised to allow marine managers to identify unknown stressors. Alterations in gene expression are likely part of the early response to stress and thus hold great promise as a sensitive tool to detect organismal stress. Similarly for the case of diseases, it may be possible to identify specific molecules that are upregulated in different diseases and use this information either in conjunction with or in place of culture techniques to identify the causative organism.
Another way in which microarrays may be useful is in aiding our understanding of the settlement and metamorphosis of coral planulae, a critical process in reef recolonisation. At present we know that extracts of coralline algae (Heyward and Negri 1999; Morse et al.1988), a GLWamide neuropeptide (Iwao et al. 2002) and high concentrations of various ions such as lithium (Mueller and Leitz 2002) are effective in inducing settlement. However, it might be possible to devise even better and more effective stimuli if we understood the stimulus-response pathways involved in this process at a biochemical level. By sampling larvae at various times after induction with the above substances it may be possible to work out the molecules involved in the transduction-response pathway by looking for upregulated genes.
Once this process is understood it should be possible to devise more directed ways of manipulating the system. This would open the way to artificially recolonising areas by moving larvae into areas devastated by storms or anthropogenic disturbances and artificially settling them. It is known that not all coral species respond to the same settlement stimuli (Morse et al. 1988) thus facilitating selectively settling larvae in areas where they will grow best.
Although direct applications of ESTs and microarrays in conservation are still fairly limited, their applicability will grow with time. To quote Feder and Mitchell-Olds (2003) “genomics has enabled EEFG (evolutionary and ecological functional genomics), rather than initiating or shifting its paradigm, and this is essential for any field that attempts the challenging task of integrating genes, function, ecology and evolution in its research programmes.” Once genomics has permeated these areas, on which conservation policies are based, its use will almost certainly flow on to managers in the field.
We gratefully acknowledge the contributions of various members of our laboratories and external collaborators, and the support of the Australian Research Council (ARC) both directly to DJM and EEB (Grants A00105431, DP0209460 and DP0344483) and via the Centre for the Molecular Genetics of Development and the Centre of Excellence for Coral Reef Studies.
- Ball EE, Hayward DC, Reece-Hoyes JS, Hislop NR, Samuel G, Saint R, Harrison P L, Miller DJ (2002) Coral development: from classical embryology to molecular control. Int J Dev Biol 46:671–678Google Scholar
- Bowtell D, Sambrook J (2003) DNA microarrays: a molecular cloning manual. Cold Spring Harbor Laboratory Press, Cold Spring HarborGoogle Scholar
- Cohen S (2002) Strong positive selection and habitat-specific amino acid substitution patterns in MHC from an estuarine fish under intense pollution stress. Mol Biol Evol 19:1870–1880Google Scholar
- Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (2005) (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, HeidelbergGoogle Scholar
- Iwao K, Fujisawa T, Hatta M (2002) A cnidarian neuropeptide of the GLWamide family induces metamorphosis of reef-building corals in the genus Acropora. Coral Reefs 21:127–129Google Scholar
- Kassahn KS, Caley MJ, Ward AC, Connolly AR, Stone G, Crozier RH (in press) Heterologous microarray experiments used to identify the early gene response to heat stress in a coral reef fish. Mol EcolGoogle Scholar
- Kim C, Joyce E, Chan K, Falkow S (2002) Improved analytical methods for microarray-based genome-composition analysis. Genome Biol 3: research0065Google Scholar
- Kimmel AR, Oliver B (2006a) DNA Microarrays Part A: Array Platforms and Wet-Bench Protocols. Methods Enzymol, vol 410Google Scholar
- Kimmel AR, Oliver B (2006b) DNA Microarrays Part B: Databases and Statistics. Methods Enzymol, vol 411Google Scholar
- Krasnov A, Koskinen H, Pehkonen P, Rexroad CE, Afanasyev S, Molsa H (2005) Gene expression in the brain and kidney of rainbow trout in response to handling stress. BMC Genomics 6:3. doi:10.1186/1471-2164-6-3Google Scholar
- Mardis ER (2006) Anticipating the $1,000 genome. Genome Biol 7:112. doi:10.1186/gb-2006-7-7-112Google Scholar
- Medhora M, Bousamra M, Zhu DL, Somberg L, Jacobs ER (2002) Upregulation of collagens detected by gene array in a model of flow-induced pulmonary vascular remodeling. Am J Physiol Heart Circ Physiol 282:H414–H422Google Scholar
- Renn SC, Aubin-Horth N, Hofmann HA (2004) Biologically meaningful expression profiling across species using heterologous hybridization to a cDNA microarray. BMC Genomics 5:42. doi:10.1186/1471-2164-5-42Google Scholar
- Schwarz J, Brokstein P, Manohar C, Coffroth MA, Szmant A, Medina M (2006) Coral Reef Genomics: Developing tools for the functional genomics of coral symbiosis. Proc 10th Int Coral Reef Symp 274–281Google Scholar
- Suzuki Y, Gojobori T (2003) Analysis of coding sequence. In: Salemi M, Vandamme A-M (eds) The phylogenetic handbook. Cambridge University Press, Cambridge, pp 283–311Google Scholar
- Swanson WJ, Aquadro CF (2002) Positive darwinian selection promotes heterogeneity among members of the antifreeze protein multigene family. J Mol Evol 54:403–410Google Scholar
- Yang Z, Nielsen R (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19:908–917Google Scholar
- Yang YH, Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3:579–588Google Scholar
- Yang YH, Speed T (2003) Design of microarray expression experiments. In: Bowtell D, Sambrook J (eds) DNA microarrays: a molecular cloning manual. Cold Spring Harbor Laboratory Press, New York, pp 513–525Google Scholar
- Yu X-J, Zheng H-K, Wang J, Wang W, Su B (in press) Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics. doi:10.1016/j.ygeno.2006.05.008Google Scholar