Metagenome-based analysis: A promising direction for plankton ecological studies

The plankton community plays an especially important role in the functioning of aquatic ecosystems and also in biogeochemical cycles. Since the beginning of marine research expeditions in the 1870s, an enormous number of planktonic organisms have been described and studied. Plankton investigation has become one of the most important areas of aquatic ecological study, as well as a crucial component of aquatic environmental evaluation. Nonetheless, traditional investigations have mainly focused on morphospecies composition, abundances and dynamics, which primarily depend on morphological identification and counting under microscopes. However, for many species/groups, with few readily observable characteristics, morphological identification and counting have historically been a difficult task. Over the past decades, microbiologists have endeavored to apply and extend molecular techniques to address questions in microbial ecology. These culture-independent studies have generated new insights into microbial ecology. One such strategy, metagenome-based analysis, has also proved to be a powerful tool for plankton research. This mini-review presents a brief history of plankton research using morphological and metagenome-based approaches and the potential applications and further directions of metagenomic analyses in plankton ecological studies are discussed. The use of metagenome-based approaches for plankton ecological study in aquatic ecosystems is encouraged.

Plankton, the key structural and functional component of aquatic ecosystems, is generally considered to be composed of both producers (phytoplankton) and consumers (zooplankton), but also includes bacterial decomposers (bacterioplankton) [1]. Therefore, it plays important roles in aquatic biogeochemical cycles [2]. Plankton investigation has been a major area of ecological study since the very start of marine research expeditions (e.g., Challenger Expedition, 1872-1876; German Plankton Expedition, 1889). The increasing awareness of their roles in the ecosystem has led to a rapid growth in descriptive, experimental and theo-retical studies concerning plankton composition, abundance, biomass and activities. However, traditional plankton ecological studies have largely depended upon morphological methods and morphological identification has historically been a difficult task because of the small size of organisms and the paucity of readily observable characteristics of taxonomic value [3]. Therefore, independent investigations may get different results from the same samples; sometimes with significant discrepancy. In addition, different groups of phytoplankton and zooplankton have always been investigated separately and bacterioplankton was often excluded from morphological investigations because of the difficulties in identification. These factors and others have made it difficult to reach higher-order ecological conclusions about general plankton trends and patterns.
Morphological approaches can only provide a few clues for identification and related functional studies of most microbes. The need for nontraditional techniques to reveal the microbial world was highlighted in 1986 [4]. One year earlier Pace et al. [5] had proposed the idea of cloning DNA directly from environmental samples and the first such clone library was reported in 1991 [6]. Since then, natural microorganisms collected from different environments have been investigated at the DNA level and progress has been made in microbial ecology and environmental microbiology [7]. In 1998, a professional term 'metagenomics' was coined [8] to describe natural habitat based microbial investigations, which provided powerful tools to extract valuable biological information quickly and effectively from complex environments [7,9]. For plankton research, Giovannoni et al. [10] were the first to phylogenetically analyze clone libraries of 16S rRNA genes from a natural bacterioplankton community. To date, large numbers of metagenome-based studies on bacterioplankton and more recently on eukaryotic picoplankton have been undertaken (e.g., [11][12][13][14]). Targeting the plankton community as a system, Yan et al. [15][16][17][18] and Yu et al. [19] currently applied fingerprinting techniques to investigate the relationships between DNA polymorphisms and species composition of plankton communities in different habitats and environments (i.e., lakes, rivers, reservoirs, ponds, enclosures, and artificial systems). As community composition is largely dictated by environmental factors, it has been a natural extension of this relationship to use plankton community fingerprints as an indication of environmental conditions [17][18][19][20][21]. In contrast to traditional morphological analyses in ecological study, multiple samples can be analyzed simultaneously with standardized sets of metagenome-based procedures, which make it practical to perform comparative analyses aimed at elucidating general microbial ecological rules. Moreover, by comparing target genes' expression under different environments, community functions can be explored using function-based metagenomics.

A brief history of early plankton research
The term 'plankton' was coined by the famous plankton ecologist Victor Hensen in 1887, but plankton research can be traced back to when Antonie van Leeuwenhoek observed 'small infusoria' with his own microscope. Johannes Müller pioneered the systematic study of small planktonic organisms with hand-held nets in the 1840s [22]. The interests of plankton research initially focused on naming and classifying the species according to their morphological characteristics (i.e., morphospecies). After a period of taxonomic investigation, numerous planktonic species had been described and more and more new species were reported. During this period, interests in plankton research also extended to the relationships between community structure and the environment. This led directly to the ecological aspects of the field. Furthermore, Victor Hensen applied the quantitative methods to gauge the distribution, abundance and productivity of the microscopic plankton in the open sea [22].

Limitations of traditional plankton investigations
As the basis of biodiversity, species diversity was the first to be investigated and attracted the attention of many taxonomists and ecologists. A large number of species diversity indices have been proposed [23,24]. However, the mechanisms regulating diversity in most systems are not completely understood [25], and questions of how to determine the real diversity of plankton communities and to explore the potential ecological functions present tremendous challenges. These challenges not only include the need for extensive taxonomic identification, but also problems of how to compare and summarize related studies to generate higher-order ecological conclusions. Since the very start of plankton ecological investigation in the 1870s, plankton research has largely depended upon traditional morphological methods. However, morphological classification of planktonic organisms has historically been a difficult task (even for a seasoned taxonomist) because of the lack of distinguishing features, especially for the multitudes of minute, nondescript organisms. Furthermore, planktonic bacteria and some small phytoplankton cannot be classified to the species level or even to the genus level using morphological methods alone. Therefore, it is not so much a lack of ideas but inadequate methodologies and instruments that have limited progress in understanding of the diversity of plankton. Despite more than 100 years for plankton research, our knowledge of plankton diversity and its roles in the natural environment has only increased modestly. Therefore, more powerful techniques are urgently needed to reveal their diversity and explore their ecological functions in ecosystems. Fortunately, recent technical developments in molecular biology have found extensive applications in the field of studying natural community ecology [26][27][28].

Metagenome-based plankton study 2.1 Metagenome
The term 'metagenome' was first introduced to describe the combined genomes of natural microbes in the soil environment [8]. Since then, microbial communities from diverse niches (including oceans, lakes, soils, thermal vents, hot springs, the mouth and gastrointestinal tract) have been involved in metagenome-based studies over a very short pe-riod of time [9,29]. Now, the term metagenome has been extended to refer to all of the genetic material recovered directly from environmental samples, which can be analyzed in a way analogous to the study of a single genome [7,9]. The genomic study of single organisms had extended our understanding from single genes to the collective genes of an organism, whereas metagenomics goes beyond the genome of single organisms and enables us to study genomes of the community [30]. More importantly, metagenomics transcends the limitations of classical genomics and microbiology and provides valuable strategies to study the relationships among genes, genomes, organisms and communities present in the natural world [7]. Since the term 'metagenome' was first published in 1998 [8], the number of annually published SCI publications in this field has increased exponentially and reached a level of approximately 400 in 2010 (according to searches using the online version of the ISI Web of Science).

Metagenomics and its applications in microbial ecology
Metagenomics is the study of genomic material collected directly from natural environmental samples. It is also termed as environmental genomics, ecogenomics and community genomics. However, metagenomics is most commonly used in describing this emerging field of study. Although the term 'metagenomics' was coined in 1998 [8], the concept that organisms could be identified without cultivation by retrieving and sequencing their DNA directly from natural samples dates much earlier (e.g., [4,5]). Another breakthrough was the amplification of 16S rRNA genes from natural picoplankton and the use of amplified environmental genes to perform phylogenetic analysis [10].
All metagenomic studies begin with community DNA extraction from diverse members present in a particular environment. Metagenome information is then analyzed directly or cloned prior to investigations of structure, function and dynamics of natural microbial communities (Figure 1). Over the past two decades, metagenomic studies have provided valuable information in a number of fields including microbiology, medicine, energy, biotechnology, agriculture, environmental remediation and gut microbial ecology. In fact, metagenomics has bridged the gap between genetics and ecology, demonstrating that the genes of single organisms are related to the genes of other species or even to the entire community [7]. It also offers a powerful lens for viewing the microbial world which in turn presents potential to revolutionize understanding of the entire living world.

Figure 1
Schematic showing the major process of metagenome-based study. All metagenome-based studies involve the extraction of genomic DNA directly from environmental samples. The temporal or spatial heterogeneity of a community can be effectively compared by analyzing the fingerprinting profiles. After the genes are sequenced and compared with identified sequences, the functions of these genes can be determined.

Community fingerprinting analysis
Community fingerprinting, a quick way of viewing the spatiotemporal patterns and succession of target microbial communities, has been widely applied to study communities from a variety of environments (e.g., oceans, lakes, reservoirs, rivers, artificial niches, soils, thermal vents, hot springs, the mouth and gastrointestinal tract) [29]. If comparable samples collected from different locations harbor similar communities in the same area, the fingerprinting patterns are theoretically expected to be similar. To date, the fingerprinting techniques commonly used in aquatic habitats include denaturing/temperature gradient gel electrophoresis (DGGE/TGGE), single strand conformation polymorphism (SSCP), randomly amplified polymorphic DNA (RAPD), terminal-restriction fragment length polymorphism (T-RFLP), amplified ribosomal DNA restriction analysis (ARDRA), ribosomal intergenic spacer analysis (RISA) and automated ribosomal intergenic spacer analysis (ARISA) (see [31] for details). These methods provide a rapid means for screening microbial communities from different environments or for comparing community dynamics at different spatial/temporal scales. Among all of the available fingerprinting methods, PCR-DGGE is one of the most commonly used approaches for screening environmental microbial communities [32], although it cannot resolve all parts of microbial community in complex communities.
Metagenome-based analysis in plankton research has become very common, although studies and publications are primarily focused on the pico-and nanoplankton [12][13][14]. PCR-DGGE fingerprinting patterns of bacterioplankton communities in lakes with different nutrient content and water color have been found to be strongly correlated with the biomass of microzooplankton, cryptophytes, and chrysophytes [33]. With PCR-DGGE fingerprinting, Lindström [34] also found that the introduction of allochthonous bacteria and their subsequent interaction with other planktonic organisms affected the composition of bacterioplankton. Although eukarya-specific primers have also been developed and applied in aquatic environmental studies (e.g., [11]), comparatively few studies have targeted eukaryotic plankton. Moreover, prokaryotic and eukaryotic groups have always been investigated separately.
Yu et al. [35] targeted the entire plankton community as a system (including both planktonic prokaryotes and eukaryotes) and explored the feasibility of applying DNA fingerprinting to community-level plankton studies. Furthermore, Yan et al. [15][16][17][18]20,21] applied different fingerprinting approaches to investigate relationships among the genetic diversity of plankton communities, species composition and environmental factors in aquatic environments of lakes, rivers, and reservoirs. Results indicated that the DNA fingerprints of target communities were generally correlated with species composition of plankton and their environments [15][16][17][18]20,21]. The concentration of TP was found to be the major factor in determining plankton communities derived from eutrophic environments [19]. PCR-DGGE fingerprinting was further applied to address plankton succession in simulated niches. Plankton succession was not significantly affected by cyanobacterial bloom removal using chitosan-modified local soils [18]. Additionally, specific plankton cyanobacterial groups were explored using PCR-DGGE fingerprinting to monitor succession in situ (Taihu Lake, unpublished data). Community fingerprinting analyses have significantly enhanced our ability to understand the non-cultured plankton world [11,12], and have made it possible to elucidate mechanisms of ecosystem functions with molecular insights.

Sequence-based and function-based analysis
In addition to using fingerprint patterns as indicators of the diversity and succession of target communities, fingerprinting methods have often been used in combination with selectively excised, sequenced bands of interest (e.g., [3,12]). However, the information provided by the excised bands may be insufficient when bands are short (e.g., less than 500 bp for DGGE/TGGE and SSCP). Therefore, genomic DNA recovered from environmental samples also needs to be directly cloned into an appropriate vector and transformed into a bacterial host, and a metagenomic library constructed. The genetic information held within the metagenomic library is then studied using sequence-based or function-based screening (Figure 1) to examine the characteristics and functions of the microbial community. In principle, the clones of a metagenomic library represent the genetic complement derived from the target community in the investigated habitat.
In sequence-based analysis, the clone library is sequenced and analyzed to obtain information regarding the structure and organization of the metagenome; sequence information is then compared with known DNA sequences deposited in public databases (e.g., GenBank, http://www. ncbi.nlm.nih.gov; EMBL, http://www.ebi.ac.uk/embl; DDBJ, http://www.ddbj.nig.ac.jp). This can involve the complete sequencing of clones containing phylogenetic anchors indicating taxonomic groups. Alternatively, random sequencing can also be conducted to identify the gene of interest [7]. By using this strategy, more and more new groups of bacteria/archaea have been identified [10,36]. Moreover, Stein et al. [37] isolated a large genome fragment from uncultured planktonic archaeon. After 1997, the culture-independent 16S rRNA gene sequences derived from environmental clones in GenBank began to exceed those from cultivated bacteria and archaea [38]. Since Giovannoni et al. [10] first phylogenetically analyzed 16S rRNA genes amplified from natural bacterioplankton, phylogenetic information directly derived from environmental samples has greatly enhanced our understanding of biodiversity in the biosphere. More recently, phylogenetic analysis of the amplified 18S rDNA, internal transcribed spacer (ITS) and other regions mimicking the 16S rDNA analysis in prokaryotes, have indicated high diversity of small planktonic eukaryotes [13,14]. These monumental studies have provided new insights for characterizing planktonic organisms from natural assemblages. Moreover, accompanying the revolution driven by the platforms of Roche 454, Illumina Genome Analyzer and ABI-SOLiD, the high-throughput next-generation sequencing will realize direct and cost-effective sequencing of complex samples at an unprecedented scale and speed [39,40]. In addition, open-source metagenomics RAST (MG-RAST) is also available and creating expanded opportunities for the annotation and analysis of metagenomic sequences [41]. As sequencing technology is further refined, sequence-based analyses will continue to enrich our understanding of the diversity and functions of non-cultivated plankton in the natural world.
In function-based analysis, the metagenomic library is screened to identify functions of interest, such as vitamin production and antibiotic resistance [42]. Additionally, function-based methods also enable us to directly extract and identify novel proteins and metabolites from microbial communities [42]. Thus, the capabilities of a community can be assessed without having prior knowledge of the gene sequences and researchers can identify entirely new classes of genes for target functions. To date, functional analysis has identified antibiotics, antibiotic resistance genes, lipases, chitinases, membrane proteins, degradative enzymes and genes encoding the biotin synthesis pathways (see [43] for details). However, current techniques cannot express multigenes in any particular host bacterium and the frequency of metagenomic clones that express a given activity is generally very low because the host may not have the correct microenvironment for target expression. With development of microarray-based genomic technology, He et al. [44,45] developed the GeoChip for studying biogeochemical processes and functional activities of microbial communities. The current GeoChip contains probes for different genes involved in biogeochemical cycling, making it possible to use functional gene arrays to analyze complex environmental samples. The functional anchors, which are the functional analogs of phylogenetic anchors, are also emerging in metagenomic analysis [7]. However, most function-based studies have focused on soil environments and planktonic organisms and aquatic ecosystems have rarely been involved. Bacterio-, phyto-and zooplankton are essential components of the aquatic food web and generally contribute the dominant portion of productivity in aquatic ecosystems [46]. Therefore, further efforts are urgently needed to address functional expression for planktonic communities, to advance function-based plankton metagenomics.

Perspectives
Metagenomics is a burgeoning and exciting field that has generated enormous amounts of valuable biological information over a relatively short period. As a result of recent developments, future directions for plankton research at the metagenome level can be illustrated as in Figure 2. Although current efforts in microbial genomics have mainly focused on single organisms, from genome to transcriptome and also to proteome (as indicated by the shadows at organism level, Figure 2), some breakthrough has also been realized recently at the community level [14,47]. Metagenome-based techniques are now widely available for the study fields of biodiversity, agriculture, biotechnology, human health and generally for deeper understanding of the biosphere. Current trends suggest that opportunities for further development are broad (Figure 2). In brief, expansion to the community level will lead to comprehensive understanding of community functions; expansion from single organisms to ecosystems will advance understanding of biotic and abiotic interactions and ecosystem evolution. The continuing development of current techniques and the introduction of more refined techniques will increase the potency of our understanding of plankton in the natural world (e.g., structure, abundance, biomass and functions in ecosystems). This will help us to address a variety of ecological questions. For example, high-throughput sequencing of metagenomic DNA derived from plankton samples will greatly advance our understanding of the plankton community genetic diversity, metabolic functions, their ecology and evolution [39,40]. Another representative example is the newly developed GeoChip, which has proved to be a powerful, high-throughput metagenomic tool for analyzing microbial communities (including diversity, metabolic capability, and their function activity) [45]. This type of functional gene array can also be used to study plankton community functional structure and to link plankton communities to ecosystem processes and functioning. Nonetheless, it should be acknowledged that there are some challenges ahead. First, to define and elucidate genes' functions: Although sequence homology comparisons are useful, they only provide clues because homology does not necessarily mean the same function. A further challenge for metagenomic studies is to understand the consequences of genetic capacity and interactions at the community/system-level, beyond the individual/population-level.