Marine Biotechnology

, Volume 10, Issue 6, pp 692–700

Preparation and Analysis of an Expressed Sequence Tag Library from the Toxic Dinoflagellate Alexandrium catenella


  • Paulina Uribe
    • Fundación Ciencia para la Vida
  • Daniela Fuentes
    • Fundación Ciencia para la Vida
  • Jorge Valdés
    • Instituto MIFAB
    • Center for Bioinformatics and Genome Biology
  • Amir Shmaryahu
    • Instituto MIFAB
    • Center for Bioinformatics and Genome Biology
  • Alicia Zúñiga
    • Fundación Ciencia para la Vida
  • David Holmes
    • Instituto MIFAB
    • Center for Bioinformatics and Genome Biology
    • Fundación Ciencia para la Vida
    • Instituto MIFAB
Original Article

DOI: 10.1007/s10126-008-9107-8

Cite this article as:
Uribe, P., Fuentes, D., Valdés, J. et al. Mar Biotechnol (2008) 10: 692. doi:10.1007/s10126-008-9107-8


Dinoflagellates of the genus Alexandrium are photosynthetic microalgae that have an extreme importance due to the impact of some toxic species on shellfish aquaculture industry. Alexandrium catenella is the species responsible for the production of paralytic shellfish poisoning in Chile and other geographical areas. We have constructed a cDNA library from midexponential cells of A. catenella grown in culture free of associated bacteria and sequenced 10,850 expressed sequence tags (ESTs) that were assembled into 1,021 contigs and 5,475 singletons for a total of 6,496 unigenes. Approximately 41.6% of the unigenes showed similarity to genes with predicted function. A significant number of unigenes showed similarity with genes from other dinoflagellates, plants, and other protists. Among the identified genes, the most expressed correspond to those coding for proteins of luminescence, carbohydrate metabolism, and photosynthesis. The sequences of 9,847 ESTs have been deposited in Gene Bank (accession numbers EX 454357–464203).


Alexandrium catenellacDNA sequencingRed tide microalgaeToxic dinoflagellate


Dinoflagellates represent a unique and important group of organisms in the marine environment in terms of their numbers and diversity, as well as their ecological and physiological significance. They commonly occur as free-living, photosynthetic, and marine unicellular algae, but also include endosymbiotic, parasitic, heterotrophic, and freshwater taxa. Some species are responsible for the production of potent toxins that can be accumulated by shellfish and affect humans and marine mammals. They form harmful blooms or “red tides,” in which cell numbers reach more than one million cells per liter of seawater, producing a significant economic impact and public health concern on different geographical areas throughout the world (Scholin et al. 1995; Hallegraeff 1993). Dinoflagellates are the only photosynthetic organisms capable of bioluminescence (Sweeney 1987). Dinoflagellates are also unique among eukaryotes in many other biological and morphological characteristics. Their DNA content is higher than other eukayotes [from 3 to 250 pg/cell, or approximately 3,000–215,000 megabases (Spector 1984; Triplett et al. 1993; Santos and Coffroth 2003)]. This is up to over 80 times the size of the human genome (Lin 2006). Their chromosomes consist of permanently condensed, genetically inactive central regions with peripheral loops of B-DNA that protrude from this core and comprise the actively transcribed DNA (Sigee 1984; Anderson et al. 1992; Bhaud et al. 1999).

Dinoflagellates of the genus Alexandrium such as Alexandrium catenella cause paralytic shellfish poisoning through saxitoxin production in Chile and in many geographical areas of the world. In Chile, the first documented toxic bloom was reported in 1972 in Magallanes (Guzmán and Campodónico 1975). Since then, the dominant toxic dinoflagellate species in Chilean southern coastal areas and estuaries during toxic bloom events has been identified to be A. catenella (Guzmán and Campodónico 1978).

The study of the molecular mechanisms that regulate growth, toxicity, photosynthesis, luminescence, and of other circadian-controlled expressed genes of A. catenella, is of critical importance for understanding the physiological mechanisms and bloom formation capacity of Alexandrium species. However, there have been few studies regarding the molecular biology of A. catenella. Sequencing of complementary DNA libraries to generate expressed sequence tags (ESTs) is a reasonable approach for discovering expressed genes. ESTs can be used as markers for genes expressed under specific conditions, for predicting protein families, and for the development of expression systems for new proteins and their functions. Here, we have developed an EST library of A. catenella strain ACC07, isolated from Chilean waters, and have carried out large-scale sequencing to yield an EST database containing 10,850 ESTs and 6,496 unique genes. This database provides an important genomic resource for scientists working on the genus Alexandrium and related dinoflagellates.

Materials and Methods

Strains and Media

Alexandrium catenella clone ACC07 isolated in Aysén, Chile, in 1994 was used. Cells were grown in f/2 medium (Guillard 1995) at 16°C in a 10:14 light/dark photoperiod. Axenic cultures were obtained according to Uribe and Espejo (2003). Briefly, cells were subjected to sequential washing and filtration through an 11-μm pore-size nylon mesh with 0.05 mg/ml gentamicin and 0.2 mg/ml penicillin G. Bacterial presence was determined by direct or epifluorescence microscopy after staining with acridine orange (Imai 1987; Kuwae and Hosokawa 1999). This procedure allowed the detection of less than one bacterium per dinoflagellate. The cell extracts of A. catenella strain ACCO7 contained approximately 5–8 femtomoles of saxitoxin equivalents/cell.

cDNA library preparation

Approximately 6 × 106 cells from an exponential phase culture were collected by centrifugation at 1,000×g for 5 min during the light phase and were broken by four successive cycles of freezing, grinding, and thawing. Approximately 400 μg of total RNA was extracted using Trizol (Gibco BRL, Life Technologies, Gaithsburg, MD, USA), according to the manufacturer’s directions, and quantified spectrophotometrically. Poly A+ mRNA was isolated with the Poly (A) Quick mRNA isolation kit (Stratagene, La Jolla, CA, USA). cDNA was prepared from approximately 5 μg of polyA+ mRNA and cloned using the vector pExpress 1, exploiting the Not I and Eco RV restriction endonuclease sites. Double-strand cDNA synthesis was performed according to manufacturer’s directions and quantified spectrophotometrically. The cDNA libraries were not normalized. Sequencing reactions were carried out from the 5′ end of the cDNA insert using the universal primers M13FWD (5′-gtaaaacgacggccagt-3′) and M13REV (5′-caggaaacagctatgac-3′).

Computational Sequence Analysis and ESTs Assembly

Vector-derived, ribosomal, and ambiguous sequences were removed from the collected EST sequences. EST sequences were assembled in clusters with a minimum value of 95% identity for at least a 50-bp region of overlap using the CAP3 program (Huang and Madan 1999). Clusters and singletons generated were designed as unigenes and were then subjected to similarity searches against the National Center for Biotechnology Information nonredundant protein database, using the BLASTX algorithm (Altschul et al. 1990). Initially, sequence similarities were considered to be significant when the E value was below e−5 at the aminoacid sequence level. However, a stricter criterion with a cut-off E value of e−20 or less was also used in the analysis. The InterProScan (Mulder et al. 2007), gene ontology (the gene ontology consortium 2007), and clusters of orthologous groups (COGs) (Tatusov et al. 2003) databases were used to infer the functional classification of the predicted proteins.

Results and Discussion

Characteristics of the A. catenella cDNA Library

The cDNA library obtained had a titer of 1.1 × 106 colony forming units per milliliter for a total of 1 × 107 primary recombinants. Blue/white plaque identification following plating of an aliquot of the library revealed 99% recombinant plaques. The quality of the library was assessed by examining the insert size of 768 (2 × 384 well plates) randomly selected recombinant plaques. The average insert size was 1.7 Kb, a value similar to that of a recent cDNA library of Karenia brevis (Lidie et al. 2005). The average size of the sequenced clones was 763 base pairs, and about 83% of the sequenced cDNA clones contained inserts that were longer than the single sequence read. The global G + C content for these ESTs was 56.8%. This value is similar to that obtained for the coding regions of Alexandrium tamarense (60.8%) (Hackett et al. 2005) and in the range of the values obtained for other dinoflagellates such as K. brevis (51%), Lingulodinium polyedrum (59.0%), Amphidirium carterae (50.4%), and Crypthecodinium cohnii (50%) (Lidie et al. 2005 and references therein).

Analysis of the codon usage revealed a major use of G (35.1%) and C (44.8%) at the third position similar to that obtained for A. tamarense (37.2% and 40.7%, respectively). The most frequent stop codon was TGA (72.7%), compared to TAA and TAG (6.5% and 20.7%, respectively).

Generation and Annotation of Expressed Sequences Tags

EST sequences were produced from the cDNA library and scanned visually to confirm overall quality of peak shape and correspondence with base identification. After the cleaning process, the average length per EST of the remaining sequences (9,847) was 736 base pairs and the Phred quality value was larger than 20. The sequences were assembled into 1,021 contigs (clusters of assembled ESTs) and 5,475 singletons (sequences found only once) (Table 1). The sequences of 9,847 ESTs have been deposited in Gene Bank with accession numbers EX454357–464203.
Table 1

Overview of the results from the A. catenella genomic library


Number of sequences

Total ESTs sequenced


Total valid ESTs


Average length per EST

736 bp

Number of contigs


Number of singletons


Total unigenes


Percentage known unigenes


Contigs were composed of multiple ESTs ranging from 2 to 438. The percentage of unigene sequences with similarity to GenBank database was 41.6%. This EST collection constitutes one of the largest dinoflagellate libraries deposited (Lidie et al. 2005; Hackett et al. 2005; Tanikawa et al. 2004). The total number of unigenes was 6,496, corresponding to less than half of the total sequences obtained (Table 1). The ratio of sequenced ESTs to the number of unigenes is similar to that reported for other dinoflagellate EST libraries.

Using a cut-off E value of e−5 or less, a total of 5,443 ESTs corresponding to 2,700 unigenes, were found to have similarity to previously identified genes from a wide variety of organisms. Alexandrium catenella sequences were classified according to the organism with the best protein sequence hit. A significant proportion of the ESTs show similarity to genes of dinoflagellates (32%), plants (15%), and other protista (protozoa, ciliates, and other microalgae, 13%). Different percentages were found in a recent EST library analysis of the dinoflagellate L. polyedrum, where the groups most frequently found were land plants and animals (21% and 16%, respectively), whereas similarities with prokaryotes, flagellates, and protozoa were 14%, 14%, and 13%, respectively (Tanikawa et al. 2004). A similar analysis was carried out using an E value of e−20 or less. In this case, a total of 3,460 ESTs corresponding to 1,546 unigenes were found to have similarity to previously identified genes. As shown in Fig. 1, a large proportion of the ESTs show high level of similarity to genes of dinoflagellates (47%), plants (18.1%), and other protista (protozoa, ciliates, and other microalgae, 13.4%) (Fig. 1).
Fig. 1

Taxonomic group distribution of targets with the best hit by A. catenella ESTs considering an E value of less than 10−20 to the National Center for Biotechnology Information protein nonredundant database

The unigenes could be assigned to known COGs. The most represented group of proteins in ESTs corresponding to cellular processes are those related to luminescence (14.5%), carbohydrate metabolism (13.6%), aminoacid metabolism (12.6%), protein modification (10.9%), and photosynthesis (8.3%). Using an E value of e−20 or less, the most represented group of proteins in ESTs corresponding to cellular processes are those related to luminescence (18.4%), carbohydrate metabolism (14.5%), aminoacid metabolism (15%), protein modification (11.7%), and photosynthesis (8.4%) (Fig. 2).
Fig. 2

Distribution of A. catenella ESTs into the GO categories of cellular processes

Among the first two categories, the majority of the predicted proteins corresponded to those from dinoflagellates. Proteins from plants were the most represented in categories such as aminoacid and nucleic acid metabolism. On the other hand, proteins from protozoa were the most represented in translation and cellular cycle categories. Some categories of proteins such as transport and those with noncharacterized function were similarly distributed among different taxonomic groups of eukaryotes and prokaryotes. This distribution of categories of proteins among different taxonomic groups was similar when E values of e−30 or less were considered. In summary, by using an E value of e−20 as cut off, a higher degree of specificity was obtained resulting in an increased percent of proteins from dinoflagellate, protozoa, plants, and other microalgae.

Highly Represented Genes

The contigs containing the highest number of ESTs (analyzed with an E value of e−20) are listed in Table 2. The sequence coding for luciferin-binding protein (LBP) was the most abundant transcript in the library with 80 unigenes (3%) representing 539 ESTs (15.6% of the total ESTs).
Table 2

Most highly represented ESTs in A. catenella cDNA library


Number of ESTs





S-adenosyl-l-homocysteine hydrolase



Glyceraldehyde-3-phosphate dehydrogenase isoform 2 (GPDH)



S-adenosylmethionine synthetase 2






EF-1 alpha-like protein



Fumarate reductase



Peridinin chl a binding protein






Phosphoglycerate kinase



Ribonucleoside-diphosphate reductase R2






Chloroplast phosphoribulokinase






Chloroplast light harvesting complex protein






Light-harvesting polyprotein precursor



This gene was also highly expressed in L. polyedrum (Machabée et al. 1994) and also highly expressed (4%) in a previous study of a normalized EST library of this dinoflagellate during the night phase (Tanikawa et al. 2004). Similar results were reported in A. tamarense (Hackett et al. 2005). Recently, in a global transcriptional profiling of the toxic dinoflagellate Alexandrium fundyense, four of the 15 signature sequences matched with the LBP gene (Edner and Anderson 2006). In the present study, the sequences of the two luminescence proteins of A. catenella were subjected to a more detailed analysis.

Luciferin-binding Protein

The complete sequence of the LBP coding region of A. catenella ACCO7 was obtained. It comprises 2,194 nucleotides, corresponding to 663 aminoacids. Sequencing the genomic coding region indicated the lack of introns, and after expression in bacteria, an 80-kDa protein was obtained (data not shown). The LBP contains four domains with low identity (15%) between them. The highest similarity in the EST database was found with A. tamarense (Table 3). At the aminoacid level, the highest similarity (76%) was found with L. polyedrum. As found previously in Lingulodinium, the amino terminal region of approximately 100 aminoacids of LBP of A. catenella is similar (50%) to the equivalent region of luciferase (LCF). This is the first complete sequence of LBP reported in a toxic strain of the genus Alexandrium (accession number EU236684).
Table 3

Photosynthesis and light harvesting proteins of A. catenella EST library

GenBank access number


E value

% Identity




Chloroplast photosystem II 12 kDa extrinsic protein (PsbU)



Alexandrium tamarense


Photosystem II 23 kDa polypeptide (PsbP)



Phakopsora pachyrhizi


PSII cytochrome c550 oxygen-evolving (PsbV)



Alexandrium tamarense


Plastid oxygen evolving enhancer 1 precursor (PsbO)



Alexandrium tamarense


Chloroplast cytochrome f (PetA)



Alexandrium tamarense


Chloroplast ferredoxin (PetF)



Alexandrium tamarense


Chloroplast ferredoxin-NADP{+) reductase (PetH)



Heterocapsa triquetra


Rieske iron–sulfur protein precursor (PetC)



Alexandrium tamarense


Photosystem I iron–sulfur center (PsaC)



Alexandrium tamarense


Chloroplast photosystem I subunit XI (PsaL)



Heterocapsa triquetra


PSI, ferredoxin-binding protein II (PsaD)



Symbiodinium sp.


Chloroplast photosystem I, subunit III (PsaF)



Alexandrium tamarense


Chloroplast ATP synthase gamma subunit (AtpC)



Alexandrium tamarense


Chloroplast ATP synthase subunit C (AtpH)



Alexandrium tamarense


Chloroplast light harvesting complex protein



Lingulodinium polyedrum


Peridinin-chlorophyll a-binding protein (PCP)



Lingulodinium polyedrum


Chloroplast phosphoribulokinase



Amphidinium carterae


Chloroplast transketolase



Euglena gracilis


Cytosolic class II fructose bisphosphate aldolase



Heterocapsa triquetra


Glyceraldehyde-3-phosphate dehydrogenase isoform 2



Symbiodinium sp.


Phosphoglycerate kinase



Karenia brevis


Ribose-5-phosphate isomerase



Phaeodactylum tricornutum


RuBisCO form II



Amphidinium carterae


Triose-phosphate isomerase



Isochrysis galbana

Chlorophyll synthesis


NADPH protochlorophyllide reductase



Phaeodactylum tricornutum


Magnesium chelatase H-subunit



Ostreococcus lucimarinus


Mg-protoporhyrin IX (ChlI)



Amphidinium carterae


NADPH-protochlorophyllide oxidoreductase



Phaeodactylum tricornutum


Chloroplast geranylgeranyl reductase/hydrogenase



Heterocapsa triquetra


Glutamate 1-semialdehyde 2,1-aminomutase



Amphidinium carterae






Lingulodinium polyedrum





Alexandrium tamarense

Light receptors


Cryptochrome dash



Euglena gracilis





Branchiostoma floridae


Another highly expressed luminescence protein was the LCF. Complete sequence analysis of the 3,476 nucleotides coding for the A. catenella enzyme showed that the most closely related were those from A. tamarense and A. affine (94% identity) (Liu et al. 2004). The sequence contains no introns and presents three domains with an identity of 76% between them, a significantly lower value than the identity obtained when these domains are compared to other dinoflagellate species (Liu et al. 2004). Internal regions of each domain are the most conserved, corresponding to the probable catalytic site of this enzyme. Four conserved histidines are present, at the following positions within each domain: first domain (D1): H138, H148, H163, and H169; second domain (D2): H512, H525, H540, and H546; and third domain (D3): H891, H901, H916, and H922. These histidines were previously reported in L. polyendrum and are probably related to the pH regulation of the activity of this enzyme (Li et al. 1997). The first and the third domains of the LCF were expressed in bacteria and the products were 60 and 45 kDa, respectively. The three domains of this protein have shown to be functional in L. polyedrum (Li et al. 1997).

The synthesis of the two luminescence proteins LCF and LBP of Lingulodinium is regulated translationally; their mRNA and protein levels remain constant over the circadian cycle (Machabée et al. 1994). Remarkably, both LCF and LBP in L. polyedrum are destroyed at the end of the night phase and then resynthesized in the next cycle. Moreover, the scintillons themselves are broken down and reformed each day (Machabée et al. 1994).

Although the ecological function of the luminescence in dinoflagellates has not been determined, it is probably related to predation avoidance and communication (Esaias and Curl 1972; Abrahams and Townsend 1993). Taken into account that the luminescence proteins are among the most expressed in A. catenella, probably a high proportion of the energy of this dinoflagellate is dedicated to this particular physiological response. Taken together, the specific characteristics of the luminescent proteins and their expression patterns in a paralytic shellfish poisoning producing dinoflagellate such as A. catenella are of special relevance to unveil the mechanisms of bloom formation in a toxin-producing species. These specific features could be useful for the development of new tools for the detection and localization of this toxic species using bio-optical instruments (Seliger et al. 1961; Widder et al. 1993).

Also highly expressed are transcripts that show a very high similarity to the enzymes S-adenosyl-l-homocysteine hydrolase and the S-adenosylmethionine synthetase 2 (E values of e−129 and e−112, respectively). These enzymes are involved in methylation reactions that play a major role in the modification of a large variety of acceptor molecules, such as lipids, polysaccharides, nucleic acids, proteins, and secondary plant products (reviewed by Giovanelli 1987). In eukaryotes, DNA methylation has been implicated in the control of several cellular processes, including differentiation, gene regulation, and embryonic development (Cheng 1995). The high expression level of genes that matched with the two heat shock proteins HSP90 and HSP70 sequences was also remarkable. These proteins participate in various cellular processes including signal transduction, protein folding, protein degradation, and morphological evolution (Lindquist and Craig 1988). HSP70 proteins can be found in different cellular compartments and have a role in the disassembly of clathrin cages and also participate in the posttranslational transmembrane targeting of proteins to cellular organelles (Craig 1989). The sequence coding for these proteins have also been found in high frequency in the EST library of the dinoflagellate A. tamarense (Hackett et al. 2005).

Photosynthesis and Light Harvesting Genes

None of the 15 known plastid-encoded genes from peridinin-producing dinoflagellates were represented among the ESTs of the library (Zhang et al. 1999). Thus, the 30 photosynthesis unigenes represented in the A. catenella cDNA library are probably encoded in the nucleus (Table 3). All these plastid protein sequences contain tripartite N-terminal targeting signals that are shown to direct the trafficking of these proteins through the different membranes of the dinoflagellate secondary plastids. The distribution of these signal elements in A. catenella plastid protein sequences was equivalent to those observed in the dinoflagellate Heterocapsa triquera (Patron et al. 2005).

The origin of these nuclear encoded plastid protein sequences is suggested by the relative high similarity with those present in other peridinin-pigmented dinoflagellates (Table 3). The nuclear location of these genes can be verified by using the spliced leader sequence recently found in the nuclear-encoded mRNAs of dinoflagellates (Zhang et al. 2007). As expected, within this group, the majority (accession numbers: EX455192, EX455275, EX455467, EX455749, EX456053, EX456206, EX456301, EX456598, EX457854, EX458868, EX459236, EX462386, EX462908, and EX463406) belong to the related species A. tamarense (from the A. cantenella–tamarense–fundyense species complex) (Scholin et al. 1995) followed by those from A. carterae and H. triquera; L. polyedrum, and Symbiodinium sp. Only few ESTs were similar to chloroplast sequences from the fucoxantin pigmented K. brevis, or from other organisms such as euglenoids, green algae, and stramenopiles, which have a different but parallel origin of the plastid proteins.

The most expressed transcripts with a high similarity to photosynthesis genes were those predicted to encode the light harvesting complex, composed of a chlorophyll a-/c- and peridinin-binding protein and those corresponding to a number of proteins of the light phase of the photosynthesis, such as photosystems I and II, cytochrome b6f, and ATP synthase (Patron et al. 2005) (Table 3). Highly expressed are the unigenes that are highly similar to the carbon fixation enzyme glyceraldehyde-3-phosphate dehydrogenase isoform 2 that was 86% identical to the one from L. polyedrum. This enzyme participates in the aldehyde formation during the Calvin cycle in the dark phase of photosynthesis. Sequences coding for this enzyme were also found among the highest expressed in other EST libraries of different dinoflagellates such as L. polyedrum (Bachvaroff et al. 2004), A. tamarense (Hackett et al. 2005), K. brevis (Lidie et al. 2005), and A. fundyense (Taroncher-Oldenburg and Anderson 2000). Other highly expressed genes similar to sequences encoding carbon fixation proteins were the phosphoglycerate kinase and the chloroplast phosphoribulokinase. The library contains the coding sequences of six enzymes related to chlorophyll synthesis and two enzymes involved in the synthesis of photoprotective pigments (Table 3).

We have also found sequences with high similarity to light receptors. One has 77% identity to the green light receptor (450 and 500 nm) type 1 rhodopsin described in Pyrocystis lunula, (Okamoto and Hastings 2003) and to those from the marine chryptophyte Guillardia theta (Sineshchekov et al. 2005) and Cryptomonas spp. (29% and 26%, respectively). Type 1 rhodopsins have recently been described in the green alga Chlamydomonas reinhardtii, where they function as receptors for phototaxis responses (Sineshchekov et al. 2002). This photosensitive protein is similar to γ-proteobacterial rhodopsins and more abundantly expressed during the early day hours (Okamoto and Hastings 2003). Sequences that correspond to a second photosensory receptor, the chryptochrome dash protein, a blue light (400–500 nm) and UV-A (320–400 nm) receptor, were found. This protein, which is involved in the light regulation of growth and development in plants and other cellular processes such as growth and the induction of sexual reproduction in algae (Liscum et al. 2003) shows 30% identity to those from K. brevis and Arabidopsis thaliana. We consider that these light receptors are an interesting subject of study in relation to the high level of expression of blue light luminescence proteins in A. catenella, considering the probable role of the luminescence in the cellular communication of dinoflagellates.

Other Proteins

Two A. catenella unigenes show a 100% identity with a toxic strain-specific sequence of A. tamarense (AT-T1), previously identified as a biomarker of toxicity by Chan et al. (2006). Both A. catenella sequences also show similarity to unknown proteins of the nontoxic dinoflagellate H. triquera. These sequences contain signal peptide sequences, suggestive of a plastid targeting protein (Patron et al. 2005). We have also found an A. catenella unigene coding for a protein with a high level of similarity to two interesting conjugation-induced proteins, SPS19 from Saccharomyces cerevisiae and eIF-4A, an eukaryotic elongation factor that was found recently to be induced during conjugation in the dinoflagellates A. catenella and A. tamarense (Hosoi-Tanabe et al. 2005).

Two sequences code for a protein with a cysteine-rich region, which has similarity to the EhV_307 protein from the Emiliania huxleyi virus (Wilson et al. 2005). Other viral sequences from the Paramecium bursaria Chlorella virus were also found but with a lower similarity.

Genes predicted to encode a diversity of proteins involved in transport processes were detected; among them were Na, K, Ca, phosphate, and ammonium channels, and also antiporters; ABC-transporters; aminoacid transporters; and the Sec61 and SecY translocases, involved in secretion pathways in eukaryotes. Thirteen sequences that correspond to transposable elements previously described by Armbrust et al. (2004) in the Thalassiosira pseudonana genome were found with a relatively low similarity.

Comparative Genomics

The A. catenella protein database was compared with genomes of the plant A. thaliana and to genomes of unicellular eukaryotes of the protista kingdom, such as T. pseudonana, Entoamoeba histolytica, Cryptosporidium hominis, and the red algae Cyanidioschyzon merolae (Fig. 3). The Venn diagram shows the highest similarity with A. thaliana (19.3%), the diatom T. pseudonana (19.1%), and C. merolae (18.3%) (Fig. 3). We observed a similar distribution of functional groups (COGs) among the sequences in common with those five organisms (not shown).
Fig. 3

Venn diagram of the comparison between the A. catenella ESTs with the genomes of A. thaliana; T. pseudonana; C. merolae; Entoamoeba histolytica; and Plasmodium falciparum. The number and percentage of the homologous sequences of A. catenella with each organism is referred in the intersection

When the unigenes of this library were compared with 10,886 ESTs of the closely related species A. tamarense (Hackett et al. 2005) present in the public database, we found 3,045 (46.9%) hits. From them, the 1,236 common unigenes were classified into COGs, and the most represented categories corresponded to carbohydrate metabolism (11.8%), posttranslational modification and chaperones (9.1%), energy production (7.4%), and luminescence (6.6%).


This research has been partially funded by CONICYT-FONDEF Project MR02I1003 and by a Microsoft Research Joint R&D Program.

Copyright information

© Springer Science+Business Media, LLC 2008