Abstract
In recent years, modern high-throughput techniques in genome and post-genome research have made a marked impact on the marine sciences. Today, massively parallel DNA sequencing and hybridization approaches allow the identification of not only the gene repertoire but also the gene regulatory networks that function within an organism. The huge amounts of data acquired from such experiments can only be handled with intensive bioinformatics support that has to provide an adequate infrastructure for storing and analysing these data. Bioinformatics has to deliver efficient data analysis algorithms, user-friendly tools and software applications, as well as extensive hardware infrastructure to deal with these genome-scale analyses.
The following chapter briefly introduces not only the most relevant topics of bioinformatics for functional and structural genomics but also addresses the practical aspects of other steps of a genome project such as sequencing or data management issues. The chapter will take the reader through the different technical approaches that can be applied in marine genomics projects.
In the first part, we will mainly focus on data generation, introducing classical genome sequencing approaches such as the Sanger method and the shotgun technique. Moreover, a short overview of the current status of the next generation of sequencing techniques will be given. In the second part, we briefly introduce the concept of data management for bioinformatics applications. In the third part, we describe the basic principles of genome sequence analysis and address topics like EST clustering and genome assembly, gene prediction, gene function assignment and classification as well as whole genome annotation. In the fourth part of this chapter, we present an overview of transcriptome data analysis using microarray hybridization technology. After a brief introduction to microarray technology we describe state-of-the-art methods for image processing, data normalization, significance testing and cluster analysis.
Keywords
- Protein Data Bank
- Basic Local Alignment Search Tool
- European Bioinformatics Institute
- European Molecular Biology Laboratory
- Tentative Consensus Sequence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Usually frequencies of oligonucleotides of length between 3 and 12 bp are modelled.
- 2.
Specificity measures the reliability of the predictions. It is defined as the fraction of correct gene predictions, i.e. the fraction of predicted genes that corresponds to known genes.
References
Adams CP, Kron SJ, Mosaic Technologies USA (1997) Method for performing amplification of nucleic acid with two primers bound to a single solid support. US Patent 5,641,658.
Alexandersson M, Cawley S, Pachter L (2003) SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13(3):496–502
Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18):3596–3603
Allison DB, Cui X, Page GP et al (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7(1):55–65
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Aziz RK, Bartels D, Best AA et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75
Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16(4):512–524
Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519
Ball CA, Brazma A, Causton H et al (2004) Submission of microarray data to public repositories. PLoS Biol 2(9):E317
Bammler T, Beyer RP, Bhattacharya S et al (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2(5):351–356
Barrett T, Troup DB, Wilhite SE et al (2007) NCBI GEO: mining tens of millions of expression profiles-database and tools update. Nucleic Acids Res 35(Database issue):D760–D765
Bartels D, Kespohl S, Albaum S et al (2005) BACCardI-a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison. Bioinformatics 21(7):853–859
Bauerle RH, Margolin P (1966) The functional organization of the tryptophan gene cluster in Salmonella typhimurium. Proc Natl Acad Sci U S A 56(1):111–118
Bekel T, Henckel K, Küster H et al (2009) The sequence analysis and management system – SAMS-2.0: data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies. J Biotechnol 140(1–2):3–12
Bendtsen JD, Nielsen H, von Heijne G et al (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340(4):783–795
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618
Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14(5):988–995
Black MA, Doerge RW (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12):1609–1616
Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371
Brejova B, Brown DG, Li M et al (2005) ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(Suppl 1):i57–i65
Brent MR (2007) How does eukaryotic gene prediction work? Nat Biotechnol 25(8):883–885
Brunak S, Danchin A, Hattori M et al (2002) Nucleotide sequence database policies. Science 298(5597):1333
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78–94
Chen YA, Lin CC, Wang CD et al (2007) An optimized procedure greatly improves EST vector contamination removal. BMC Genomics 8:416
Chothia C, Gough J, Vogel C et al (2003) Evolution of the protein repertoire. Science 300(5626):1701–1703
Cochrane G, Bates K, Apweiler R et al (2006) Evidence standards in experimental and inferential INSDC Third Party Annotation data. Omics 10(2):105–113
Cochrane G, Akhtar R, Aldebert P et al (2008) Priorities for nucleotide trace, sequence and annotation data capture at the ensembl trace archive and the EMBL nucleotide sequence database. Nucleic Acids Res 36:D5–D12
Codd EF (1990) The relational model for database management: version 2. Addison-Wesley Longman Publishing Co., Inc, New York.
Conesa A, Gotz S, Garcia-Gomez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676
Consortium U (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Dandekar T, Snel B, Huynen MA et al (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23(9):324–328
Datson NA, van der Perk-de Jong J, van den Berg MP et al (1999) MicroSAGE: a modified procedure for serial analysis of gene expression in limited amounts of tissue. Nucleic Acids Res 27(5):1300–1307
Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6):673–679
Delcher AL, Harmon D, Kasif S et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641
Demeter J, Beauheim C, Gollub J et al (2007) The Stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35:D766–D770
Djebali S, Delaplace F, Crollius HR (2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biol 7(Suppl 1):S7–S10
Dondrup M, Goesmann A, Bartels D et al (2003) EMMA: a platform for consistent storage and efficient analysis of microarray data. J Biotechnol 106(2-3):135–146
Dondrup M, Albaum S, Griebel T et al (2009) EMMA 2 – A MAGE-compliant system for the collaborative analysis and integration of microarray data. BMC Bioinformatics 10(1):50
Dressman D, Yan H, Traverso G et al (2003) Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proc Natl Acad Sci U S A 100(15):8817–8822
Durbin R, Eddy S, Krogh A et al (1998) Biological sequence analysis. Cambridge University Press, Cambridge.
Edwards RA, Rodriguez-Brito B, Wegley L et al (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7:57
Eisen MB, Spellman PT, Brown PO et al (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868
Elsik CG, Mackey AJ, Reese JT et al (2007) Creating a honey bee consensus gene set. Genome Biol 8(1):R13
Emanuelsson O, Nielsen H, von Heijne G (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8(5):978–984
Emanuelsson O, Brunak S, von Heijne G et al (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953–971
Ewing B, Hillier L, Wendl MC et al (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3):175–185
Fedurco M, Romieu A, Williams S et al (2006) BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res 34(3):e22
Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512
Flicek P, Aken BL, Beal K et al (2008) Ensembl 2008. Nucleic Acids Res 36:D707–D714
Florea L, Hartzell G, Zhang Z et al (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res 8(9):967–974
Gaasterland T, Sczyrba A, Thomas E et al (2000) MAGPIE/EGRET annotation of the 2.9-Mb Drosophila melanogaster Adh region. Genome Res 10:502–510
Gartemann KH, Abt B, Bekel T et al (2008) The genome sequence of the tomato-pathogenic actinomycete Clavibacter michiganensis subsp. michiganensis NCPPB382 reveals a large island involved in pathogenicity. J Bacteriol 190(6):2138–2149
Gentleman R, Huber W, Carev VJ (eds) (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York.
Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80
Goesmann A, Linke B, Bartels D et al (2005) BRIGEP-the BRIDGE-based genome-transcriptome-proteome browser. Nucleic Acids Res 33:W710–W716
Goldberg SMD, Johnson J, Busam D et al (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103(30):11240–11245
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8(3):195–202
Gordon D, Desmarais C, Green P (2001) Automated finishing with autofinish. Genome Res 11(4):614–625
Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10(22):7055–7074
Green P (2002) Whole-genome disassembly. Proc Natl Acad Sci U S A 99(7):4143–4144
Gresham D, Ruderfer DM, Pratt SC et al (2006) Genome-wide detection of polymorphisms at nucleotide resolution with a single DNA microarray. Science 311(5769):1932–1936
Gross SS, Brent MR (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393
Guigo R, Reese MG (2005) EGASP: collaboration through competition to find human genes. Nat Methods 2(8):575–577
Guigo R, Flicek P, Abril JF et al (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1):S2–S31
Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31(6):1780–1789
Haas BJ, Salzberg SL, Zhu W et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9(1):R7
Henrick K, Feng Z, Bluhm WF et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426–D433
Herring CD, Raghunathan A, Honisch C et al (2006) Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet 38(12):1406–1412
Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877
Huang X, Adams MD, Zhou H et al (1997) A tool for analyzing and annotating genomic sequences. Genomics 46(1):37–45
Iizuka M, Yamauchi M, Ando K et al (1994) Quantitative RT-PCR assay detecting the transcriptional induction of vascular endothelial growth factor under hypoxia. Biochem Biophys Res Commun 205(2):1474–1480
Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol 7:138–148
Ju J, Kim DH, Bi L et al (2006) Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. Proc Natl Acad Sci U S A 103(52):19635–19640
Kaiser O, Bartels D, Bekel T et al (2003) Whole genome shotgun sequencing guided by bioinformatics pipelines-an optimized approach for an established technique. J Biotechnol 106(2–3):121–133
Kall L, Krogh A, Sonnhammer EL (2007) Advantages of combined transmembrane topology and signal peptide prediction-the Phobius web server. Nucleic Acids Res 35:W429–W432
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12(4):656–664
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59
Korf I, Flicek P, Duan D et al (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17(Suppl 1):S140–S148
Krause A, Ramakumar A, Bartels D et al (2006) Complete genome of the mutualistic, N2-fixing grass endophyte Azoarcus sp. strain BH72. Nat Biotechnol 24(11):1385–1391
Krause L, McHardy AC, Nattkemper TW et al (2007) GISMO-gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35(2):540–549
Krogh A, Larsson B, von Heijne G et al (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305(3):567–580
Küster H, Becker A, Firnhaber C et al (2007) Development of bioinformatic tools to support EST-sequencing, in silico- and microarray-based transcriptome profiling in mycorrhizal symbioses. Phytochemistry 68(1):19–32
Lafay B, Lloyd AT, McLean MJ et al (1999) Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases. Nucleic Acids Res 27(7):1642–1649
Lagesen K, Hallin P, Rodland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35(9):3100–3108
Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2(3):231–239
Larsen TS, Krogh A (2003) EasyGene-a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 4:21
Lawrence JG, Roth JR (1996) Selfish Operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143(4):1843–1860
Lee ML, Kuo FC, Whitmore GA et al (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A 97(18):9834–9839
Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98(1):31–36
Li SS, Bigler J, Lampe JW et al (2005) FDR-controlling testing procedures and sample size determination for microarrays. Stat Med 24(15):2267–2280
Lin M, Wei LJ, Sellers WR et al (2004) dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 20(8):1233–1240
Linke B, McHardy AC, Neuweger H et al (2006) REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes. Appl Bioinformatics 5(3):193–198
Liolios K, Mavromatis K, Tavernarakis N et al (2008) The genomes on line database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 36:D475–D479
Lipshutz RJ, Fodor SP, Gingeras TR et al (1999) High density synthetic oligonucleotide arrays. Nat Genet 21(1 Suppl):20–24
Lipshutz RJ, Morris D, Chee M et al (1995) Using oligonucleotide probe arrays to access genetic diversity. Biotechniques 19(3):442–447
Liu JJ, Cutler G, Li W et al (2005) Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21(11):2691–2697
Lomsadze A, Ter Hovhannisyan V, Chernoff YO et al (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 33(20):6494–6506
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115
Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879
Majoros WH, Pertea M, Salzberg SL (2005) Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics 21(9):1782–1788
Mangalam H (2002) The Bio* toolkits-a brief overview. Brief Bioinform 3(3):296–302
Mao X, Cai T, Olyarchuk JG et al (2005) Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 21(19):3787–3793
Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402
Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057):376–380
Mathe C, Sagot MF, Schiex T et al (2002) Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 30(19):4103–4117
Matsumura H, Reich S, Ito A et al (2003) Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc Natl Acad Sci U S A 100(26):15718–15723
Maurer M, Molidor R, Sturn A et al (2005) MARS: microarray analysis, retrieval, and storage system. BMC Bioinformatics 6:101
McHardy AC, Pühler A, Kalinowski J et al (2004a) Comparing expression level-dependent features in codon usage with protein abundance: an analysis of 'predictive proteomics'. Proteomics 4(1):46–58
McHardy AC, Goesmann A, Pühler A et al (2004b) Development of joint application strategies for two microbial gene finders. Bioinformatics 20(10):1622–1631
Meyer F, Goesmann A, McHardy AC et al (2003) GenDB-an open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31(8):2187–2195
Millar CD, Huynen L, Subramanian S et al (2008) New developments in ancient genomics. Trends Ecol Evol 23(7):386–393
Miron M, Nadon R (2006) Inferential literacy for experimental high-throughput biology. Trends Genet 22(2):84–89
Moore JE, Lake JA (2003) Gene structure prediction in syntenic DNA segments. Nucleic Acids Res 31(24):7271–7279
Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci 13(4):477–478
Mulder NJ, Apweiler R, Attwood TK et al (2007) New developments in the InterPro database. Nucleic Acids Res 35:D224–D228
Nagaraj SH, Deshpande N, Gasser RB et al (2007) ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform. Nucleic Acids Res 35:W143–W147
Nakano M, Komatsu J, Matsuura S-i et al (2003) Single-molecule PCR using water-in-oil emulsion. J Biotechnol 102(2): 117–124
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Nekrutenko A, Chung WY, Li WH (2003) ETOPE: evolutionary test of predicted exons. Nucleic Acids Res 31(13):3564–3567
Ng P, Wei C-L, Sung W-K et al (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2(2):105–111
Ng P, Tan JJS, Ooi HS et al (2006) Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res 34(12):e84
Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34(19):5623–5630
Ou HY, Guo FB, Zhang CT (2004) GS-Finder: a program to find bacterial gene start sites with a self-training method. Int J Biochem Cell Biol 36(3):535–544
Overbeek R, Disz T, Stevens R (2004) The SEED: a peer-to-peer environment for genome annotation. Commun ACM 47(11):47–51
Overbeek R, Fonstein M, D‘Souza M et al (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901
Overbeek R, Larsen N, Pusch GD et al (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 28(1):123–125
Overbeek R, Larsen N, Walunas T et al (2003) The ERGO genome analysis and discovery system. Nucleic Acids Res 31:164–171
Overbeek R, Begley T, Butler RM et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1,000 genomes. Nucleic Acids Res 33(17):5691–5702
Page GP, Edwards JW, Gadbury GL et al (2006) The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics 7:84
Pan W, Lin J, Le CT (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. research0022.
Parkinson H, Kapushesky M, Shojatalab M et al (2007) ArrayExpress-a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35:D747–D750
Parra G, Agarwal P, Abril JF et al (2003) Comparative gene prediction in human and mouse. Genome Res 13(1):108–117
Pavlidis P, Weston J, Cai J et al (2002) Learning gene functional classifications from multiple data types. J Comput Biol 9(2):401–411
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85(8):2444–2448
Pertea G, Huang X, Liang F et al (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19(5):651–652
Pieler R, Sanchez-Cabo F, Hackl H et al (2004) ArrayNorm: comprehensive normalization and analysis of microarray data. Bioinformatics 20(12):1971–1973
Prober JM, Trainor GL, Dam RJ et al (1987) A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238(4825):336–341
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65
Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496-501
Quackenbush J (2003) Genomics. Microarrays-guilt by association. Science 302(5643):240–241
Quevillon E, Silventoinen V, Pillai S et al (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33:W116–W1120
Rayner TF, Rocca-Serra P, Spellman PT et al (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7:489
Reeck GR, de Haen C, Teller DC et al (1987) Homology in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50(5):667
Reese MG, Kulp D, Tammana H et al (2000) Genie-gene finding in Drosophila melanogaster. Genome Res 10(4):529–538
Repsilber D, Ziegler A (2005) Two-color microarray experiments. Technology and sources of variance. Methods Inf Med 44(3):400–404
Ronaghi M, Uhlén M, Nyrén P (1998) A sequencing method based on real-time pyrophosphate. Science 281(5375):363–365
Rutherford K, Parkhill J, Crook J et al (2000) Artemis: sequence visualization and annotation. Bioinformatics 16(10):944–945
Saal LH, Troein C, Vallon-Christersson J et al (2002) BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 3(8): SOFTWARE0003.
Saeed AI, Sharov V, White J et al (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34(2):374–378
Saha S, Sparks AB, Rago C et al (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512
Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4):516–522
Sanger F, Nicklen S, Coulson A (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463–5467
Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Computational Biology, selected papers from JOBIM'2000 number 2066 in LNCS, Springer Verlag, New York, pp. 111–125.
Schneiker S, Martins dos Santos VA, Bartels D et al (2006) Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nat Biotechnol 24(8):997–1004
Schneiker S, Perlova O, Kaiser O et al (2007) Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol 25(11):1281–1289
Shendure J, Mitra RD, Varma C et al (2004) Advanced sequencing technologies: methods and goals. Nat Rev Genet 5(5):335–344
Shendure J, Porreca GJ, Reppas NB et al (2005) Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309(5741):1728–1732
Shendure JA, Porreca GJ, Church GM (2008) Overview of DNA sequencing strategies. Curr Protoc Mol Biol Chapter 7: Unit 7:1
Skovgaard M, Jensen LJ, Brunak S et al (2001) On the total number of genes and their length distribution in complete microbial genomes. Trends Genet 17(8):425–428
Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31
Smith MW, Feng DF, Doolittle RF (1992) Evolution by acquisition: the case for horizontal gene transfers. Trends Biochem Sci 17(12):489–493
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Sonnhammer EL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182
Spellman PT, Miller M, Stewart J et al (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9): RESEARCH0046.
Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225
Stanke M, Tzvetkova A, Morgenstern B (2006) AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7(Suppl 1):S11–S18
Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18(1):207–208
Sugawara H, Ogasawara O, Okubo K et al (2008) DDBJ with new system and face. Nucleic Acids Res 36:D22–D24
Suzek BE, Ermolaeva MD, Schreiber M et al (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17(12):1123–1130
Tamames J, Casari G, Ouzounis C et al (1997) Conserved clusters of functionally related genes in two bacterial genomes. Mol Evol 44:66–73
Tatsuov RL, Mushegian AR, Bork P et al (1996) Metabolism and evolution of Haemophilus influenza deduced from a whole-genome comparison with Escherichia coli. Curr Biol 6(3):279–291
Tatusov RL, Fedorova ND, Jackson JD et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41
Team RDC (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Tech M, Meinicke P (2006) An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7:121
Thieme F, Koebnik R, Bekel T et al (2005) Insights into genome plasticity and pathogenicity of the plant pathogenic bacterium Xanthomonas campestris pv. vesicatoria revealed by the complete genome sequence. J Bacteriol 187(21):7254–7266
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121
Usuka J, Zhu W, Brendel V (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16(3):203–211
van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16(5):678–685
Vapnik VN (1999) The nature of statistical learning theory. Springer, New York.
Velculescu VE, Zhang L, Vogelstein B et al (1995) Serial analysis of gene expression. Science 270(5235):484–487
von Mering C, Jensen LJ, Snel B et al (2005) STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 33:433–437
Vorhölter FJ, Schneiker S, Goesmann A et al (2008) The genome of Xanthomonas campestris pv. campestris B100 and its use for the reconstruction of metabolic pathways involved in xanthan biosynthesis. J Biotechnol 134(1–2):33–45
Wei C, Brent MR (2006) Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 7:327
Wilkinson MD, Links M (2002) BioMOBY: an open source biological web services proposal. Brief Bioinform 3(4):331–341
Wu J, Mao X, Cai T et al (2006) KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res 34:W720–W724
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21(9):1859–1875
Wu W, Xing EP, Myers C et al (2005) Evaluation of normalization methods for cDNA microarray data by k-NN classification. BMC Bioinformatics 6:191
Yang YH, Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3(8):579–588
Yauk C, Berndt L, Williams A et al (2005) Automation of cDNA microarray hybridization and washing yields improved data quality. J Biochem Biophys Methods 64(1):69–75
Yauk CL, Berndt ML, Williams A et al (2004) Comprehensive comparison of six microarray technologies. Nucleic Acids Res 32(15):e124
Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3(9):698–709
Zhang Z, Schwartz S, Wagner L et al (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7(1–2):203–214
Acknowledgments
We are grateful to the CeBiTec at Bielefeld University, the BMBF Competence Network GenoMik-Plus (grant 0313805A), the International NRW Graduate School in Bioinformatics and Genome Research, the EU FP6 Network of Excellence Marine Genomics Europe (contract No. COGE-CT-2004-505403) and Nestlé Research Center for financial support of our work. Special thanks to our native speaker Sita Lange, the chapter would not have been the same without her efforts. The authors would also like to thank Guy Cochrane, Naryttza Diaz, Michele Magrane, Nicky Mulder, Kai Runte and Rafael Szczepanowski, who read sections of the chapter and provided valuable comments. Many thanks to our present and former colleagues from the Junior Group Computational Genomics for their patience during the writing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Mittard-Runte, V. et al. (2010). Practical Guide: Genomic Techniques and How to Apply Them to Marine Questions. In: Cock, J., Tessmar-Raible, K., Boyen, C., Viard, F. (eds) Introduction to Marine Genomics. Advances in Marine Genomics, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-8639-6_9
Download citation
DOI: https://doi.org/10.1007/978-90-481-8639-6_9
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-8616-7
Online ISBN: 978-90-481-8639-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)