Skip to main content

Practical Guide: Genomic Techniques and How to Apply Them to Marine Questions

  • Chapter
  • First Online:
  • 1469 Accesses

Part of the book series: Advances in Marine Genomics ((AMGE,volume 1))

Abstract

In recent years, modern high-throughput techniques in genome and post-genome research have made a marked impact on the marine sciences. Today, massively parallel DNA sequencing and hybridization approaches allow the identification of not only the gene repertoire but also the gene regulatory networks that function within an organism. The huge amounts of data acquired from such experiments can only be handled with intensive bioinformatics support that has to provide an adequate infrastructure for storing and analysing these data. Bioinformatics has to deliver efficient data analysis algorithms, user-friendly tools and software applications, as well as extensive hardware infrastructure to deal with these genome-scale analyses.

The following chapter briefly introduces not only the most relevant topics of bioinformatics for functional and structural genomics but also addresses the practical aspects of other steps of a genome project such as sequencing or data management issues. The chapter will take the reader through the different technical approaches that can be applied in marine genomics projects.

In the first part, we will mainly focus on data generation, introducing classical genome sequencing approaches such as the Sanger method and the shotgun technique. Moreover, a short overview of the current status of the next generation of sequencing techniques will be given. In the second part, we briefly introduce the concept of data management for bioinformatics applications. In the third part, we describe the basic principles of genome sequence analysis and address topics like EST clustering and genome assembly, gene prediction, gene function assignment and classification as well as whole genome annotation. In the fourth part of this chapter, we present an overview of transcriptome data analysis using microarray hybridization technology. After a brief introduction to microarray technology we describe state-of-the-art methods for image processing, data normalization, significance testing and cluster analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Usually frequencies of oligonucleotides of length between 3 and 12 bp are modelled.

  2. 2.

    Specificity measures the reliability of the predictions. It is defined as the fraction of correct gene predictions, i.e. the fraction of predicted genes that corresponds to known genes.

References

  • Adams CP, Kron SJ, Mosaic Technologies USA (1997) Method for performing amplification of nucleic acid with two primers bound to a single solid support. US Patent 5,641,658.

    Google Scholar 

  • Alexandersson M, Cawley S, Pachter L (2003) SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13(3):496–502

    CAS  PubMed  Google Scholar 

  • Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18):3596–3603

    CAS  PubMed  Google Scholar 

  • Allison DB, Cui X, Page GP et al (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7(1):55–65

    CAS  PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    CAS  PubMed  Google Scholar 

  • Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    CAS  PubMed  Google Scholar 

  • Aziz RK, Bartels D, Best AA et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75

    PubMed  Google Scholar 

  • Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16(4):512–524

    CAS  PubMed  Google Scholar 

  • Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519

    CAS  PubMed  Google Scholar 

  • Ball CA, Brazma A, Causton H et al (2004) Submission of microarray data to public repositories. PLoS Biol 2(9):E317

    PubMed  Google Scholar 

  • Bammler T, Beyer RP, Bhattacharya S et al (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2(5):351–356

    PubMed  Google Scholar 

  • Barrett T, Troup DB, Wilhite SE et al (2007) NCBI GEO: mining tens of millions of expression profiles-database and tools update. Nucleic Acids Res 35(Database issue):D760–D765

    CAS  PubMed  Google Scholar 

  • Bartels D, Kespohl S, Albaum S et al (2005) BACCardI-a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison. Bioinformatics 21(7):853–859

    CAS  PubMed  Google Scholar 

  • Bauerle RH, Margolin P (1966) The functional organization of the tryptophan gene cluster in Salmonella typhimurium. Proc Natl Acad Sci U S A 56(1):111–118

    CAS  PubMed  Google Scholar 

  • Bekel T, Henckel K, Küster H et al (2009) The sequence analysis and management system – SAMS-2.0: data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies. J Biotechnol 140(1–2):3–12

    CAS  PubMed  Google Scholar 

  • Bendtsen JD, Nielsen H, von Heijne G et al (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340(4):783–795

    PubMed  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30

    CAS  PubMed  Google Scholar 

  • Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980

    CAS  PubMed  Google Scholar 

  • Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454

    CAS  PubMed  Google Scholar 

  • Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618

    CAS  PubMed  Google Scholar 

  • Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14(5):988–995

    CAS  PubMed  Google Scholar 

  • Black MA, Doerge RW (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12):1609–1616

    CAS  PubMed  Google Scholar 

  • Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371

    CAS  PubMed  Google Scholar 

  • Brejova B, Brown DG, Li M et al (2005) ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(Suppl 1):i57–i65

    CAS  PubMed  Google Scholar 

  • Brent MR (2007) How does eukaryotic gene prediction work? Nat Biotechnol 25(8):883–885

    CAS  PubMed  Google Scholar 

  • Brunak S, Danchin A, Hattori M et al (2002) Nucleotide sequence database policies. Science 298(5597):1333

    CAS  PubMed  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):78–94

    CAS  PubMed  Google Scholar 

  • Chen YA, Lin CC, Wang CD et al (2007) An optimized procedure greatly improves EST vector contamination removal. BMC Genomics 8:416

    CAS  PubMed  Google Scholar 

  • Chothia C, Gough J, Vogel C et al (2003) Evolution of the protein repertoire. Science 300(5626):1701–1703

    CAS  PubMed  Google Scholar 

  • Cochrane G, Bates K, Apweiler R et al (2006) Evidence standards in experimental and inferential INSDC Third Party Annotation data. Omics 10(2):105–113

    CAS  PubMed  Google Scholar 

  • Cochrane G, Akhtar R, Aldebert P et al (2008) Priorities for nucleotide trace, sequence and annotation data capture at the ensembl trace archive and the EMBL nucleotide sequence database. Nucleic Acids Res 36:D5–D12

    CAS  PubMed  Google Scholar 

  • Codd EF (1990) The relational model for database management: version 2. Addison-Wesley Longman Publishing Co., Inc, New York.

    Google Scholar 

  • Conesa A, Gotz S, Garcia-Gomez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676

    CAS  PubMed  Google Scholar 

  • Consortium U (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195

    Google Scholar 

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Google Scholar 

  • Dandekar T, Snel B, Huynen MA et al (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23(9):324–328

    CAS  PubMed  Google Scholar 

  • Datson NA, van der Perk-de Jong J, van den Berg MP et al (1999) MicroSAGE: a modified procedure for serial analysis of gene expression in limited amounts of tissue. Nucleic Acids Res 27(5):1300–1307

    CAS  PubMed  Google Scholar 

  • Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6):673–679

    CAS  PubMed  Google Scholar 

  • Delcher AL, Harmon D, Kasif S et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641

    CAS  PubMed  Google Scholar 

  • Demeter J, Beauheim C, Gollub J et al (2007) The Stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35:D766–D770

    CAS  PubMed  Google Scholar 

  • Djebali S, Delaplace F, Crollius HR (2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biol 7(Suppl 1):S7–S10

    PubMed  Google Scholar 

  • Dondrup M, Goesmann A, Bartels D et al (2003) EMMA: a platform for consistent storage and efficient analysis of microarray data. J Biotechnol 106(2-3):135–146

    CAS  PubMed  Google Scholar 

  • Dondrup M, Albaum S, Griebel T et al (2009) EMMA 2 – A MAGE-compliant system for the collaborative analysis and integration of microarray data. BMC Bioinformatics 10(1):50

    PubMed  Google Scholar 

  • Dressman D, Yan H, Traverso G et al (2003) Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proc Natl Acad Sci U S A 100(15):8817–8822

    CAS  PubMed  Google Scholar 

  • Durbin R, Eddy S, Krogh A et al (1998) Biological sequence analysis. Cambridge University Press, Cambridge.

    Google Scholar 

  • Edwards RA, Rodriguez-Brito B, Wegley L et al (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7:57

    PubMed  Google Scholar 

  • Eisen MB, Spellman PT, Brown PO et al (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868

    CAS  PubMed  Google Scholar 

  • Elsik CG, Mackey AJ, Reese JT et al (2007) Creating a honey bee consensus gene set. Genome Biol 8(1):R13

    PubMed  Google Scholar 

  • Emanuelsson O, Nielsen H, von Heijne G (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8(5):978–984

    CAS  PubMed  Google Scholar 

  • Emanuelsson O, Brunak S, von Heijne G et al (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953–971

    CAS  PubMed  Google Scholar 

  • Ewing B, Hillier L, Wendl MC et al (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3):175–185

    CAS  PubMed  Google Scholar 

  • Fedurco M, Romieu A, Williams S et al (2006) BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res 34(3):e22

    PubMed  Google Scholar 

  • Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512

    CAS  PubMed  Google Scholar 

  • Flicek P, Aken BL, Beal K et al (2008) Ensembl 2008. Nucleic Acids Res 36:D707–D714

    CAS  PubMed  Google Scholar 

  • Florea L, Hartzell G, Zhang Z et al (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res 8(9):967–974

    CAS  PubMed  Google Scholar 

  • Gaasterland T, Sczyrba A, Thomas E et al (2000) MAGPIE/EGRET annotation of the 2.9-Mb Drosophila melanogaster Adh region. Genome Res 10:502–510

    CAS  PubMed  Google Scholar 

  • Gartemann KH, Abt B, Bekel T et al (2008) The genome sequence of the tomato-pathogenic actinomycete Clavibacter michiganensis subsp. michiganensis NCPPB382 reveals a large island involved in pathogenicity. J Bacteriol 190(6):2138–2149

    CAS  PubMed  Google Scholar 

  • Gentleman R, Huber W, Carev VJ (eds) (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York.

    Google Scholar 

  • Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80

    PubMed  Google Scholar 

  • Goesmann A, Linke B, Bartels D et al (2005) BRIGEP-the BRIDGE-based genome-transcriptome-proteome browser. Nucleic Acids Res 33:W710–W716

    CAS  PubMed  Google Scholar 

  • Goldberg SMD, Johnson J, Busam D et al (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 103(30):11240–11245

    CAS  PubMed  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    CAS  PubMed  Google Scholar 

  • Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8(3):195–202

    CAS  PubMed  Google Scholar 

  • Gordon D, Desmarais C, Green P (2001) Automated finishing with autofinish. Genome Res 11(4):614–625

    CAS  PubMed  Google Scholar 

  • Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10(22):7055–7074

    CAS  PubMed  Google Scholar 

  • Green P (2002) Whole-genome disassembly. Proc Natl Acad Sci U S A 99(7):4143–4144

    CAS  PubMed  Google Scholar 

  • Gresham D, Ruderfer DM, Pratt SC et al (2006) Genome-wide detection of polymorphisms at nucleotide resolution with a single DNA microarray. Science 311(5769):1932–1936

    CAS  PubMed  Google Scholar 

  • Gross SS, Brent MR (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393

    CAS  PubMed  Google Scholar 

  • Guigo R, Reese MG (2005) EGASP: collaboration through competition to find human genes. Nat Methods 2(8):575–577

    CAS  PubMed  Google Scholar 

  • Guigo R, Flicek P, Abril JF et al (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1):S2–S31

    PubMed  Google Scholar 

  • Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31(6):1780–1789

    CAS  PubMed  Google Scholar 

  • Haas BJ, Salzberg SL, Zhu W et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9(1):R7

    PubMed  Google Scholar 

  • Henrick K, Feng Z, Bluhm WF et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426–D433

    CAS  PubMed  Google Scholar 

  • Herring CD, Raghunathan A, Honisch C et al (2006) Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet 38(12):1406–1412

    CAS  PubMed  Google Scholar 

  • Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877

    CAS  PubMed  Google Scholar 

  • Huang X, Adams MD, Zhou H et al (1997) A tool for analyzing and annotating genomic sequences. Genomics 46(1):37–45

    CAS  PubMed  Google Scholar 

  • Iizuka M, Yamauchi M, Ando K et al (1994) Quantitative RT-PCR assay detecting the transcriptional induction of vascular endothelial growth factor under hypoxia. Biochem Biophys Res Commun 205(2):1474–1480

    CAS  PubMed  Google Scholar 

  • Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol 7:138–148

    Google Scholar 

  • Ju J, Kim DH, Bi L et al (2006) Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. Proc Natl Acad Sci U S A 103(52):19635–19640

    CAS  PubMed  Google Scholar 

  • Kaiser O, Bartels D, Bekel T et al (2003) Whole genome shotgun sequencing guided by bioinformatics pipelines-an optimized approach for an established technique. J Biotechnol 106(2–3):121–133

    CAS  PubMed  Google Scholar 

  • Kall L, Krogh A, Sonnhammer EL (2007) Advantages of combined transmembrane topology and signal peptide prediction-the Phobius web server. Nucleic Acids Res 35:W429–W432

    PubMed  Google Scholar 

  • Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    CAS  PubMed  Google Scholar 

  • Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12(4):656–664

    CAS  PubMed  Google Scholar 

  • Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59

    PubMed  Google Scholar 

  • Korf I, Flicek P, Duan D et al (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17(Suppl 1):S140–S148

    PubMed  Google Scholar 

  • Krause A, Ramakumar A, Bartels D et al (2006) Complete genome of the mutualistic, N2-fixing grass endophyte Azoarcus sp. strain BH72. Nat Biotechnol 24(11):1385–1391

    CAS  PubMed  Google Scholar 

  • Krause L, McHardy AC, Nattkemper TW et al (2007) GISMO-gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35(2):540–549

    CAS  PubMed  Google Scholar 

  • Krogh A, Larsson B, von Heijne G et al (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305(3):567–580

    CAS  PubMed  Google Scholar 

  • Küster H, Becker A, Firnhaber C et al (2007) Development of bioinformatic tools to support EST-sequencing, in silico- and microarray-based transcriptome profiling in mycorrhizal symbioses. Phytochemistry 68(1):19–32

    PubMed  Google Scholar 

  • Lafay B, Lloyd AT, McLean MJ et al (1999) Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases. Nucleic Acids Res 27(7):1642–1649

    CAS  PubMed  Google Scholar 

  • Lagesen K, Hallin P, Rodland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35(9):3100–3108

    CAS  PubMed  Google Scholar 

  • Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2(3):231–239

    CAS  PubMed  Google Scholar 

  • Larsen TS, Krogh A (2003) EasyGene-a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 4:21

    PubMed  Google Scholar 

  • Lawrence JG, Roth JR (1996) Selfish Operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143(4):1843–1860

    CAS  PubMed  Google Scholar 

  • Lee ML, Kuo FC, Whitmore GA et al (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A 97(18):9834–9839

    CAS  PubMed  Google Scholar 

  • Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98(1):31–36

    CAS  PubMed  Google Scholar 

  • Li SS, Bigler J, Lampe JW et al (2005) FDR-controlling testing procedures and sample size determination for microarrays. Stat Med 24(15):2267–2280

    PubMed  Google Scholar 

  • Lin M, Wei LJ, Sellers WR et al (2004) dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 20(8):1233–1240

    CAS  PubMed  Google Scholar 

  • Linke B, McHardy AC, Neuweger H et al (2006) REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes. Appl Bioinformatics 5(3):193–198

    CAS  PubMed  Google Scholar 

  • Liolios K, Mavromatis K, Tavernarakis N et al (2008) The genomes on line database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 36:D475–D479

    CAS  PubMed  Google Scholar 

  • Lipshutz RJ, Fodor SP, Gingeras TR et al (1999) High density synthetic oligonucleotide arrays. Nat Genet 21(1 Suppl):20–24

    CAS  PubMed  Google Scholar 

  • Lipshutz RJ, Morris D, Chee M et al (1995) Using oligonucleotide probe arrays to access genetic diversity. Biotechniques 19(3):442–447

    CAS  PubMed  Google Scholar 

  • Liu JJ, Cutler G, Li W et al (2005) Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21(11):2691–2697

    CAS  PubMed  Google Scholar 

  • Lomsadze A, Ter Hovhannisyan V, Chernoff YO et al (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 33(20):6494–6506

    CAS  PubMed  Google Scholar 

  • Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964

    CAS  PubMed  Google Scholar 

  • Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115

    CAS  PubMed  Google Scholar 

  • Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879

    CAS  PubMed  Google Scholar 

  • Majoros WH, Pertea M, Salzberg SL (2005) Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics 21(9):1782–1788

    CAS  PubMed  Google Scholar 

  • Mangalam H (2002) The Bio* toolkits-a brief overview. Brief Bioinform 3(3):296–302

    PubMed  Google Scholar 

  • Mao X, Cai T, Olyarchuk JG et al (2005) Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 21(19):3787–3793

    CAS  PubMed  Google Scholar 

  • Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402

    CAS  PubMed  Google Scholar 

  • Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057):376–380

    CAS  PubMed  Google Scholar 

  • Mathe C, Sagot MF, Schiex T et al (2002) Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 30(19):4103–4117

    CAS  PubMed  Google Scholar 

  • Matsumura H, Reich S, Ito A et al (2003) Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc Natl Acad Sci U S A 100(26):15718–15723

    CAS  PubMed  Google Scholar 

  • Maurer M, Molidor R, Sturn A et al (2005) MARS: microarray analysis, retrieval, and storage system. BMC Bioinformatics 6:101

    PubMed  Google Scholar 

  • McHardy AC, Pühler A, Kalinowski J et al (2004a) Comparing expression level-dependent features in codon usage with protein abundance: an analysis of 'predictive proteomics'. Proteomics 4(1):46–58

    CAS  PubMed  Google Scholar 

  • McHardy AC, Goesmann A, Pühler A et al (2004b) Development of joint application strategies for two microbial gene finders. Bioinformatics 20(10):1622–1631

    CAS  PubMed  Google Scholar 

  • Meyer F, Goesmann A, McHardy AC et al (2003) GenDB-an open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31(8):2187–2195

    CAS  PubMed  Google Scholar 

  • Millar CD, Huynen L, Subramanian S et al (2008) New developments in ancient genomics. Trends Ecol Evol 23(7):386–393

    PubMed  Google Scholar 

  • Miron M, Nadon R (2006) Inferential literacy for experimental high-throughput biology. Trends Genet 22(2):84–89

    CAS  PubMed  Google Scholar 

  • Moore JE, Lake JA (2003) Gene structure prediction in syntenic DNA segments. Nucleic Acids Res 31(24):7271–7279

    CAS  PubMed  Google Scholar 

  • Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci 13(4):477–478

    CAS  PubMed  Google Scholar 

  • Mulder NJ, Apweiler R, Attwood TK et al (2007) New developments in the InterPro database. Nucleic Acids Res 35:D224–D228

    CAS  PubMed  Google Scholar 

  • Nagaraj SH, Deshpande N, Gasser RB et al (2007) ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform. Nucleic Acids Res 35:W143–W147

    PubMed  Google Scholar 

  • Nakano M, Komatsu J, Matsuura S-i et al (2003) Single-molecule PCR using water-in-oil emulsion. J Biotechnol 102(2): 117–124

    Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

    CAS  PubMed  Google Scholar 

  • Nekrutenko A, Chung WY, Li WH (2003) ETOPE: evolutionary test of predicted exons. Nucleic Acids Res 31(13):3564–3567

    CAS  PubMed  Google Scholar 

  • Ng P, Wei C-L, Sung W-K et al (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2(2):105–111

    CAS  PubMed  Google Scholar 

  • Ng P, Tan JJS, Ooi HS et al (2006) Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res 34(12):e84

    PubMed  Google Scholar 

  • Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34(19):5623–5630

    CAS  PubMed  Google Scholar 

  • Ou HY, Guo FB, Zhang CT (2004) GS-Finder: a program to find bacterial gene start sites with a self-training method. Int J Biochem Cell Biol 36(3):535–544

    CAS  PubMed  Google Scholar 

  • Overbeek R, Disz T, Stevens R (2004) The SEED: a peer-to-peer environment for genome annotation. Commun ACM 47(11):47–51

    Google Scholar 

  • Overbeek R, Fonstein M, D‘Souza M et al (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901

    CAS  PubMed  Google Scholar 

  • Overbeek R, Larsen N, Pusch GD et al (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 28(1):123–125

    CAS  PubMed  Google Scholar 

  • Overbeek R, Larsen N, Walunas T et al (2003) The ERGO genome analysis and discovery system. Nucleic Acids Res 31:164–171

    CAS  PubMed  Google Scholar 

  • Overbeek R, Begley T, Butler RM et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1,000 genomes. Nucleic Acids Res 33(17):5691–5702

    CAS  PubMed  Google Scholar 

  • Page GP, Edwards JW, Gadbury GL et al (2006) The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics 7:84

    PubMed  Google Scholar 

  • Pan W, Lin J, Le CT (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. research0022.

    Google Scholar 

  • Parkinson H, Kapushesky M, Shojatalab M et al (2007) ArrayExpress-a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35:D747–D750

    CAS  PubMed  Google Scholar 

  • Parra G, Agarwal P, Abril JF et al (2003) Comparative gene prediction in human and mouse. Genome Res 13(1):108–117

    CAS  PubMed  Google Scholar 

  • Pavlidis P, Weston J, Cai J et al (2002) Learning gene functional classifications from multiple data types. J Comput Biol 9(2):401–411

    CAS  PubMed  Google Scholar 

  • Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85(8):2444–2448

    CAS  PubMed  Google Scholar 

  • Pertea G, Huang X, Liang F et al (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19(5):651–652

    CAS  PubMed  Google Scholar 

  • Pieler R, Sanchez-Cabo F, Hackl H et al (2004) ArrayNorm: comprehensive normalization and analysis of microarray data. Bioinformatics 20(12):1971–1973

    CAS  PubMed  Google Scholar 

  • Prober JM, Trainor GL, Dam RJ et al (1987) A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238(4825):336–341

    CAS  PubMed  Google Scholar 

  • Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65

    CAS  PubMed  Google Scholar 

  • Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427

    CAS  PubMed  Google Scholar 

  • Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496-501

    CAS  PubMed  Google Scholar 

  • Quackenbush J (2003) Genomics. Microarrays-guilt by association. Science 302(5643):240–241

    CAS  PubMed  Google Scholar 

  • Quevillon E, Silventoinen V, Pillai S et al (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33:W116–W1120

    CAS  PubMed  Google Scholar 

  • Rayner TF, Rocca-Serra P, Spellman PT et al (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7:489

    PubMed  Google Scholar 

  • Reeck GR, de Haen C, Teller DC et al (1987) Homology in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50(5):667

    CAS  PubMed  Google Scholar 

  • Reese MG, Kulp D, Tammana H et al (2000) Genie-gene finding in Drosophila melanogaster. Genome Res 10(4):529–538

    CAS  PubMed  Google Scholar 

  • Repsilber D, Ziegler A (2005) Two-color microarray experiments. Technology and sources of variance. Methods Inf Med 44(3):400–404

    CAS  PubMed  Google Scholar 

  • Ronaghi M, Uhlén M, Nyrén P (1998) A sequencing method based on real-time pyrophosphate. Science 281(5375):363–365

    CAS  PubMed  Google Scholar 

  • Rutherford K, Parkhill J, Crook J et al (2000) Artemis: sequence visualization and annotation. Bioinformatics 16(10):944–945

    CAS  PubMed  Google Scholar 

  • Saal LH, Troein C, Vallon-Christersson J et al (2002) BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 3(8): SOFTWARE0003.

    Google Scholar 

  • Saeed AI, Sharov V, White J et al (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34(2):374–378

    CAS  PubMed  Google Scholar 

  • Saha S, Sparks AB, Rago C et al (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512

    CAS  PubMed  Google Scholar 

  • Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4):516–522

    CAS  PubMed  Google Scholar 

  • Sanger F, Nicklen S, Coulson A (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463–5467

    CAS  PubMed  Google Scholar 

  • Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470

    CAS  PubMed  Google Scholar 

  • Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Computational Biology, selected papers from JOBIM'2000 number 2066 in LNCS, Springer Verlag, New York, pp. 111–125.

    Google Scholar 

  • Schneiker S, Martins dos Santos VA, Bartels D et al (2006) Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nat Biotechnol 24(8):997–1004

    CAS  PubMed  Google Scholar 

  • Schneiker S, Perlova O, Kaiser O et al (2007) Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol 25(11):1281–1289

    CAS  PubMed  Google Scholar 

  • Shendure J, Mitra RD, Varma C et al (2004) Advanced sequencing technologies: methods and goals. Nat Rev Genet 5(5):335–344

    CAS  PubMed  Google Scholar 

  • Shendure J, Porreca GJ, Reppas NB et al (2005) Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309(5741):1728–1732

    CAS  PubMed  Google Scholar 

  • Shendure JA, Porreca GJ, Church GM (2008) Overview of DNA sequencing strategies. Curr Protoc Mol Biol Chapter 7: Unit 7:1

    PubMed  Google Scholar 

  • Skovgaard M, Jensen LJ, Brunak S et al (2001) On the total number of genes and their length distribution in complete microbial genomes. Trends Genet 17(8):425–428

    CAS  PubMed  Google Scholar 

  • Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31

    PubMed  Google Scholar 

  • Smith MW, Feng DF, Doolittle RF (1992) Evolution by acquisition: the case for horizontal gene transfers. Trends Biochem Sci 17(12):489–493

    CAS  PubMed  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    CAS  PubMed  Google Scholar 

  • Sonnhammer EL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182

    CAS  PubMed  Google Scholar 

  • Spellman PT, Miller M, Stewart J et al (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9): RESEARCH0046.

    Google Scholar 

  • Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225

    PubMed  Google Scholar 

  • Stanke M, Tzvetkova A, Morgenstern B (2006) AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7(Suppl 1):S11–S18

    PubMed  Google Scholar 

  • Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18(1):207–208

    CAS  PubMed  Google Scholar 

  • Sugawara H, Ogasawara O, Okubo K et al (2008) DDBJ with new system and face. Nucleic Acids Res 36:D22–D24

    CAS  PubMed  Google Scholar 

  • Suzek BE, Ermolaeva MD, Schreiber M et al (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17(12):1123–1130

    CAS  PubMed  Google Scholar 

  • Tamames J, Casari G, Ouzounis C et al (1997) Conserved clusters of functionally related genes in two bacterial genomes. Mol Evol 44:66–73

    CAS  Google Scholar 

  • Tatsuov RL, Mushegian AR, Bork P et al (1996) Metabolism and evolution of Haemophilus influenza deduced from a whole-genome comparison with Escherichia coli. Curr Biol 6(3):279–291

    Google Scholar 

  • Tatusov RL, Fedorova ND, Jackson JD et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41

    PubMed  Google Scholar 

  • Team RDC (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

    Google Scholar 

  • Tech M, Meinicke P (2006) An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7:121

    PubMed  Google Scholar 

  • Thieme F, Koebnik R, Bekel T et al (2005) Insights into genome plasticity and pathogenicity of the plant pathogenic bacterium Xanthomonas campestris pv. vesicatoria revealed by the complete genome sequence. J Bacteriol 187(21):7254–7266

    CAS  PubMed  Google Scholar 

  • Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121

    CAS  PubMed  Google Scholar 

  • Usuka J, Zhu W, Brendel V (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16(3):203–211

    CAS  PubMed  Google Scholar 

  • van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16(5):678–685

    PubMed  Google Scholar 

  • Vapnik VN (1999) The nature of statistical learning theory. Springer, New York.

    Google Scholar 

  • Velculescu VE, Zhang L, Vogelstein B et al (1995) Serial analysis of gene expression. Science 270(5235):484–487

    CAS  PubMed  Google Scholar 

  • von Mering C, Jensen LJ, Snel B et al (2005) STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 33:433–437

    Google Scholar 

  • Vorhölter FJ, Schneiker S, Goesmann A et al (2008) The genome of Xanthomonas campestris pv. campestris B100 and its use for the reconstruction of metabolic pathways involved in xanthan biosynthesis. J Biotechnol 134(1–2):33–45

    PubMed  Google Scholar 

  • Wei C, Brent MR (2006) Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 7:327

    PubMed  Google Scholar 

  • Wilkinson MD, Links M (2002) BioMOBY: an open source biological web services proposal. Brief Bioinform 3(4):331–341

    PubMed  Google Scholar 

  • Wu J, Mao X, Cai T et al (2006) KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res 34:W720–W724

    CAS  PubMed  Google Scholar 

  • Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21(9):1859–1875

    CAS  PubMed  Google Scholar 

  • Wu W, Xing EP, Myers C et al (2005) Evaluation of normalization methods for cDNA microarray data by k-NN classification. BMC Bioinformatics 6:191

    PubMed  Google Scholar 

  • Yang YH, Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3(8):579–588

    CAS  PubMed  Google Scholar 

  • Yauk C, Berndt L, Williams A et al (2005) Automation of cDNA microarray hybridization and washing yields improved data quality. J Biochem Biophys Methods 64(1):69–75

    CAS  PubMed  Google Scholar 

  • Yauk CL, Berndt ML, Williams A et al (2004) Comprehensive comparison of six microarray technologies. Nucleic Acids Res 32(15):e124

    PubMed  Google Scholar 

  • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3(9):698–709

    CAS  PubMed  Google Scholar 

  • Zhang Z, Schwartz S, Wagner L et al (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7(1–2):203–214

    CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We are grateful to the CeBiTec at Bielefeld University, the BMBF Competence Network GenoMik-Plus (grant 0313805A), the International NRW Graduate School in Bioinformatics and Genome Research, the EU FP6 Network of Excellence Marine Genomics Europe (contract No. COGE-CT-2004-505403) and Nestlé Research Center for financial support of our work. Special thanks to our native speaker Sita Lange, the chapter would not have been the same without her efforts. The authors would also like to thank Guy Cochrane, Naryttza Diaz, Michele Magrane, Nicky Mulder, Kai Runte and Rafael Szczepanowski, who read sections of the chapter and provided valuable comments. Many thanks to our present and former colleagues from the Junior Group Computational Genomics for their patience during the writing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virginie Mittard-Runte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Mittard-Runte, V. et al. (2010). Practical Guide: Genomic Techniques and How to Apply Them to Marine Questions. In: Cock, J., Tessmar-Raible, K., Boyen, C., Viard, F. (eds) Introduction to Marine Genomics. Advances in Marine Genomics, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-8639-6_9

Download citation

Publish with us

Policies and ethics