Skip to main content

Genome Annotation

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

Abstract

The dynamic structure and functions of genomes are being revealed simultaneously with the progress of technologies and researches in genomics. Evidence indicating genome regional characteristics (genome annotations in a broad sense) provide the basis for further analyses. Target listing and screening can be effectively performed in silico using such data. Here, we describe steps to obtain publicly available genome annotations or to construct new annotations based on your own analyses, as well as an overview of the types of available genome annotations and corresponding resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65

    Article  Google Scholar 

  2. Li W, Manktelow E, von Kirchbach JC, Gog JR, Desselberger U, Lever AM (2010) Genomic analysis of codon, sequence and structural conservation with selective biochemical-structure mapping reveals highly conserved and dynamic structures in rotavirus RNAs with potential cis-acting functions. Nucleic Acids Res 38(21):7718–7735

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Kageyama Y, Kondo T, Hashimoto Y (2011) Coding vs non-coding: translatability of short ORFs found in putative non-coding transcripts. Biochimie 93(11):1981–1986

    Article  CAS  PubMed  Google Scholar 

  4. Abugessaisa I, Saevarsdottir S, Tsipras G, Lindblad S, Sandin C, Nikamo P et al (2014) Accelerating translational research by clinically driven development of an informatics platform—a case study. PLoS One 9(9):e104382

    Article  PubMed  PubMed Central  Google Scholar 

  5. Harbers M, Carninci P (2005) Tag-based approaches for transcriptome research and genome annotation. Nat Methods 2(7):495–502

    Article  CAS  PubMed  Google Scholar 

  6. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771

    Article  CAS  PubMed  Google Scholar 

  7. Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3(3):211–222

    Article  CAS  PubMed  Google Scholar 

  8. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26):15776–15781

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Forrest AR, Kawaji H, Rehli M et al (2014) A promoter-level mammalian expression atlas. Nature 507(7493):462–470

    Google Scholar 

  11. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507(7493):455–461

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lockhart DJ, Winzeler EA (2000) Genomics, gene expression and DNA arrays. Nature 405(6788):827–836

    Article  CAS  PubMed  Google Scholar 

  13. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D et al (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4):499–509

    Article  CAS  PubMed  Google Scholar 

  14. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G et al (2007) Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448(7153):553–560

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S et al (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22(9):1813–1831

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rhee HS, Pugh BF (2011) Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147(6):1408–1419

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ndlovu MN, Denis H, Fuks F (2011) Exposing the DNA methylome iceberg. Trends Biochem Sci 36(7):381–387

    CAS  PubMed  Google Scholar 

  18. Bannister AJ, Kouzarides T (2011) Regulation of chromatin by histone modifications. Cell Res 21(3):381–395

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Huebert DJ, Bernstein BE (2005) Genomic views of chromatin. Curr Opin Genet Dev 15(5):476–481

    Article  CAS  PubMed  Google Scholar 

  20. Lan X, Adams C, Landers M, Dudas M, Krissinger D, Marnellos G et al (2011) High resolution detection and analysis of CpG dinucleotides methylation using MBD-Seq technology. PLoS One 6(7):e22226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Aberg KA, McClay JL, Nerella S, Xie LY, Clark SL, Hudson AD et al (2012) MBD-seq as a cost-effective approach for methylome-wide association studies: demonstration in 1500 case–control samples. Epigenomics 4(6):605–621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M et al (2013) Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41(2):827–841

    Article  CAS  PubMed  Google Scholar 

  23. Li Y, Tollefsbol TO (2011) DNA methylation detection: bisulfite genomic sequencing analysis. Methods Mol Biol 791:11–21

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Portela A, Liz J, Nogales V, Setien F, Villanueva A, Esteller M (2013) DNA methylation determines nucleosome occupancy in the 5′-CpG islands of tumor suppressor genes. Oncogene 32(47):5421–5428

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Paulsen J, Rodland EA, Holden L, Holden M, Hovig E (2014) A statistical model of ChIA-PET data for accurate detection of chromatin 3D interactions. Nucleic Acids Res 42(18):e143

    Article  PubMed  PubMed Central  Google Scholar 

  27. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N et al (2005) The transcriptional landscape of the mammalian genome. Science 309(5740):1559–1563

    Article  CAS  PubMed  Google Scholar 

  28. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS et al (2004) Ultraconserved elements in the human genome. Science 304(5675):1321–1325

    Article  CAS  PubMed  Google Scholar 

  29. Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D (2003) Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A 100(20):11484–11489

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20(1):110–121

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Marigorta UM, Gibson G (2014) A simulation study of gene-by-environment interactions in GWAS implies ample hidden effects. Front Genet 5:225

    Article  PubMed  PubMed Central  Google Scholar 

  32. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D et al (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39(Database issue):D945–D950

    Article  CAS  PubMed  Google Scholar 

  33. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM et al (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42(Database issue):D980–D985

    Article  CAS  PubMed  Google Scholar 

  34. Kuehn BM (2008) 1000 Genomes Project promises closer look at variation in human genome. JAMA 300(23):2715

    Article  CAS  PubMed  Google Scholar 

  35. International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437(7063):1299–1320

    Article  Google Scholar 

  36. International HapMap Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA et al (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467(7311):52–58

    Article  Google Scholar 

  37. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74

    Article  Google Scholar 

  39. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A et al (2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28(10):1045–1048

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J et al (2011) International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database 2011:bar026

    Google Scholar 

  41. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA et al (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45(10):1113–1120

    Article  PubMed Central  Google Scholar 

  42. Rastogi A, Gupta D (2014) GFF-Ex: a genome feature extraction package. BMC Res Notes 7:315

    Google Scholar 

  43. Kuhn RM, Haussler D, Kent WJ (2013) The UCSC genome browser and associated tools. Brief Bioinform 14(2):144–161

    Article  CAS  PubMed  Google Scholar 

  44. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  47. Stalker J, Gibbins B, Meidl P, Smith J, Spooner W, Hotz HR et al (2004) The Ensembl Web site: mechanics of a genome browser. Genome Res 14(5):951–955

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Donlin MJ (2009) Using the Generic Genome Browser (GBrowse). Current protocols in bioinformatics/editoral board, Andreas D. Baxevanis [et al.] Chapter 9:Unit 9

    Google Scholar 

  49. Severin J, Lizio M, Harshbarger J, Kawaji H, Daub CO, Hayashizaki Y et al (2014) Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotechnol 32(3):217–219

    Article  CAS  PubMed  Google Scholar 

  50. Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2):178–192

    Article  CAS  PubMed  Google Scholar 

  51. Kasprzyk A (2011) BioMart: driving a paradigm change in biological data management. Database 2011:bar049

    Google Scholar 

  52. Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T et al (2014) Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 30(7):1003–1005

    Article  CAS  PubMed  Google Scholar 

  53. De Siervi A, De Luca P, Byun JS, Di LJ, Fufa T, Haggerty CM et al (2010) Transcriptional autoregulation by BRCA1. Cancer Res 70(2):532–542

    Article  PubMed  PubMed Central  Google Scholar 

  54. Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11(5):473–483

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T et al (2013) Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 9(11):e1003326

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by a Research Grant for the RIKEN Genome Exploration Research Project provided by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), a grant to the Genome Network Project from MEXT, a Research Grant from MEXT to the RIKEN Center for Life Science Technologies, a Research Grant to RIKEN Preventive Medicine and a Diagnosis Innovation Program from MEXT to Yoshihide Hayashizaki.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hideya Kawaji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Abugessaisa, I., Kasukawa, T., Kawaji, H. (2017). Genome Annotation. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6622-6_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6620-2

  • Online ISBN: 978-1-4939-6622-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics