Publicly available sequence annotation data is a vital resource for researchers. Many types of information are available, including structural annotations (i.e., the locations and identities of genomic features) and functional annotations (e.g., gene expression and protein interactions). Annotation data is especially useful for interrogating Next-Gen sequencing data (e.g., identifying genomic features that are associated with mapped reads). Additionally, the vast amount of data that is available offers researchers the opportunity to mine existing data sets and make new discoveries. The ability to efficiently obtain, manipulate, and interrogate this data is a valuable and empowering skill. In this chapter, we introduce several primary data repositories and describe the most commonly encountered file formats. In order to highlight some of the key concepts, operations, and utilities that are involved in working with annotation data we provide a fully worked example of using annotations to answer some basic questions about a particular CHIP-seq data set.
Sequence annotation Bioinformatics BED format UCSC genome browser Genomic interval operations
This is a preview of subscription content, log in to check access.
Springer Nature is developing a new tool to find and evaluate Protocols. Learn more
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 26:841–842CrossRefGoogle Scholar
Tsirigos A, Haiminen N, Bilal E et al (2012) GenomicTools: a computational platform for developing high-throughput analytics in genomics. Bioinformatics (Oxford, England) 28:282–283CrossRefGoogle Scholar
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics (Oxford, England) 25:2078–2079CrossRefGoogle Scholar
Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995PubMedCrossRefGoogle Scholar
Andersson R, Enroth S, Rada-Iglesias A et al (2009) Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19:1732–1741PubMedCentralPubMedCrossRefGoogle Scholar
Galperin MY, Fernandez-Suarez XM (2011) The 2012 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res 40:D1–D8PubMedCentralPubMedCrossRefGoogle Scholar
ENCODE Project Consortium, Bernstein BE, Birney E et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74PubMedCrossRefGoogle Scholar