Abstract
Cap analysis of gene expression (CAGE) is an approach to identify and monitor the activity (transcription initiation frequency) of transcription start sites (TSSs) at single base-pair resolution across the genome. It has been effectively used to identify active promoter and enhancer regions in cancer cells, with potential utility to identify key factors to immunotherapy. Here, we overview a series of CAGE protocols and describe detailed experimental steps of the latest protocol based on the Illumina sequencing platform; both experimental steps (see Subheadings 3.1–3.11) and computational processing steps (see Subheadings 3.12–3.20) are described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470
Adams MD, Kelley JM, Gocayne JD et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656
Velculescu VE, Zhang L, Vogelstein B et al (1995) Serial analysis of gene expression. Science 270:484–487
Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380. https://doi.org/10.1038/nature03959
Morin R, Bainbridge M, Fejes A et al (2008) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 45:81–94. https://doi.org/10.2144/000112900
Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628. https://doi.org/10.1038/nmeth.1226
Cloonan N, Forrest ARR, Kolle G et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619. https://doi.org/10.1038/nmeth.1223
Carninci P, Kasukawa T, Katayama S et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563. https://doi.org/10.1126/science.1112014
Kawamoto S, Yoshii J, Mizuno K et al (2000) BodyMap: a collection of 3′ ESTs for analysis of human gene expression information. Genome Res 10:1817–1827
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. https://doi.org/10.1038/nrg3068
Marioni JC, Mason CE, Mane SM et al (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517. https://doi.org/10.1101/gr.079558.108
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504. https://doi.org/10.1093/nar/gki025
Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22:1760–1774. https://doi.org/10.1101/gr.135350.111
Shiraki T, Kondo S, Katayama S et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100:15776–15781. https://doi.org/10.1073/pnas.2136655100
FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest ARR, Kawaji H et al (2014) A promoter-level mammalian expression atlas. Nature 507:462–470. https://doi.org/10.1038/nature13182
Andersson R, Gebhard C, Miguel-Escalada I et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461. https://doi.org/10.1038/nature12787
Arner E, Daub CO, Vitting-Seerup K et al (2015) Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347:1010–1014. https://doi.org/10.1126/science.1259418
Dunham I, Kundaje A, Aldred SF et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. https://doi.org/10.1038/nature11247
Carninci P, Sandelin A, Lenhard B et al (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635. https://doi.org/10.1038/ng1789
Valen E, Pascarella G, Chalk A et al (2009) Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res 19:255–265. https://doi.org/10.1101/gr.084541.108
Faulkner GJ, Kimura Y, Daub CO et al (2009) The regulated retrotransposon transcriptome of mammalian cells. Nat Genet 41:563–571. https://doi.org/10.1038/ng.368
Takahashi H, Lassmann T, Murata M et al (2012) 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc 7:542–561. https://doi.org/10.1038/nprot.2012.005
Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature 489:101–108. https://doi.org/10.1038/nature11233
Carninci P, Kvam C, Kitamura A et al (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37:327–336. https://doi.org/10.1006/geno.1996.0567
Kim T-K, Hemberg M, Gray JM et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187. https://doi.org/10.1038/nature09033
de Hoon M, Shin JW, Carninci P (2015) Paradigm shifts in genomics through the FANTOM projects. Mamm Genome 26:391–402. https://doi.org/10.1007/s00335-015-9593-8
Kodzius R, Kojima M, Nishiyori H et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222. https://doi.org/10.1038/nmeth0306-211
Ravasi T, Suzuki H, Cannistraci CV et al (2010) An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140:744–752. https://doi.org/10.1016/j.cell.2010.01.044
FANTOM Consortium, Suzuki H, Forrest ARR, van Nimwegen E et al (2009) The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 41:553–562. https://doi.org/10.1038/ng.375
Taft RJ, Glazov EA, Cloonan N et al (2009) Tiny RNAs associated with transcription start sites in animals. Nat Genet 41:572–578. https://doi.org/10.1038/ng.312
Plessy C, Bertin N, Takahashi H et al (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7:528–534. https://doi.org/10.1038/nmeth.1470
Zhu YY, Machleder EM, Chenchik A et al (2001) Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques 30:892–897
Ohtake H, Ohtoko K, Ishimaru Y et al (2004) Determination of the capped site sequence of mRNA based on the detection of cap-dependent nucleotide addition using an anchor ligation method. DNA Res 11:305–309
Harris TD, Buzby PR, Babcock H et al (2008) Single-molecule DNA sequencing of a viral genome. Science 320:106–109. https://doi.org/10.1126/science.1150427
Kawaji H, Lizio M, Itoh M et al (2014) Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res 24:708–717. https://doi.org/10.1101/gr.156232.113
Itoh M, Kojima M, Nagao-Sato S et al (2012) Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer. PLoS One 7:e30809. https://doi.org/10.1371/journal.pone.0030809
Kanamori-Katayama M, Itoh M, Kawaji H et al (2011) Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21:1150–1159. https://doi.org/10.1101/gr.115469.110
Murata M, Nishiyori-Sueki H, Kojima-Ishiyama M et al (2014) Detecting expressed genes using CAGE. Methods Mol Biol 1164:67–85. https://doi.org/10.1007/978-1-4939-0805-9_7
Hasegawa A, Daub C, Carninci P et al (2014) MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15:144. https://doi.org/10.1186/1471-2105-15-144
Gordon A, Hannon GJ (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit. Accessed 19 Jul 2019
Lassmann T (2015) TagDust2: a generic method to extract reads from sequencing data. BMC Bioinformatics 16:24. https://doi.org/10.1186/s12859-015-0454-y
FANTOM Consortium (2014) rRNAdust program. http://fantom.gsc.riken.jp/5/suppl/rRNAdust/. Accessed 19 Jul 2019
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111. https://doi.org/10.1093/bioinformatics/btp120
Dobin A, Gingeras TR (2015) Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics 51:11.14.1–11.14.19. https://doi.org/10.1002/0471250953.bi1114s51
Lassmann T (2011) DELVE: a probabilistic short read aligner used in FANTOM5 and ENCODE. http://fantom.gsc.riken.jp/5/suppl/delve/delve.tgz. Accessed 19 Jul 2019
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Quinlan AR (2014) BEDTools: The Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics 47:11.12.1–11.12.34. https://doi.org/10.1002/0471250953.bi1112s47
Robinson JT, Thorvaldsdóttir H, Winckler W et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26. https://doi.org/10.1038/nbt.1754
Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC genome browser database: 2015 update. Nucleic Acids Res 43:D670–D681. https://doi.org/10.1093/nar/gku1177
Kent WJ, Zweig AS, Barber G et al (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26:2204–2207. https://doi.org/10.1093/bioinformatics/btq351
UCSC Kent source utilities. http://hgdownload.soe.ucsc.edu/admin/exe/. Accessed 19 Jul 2019
Severin J, Lizio M, Harshbarger J et al (2014) Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotechnol 32:217–219. https://doi.org/10.1038/nbt.2840
Frith MC, Valen E, Krogh A et al (2008) A code for transcription initiation in mammalian genomes. Genome Res 18:1–12. https://doi.org/10.1101/gr.6831208
Fejes-Toth K, Sotirova V, Sachidanandam R et al (2009) Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457:1028–1032. https://doi.org/10.1038/nature07759
Hirzmann J, Luo D, Hahnen J et al (1993) Determination of messenger RNA 5′-ends by reverse transcription of the cap structure. Nucleic Acids Res 21:3597–3598
Ohmiya H, Vitezic M, Frith MC et al (2014) RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genomics 15:269. https://doi.org/10.1186/1471-2164-15-269
Haberle V, Forrest ARR, Hayashizaki Y et al (2015) CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res 43:e51. https://doi.org/10.1093/nar/gkv054
Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483–1492
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. https://doi.org/10.1093/bioinformatics/btp616
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. https://doi.org/10.1186/s13059-014-0550-8
Fort A, Hashimoto K, Yamada D et al (2014) Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet 46:558–566. https://doi.org/10.1038/ng.2965
Hashimoto K, Suzuki AM, Dos Santos A et al (2015) CAGE profiling of ncRNAs in hepatocellular carcinoma reveals widespread activation of retroviral LTR promoters in virus-induced tumors. Genome Res 25:1812–1824. https://doi.org/10.1101/gr.191031.115
Vitezic M, Lassmann T, Forrest ARR et al (2010) Building promoter aware transcriptional regulatory networks using siRNA perturbation and deepCAGE. Nucleic Acids Res 38:8141–8148. https://doi.org/10.1093/nar/gkq729
Lizio M, Harshbarger J, Shimoji H et al (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:22. https://doi.org/10.1186/s13059-014-0560-6
Takamochi K, Ohmiya H, Itoh M et al (2016) Novel biomarkers that assist in accurate discrimination of squamous cell carcinoma from adenocarcinoma of the lung. BMC Cancer 16(1):760
Yoshida E, Terao Y, Hayashi N et al (2017) Promoter-level transcriptome in primary lesions of endometrial cancer identified biomarkers associated with lymph node metastasis. Sci Rep 7(1):14160. https://doi.org/10.1038/s41598-017-14418-5
Sompallae R, Hofmann O, Maher CA et al (2013) A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells. Front Genet 4:209. https://doi.org/10.3389/fgene.2013.00209
Thorsen K, Schepeler T, Øster B et al (2011) Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics 12:505. https://doi.org/10.1186/1471-2164-12-505
Demircioğlu D, Kindermans M, Nandi T et al (2017) A pan cancer analysis of promoter activity highlights the regulatory role of alternative transcription start sites and their association with noncoding mutations. bioRxiv. https://doi.org/10.1101/176487
Dieudonné FX, O’Connor PB, Gubler-Jaquier P et al (2015) The effect of heterogeneous Transcription Start Sites (TSS) on the translatome: implications for the mammalian cellular phenotype. BMC Genomics 16:986. https://doi.org/10.1186/s12864-015-2179-8
Conte M, De Palma R, Altucci L (2018) HDAC inhibitors as epigenetic regulators for cancer immunotherapy. Int J Biochem Cell Biol 98:65–74. https://doi.org/10.1016/j.biocel.2018.03.004
Brocks D, Schmidt CR, Daskalakis M et al (2017) DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat Genet 49(7):1052–1060. https://doi.org/10.1038/ng.3889
Navada SC, Steinmann J, Lübbert M et al (2014) J Clin Invest 124(1):40–46. https://doi.org/10.1172/JCI69739
Pan T, Qi J, You T et al (2018) Addition of histone deacetylase inhibitors does not improve prognosis in patients with myelodysplastic syndrome and acute myeloid leukemia compared with hypomethylating agents alone: a systematic review and meta-analysis of seven prospective cohort studies. Leuk Res 71:13–24. https://doi.org/10.1016/j.leukres
Pleyer L, Greil R (2015) Digging deep into “dirty” drugs—modulation of the methylation machinery. Drug Metab Rev 47(2):252–279. https://doi.org/10.3109/03602532.2014.995379
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Morioka, M.S. et al. (2020). Cap Analysis of Gene Expression (CAGE): A Quantitative and Genome-Wide Assay of Transcription Start Sites. In: Boegel, S. (eds) Bioinformatics for Cancer Immunotherapy. Methods in Molecular Biology, vol 2120. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0327-7_20
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0327-7_20
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0326-0
Online ISBN: 978-1-0716-0327-7
eBook Packages: Springer Protocols