Cap Analysis of Gene Expression (CAGE): A Quantitative and Genome-Wide Assay of Transcription Start Sites

  • Masaki Suimye Morioka
  • Hideya Kawaji
  • Hiromi Nishiyori-Sueki
  • Mitsuyoshi Murata
  • Miki Kojima-Ishiyama
  • Piero Carninci
  • Masayoshi ItohEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 2120)


Cap analysis of gene expression (CAGE) is an approach to identify and monitor the activity (transcription initiation frequency) of transcription start sites (TSSs) at single base-pair resolution across the genome. It has been effectively used to identify active promoter and enhancer regions in cancer cells, with potential utility to identify key factors to immunotherapy. Here, we overview a series of CAGE protocols and describe detailed experimental steps of the latest protocol based on the Illumina sequencing platform; both experimental steps (see Subheadings 3.13.11) and computational processing steps (see Subheadings 3.123.20) are described.

Key words

CAGE TSS Transcription start site Transcription initiation Promoter-level expression analysis Enhancer eRNA Promoter activity 


  1. 1.
    Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470CrossRefGoogle Scholar
  2. 2.
    Adams MD, Kelley JM, Gocayne JD et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656CrossRefGoogle Scholar
  3. 3.
    Velculescu VE, Zhang L, Vogelstein B et al (1995) Serial analysis of gene expression. Science 270:484–487CrossRefGoogle Scholar
  4. 4.
    Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380. Scholar
  5. 5.
    Morin R, Bainbridge M, Fejes A et al (2008) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 45:81–94. Scholar
  6. 6.
    Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628. Scholar
  7. 7.
    Cloonan N, Forrest ARR, Kolle G et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619. Scholar
  8. 8.
    Carninci P, Kasukawa T, Katayama S et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563. Scholar
  9. 9.
    Kawamoto S, Yoshii J, Mizuno K et al (2000) BodyMap: a collection of 3′ ESTs for analysis of human gene expression information. Genome Res 10:1817–1827CrossRefGoogle Scholar
  10. 10.
    Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. Scholar
  11. 11.
    Marioni JC, Mason CE, Mane SM et al (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517. Scholar
  12. 12.
    Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504. Scholar
  13. 13.
    Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22:1760–1774. Scholar
  14. 14.
    Shiraki T, Kondo S, Katayama S et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100:15776–15781. Scholar
  15. 15.
    FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest ARR, Kawaji H et al (2014) A promoter-level mammalian expression atlas. Nature 507:462–470. Scholar
  16. 16.
    Andersson R, Gebhard C, Miguel-Escalada I et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461. Scholar
  17. 17.
    Arner E, Daub CO, Vitting-Seerup K et al (2015) Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347:1010–1014. Scholar
  18. 18.
    Dunham I, Kundaje A, Aldred SF et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. Scholar
  19. 19.
    Carninci P, Sandelin A, Lenhard B et al (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635. Scholar
  20. 20.
    Valen E, Pascarella G, Chalk A et al (2009) Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res 19:255–265. Scholar
  21. 21.
    Faulkner GJ, Kimura Y, Daub CO et al (2009) The regulated retrotransposon transcriptome of mammalian cells. Nat Genet 41:563–571. Scholar
  22. 22.
    Takahashi H, Lassmann T, Murata M et al (2012) 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc 7:542–561. Scholar
  23. 23.
    Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature 489:101–108. Scholar
  24. 24.
    Carninci P, Kvam C, Kitamura A et al (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37:327–336. Scholar
  25. 25.
    Kim T-K, Hemberg M, Gray JM et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187. Scholar
  26. 26.
    de Hoon M, Shin JW, Carninci P (2015) Paradigm shifts in genomics through the FANTOM projects. Mamm Genome 26:391–402. Scholar
  27. 27.
    Kodzius R, Kojima M, Nishiyori H et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222. Scholar
  28. 28.
    Ravasi T, Suzuki H, Cannistraci CV et al (2010) An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140:744–752. Scholar
  29. 29.
    FANTOM Consortium, Suzuki H, Forrest ARR, van Nimwegen E et al (2009) The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 41:553–562. Scholar
  30. 30.
    Taft RJ, Glazov EA, Cloonan N et al (2009) Tiny RNAs associated with transcription start sites in animals. Nat Genet 41:572–578. Scholar
  31. 31.
    Plessy C, Bertin N, Takahashi H et al (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7:528–534. Scholar
  32. 32.
    Zhu YY, Machleder EM, Chenchik A et al (2001) Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques 30:892–897CrossRefGoogle Scholar
  33. 33.
    Ohtake H, Ohtoko K, Ishimaru Y et al (2004) Determination of the capped site sequence of mRNA based on the detection of cap-dependent nucleotide addition using an anchor ligation method. DNA Res 11:305–309CrossRefGoogle Scholar
  34. 34.
    Harris TD, Buzby PR, Babcock H et al (2008) Single-molecule DNA sequencing of a viral genome. Science 320:106–109. Scholar
  35. 35.
    Kawaji H, Lizio M, Itoh M et al (2014) Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res 24:708–717. Scholar
  36. 36.
    Itoh M, Kojima M, Nagao-Sato S et al (2012) Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer. PLoS One 7:e30809. Scholar
  37. 37.
    Kanamori-Katayama M, Itoh M, Kawaji H et al (2011) Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21:1150–1159. Scholar
  38. 38.
    Murata M, Nishiyori-Sueki H, Kojima-Ishiyama M et al (2014) Detecting expressed genes using CAGE. Methods Mol Biol 1164:67–85. Scholar
  39. 39.
    Hasegawa A, Daub C, Carninci P et al (2014) MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15:144. Scholar
  40. 40.
    Gordon A, Hannon GJ (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools. Accessed 19 Jul 2019
  41. 41.
    Lassmann T (2015) TagDust2: a generic method to extract reads from sequencing data. BMC Bioinformatics 16:24. Scholar
  42. 42.
    FANTOM Consortium (2014) rRNAdust program. Accessed 19 Jul 2019
  43. 43.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. Scholar
  44. 44.
    Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111. Scholar
  45. 45.
    Dobin A, Gingeras TR (2015) Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics 51:11.14.1–11.14.19. Scholar
  46. 46.
    Lassmann T (2011) DELVE: a probabilistic short read aligner used in FANTOM5 and ENCODE. Accessed 19 Jul 2019
  47. 47.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. Scholar
  48. 48.
    Quinlan AR (2014) BEDTools: The Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics 47:11.12.1–11.12.34. Scholar
  49. 49.
    Robinson JT, Thorvaldsdóttir H, Winckler W et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26. Scholar
  50. 50.
    Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC genome browser database: 2015 update. Nucleic Acids Res 43:D670–D681. Scholar
  51. 51.
    Kent WJ, Zweig AS, Barber G et al (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26:2204–2207. Scholar
  52. 52.
    UCSC Kent source utilities. Accessed 19 Jul 2019
  53. 53.
    Severin J, Lizio M, Harshbarger J et al (2014) Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotechnol 32:217–219. Scholar
  54. 54.
    Frith MC, Valen E, Krogh A et al (2008) A code for transcription initiation in mammalian genomes. Genome Res 18:1–12. Scholar
  55. 55.
    Fejes-Toth K, Sotirova V, Sachidanandam R et al (2009) Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457:1028–1032. Scholar
  56. 56.
    Hirzmann J, Luo D, Hahnen J et al (1993) Determination of messenger RNA 5′-ends by reverse transcription of the cap structure. Nucleic Acids Res 21:3597–3598CrossRefGoogle Scholar
  57. 57.
    Ohmiya H, Vitezic M, Frith MC et al (2014) RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genomics 15:269. Scholar
  58. 58.
    Haberle V, Forrest ARR, Hayashizaki Y et al (2015) CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res 43:e51. Scholar
  59. 59.
    Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483–1492CrossRefGoogle Scholar
  60. 60.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. Scholar
  61. 61.
    Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. Scholar
  62. 62.
    Fort A, Hashimoto K, Yamada D et al (2014) Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet 46:558–566. Scholar
  63. 63.
    Hashimoto K, Suzuki AM, Dos Santos A et al (2015) CAGE profiling of ncRNAs in hepatocellular carcinoma reveals widespread activation of retroviral LTR promoters in virus-induced tumors. Genome Res 25:1812–1824. Scholar
  64. 64.
    Vitezic M, Lassmann T, Forrest ARR et al (2010) Building promoter aware transcriptional regulatory networks using siRNA perturbation and deepCAGE. Nucleic Acids Res 38:8141–8148. Scholar
  65. 65.
    Lizio M, Harshbarger J, Shimoji H et al (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:22. Scholar
  66. 66.
    Takamochi K, Ohmiya H, Itoh M et al (2016) Novel biomarkers that assist in accurate discrimination of squamous cell carcinoma from adenocarcinoma of the lung. BMC Cancer 16(1):760CrossRefGoogle Scholar
  67. 67.
    Yoshida E, Terao Y, Hayashi N et al (2017) Promoter-level transcriptome in primary lesions of endometrial cancer identified biomarkers associated with lymph node metastasis. Sci Rep 7(1):14160. Scholar
  68. 68.
    Sompallae R, Hofmann O, Maher CA et al (2013) A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells. Front Genet 4:209. Scholar
  69. 69.
    Thorsen K, Schepeler T, Øster B et al (2011) Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics 12:505. Scholar
  70. 70.
    Demircioğlu D, Kindermans M, Nandi T et al (2017) A pan cancer analysis of promoter activity highlights the regulatory role of alternative transcription start sites and their association with noncoding mutations. bioRxiv.
  71. 71.
    Dieudonné FX, O’Connor PB, Gubler-Jaquier P et al (2015) The effect of heterogeneous Transcription Start Sites (TSS) on the translatome: implications for the mammalian cellular phenotype. BMC Genomics 16:986. Scholar
  72. 72.
    Conte M, De Palma R, Altucci L (2018) HDAC inhibitors as epigenetic regulators for cancer immunotherapy. Int J Biochem Cell Biol 98:65–74. Scholar
  73. 73.
    Brocks D, Schmidt CR, Daskalakis M et al (2017) DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat Genet 49(7):1052–1060. Scholar
  74. 74.
    Navada SC, Steinmann J, Lübbert M et al (2014) J Clin Invest 124(1):40–46. Scholar
  75. 75.
    Pan T, Qi J, You T et al (2018) Addition of histone deacetylase inhibitors does not improve prognosis in patients with myelodysplastic syndrome and acute myeloid leukemia compared with hypomethylating agents alone: a systematic review and meta-analysis of seven prospective cohort studies. Leuk Res 71:13–24. Scholar
  76. 76.
    Pleyer L, Greil R (2015) Digging deep into “dirty” drugs—modulation of the methylation machinery. Drug Metab Rev 47(2):252–279. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Masaki Suimye Morioka
    • 1
  • Hideya Kawaji
    • 1
    • 2
    • 3
  • Hiromi Nishiyori-Sueki
    • 4
  • Mitsuyoshi Murata
    • 4
  • Miki Kojima-Ishiyama
    • 4
  • Piero Carninci
    • 4
  • Masayoshi Itoh
    • 2
    Email author
  1. 1.Preventive Medicine and Applied Genomics UnitRIKEN Center for Integrative Medical Sciences (IMS)YokohamaJapan
  2. 2.RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI)YokohamaJapan
  3. 3.Tokyo Metropolitan Institute of Medical ScienceTokyoJapan
  4. 4.Laboratory for Transcriptome TechnologyRIKEN Center for Integrative Medical Sciences (IMS)YokohamaJapan

Personalised recommendations