Advertisement

Cap Analysis of Gene Expression (CAGE): A Quantitative and Genome-Wide Assay of Transcription Start Sites

  • Masaki Suimye Morioka
  • Hideya Kawaji
  • Hiromi Nishiyori-Sueki
  • Mitsuyoshi Murata
  • Miki Kojima-Ishiyama
  • Piero Carninci
  • Masayoshi ItohEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 2120)

Abstract

Cap analysis of gene expression (CAGE) is an approach to identify and monitor the activity (transcription initiation frequency) of transcription start sites (TSSs) at single base-pair resolution across the genome. It has been effectively used to identify active promoter and enhancer regions in cancer cells, with potential utility to identify key factors to immunotherapy. Here, we overview a series of CAGE protocols and describe detailed experimental steps of the latest protocol based on the Illumina sequencing platform; both experimental steps (see Subheadings 3.13.11) and computational processing steps (see Subheadings 3.123.20) are described.

Key words

CAGE TSS Transcription start site Transcription initiation Promoter-level expression analysis Enhancer eRNA Promoter activity 

References

  1. 1.
    Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470CrossRefGoogle Scholar
  2. 2.
    Adams MD, Kelley JM, Gocayne JD et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656CrossRefGoogle Scholar
  3. 3.
    Velculescu VE, Zhang L, Vogelstein B et al (1995) Serial analysis of gene expression. Science 270:484–487CrossRefGoogle Scholar
  4. 4.
    Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380.  https://doi.org/10.1038/nature03959CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Morin R, Bainbridge M, Fejes A et al (2008) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 45:81–94.  https://doi.org/10.2144/000112900CrossRefPubMedGoogle Scholar
  6. 6.
    Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628.  https://doi.org/10.1038/nmeth.1226CrossRefPubMedGoogle Scholar
  7. 7.
    Cloonan N, Forrest ARR, Kolle G et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619.  https://doi.org/10.1038/nmeth.1223CrossRefPubMedGoogle Scholar
  8. 8.
    Carninci P, Kasukawa T, Katayama S et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563.  https://doi.org/10.1126/science.1112014CrossRefPubMedGoogle Scholar
  9. 9.
    Kawamoto S, Yoshii J, Mizuno K et al (2000) BodyMap: a collection of 3′ ESTs for analysis of human gene expression information. Genome Res 10:1817–1827CrossRefGoogle Scholar
  10. 10.
    Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682.  https://doi.org/10.1038/nrg3068CrossRefPubMedGoogle Scholar
  11. 11.
    Marioni JC, Mason CE, Mane SM et al (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517.  https://doi.org/10.1101/gr.079558.108CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504.  https://doi.org/10.1093/nar/gki025CrossRefPubMedGoogle Scholar
  13. 13.
    Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22:1760–1774.  https://doi.org/10.1101/gr.135350.111CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Shiraki T, Kondo S, Katayama S et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100:15776–15781.  https://doi.org/10.1073/pnas.2136655100CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest ARR, Kawaji H et al (2014) A promoter-level mammalian expression atlas. Nature 507:462–470.  https://doi.org/10.1038/nature13182CrossRefGoogle Scholar
  16. 16.
    Andersson R, Gebhard C, Miguel-Escalada I et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461.  https://doi.org/10.1038/nature12787CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Arner E, Daub CO, Vitting-Seerup K et al (2015) Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347:1010–1014.  https://doi.org/10.1126/science.1259418CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Dunham I, Kundaje A, Aldred SF et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74.  https://doi.org/10.1038/nature11247CrossRefGoogle Scholar
  19. 19.
    Carninci P, Sandelin A, Lenhard B et al (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635.  https://doi.org/10.1038/ng1789CrossRefPubMedGoogle Scholar
  20. 20.
    Valen E, Pascarella G, Chalk A et al (2009) Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res 19:255–265.  https://doi.org/10.1101/gr.084541.108CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Faulkner GJ, Kimura Y, Daub CO et al (2009) The regulated retrotransposon transcriptome of mammalian cells. Nat Genet 41:563–571.  https://doi.org/10.1038/ng.368CrossRefPubMedGoogle Scholar
  22. 22.
    Takahashi H, Lassmann T, Murata M et al (2012) 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc 7:542–561.  https://doi.org/10.1038/nprot.2012.005CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature 489:101–108.  https://doi.org/10.1038/nature11233CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Carninci P, Kvam C, Kitamura A et al (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37:327–336.  https://doi.org/10.1006/geno.1996.0567CrossRefPubMedGoogle Scholar
  25. 25.
    Kim T-K, Hemberg M, Gray JM et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187.  https://doi.org/10.1038/nature09033CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    de Hoon M, Shin JW, Carninci P (2015) Paradigm shifts in genomics through the FANTOM projects. Mamm Genome 26:391–402.  https://doi.org/10.1007/s00335-015-9593-8CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Kodzius R, Kojima M, Nishiyori H et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222.  https://doi.org/10.1038/nmeth0306-211CrossRefPubMedGoogle Scholar
  28. 28.
    Ravasi T, Suzuki H, Cannistraci CV et al (2010) An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140:744–752.  https://doi.org/10.1016/j.cell.2010.01.044CrossRefPubMedGoogle Scholar
  29. 29.
    FANTOM Consortium, Suzuki H, Forrest ARR, van Nimwegen E et al (2009) The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 41:553–562.  https://doi.org/10.1038/ng.375CrossRefPubMedCentralGoogle Scholar
  30. 30.
    Taft RJ, Glazov EA, Cloonan N et al (2009) Tiny RNAs associated with transcription start sites in animals. Nat Genet 41:572–578.  https://doi.org/10.1038/ng.312CrossRefPubMedGoogle Scholar
  31. 31.
    Plessy C, Bertin N, Takahashi H et al (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7:528–534.  https://doi.org/10.1038/nmeth.1470CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Zhu YY, Machleder EM, Chenchik A et al (2001) Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques 30:892–897CrossRefGoogle Scholar
  33. 33.
    Ohtake H, Ohtoko K, Ishimaru Y et al (2004) Determination of the capped site sequence of mRNA based on the detection of cap-dependent nucleotide addition using an anchor ligation method. DNA Res 11:305–309CrossRefGoogle Scholar
  34. 34.
    Harris TD, Buzby PR, Babcock H et al (2008) Single-molecule DNA sequencing of a viral genome. Science 320:106–109.  https://doi.org/10.1126/science.1150427CrossRefPubMedGoogle Scholar
  35. 35.
    Kawaji H, Lizio M, Itoh M et al (2014) Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res 24:708–717.  https://doi.org/10.1101/gr.156232.113CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Itoh M, Kojima M, Nagao-Sato S et al (2012) Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer. PLoS One 7:e30809.  https://doi.org/10.1371/journal.pone.0030809CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Kanamori-Katayama M, Itoh M, Kawaji H et al (2011) Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21:1150–1159.  https://doi.org/10.1101/gr.115469.110CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Murata M, Nishiyori-Sueki H, Kojima-Ishiyama M et al (2014) Detecting expressed genes using CAGE. Methods Mol Biol 1164:67–85.  https://doi.org/10.1007/978-1-4939-0805-9_7CrossRefPubMedGoogle Scholar
  39. 39.
    Hasegawa A, Daub C, Carninci P et al (2014) MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15:144.  https://doi.org/10.1186/1471-2105-15-144CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Gordon A, Hannon GJ (2010) Fastx-toolkit. FASTQ/A short-reads pre-processing tools. http://hannonlab.cshl.edu/fastx_toolkit. Accessed 19 Jul 2019
  41. 41.
    Lassmann T (2015) TagDust2: a generic method to extract reads from sequencing data. BMC Bioinformatics 16:24.  https://doi.org/10.1186/s12859-015-0454-yCrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    FANTOM Consortium (2014) rRNAdust program. http://fantom.gsc.riken.jp/5/suppl/rRNAdust/. Accessed 19 Jul 2019
  43. 43.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760.  https://doi.org/10.1093/bioinformatics/btp324CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111.  https://doi.org/10.1093/bioinformatics/btp120CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Dobin A, Gingeras TR (2015) Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics 51:11.14.1–11.14.19.  https://doi.org/10.1002/0471250953.bi1114s51CrossRefGoogle Scholar
  46. 46.
    Lassmann T (2011) DELVE: a probabilistic short read aligner used in FANTOM5 and ENCODE. http://fantom.gsc.riken.jp/5/suppl/delve/delve.tgz. Accessed 19 Jul 2019
  47. 47.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079.  https://doi.org/10.1093/bioinformatics/btp352CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Quinlan AR (2014) BEDTools: The Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics 47:11.12.1–11.12.34.  https://doi.org/10.1002/0471250953.bi1112s47CrossRefGoogle Scholar
  49. 49.
    Robinson JT, Thorvaldsdóttir H, Winckler W et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26.  https://doi.org/10.1038/nbt.1754CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC genome browser database: 2015 update. Nucleic Acids Res 43:D670–D681.  https://doi.org/10.1093/nar/gku1177CrossRefPubMedGoogle Scholar
  51. 51.
    Kent WJ, Zweig AS, Barber G et al (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26:2204–2207.  https://doi.org/10.1093/bioinformatics/btq351CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    UCSC Kent source utilities. http://hgdownload.soe.ucsc.edu/admin/exe/. Accessed 19 Jul 2019
  53. 53.
    Severin J, Lizio M, Harshbarger J et al (2014) Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotechnol 32:217–219.  https://doi.org/10.1038/nbt.2840CrossRefPubMedGoogle Scholar
  54. 54.
    Frith MC, Valen E, Krogh A et al (2008) A code for transcription initiation in mammalian genomes. Genome Res 18:1–12.  https://doi.org/10.1101/gr.6831208CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Fejes-Toth K, Sotirova V, Sachidanandam R et al (2009) Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457:1028–1032.  https://doi.org/10.1038/nature07759CrossRefPubMedCentralGoogle Scholar
  56. 56.
    Hirzmann J, Luo D, Hahnen J et al (1993) Determination of messenger RNA 5′-ends by reverse transcription of the cap structure. Nucleic Acids Res 21:3597–3598CrossRefGoogle Scholar
  57. 57.
    Ohmiya H, Vitezic M, Frith MC et al (2014) RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genomics 15:269.  https://doi.org/10.1186/1471-2164-15-269CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Haberle V, Forrest ARR, Hayashizaki Y et al (2015) CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res 43:e51.  https://doi.org/10.1093/nar/gkv054CrossRefPubMedPubMedCentralGoogle Scholar
  59. 59.
    Hyvärinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483–1492CrossRefGoogle Scholar
  60. 60.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140.  https://doi.org/10.1093/bioinformatics/btp616CrossRefPubMedGoogle Scholar
  61. 61.
    Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550.  https://doi.org/10.1186/s13059-014-0550-8CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Fort A, Hashimoto K, Yamada D et al (2014) Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet 46:558–566.  https://doi.org/10.1038/ng.2965CrossRefPubMedGoogle Scholar
  63. 63.
    Hashimoto K, Suzuki AM, Dos Santos A et al (2015) CAGE profiling of ncRNAs in hepatocellular carcinoma reveals widespread activation of retroviral LTR promoters in virus-induced tumors. Genome Res 25:1812–1824.  https://doi.org/10.1101/gr.191031.115CrossRefPubMedPubMedCentralGoogle Scholar
  64. 64.
    Vitezic M, Lassmann T, Forrest ARR et al (2010) Building promoter aware transcriptional regulatory networks using siRNA perturbation and deepCAGE. Nucleic Acids Res 38:8141–8148.  https://doi.org/10.1093/nar/gkq729CrossRefPubMedPubMedCentralGoogle Scholar
  65. 65.
    Lizio M, Harshbarger J, Shimoji H et al (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:22.  https://doi.org/10.1186/s13059-014-0560-6CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Takamochi K, Ohmiya H, Itoh M et al (2016) Novel biomarkers that assist in accurate discrimination of squamous cell carcinoma from adenocarcinoma of the lung. BMC Cancer 16(1):760CrossRefGoogle Scholar
  67. 67.
    Yoshida E, Terao Y, Hayashi N et al (2017) Promoter-level transcriptome in primary lesions of endometrial cancer identified biomarkers associated with lymph node metastasis. Sci Rep 7(1):14160.  https://doi.org/10.1038/s41598-017-14418-5CrossRefPubMedPubMedCentralGoogle Scholar
  68. 68.
    Sompallae R, Hofmann O, Maher CA et al (2013) A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells. Front Genet 4:209.  https://doi.org/10.3389/fgene.2013.00209CrossRefPubMedPubMedCentralGoogle Scholar
  69. 69.
    Thorsen K, Schepeler T, Øster B et al (2011) Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics 12:505.  https://doi.org/10.1186/1471-2164-12-505CrossRefPubMedPubMedCentralGoogle Scholar
  70. 70.
    Demircioğlu D, Kindermans M, Nandi T et al (2017) A pan cancer analysis of promoter activity highlights the regulatory role of alternative transcription start sites and their association with noncoding mutations. bioRxiv.  https://doi.org/10.1101/176487
  71. 71.
    Dieudonné FX, O’Connor PB, Gubler-Jaquier P et al (2015) The effect of heterogeneous Transcription Start Sites (TSS) on the translatome: implications for the mammalian cellular phenotype. BMC Genomics 16:986.  https://doi.org/10.1186/s12864-015-2179-8CrossRefPubMedPubMedCentralGoogle Scholar
  72. 72.
    Conte M, De Palma R, Altucci L (2018) HDAC inhibitors as epigenetic regulators for cancer immunotherapy. Int J Biochem Cell Biol 98:65–74.  https://doi.org/10.1016/j.biocel.2018.03.004CrossRefPubMedGoogle Scholar
  73. 73.
    Brocks D, Schmidt CR, Daskalakis M et al (2017) DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat Genet 49(7):1052–1060.  https://doi.org/10.1038/ng.3889CrossRefPubMedPubMedCentralGoogle Scholar
  74. 74.
    Navada SC, Steinmann J, Lübbert M et al (2014) J Clin Invest 124(1):40–46.  https://doi.org/10.1172/JCI69739CrossRefPubMedPubMedCentralGoogle Scholar
  75. 75.
    Pan T, Qi J, You T et al (2018) Addition of histone deacetylase inhibitors does not improve prognosis in patients with myelodysplastic syndrome and acute myeloid leukemia compared with hypomethylating agents alone: a systematic review and meta-analysis of seven prospective cohort studies. Leuk Res 71:13–24.  https://doi.org/10.1016/j.leukresCrossRefPubMedGoogle Scholar
  76. 76.
    Pleyer L, Greil R (2015) Digging deep into “dirty” drugs—modulation of the methylation machinery. Drug Metab Rev 47(2):252–279.  https://doi.org/10.3109/03602532.2014.995379CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Masaki Suimye Morioka
    • 1
  • Hideya Kawaji
    • 1
    • 2
    • 3
  • Hiromi Nishiyori-Sueki
    • 4
  • Mitsuyoshi Murata
    • 4
  • Miki Kojima-Ishiyama
    • 4
  • Piero Carninci
    • 4
  • Masayoshi Itoh
    • 2
    Email author
  1. 1.Preventive Medicine and Applied Genomics UnitRIKEN Center for Integrative Medical Sciences (IMS)YokohamaJapan
  2. 2.RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI)YokohamaJapan
  3. 3.Tokyo Metropolitan Institute of Medical ScienceTokyoJapan
  4. 4.Laboratory for Transcriptome TechnologyRIKEN Center for Integrative Medical Sciences (IMS)YokohamaJapan

Personalised recommendations