Skip to main content
Log in

Improving mRNA 5′ coding sequence determination in the mouse genome

  • Published:
Mammalian Genome Aims and scope Submit manuscript

Abstract

The incomplete determination of the mRNA 5′ end sequence may lead to the incorrect assignment of the first AUG codon and to errors in the prediction of the encoded protein product. Due to the significance of the mouse as a model organism in biomedical research, we performed a systematic identification of coding regions at the 5′ end of all known mouse mRNAs, using an automated expressed sequence tag (EST)-based approach which we have previously described. By parsing almost 4 million BLAT alignments we found 351 mouse loci, out of 20,221 analyzed, in which an extension of the mRNA 5′ coding region was identified. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for Apc2 and Mknk2 cDNAs. We also generated a list of 16,330 mouse mRNAs where the presence of an in-frame stop codon upstream of the known start codon indicates completeness of the coding sequence at 5′ end in the current form. Systematic searches in the main mouse genome databases and genome browsers showed that 82 % of our results are original and have not been identified by their annotation pipelines. Moreover, the same information is not easily derivable from RNA-Seq data, due to short sequence length and laboriousness in building full-length transcript structures. In conclusion, our results improve the determination of full-length 5′ coding sequences and might be useful in order to reduce errors when studying mouse gene structure and function in biomedical research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie R, Venter C (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656

    Article  CAS  PubMed  Google Scholar 

  • Baxevanis AD (2004) An overview of gene identification: approaches, strategies, and considerations. Curr Protoc Bioinformatics 4(4):1

    PubMed  Google Scholar 

  • Bazykin GA, Kochetov AV (2011) Alternative translation start sites are conserved in eukaryotic genomes. Nucleic Acids Res 39:567–577

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Boguski MS, Lowe TM, Tolstoshev CM (1993) dbEST-database for “expressed sequence tags”. Nat Genet 4:332–333

    Article  CAS  PubMed  Google Scholar 

  • Brent MR (2005) Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res 15:1777–1786

    Article  CAS  PubMed  Google Scholar 

  • Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563

    Article  CAS  PubMed  Google Scholar 

  • Casadei R, Strippoli P, D’Addabbo P, Canaider S, Lenzi L, Vitale L, Giannone S, Frabetti F, Facchin F, Carinci P, Zannotti M (2003) mRNA 5′ region sequence incompleteness: a potential source of systematic errors in translation initiation codon assignment in human mRNAs. Gene 321:185–193

    Article  CAS  PubMed  Google Scholar 

  • Casadei R, Piovesan A, Vitale L, Facchin F, Pelleri MC, Canaider S, Bianconi E, Frabetti F, Strippoli P (2012) Genome-scale analysis of human mRNA 5′ coding sequences based on expressed sequence tag (EST) database. Genomics 100:125–130

    Article  CAS  PubMed  Google Scholar 

  • Davis LG, Kuehl WM, Battey JF (1994) Basic methods in molecular biology. Appleton & Lange, Norwalk

    Google Scholar 

  • ENCODE Project Consortium, Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE, Gingeras TR, Kent WJ, Birney E, Wold B, Crawford GE, Bernstein BE, Epstein CB, Shoresh N et al (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e100104

    Google Scholar 

  • Engels WR (1993) Contributing software to the internet: the Amplify program. Trends Biochem Sci 18:448–450

    Article  CAS  PubMed  Google Scholar 

  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, McKenney K, Sutton G, FitzHugh W, Fields C, Gocayne JD et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 69:496–512

    Article  Google Scholar 

  • Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T et al (2013) Ensembl 2013. Nucleic Acids Res 41:D48–D55

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Frabetti F, Casadei R, Lenzi L, Canaider S, Vitale L, Facchin F, Carinci P, Zannotti M, Strippoli P (2007) Systematic analysis of mRNA 5′ coding sequence incompleteness in Danio rerio: an automated EST-based approach. Biol Direct 2:34

    Article  PubMed Central  PubMed  Google Scholar 

  • Gharib WH, Robinson-Rechavi M (2011) When orthologs diverge between human and mouse. Brief Bioinform 12:436–441

    Article  PubMed Central  PubMed  Google Scholar 

  • Goecks J, Nekrutenko A, Taylor J, The Galaxy Team (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86

    Article  PubMed Central  PubMed  Google Scholar 

  • Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J, Pruitt KD (2012) Tracking and coordinating an international curation effort for the CCDS Project. Database (Oxford) 2012:bas008

    Article  Google Scholar 

  • Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, Hayashizaki Y, Carninci P (2006) CAGE: cap analysis of gene expression. Nat Methods 3:211–222

    Article  CAS  PubMed  Google Scholar 

  • Kozak M (2002) Pushing the limits of the scanning mechanism for initiation of translation. Gene 99:1–34

    Article  Google Scholar 

  • Lenzi L, Frabetti F, Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M, Strippoli P (2006) UniGene Tabulator: a full parser for the UniGene format. Bioinformatics 22:2570–2571

    Article  CAS  PubMed  Google Scholar 

  • Lenzi L, Facchin F, Piva F, Giulietti M, Pelleri MC, Frabetti F, Vitale L, Casadei R, Canaider S, Bortoluzzi S, Coppe A, Danieli GA, Principato G, Ferrari S, Strippoli P (2011) TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources. BMC Genomics 12:121

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li Q, Ownby CL (1993) A rapid method for extraction of DNA from agarose gels using a syringe. Biotechniques 15:976–978

    CAS  PubMed  Google Scholar 

  • Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46

    Article  CAS  PubMed  Google Scholar 

  • Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT et al (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41:D64–D69

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • MGC Project Team (2009) The completion of the Mammalian Gene Collection (MGC). Genome Res 19:2324–2333

    Article  Google Scholar 

  • Mouse ENCODE Consortium, Stamatoyannopoulos JA, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert DM, Groudine M, Bender M, Kaul R (2012) An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 13:418

    Article  PubMed Central  PubMed  Google Scholar 

  • Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562

    Article  CAS  PubMed  Google Scholar 

  • NCBI Resource Coordinators (2013) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 41:D8–D20

    Article  PubMed Central  Google Scholar 

  • Porcel BM, Delfour O, Castelli V, De Berardinis V, Friedlander L, Cruaud C, Ureta-Vidal A, Scarpelli C, Wincker P, Schächter V, Saurin W, Gyapay G, Salanoubat M, Weissenbach J (2004) Numerous novel annotations of the human genome sequence supported by a 5′-end-enriched cDNA collection. Genome Res 14:463–471

    Article  PubMed Central  PubMed  Google Scholar 

  • Sambrook J, Russell DW (2001) Rapid amplification of 5′ cDNA ends. In: Sambrook J, Russell DW (eds) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 8.54–8.60

    Google Scholar 

  • Shintani T, Takeuchi Y, Fujikawa A, Noda M (2012) Directional neuronal migration is impaired in mice lacking adenomatous polyposis coli 2. J Neurosci 32:6468–6484

    Article  CAS  PubMed  Google Scholar 

  • Suzuki Y, Ishihara D, Sasaki M, Nakagawa H, Hata H, Tsunoda T, Watanabe M, Komatsu T, Ota T, Isogai T, Suyama A, Sugano S (2000) Statistical analysis of the 5′ untranslated region of human mRNA using “Oligo-Capped” cDNA libraries. Genomics 64:286–297

    Article  CAS  PubMed  Google Scholar 

  • Ueda T, Watanabe-Fukunaga R, Fukuyama H, Nagata S, Fukunaga R (2004) Mnk2 and Mnk1 are essential for constitutive and inducible phosphorylation of eukaryotic initiation factor 4E but not for cell growth or development. Mol Cell Biol 24:6539–6549

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • van Es JH, Kirkpatrick C, van de Wetering M, Molenaar M, Miles A, Kuipers J, Destrée O, Peifer M, Clevers H (1999) Identification of APC2, a homologue of the adenomatous polyposis coli tumour suppressor. Curr Biol 9:105–108

    Article  PubMed  Google Scholar 

  • Waskiewicz AJ, Flynn A, Proud CG, Cooper JA (1997) Mitogen-activated protein kinases activate the serine/threonine kinases Mnk1 and Mnk2. EMBO J 16:1909–1920

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Watahiki A, Waki K, Hayatsu N, Shiraki T, Kondo S, Nakamura M, Sasaki D, Arakawa T, Kawai J, Harbers M, Hayashizaki Y, Carninci P (2004) Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nat Methods 1:233–239

    Article  PubMed  Google Scholar 

  • Yalcin B, Adams DJ, Flint J, Keane TM (2012) Next-generation sequencing of experimental mouse strains. Mamm Genome 23:490–498

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

This work was funded by “RFO” Grants from Alma Mater Studiorum—University of Bologna to P.S. and L.V. M.C.’s fellowship is supported by a generous donation made by Illumia, Bologna, Italy. The 5′_ORF_Extender software was executed on the Apple Mac Pro “Multiprocessor Server” available at the Center for Research in Molecular Genetics “Fondazione CARISBO”, Bologna, and funded by “Fondazione CARISBO”. We are grateful to Gabriella Mattei and Michela Bonaguro for their excellent technical assistance with cDNA sequencing. We are grateful to Danielle Mitzman for her expert revision of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierluigi Strippoli.

Additional information

Nucleotide sequence data reported are available in the DDBJ/EMBL/GenBank databases under the accession numbers KF481611 and KF612275.

Allison Piovesan and Maria Caracausi contributed equally to the work.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piovesan, A., Caracausi, M., Pelleri, M.C. et al. Improving mRNA 5′ coding sequence determination in the mouse genome. Mamm Genome 25, 149–159 (2014). https://doi.org/10.1007/s00335-013-9498-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-013-9498-3

Keywords

Navigation