Abstract
Annotation of protein coding genes in sequenced genomes has been routinely carried out using gene prediction programs guided by available transcript data. The advent of mass spectrometry has enabled the identification of proteins in a high-throughput manner. In addition to searching proteins annotated in public databases, mass spectrometry data can also be searched against conceptually translated genome as well as transcriptome to identify novel protein coding regions. This proteogenomics approach has resulted in the identification of novel protein coding regions in both prokaryotic and eukaryotic genomes. These studies have also revealed that some of the annotated noncoding RNAs and pseudogenes code for proteins. This approach is likely to become a part of most genome annotation workflows in the future. Here we describe a general methodology and approach that can be used for proteogenomics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291(5507):1304–1351
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(database issue):D61–65
Kersey PJ, Duarte J, Williams A et al (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4(7):1985–1988
UniProt: a hub for protein information (2015). Nucleic Acids Res 43(database issue):D204–D212
Gaudet P, Argoud-Puy G, Cusin I et al (2013) neXtProt: organizing protein knowledge in the context of human proteome projects. J Proteome Res 12(1):293–298
Brosch M, Saunders GI, Frankish A et al (2011) Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Res 21(5):756–767
Kumar D, Yadav AK, Kadimi PK et al (2013) Proteogenomic analysis of Bradyrhizobium japonicum USDA110 using GenoSuite, an automated multi-algorithmic pipeline. Mol Cell Proteomics 12(11):3388–3397
Gupta N, Benhamida J, Bhargava V et al (2008) Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 18(7):1133–1142
Castellana NE, Payne SH, Shen Z et al (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci U S A 105(52):21034–21038
Kelkar DS, Kumar D, Kumar P et al (2011) Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. Mol Cell Proteomics 10(12):M111. 011627
Prasad TS, Harsha HC, Keerthikumar S et al (2012) Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry. J Proteome Res 11(1):247–260
Nagarajha Selvan LD, Kaviyil JE, Nirujogi RS et al (2014) Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry. Clin Proteomics 11(1):5
Pawar H, Sahasrabuddhe NA, Renuse S et al (2012) A proteogenomic approach to map the proteome of an unsequenced pathogen—Leishmania donovani. Proteomics 12(6):832–844
Nirujogi RS, Pawar H, Renuse S et al (2014) Moving from unsequenced to sequenced genome: reanalysis of the proteome of Leishmania donovani. J Proteomics 97:48–61
Chaerkady R, Kelkar DS, Muthusamy B et al (2011) A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 21(11):1872–1881
Kelkar DS, Provost E, Chaerkady R et al (2014) Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis. Mol Cell Proteomics 13(11):3184–3198
Kim MS, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509(7502):575–581
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16(6):276–277
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214
Jeong K, Kim S, Bandeira N (2012) False discovery rates in spectral identification. BMC Bioinformatics 13:S2
Bonzon-Kulichenko E, Garcia-Marques F, Trevisan-Herraz M et al (2015) Revisiting peptide identification by high-accuracy mass spectrometry: problems associated with the use of narrow mass precursor windows. J Proteome Res 14(2):700–710
Acknowledgements
We thank the Department of Biotechnology (DBT), Government of India, for research support to the Institute of Bioinformatics. Keshava K. Datta is a recipient of Research Fellowship from the University Grants Commission (UGC), Government of India. Anil K. Madugundu is a recipient of BINC-Research Fellowship from DBT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Datta, K.K., Madugundu, A.K., Gowda, H. (2016). Proteogenomic Methods to Improve Genome Annotation. In: Sechi, S. (eds) Quantitative Proteomics by Mass Spectrometry. Methods in Molecular Biology, vol 1410. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3524-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3524-6_5
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3522-2
Online ISBN: 978-1-4939-3524-6
eBook Packages: Springer Protocols