Proteogenomic Methods to Improve Genome Annotation

Datta, Keshava K.; Madugundu, Anil K.; Gowda, Harsha

doi:10.1007/978-1-4939-3524-6_5

Keshava K. Datta^3,4,
Anil K. Madugundu^3,5 &
Harsha Gowda^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1410))

3935 Accesses
6 Citations
2 Altmetric

Abstract

Annotation of protein coding genes in sequenced genomes has been routinely carried out using gene prediction programs guided by available transcript data. The advent of mass spectrometry has enabled the identification of proteins in a high-throughput manner. In addition to searching proteins annotated in public databases, mass spectrometry data can also be searched against conceptually translated genome as well as transcriptome to identify novel protein coding regions. This proteogenomics approach has resulted in the identification of novel protein coding regions in both prokaryotic and eukaryotic genomes. These studies have also revealed that some of the annotated noncoding RNAs and pseudogenes code for proteins. This approach is likely to become a part of most genome annotation workflows in the future. Here we describe a general methodology and approach that can be used for proteogenomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Article CAS PubMed Google Scholar
Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291(5507):1304–1351
Article CAS PubMed Google Scholar
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(database issue):D61–65
Article PubMed Central CAS PubMed Google Scholar
Kersey PJ, Duarte J, Williams A et al (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4(7):1985–1988
Article CAS PubMed Google Scholar
UniProt: a hub for protein information (2015). Nucleic Acids Res 43(database issue):D204–D212
Google Scholar
Gaudet P, Argoud-Puy G, Cusin I et al (2013) neXtProt: organizing protein knowledge in the context of human proteome projects. J Proteome Res 12(1):293–298
Article CAS PubMed Google Scholar
Brosch M, Saunders GI, Frankish A et al (2011) Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Res 21(5):756–767
Article PubMed Central CAS PubMed Google Scholar
Kumar D, Yadav AK, Kadimi PK et al (2013) Proteogenomic analysis of Bradyrhizobium japonicum USDA110 using GenoSuite, an automated multi-algorithmic pipeline. Mol Cell Proteomics 12(11):3388–3397
Article PubMed Central CAS PubMed Google Scholar
Gupta N, Benhamida J, Bhargava V et al (2008) Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 18(7):1133–1142
Article PubMed Central CAS PubMed Google Scholar
Castellana NE, Payne SH, Shen Z et al (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci U S A 105(52):21034–21038
Article PubMed Central CAS PubMed Google Scholar
Kelkar DS, Kumar D, Kumar P et al (2011) Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. Mol Cell Proteomics 10(12):M111. 011627
Article PubMed Central PubMed Google Scholar
Prasad TS, Harsha HC, Keerthikumar S et al (2012) Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry. J Proteome Res 11(1):247–260
Article CAS PubMed Google Scholar
Nagarajha Selvan LD, Kaviyil JE, Nirujogi RS et al (2014) Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry. Clin Proteomics 11(1):5
Article PubMed Central PubMed Google Scholar
Pawar H, Sahasrabuddhe NA, Renuse S et al (2012) A proteogenomic approach to map the proteome of an unsequenced pathogen—Leishmania donovani. Proteomics 12(6):832–844
Article CAS PubMed Google Scholar
Nirujogi RS, Pawar H, Renuse S et al (2014) Moving from unsequenced to sequenced genome: reanalysis of the proteome of Leishmania donovani. J Proteomics 97:48–61
Article PubMed Central CAS PubMed Google Scholar
Chaerkady R, Kelkar DS, Muthusamy B et al (2011) A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 21(11):1872–1881
Article PubMed Central CAS PubMed Google Scholar
Kelkar DS, Provost E, Chaerkady R et al (2014) Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis. Mol Cell Proteomics 13(11):3184–3198
Article PubMed Central CAS PubMed Google Scholar
Kim MS, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509(7502):575–581
Article PubMed Central CAS PubMed Google Scholar
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16(6):276–277
Article CAS PubMed Google Scholar
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214
Article CAS PubMed Google Scholar
Jeong K, Kim S, Bandeira N (2012) False discovery rates in spectral identification. BMC Bioinformatics 13:S2
Article PubMed Central CAS PubMed Google Scholar
Bonzon-Kulichenko E, Garcia-Marques F, Trevisan-Herraz M et al (2015) Revisiting peptide identification by high-accuracy mass spectrometry: problems associated with the use of narrow mass precursor windows. J Proteome Res 14(2):700–710
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the Department of Biotechnology (DBT), Government of India, for research support to the Institute of Bioinformatics. Keshava K. Datta is a recipient of Research Fellowship from the University Grants Commission (UGC), Government of India. Anil K. Madugundu is a recipient of BINC-Research Fellowship from DBT.

Author information

Authors and Affiliations

Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India
Keshava K. Datta, Anil K. Madugundu & Harsha Gowda
School of Biotechnology, KIIT University, Bhubaneswar, 751024, Odisha, India
Keshava K. Datta & Harsha Gowda
Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, 605014, India
Anil K. Madugundu

Authors

Keshava K. Datta
View author publications
You can also search for this author in PubMed Google Scholar
Anil K. Madugundu
View author publications
You can also search for this author in PubMed Google Scholar
Harsha Gowda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harsha Gowda .

Editor information

Editors and Affiliations

NIDDK, National Institutes of Health, BETHESDA, Maryland, USA
Salvatore Sechi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Datta, K.K., Madugundu, A.K., Gowda, H. (2016). Proteogenomic Methods to Improve Genome Annotation. In: Sechi, S. (eds) Quantitative Proteomics by Mass Spectrometry. Methods in Molecular Biology, vol 1410. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3524-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3524-6_5
Published: 12 February 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3522-2
Online ISBN: 978-1-4939-3524-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics