Abstract
The integration of genomics and proteomics has led to the emergence of proteogenomics, a field of research successfully applied to the characterization of cancer samples. The diagnosis, prognosis and response to therapy of cancer patients will largely benefit from the identification of mutations present in their genome. The current state of the art of high throughput experiments for genome-wide detection of somatic mutations in cancer samples has allowed the development of projects such as the TCGA, in which hundreds of cancer genomes have been sequenced. This huge amount of data can be used to generate protein sequence databases in which each entry corresponds to a mutated peptide associated with certain cancer types. In this chapter, we describe a bioinformatics workflow for creating these databases and detecting mutated peptides in cancer samples from proteomic shotgun experiments. The performance of the proposed method has been evaluated using publicly available datasets from four cancer cell lines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adzhubei, I. A., Schmidt, S., Peshkin, L., et al. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7(4), 248–249.
Alfaro, J. A., Sinha, A., Kislinger, T., et al. (2014). Onco-proteogenomics: Cancer proteomics joins forces with genomics. Nature Methods, 11(11), 1107–1113.
Ansong, C., Purvine, S. O., Adkins, J. N., et al. (2008). Proteogenomics: Needs and roles to be filled by proteomics in genome annotation. Briefings in Functional Genomics & Proteomics, 7(1), 50–62.
Chin, L., Andersen, J. N., & Futreal, P. A. (2011a). Cancer genomics: From discovery science to personalized medicine. Nature Medicine, 17(3), 297–303.
Chin, L., Hahn, W. C., Getz, G., et al. (2011b). Making sense of cancer genomic data. Genes & Development, 25(6), 534–555.
Ciriello, G., Miller, M. L., Aksoy, B. A., et al. (2013). Emerging landscape of oncogenic signatures across human cancers. Nature Genetics, 45(10), 1127–1133.
Cordero, F., Botta, M., & Calogero, R. A. (2007). Microarray data analysis and mining approaches. Briefings in Functional Genomics & Proteomics, 6(4), 265–281.
Desmedt, C., Sotiriou, C., & Piccart-Gebhart, M. J. (2009). Development and validation of gene expression profile signatures in early-stage breast cancer. Cancer Investigation, 27(1), 1–10.
Eng, J. K., Jahan, T. A., & Hoopmann, M. R. (2013). Comet: An open-source MS/MS sequence database search tool. Proteomics, 13(1), 22–24.
Evans, V. C., Barker, G., Heesom, K. J., et al. (2012). De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nature Methods, 9(12), 1207–1211.
Faulkner, S., Dun, M. D., & Hondermarck, H. (2015). Proteogenomics: Emergence and promise. Cellular and Molecular Life Sciences, 72(5), 953–957.
Forbes, S. A., Beare, D., Gunasekaran, P., et al. (2015). COSMIC: Exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Research, 43(Database issue), D805–D811.
Hanahan, D., & Weinberg, R. A. (2000). The hallmarks of cancer. Cell, 100(1), 57–70.
Jumeau, F., Com, E., Lane, L., et al. (2015). Human spermatozoa as a model for detecting missing proteins in the context of the chromosome-centric Human Proteome Project. Journal of Proteome Research, 14(9), 3606–3620.
Kandoth, C., McLellan, M. D., Vandin, F., et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature, 502(7471), 333–339.
Krasnov, G. S., Dmitriev, A. A., Kudryavtseva, A. V., et al. (2015). PPLine: An automated pipeline for SNP, SAP, and splice variant detection in the context of proteogenomics. Journal of Proteome Research, 14(9), 3729–3737.
Kumar, P., Henikoff, S., & Pauline, C. N. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols, 4(8), 1073–1082.
Kurian, A. W., Hare, E. E., Mills, M. A., et al. (2014). Clinical evaluation of a multiple-gene sequencing panel for hereditary cancer risk assessment. Journal of Clinical Oncology, 32(19), 2001–2009.
Lander, E. S., Linton, L. M., Birren, B., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.
Landrum, M. J., Lee, J. M., Riley, G. R., et al. (2014). ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42(Database issue), D980–D985.
Lappalainen, I., Lopez, J., Skipper, L., et al. (2013). DbVar and DGVa: Public archives for genomic structural variation. Nucleic Acids Research, 41(Database issue), D936–D941.
Legrain, P., Aebersold, R., Archakov, A., et al. (2011). The human proteome project: Current state and future direction. Molecular and Cellular Proteomics, 10(7), M111.009993.
McDermott, U., Downing, J. R., & Stratton, M. R. (2011). Genomics and the continuum of cancer care. New England Journal of Medicine, 364(4), 340–350.
McLaren, W., Pritchard, B., Rios, D., et al. (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor. Bioinformatics, 26(16), 2069–2070.
Meyerson, M., Gabriel, S., & Getz, G. (2010). Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics, 11(10), 685–696.
Nagaraj, N., Wisniewski, J. R., Geiger, T., et al. (2011). Deep proteome and transcriptome mapping of a human cancer cell line. Molecular Systems Biology, 7, 548.
Nagaraj, S. H., Waddell, N., Madugundu, A. K., et al. (2015). PGTools: A software suite for proteogenomic data analysis and visualization. Journal of Proteome Research, 14(5), 2255–2266.
Nesvizhskii, A. I. (2014). Proteogenomics: Concepts, applications and computational strategies. Nature Methods, 11(11), 1114–1125.
Pabinger, S., Dander, A., Fischer, M., et al. (2014). A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 15(2), 256–278.
Paik, Y. K., & Hancock, W. S. (2012). Uniting ENCODE with genome-wide proteomics. Nature Biotechnology, 30(11), 1065–1067.
Prieto, G., Aloria, K., Osinalde, N., et al. (2012). PAnalyzer: A software tool for protein inference in shotgun proteomics. BMC Bioinformatics, 13, 288.
ENCODE Project Consortium, Bernstein, B. E., Birney, E., et al. (2011). A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biology, 9(4), e1001046.
Reiter, L., Claassen, M., Schrimpf, S. P., et al. (2009). Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Molecular and Cellular Proteomics, 8(11), 2405–2417.
Segura, V., Medina-Aunon, J. A., Mora, M. I., et al. (2014). Surfing transcriptomic landscapes. A step beyond the annotation of chromosome 16 proteome. Journal of Proteome Research, 13(1), 158–172.
Smigielski, E. M., Sirotkin, K., Ward, M., et al. (2000). dbSNP: A database of single nucleotide polymorphisms. Nucleic Acids Research, 28(1), 352–355.
Sotiriou, C., & Pusztai, L. (2009). Gene-expression signatures in breast cancer. New England Journal of Medicine, 360(8), 790–800.
Tabas-Madrid, D., Alves-Cruzeiro, J., Segura, V., et al. (2015). Proteogenomics dashboard for the Human Proteome Project. Journal of Proteome Research, 14(9), 3738–3749.
Tamborero, D., Gonzalez-Perez, A., Perez-Llamas, C., et al. (2013). Comprehensive identification of mutational cancer driver genes across 12 tumor types. Science Reports, 3, 2650.
Tomczak, K., Czerwińska, P., & Wiznerowicz, M. (2015). The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemporary Oncology (Pozn), 19(1A), A68–A77.
Trapnell, C., Hendrickson, D. G., Sauvageau, M., et al. (2013). Differential analysis of gene regulation at transcript resolution with RNA-Seq. Nature Biotechnology, 31(1), 46–53.
Venter, J. C., Adams, M. D., Myers, E. W., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304–1351.
Vizcaíno, J. A., Côté, R. G., Csordas, A., et al. (2013). The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Research, 41(Database issue), D1063–D1069.
Wang, X., & Zhang, B. (2013). customProDB: An R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics, 29(24), 3235–3237.
Woo, S., Cha, S. W., Na, S., et al. (2014). Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data. Proteomics, 14(23–24), 2719–2730.
Yang, X., & Lazar, I. M. (2014). XMAn: A Homo sapiens mutated-peptide database for the MS analysis of cancerous cell states. Journal of Proteome Research, 13(12), 5486–5495.
Zhang, B., Wang, J., Wang, X., et al. (2014). Proteogenomic characterization of human colon and rectal cancer. Nature, 513(7518), 382–387.
Acknowledgments
All participating laboratories are members of ProteoRed-ISCIII. This work was supported by: Carlos III Health Institute of Spain (ISCIII, FIS PI11/02114 and FIS PI14/01538)-Fondos FEDER (EU); grants SAF2014-5478-R from Ministerio de Economía y Competitividad. The CIMA Proteomics Unit belongs to ProteoRed, PRB2-ISCIII, supported by grant PT13/0001 L We also thank the Proteomics, Genomics and Bioinformatics Core Facility of CIMA, especially to Elizabeth Guruceaga, María Mora and Leticia Odriozola for technical support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Garin-Muga, A., Corrales, F.J., Segura, V. (2016). Proteogenomic Analysis of Single Amino Acid Polymorphisms in Cancer Research. In: Végvári, Á. (eds) Proteogenomics. Advances in Experimental Medicine and Biology, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-319-42316-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-42316-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42314-2
Online ISBN: 978-3-319-42316-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)