Abstract
Proteogenomics is a multi-omics research field that has the aim to efficiently integrate genomics, transcriptomics and proteomics. With this approach it is possible to identify new patient-specific proteoforms that may have implications in disease development, specifically in cancer. Understanding the impact of a large number of mutations detected at the genomics level is needed to assess the effects at the proteome level. Proteogenomics data integration would help in identifying molecular changes that are persistent across multiple molecular layers and enable better interpretation of molecular mechanisms of disease, such as the causal relationship between single nucleotide polymorphisms (SNPs) and the expression of transcripts and translation of proteins compared to mainstream proteomics approaches. Identifying patient-specific protein forms and getting a better picture of molecular mechanisms of disease opens the avenue for precision and personalized medicine. Proteogenomics is, however, a challenging interdisciplinary science that requires the understanding of sample preparation, data acquisition and processing for genomics, transcriptomics and proteomics. This chapter aims to guide the reader through the technology and bioinformatics aspects of these multi-omics approaches, illustrated with proteogenomics applications having clinical or biological relevance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aviner, R., Geiger, T., & Elroy-Stein, O. (2013). PUNCH-P for global translatome profiling: Methodology, insights and comparison to other techniques. Translation (Austin), 1(2), e27516. doi:10.4161/trla.27516
Bantscheff, M., Schirle, M., Sweetman, G., Rick, J., & Kuster, B. (2007). Quantitative mass spectrometry in proteomics: A critical review. Analytical and Bioanalytical Chemistry, 389(4), 1017–1031. doi:10.1007/s00216-007-1486-6.
Bantscheff, M., Lemeer, S., Savitski, M. M., & Kuster, B. (2012). Quantitative mass spectrometry in proteomics: Critical review update from 2007 to the present. Analytical and Bioanalytical Chemistry, 404(4), 939–965. doi:10.1007/s00216-012-6203-4.
Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C. L., Serova, N., Davis, S., & Soboleva, A. (2013). NCBI GEO: Archive for functional genomics data sets–update. Nucleic Acids Research, 41(Database issue), D991–D995. doi:10.1093/nar/gks1193.
Bensimon, A., Heck, A. J., & Aebersold, R. (2012). Mass spectrometry-based proteomics and network biology. Annual Review of Biochemistry, 81, 379–405. doi:10.1146/annurev-biochem-072909-100424.
Bertsch, A., Gropl, C., Reinert, K., & Kohlbacher, O. (2011). OpenMS and TOPP: Open source software for LC-MS data analysis. Methods in Molecular Biology, 696, 353–367. doi:10.1007/978-1-60761-987-1_23.
Besemer, J., Lomsadze, A., & Borodovsky, M. (2001). GeneMarkS: A self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research, 29(12), 2607–2618.
Bischoff, R., & Schlüter, H. (2012). Amino acids: Chemistry, functionality and selected non-enzymatic post-translational modifications. Journal of Proteomics, 75(8), 2275–2296. doi:10.1016/j.jprot.2012.01.041.
Bischoff, R., Permentier, H., Guryev, V., & Horvatovich, P. (2015). Genomic variability and protein species – Improving sequence coverage for proteogenomics. Journal of Proteomics. doi:10.1016/j.jprot.2015.09.021.
Bjornson, R. D., Carriero, N. J., Colangelo, C., Shifman, M., Cheung, K. H., Miller, P. L., & Williams, K. (2008). X!!Tandem, an improved method for running X! Tandem in parallel on collections of commodity computers. Journal of Proteome Research, 7(1), 293–299. doi:10.1021/pr0701198.
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. doi:10.1093/bioinformatics/btu170.
Chambers, M. C., Maclean, B., Burke, R., Amodei, D., Ruderman, D. L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J., Hoff, K., Kessner, D., Tasman, N., Shulman, N., Frewen, B., Baker, T. A., Brusniak, M. Y., Paulse, C., Creasy, D., Flashner, L., Kani, K., Moulding, C., Seymour, S. L., Nuwaysir, L. M., Lefebvre, B., Kuhlmann, F., Roark, J., Rainer, P., Detlev, S., Hemenway, T., Huhmer, A., Langridge, J., Connolly, B., Chadick, T., Holly, K., Eckels, J., Deutsch, E. W., Moritz, R. L., Katz, J. E., Agus, D. B., MacCoss, M., Tabb, D. L., & Mallick, P. (2012). A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology, 30(10), 918–920. doi:10.1038/nbt.2377.
Chang, C., Li, L., Zhang, C., Wu, S., Guo, K., Zi, J., Chen, Z., Jiang, J., Ma, J., Yu, Q., Fan, F., Qin, P., Han, M., Su, N., Chen, T., Wang, K., Zhai, L., Zhang, T., Ying, W., Xu, Z., Zhang, Y., Liu, Y., Liu, X., Zhong, F., Shen, H., Wang, Q., Hou, G., Zhao, H., Li, G., Liu, S., Gu, W., Wang, G., Wang, T., Zhang, G., Qian, X., Li, N., He, Q. Y., Lin, L., Yang, P., Zhu, Y., He, F., & Xu, P. (2014). Systematic analyses of the transcriptome, translatome, and proteome provide a global view and potential strategy for the C-HPP. Journal of Proteome Research, 13(1), 38–49. doi:10.1021/pr4009018.
Christin, C., Bischoff, R., & Horvatovich, P. (2011). Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC-MS for biomarker discovery. Talanta, 83(4), 1209–1224. doi:10.1016/j.talanta.2010.10.029.
Chuh, K. N., & Pratt, M. R. (2015). Chemical methods for the proteome-wide identification of posttranslationally modified proteins. Current Opinion in Chemical Biology, 24, 27–37. doi:10.1016/j.cbpa.2014.10.020.
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., & Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767–1771. doi:10.1093/nar/gkp1137/ConsortiumN.
Consortium U. (2015). UniProt: A hub for protein information. Nucleic Acids Research, 43(Database issue), D204–D212. doi:10.1093/nar/gku989.
Cote, R. G., Griss, J., Dianes, J. A., Wang, R., Wright, J. C., van den Toorn, H. W., van Breukelen, B., Heck, A. J., Hulstaert, N., Martens, L., Reisinger, F., Csordas, A., Ovelleiro, D., Perez-Rivevol, Y., Barsnes, H., Hermjakob, H., & Vizcaino, J. A. (2012). The PRoteomics IDEntification (PRIDE) Converter 2 framework: An improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Molecular & Cellular Proteomics, 11(12), 1682–1689. doi:10.1074/mcp.O112.021543.
Cox, J., & Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology, 26(12), 1367–1372. doi:10.1038/nbt.1511.
Craig, R., Cortens, J. C., Fenyo, D., & Beavis, R. C. (2006). Using annotated peptide mass spectrum libraries for protein identification. Journal of Proteome Research, 5(8), 1843–1849. doi:10.1021/pr0602085.
Deutsch, E. W., Lam, H., & Aebersold, R. (2008). PeptideAtlas: A resource for target selection for emerging targeted proteomics workflows. EMBO Reports, 9(5), 429–434. doi:10.1038/embor.2008.56.
Deutsch, E. W., Mendoza, L., Shteynberg, D., Farrah, T., Lam, H., Tasman, N., Sun, Z., Nilsson, E., Pratt, B., Prazen, B., Eng, J. K., Martin, D. B., Nesvizhskii, A. I., & Aebersold, R. (2010). A guided tour of the trans-proteomic pipeline. Proteomics, 10(6), 1150–1159. doi:10.1002/pmic.200900375.
Deutsch, E. W., Mendoza, L., Shteynberg, D., Slagel, J., Sun, Z., & Moritz, R. L. (2015). Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clinical Applications, 9(7–8), 745–754. doi:10.1002/prca.201400164.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., & Gingeras, T. R. (2013). STAR: Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21. doi:10.1093/bioinformatics/bts635.
Domon, B., & Aebersold, R. (2006). Mass spectrometry and protein analysis. Science, 312(5771), 212–217. doi:10.1126/science.1124619.
Elias, J. E., & Gygi, S. P. (2010). Target-decoy search strategy for mass spectrometry-based proteomics. Methods in Molecular Biology, 604, 55–71. doi:10.1007/978-1-60761-444-9_5.
Eng, J. K., Searle, B. C., Clauser, K. R., & Tabb, D. L. (2011). A face in the crowd: Recognizing peptides through database search. Molecular & Cellular Proteomics, 10(11), R111.009522. doi:10.1074/mcp.R111.009522.
Eng, J. K., Jahan, T. A., & Hoopmann, M. R. (2013). Comet: An open-source MS/MS sequence database search tool. Proteomics, 13(1), 22–24. doi:10.1002/pmic.201200439.
Farrah, T., Deutsch, E. W., Omenn, G. S., Campbell, D. S., Sun, Z., Bletz, J. A., Mallick, P., Katz, J. E., Malmstrom, J., Ossola, R., Watts, J. D., Lin, B., Zhang, H., Moritz, R. L., & Aebersold, R. (2011). A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas. Molecular & Cellular Proteomics, 10(9), M110 006353. doi:10.1074/mcp.M110.006353.
Fiume, M., Williams, V., Brook, A., & Brudno, M. (2010). Savant: Genome browser for high-throughput sequencing data. Bioinformatics, 26(16), 1938–1944. doi:10.1093/bioinformatics/btq332.
Frank, A., & Pevzner, P. (2005). PepNovo: De novo peptide sequencing via probabilistic network modeling. Analytical Chemistry, 77(4), 964–973.
Gawron, D., Gevaert, K., & Van Damme, P. (2014). The proteome under translational control. Proteomics, 14(23–24), 2647–2662. doi:10.1002/pmic.201400165.
Geer, L. Y., Markey, S. P., Kowalak, J. A., Wagner, L., Xu, M., Maynard, D. M., Yang, X., Shi, W., & Bryant, S. H. (2004). Open mass spectrometry search algorithm. Journal of Proteome Research, 3(5), 958–964. doi:10.1021/pr0499491.
Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., & Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29(7), 644–652. doi:10.1038/nbt.1883.
Griss, J., Jones, A. R., Sachsenberg, T., Walzer, M., Gatto, L., Hartler, J., Thallinger, G. G., Salek, R. M., Steinbeck, C., Neuhauser, N., Cox, J., Neumann, S., Fan, J., Reisinger, F., Xu, Q. W., Del Toro, N., Perez-Riverol, Y., Ghali, F., Bandeira, N., Xenarios, I., Kohlbacher, O., Vizcaino, J. A., & Hermjakob, H. (2014). The mzTab data exchange format: Communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Molecular & Cellular Proteomics, 13(10), 2765–2775. doi:10.1074/mcp.O113.036681.
Gstaiger, M., & Aebersold, R. (2009). Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nature Reviews Genetics, 10(9), 617–627. doi:10.1038/nrg2633.
Herrero, J., Muffato, M., Beal, K., Fitzgerald, S., Gordon, L., Pignatelli, M., Vilella, A. J., Searle, S. M., Amode, R., Brent, S., Spooner, W., Kulesha, E., Yates, A., & Flicek, P. (2016). Ensembl comparative genomics resources. Database: The Journal of Biological Databases and Curation. doi:10.1093/database/bav096.
Hoopmann, M. R., & Moritz, R. L. (2013). Current algorithmic solutions for peptide-based proteomics data generation and identification. Current Opinion in Biotechnology, 24(1), 31–38. doi:10.1016/j.copbio.2012.10.013.
Horvatovich, P. L., & Bischoff, R. (2010). Current technological challenges in biomarker discovery and validation. European Journal of Mass Spectrometry, 16(1), 101–121. doi:10.1255/ejms.1050.
Horvatovich, P., Govorukhina, N., & Bischoff, R. (2006). Biomarker discovery by proteomics: Challenges not only for the analytical chemist. The Analyst, 131(11), 1193–1196. doi:10.1039/b607833h.
Horvatovich, P., Hoekman, B., Govorukhina, N., & Bischoff, R. (2010). Multidimensional chromatography coupled to mass spectrometry in analysing complex proteomics samples. Journal of Separation Science, 33(10), 1421–1437. doi:10.1002/jssc.201000050.
Horvatovich, P., Lundberg, E. K., Chen, Y. J., Sung, T. Y., He, F., Nice, E. C., Goode, R. J., Yu, S., Ranganathan, S., Baker, M. S., Domont, G. B., Velasquez, E., Li, D., Liu, S., Wang, Q., He, Q. Y., Menon, R., Guan, Y., Corrales, F. J., Segura, V., Casal, J. I., Pascual-Montano, A., Albar, J. P., Fuentes, M., Gonzalez-Gonzalez, M., Diez, P., Ibarrola, N., Degano, R. M., Mohammed, Y., Borchers, C. H., Urbani, A., Soggiu, A., Yamamoto, T., Salekdeh, G. H., Archakov, A., Ponomarenko, E., Lisitsa, A., Lichti, C. F., Mostovenko, E., Kroes, R. A., Rezeli, M., Vegvari, A., Fehniger, T. E., Bischoff, R., Vizcaino, J. A., Deutsch, E. W., Lane, L., Nilsson, C. L., Marko-Varga, G., Omenn, G. S., Jeong, S. K., Lim, J. S., Paik, Y. K., & Hancock, W. S. (2015). Quest for missing proteins: Update 2015 on chromosome-centric human proteome project. Journal of Proteome Research, 14(9), 3415–3431. doi:10.1021/pr5013009.
Hughes, C., Ma, B., & Lajoie, G. A. (2010). De novo sequencing methods in proteomics. Methods in Molecular Biology, 604, 105–121. doi:10.1007/978-1-60761-444-9_8.
Jeong, K., Kim, S., & Pevzner, P. A. (2013). UniNovo: A universal tool for de novo peptide sequencing. Bioinformatics, 29(16), 1953–1962. doi:10.1093/bioinformatics/btt338.
Kall, L., Canterbury, J. D., Weston, J., Noble, W. S., & MacCoss, M. J. (2007). Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods, 4(11), 923–925. doi:10.1038/nmeth1113.
Kapp, E., & Schutz, F. (2007). Overview of tandem mass spectrometry (MS/MS) database search algorithms. Current protocols in protein science / editorial board, John E Coligan [et al] Chapter 25:Unit25 22. doi:10.1002/0471140864.ps2502s49.
Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5392.
Kertesz-Farkas, A., Keich, U., & Noble, W. S. (2015). Tandem mass spectrum identification via cascaded search. Journal of Proteome Research, 14(8), 3027–3038. doi:10.1021/pr501173s.
Kessner, D., Chambers, M., Burke, R., Agus, D., & Mallick, P. (2008). ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics, 24(21), 2534–2536. doi:10.1093/bioinformatics/btn323.
Khan, Z., Bloom, J. S., Garcia, B. A., Singh, M., & Kruglyak, L. (2009). Protein quantification across hundreds of experimental conditions. Proceedings of the National Academy of Sciences of the United States of America, 106(37), 15544–15548. doi:10.1073/pnas.0904100106.
Kim, S., & Pevzner, P. A. (2014). MS-GF+ makes progress towards a universal database search tool for proteomics. Nature Communications, 5, 5277. doi:10.1038/ncomms6277.
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., & Salzberg, S. L. (2013). TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14(4), R36. doi:10.1186/gb-2013-14-4-r36.
Kirchner, M., Steen, J. A., Hamprecht, F. A., & Steen, H. (2010). MGFp: An open Mascot Generic Format parser library implementation. Journal of Proteome Research, 9(5), 2762–2763. doi:10.1021/pr100118f.
Lam, H. (2011). Building and searching tandem mass spectral libraries for peptide identification. Molecular & Cellular Proteomics, 10(12), R111.008565. doi:10.1074/mcp.R111.008565.
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J. P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, Y., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J. C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R. H., Wilson, R. K., Hillier, L. W., McPherson, J. D., Marra, M. A., Mardis, E. R., Fulton, L. A., Chinwalla, A. T., Pepin, K. H., Gish, W. R., Chissoe, S. L., Wendl, M. C., Delehaunty, K. D., Miner, T. L., Delehaunty, A., Kramer, J. B., Cook, L. L., Fulton, R. S., Johnson, D. L., Minx, P. J., Clifton, S. W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J. F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R. A., Muzny, D. M., Scherer, S. E., Bouck, J. B., Sodergren, E. J., Worley, K. C., Rives, C. M., Gorrell, J. H., Metzker, M. L., Naylor, S. L., Kucherlapati, R. S., Nelson, D. L., Weinstock, G. M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D. R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H. M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R. W., Federspiel, N. A., Abola, A. P., Proctor, M. J., Myers, R. M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D. R., Olson, M. V., Kaul, R., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G. A., Athanasiou, M., Schultz, R., Roe, B. A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W. R., de la Bastide, M., Dedhia, N., Blocker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J. A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D. G., Burge, C. B., Cerutti, L., Chen, H. C., Church, D., Clamp, M., Copley, R. R., Doerks, T., Eddy, S. R., Eichler, E. E., Furey, T. S., Galagan, J., Gilbert, J. G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L. S., Jones, T. A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W. J., Kitts, P., Koonin, E. V., Korf, I., Kulp, D., Lancet, D., Lowe, T. M., McLysaght, A., Mikkelsen, T., Moran, J. V., Mulder, N., Pollara, V. J., Ponting, C. P., Schuler, G., Schultz, J., Slater, G., Smit, A. F., Stupka, E., Szustakowki, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y. I., Wolfe, K. H., Yang, S. P., Yeh, R. F., Collins, F., Guyer, M. S., Peterson, J., Felsenfeld, A., Wetterstrand, K. A., Patrinos, A., Morgan, M. J., de Jong, P., Catanese, J. J., Osoegawa, K., Shizuya, H., Choi, S., & Chen, Y. J. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. doi:10.1038/35057062.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The sequence alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. doi:10.1093/bioinformatics/btp352.
Low, T. Y., van Heesch, S., van den Toorn, H., Giansanti, P., Cristobal, A., Toonen, P., Schafer, S., Hubner, N., van Breukelen, B., Mohammed, S., Cuppen, E., Heck, A. J., & Guryev, V. (2013). Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis. Cell Reports, 5(5), 1469–1478. doi:10.1016/j.celrep.2013.10.041.
Markiv, A., Rambaruth, N. D., & Dwek, M. V. (2012). Beyond the genome and proteome: Targeting protein modifications in cancer. Current Opinion in Pharmacology, 12(4), 408–413. doi:10.1016/j.coph.2012.04.003.
Martin, J. A., & Wang, Z. (2011). Next-generation transcriptome assembly. Nature Reviews Genetics, 12(10), 671–682. doi:10.1038/nrg3068.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., & DePristo, M. A. (2010). The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303. doi:10.1101/gr.107524.110.
Menschaert, G., & Fenyo, D. (2015). Proteogenomics from a bioinformatics angle: A growing field. Mass Spectrometry Reviews. doi:10.1002/mas.21483.
Metzker, M. L. (2010). Sequencing technologies – The next generation. Nature Reviews Genetics, 11(1), 31–46. doi:10.1038/nrg2626.
Muth, T., Weilnbock, L., Rapp, E., Huber, C. G., Martens, L., Vaudel, M., & Barsnes, H. (2014). DeNovoGUI: An open source graphical user interface for de novo sequencing of tandem mass spectra. Journal of Proteome Research, 13(2), 1143–1146. doi:10.1021/pr4008078.
Nesvizhskii, A. I. (2007). Protein identification by tandem mass spectrometry and sequence database searching. Methods in Molecular Biology, 367, 87–119. doi:10.1385/1-59745-275-0:87.
Nesvizhskii, A. I. (2010). A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics, 73(11), 2092–2123. doi:10.1016/j.jprot.2010.08.009.
Nesvizhskii, A. I. (2014). Proteogenomics: Concepts, applications and computational strategies. Nature Methods, 11(11), 1114–1125. doi:10.1038/nmeth.3144.
Nesvizhskii, A., & Avtonomov, D. http://www.batmass.org/
Nesvizhskii, A. I., & Aebersold, R. (2005). Interpretation of shotgun proteomic data: The protein inference problem. Molecular & Cellular Proteomics, 4(10), 1419–1440. doi:10.1074/mcp.R500012-MCP200.
Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658.
Orchard, S., Taylor, C., Hermjakob, H., Zhu, W., Julian, R., & Apweiler, R. (2004). Current status of proteomic standards development. Expert Review of Proteomics, 1(2), 179–183. doi:10.1586/14789450.1.2.179.
Patel, R. K., & Jain, M. (2012). NGS QC Toolkit: A toolkit for quality control of next generation sequencing data. PloS One, 7(2), e30619. doi:10.1371/journal.pone.0030619.
Pearson, W. R., Wood, T., Zhang, Z., & Miller, W. (1997). Comparison of DNA sequences with protein sequences. Genomics, 46(1), 24–36. doi:10.1006/geno.1997.4995.
Pedrioli, P. G., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., Pratt, B., Nilsson, E., Angeletti, R. H., Apweiler, R., Cheung, K., Costello, C. E., Hermjakob, H., Huang, S., Julian, R. K., Kapp, E., McComb, M. E., Oliver, S. G., Omenn, G., Paton, N. W., Simpson, R., Smith, R., Taylor, C. F., Zhu, W., & Aebersold, R. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology, 22(11), 1459–1466. doi:10.1038/nbt1031.
Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., & Mesirov, J. P. (2011). Integrative genomics viewer. Nature Biotechnology, 29(1), 24–26. doi:10.1038/nbt.1754.
Rost, H. L., Rosenberger, G., Navarro, P., Gillet, L., Miladinovic, S. M., Schubert, O. T., Wolski, W., Collins, B. C., Malmstrom, J., Malmstrom, L., & Aebersold, R. (2014). OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nature Biotechnology, 32(3), 219–223. doi:10.1038/nbt.2841.
Ruggles, K. V., Tang, Z., Wang, X., Grover, H., Askenazi, M., Teubl, J., Cao, S., McLellan, M. D., Clauser, K. R., Tabb, D. L., Mertins, P., Slebos, R., Erdmann-Gilmore, P., Li, S., Gunawardena, H. P., Xie, L., Liu, T., Zhou, J. Y., Sun, S., Hoadley, K. A., Perou, C. M., Chen, X., Davies, S. R., Maher, C. A., Kinsinger, C. R., Rodland, K. D., Zhang, H., Zhang, Z., Ding, L., Townsend, R. R., Rodriguez, H., Chan, D., Smith, R. D., Liebler, D. C., Carr, S. A., Payne, S., Ellis, M. J., & Fenyo, D. (2015). An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Molecular & Cellular Proteomics. doi:10.1074/mcp.M115.056226.
Ruiz-Orera, J., Messeguer, X., Subirana, J. A., & Alba, M. M. (2014). Long non-coding RNAs as a source of new peptides. eLife, 3, e03523. doi:10.7554/eLife.03523.
Sajic, T., Liu, Y., & Aebersold, R. (2015). Using data-independent, high-resolution mass spectrometry in protein biomarker research: Perspectives and clinical applications. Proteomics Clinical Applications, 9(3–4), 307–321. doi:10.1002/prca.201400117.
Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A. R., Fiddes, C. A., Hutchison, C. A., Slocombe, P. M., & Smith, M. (1977). Nucleotide sequence of bacteriophage phi X174 DNA. Nature, 265(5596), 687–695.
Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., & Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature, 473(7347), 337–342. doi:10.1038/nature10098.
Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., & Selbach, M. (2013). Corrigendum: Global quantification of mammalian gene expression control. Nature, 495(7439), 126–127. doi:10.1038/nature11848.
Shanmugam, A. K., & Nesvizhskii, A. I. (2015). Effective leveraging of targeted search spaces for improving peptide identification in tandem mass spectrometry based proteomics. Journal of Proteome Research, 14(12), 5169–5178. doi:10.1021/acs.jproteome.5b00504.
Sheynkman, G. M., Shortreed, M. R., Frey, B. L., & Smith, L. M. (2013). Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Molecular & Cellular Proteomics, 12(8), 2341–2353. doi:10.1074/mcp.O113.028142.
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J., & Birol, I. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123. doi:10.1101/gr.089532.108.
Sturm, M., & Kohlbacher, O. (2009). TOPPView: An open-source viewer for mass spectrometry data. Journal of Proteome Research, 8(7), 3760–3763. doi:10.1021/pr900171m.
Tang, S., Lomsadze, A., & Borodovsky, M. (2015). Identification of protein coding regions in RNA transcripts. Nucleic Acids Research, 43(12), e78. doi:10.1093/nar/gkv227.
Tay, A. P., Pang, C. N., Twine, N. A., Hart-Smith, G., Harkness, L., Kassem, M., & Wilkins, M. R. (2015). Proteomic validation of transcript isoforms, including those assembled from RNA-Seq data. Journal of Proteome Research, 14(9), 3541–3554. doi:10.1021/pr5011394.
Teleman, J., Rost, H. L., Rosenberger, G., Schmitt, U., Malmstrom, L., Malmstrom, J., & Levander, F. (2015). DIANA–algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics, 31(4), 555–562. doi:10.1093/bioinformatics/btu686.
Ternent, T., Csordas, A., Qi, D., Gomez-Baena, G., Beynon, R. J., Jones, A. R., Hermjakob, H., & Vizcaino, J. A. (2014). How to submit MS proteomics data to ProteomeXchange via the PRIDE database. Proteomics, 14(20), 2233–2241. doi:10.1002/pmic.201400120.
Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J., & Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28(5), 511–515. doi:10.1038/nbt.1621.
Trevisiol, S., Ayoub, D., Lesur, A., Ancheva, L., Gallien, S., & Domon, B. (2015). The use of proteases complementary to trypsin to probe isoforms and modifications. Proteomics. doi:10.1002/pmic.201500379.
Turewicz, M., & Deutsch, E. W. (2011). Spectra, chromatograms, metadata: mzML-the standard data format for mass spectrometer output. Methods in Molecular Biology, 696, 179–203. doi:10.1007/978-1-60761-987-1_11.
Tyanova, S., Temu, T., Carlson, A., Sinitcyn, P., Mann, M., & Cox, J. (2015). Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics, 15(8), 1453–1456. doi:10.1002/pmic.201400449.
Vaudel, M., Barsnes, H., Berven, F. S., Sickmann, A., & Martens, L. (2011). SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X! Tandem searches. Proteomics, 11(5), 996–999. doi:10.1002/pmic.201000595.
Vaudel, M., Burkhart, J. M., Zahedi, R. P., Oveland, E., Berven, F. S., Sickmann, A., Martens, L., & Barsnes, H. (2015). PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nature Biotechnology, 33(1), 22–24. doi:10.1038/nbt.3109.
Volders, P. J., Helsens, K., Wang, X., Menten, B., Martens, L., Gevaert, K., Vandesompele, J., & Mestdagh, P. (2013). LNCipedia: A database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Research, 41(Database issue), D246–D251. doi:10.1093/nar/gks915.
Volders, P. J., Verheggen, K., Menschaert, G., Vandepoele, K., Martens, L., Vandesompele, J., & Mestdagh, P. (2015). An update on LNCipedia: A database for annotated human lncRNA sequences. Nucleic Acids Research, 43(Database issue), D174–D180. doi:10.1093/nar/gku1060.
Walsh, C. T., Garneau-Tsodikova, S., & Gatto, G. J., Jr. (2005). Protein posttranslational modifications: The chemistry of proteome diversifications. Angewandte Chemie International Edition, 44(45), 7342–7372. doi:10.1002/anie.200501023.
Walzer, M., Qi, D., Mayer, G., Uszkoreit, J., Eisenacher, M., Sachsenberg, T., Gonzalez-Galarza, F. F., Fan, J., Bessant, C., Deutsch, E. W., Reisinger, F., Vizcaino, J. A., Medina-Aunon, J. A., Albar, J. P., Kohlbacher, O., & Jones, A. R. (2013). The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Molecular & Cellular Proteomics, 12(8), 2332–2340. doi:10.1074/mcp.O113.028506.
Walzer, M., Pernas, L. E., Nasso, S., Bittremieux, W., Nahnsen, S., Kelchtermans, P., Pichler, P., van den Toorn, H. W., Staes, A., Vandenbussche, J., Mazanek, M., Taus, T., Scheltema, R. A., Kelstrup, C. D., Gatto, L., van Breukelen, B., Aiche, S., Valkenborg, D., Laukens, K., Lilley, K. S., Olsen, J. V., Heck, A. J., Mechtler, K., Aebersold, R., Gevaert, K., Vizcaino, J. A., Hermjakob, H., Kohlbacher, O., & Martens, L. (2014). qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics, 13(8), 1905–1913. doi:10.1074/mcp.M113.035907.
Weisser, H., Nahnsen, S., Grossmann, J., Nilse, L., Quandt, A., Brauer, H., Sturm, M., Kenar, E., Kohlbacher, O., Aebersold, R., & Malmstrom, L. (2013). An automated pipeline for high-throughput label-free quantitative proteomics. Journal of Proteome Research, 12(4), 1628–1644. doi:10.1021/pr300992u.
Zhang, J., Xin, L., Shan, B., Chen, W., Xie, M., Yuen, D., Zhang, W., Zhang, Z., Lajoie, G. A., & Ma, B. (2012). PEAKS DB: De novo sequencing assisted database search for sensitive and accurate peptide identification. Molecular & Cellular Proteomics, 11(4), M111 010587. doi:10.1074/mcp.M111.010587.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Barbieri, R., Guryev, V., Brandsma, CA., Suits, F., Bischoff, R., Horvatovich, P. (2016). Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine. In: Végvári, Á. (eds) Proteogenomics. Advances in Experimental Medicine and Biology, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-319-42316-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-42316-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42314-2
Online ISBN: 978-3-319-42316-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)