Skip to main content

Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets

Part of the Methods in Molecular Biology book series (MIMB,volume 2361)

Abstract

Cancer is a complex disease characterized by molecular heterogeneity and the involvement of several cellular mechanisms throughout its evolution and pathogenesis. Despite the great efforts made to untangle these mechanisms, cancer pathophysiology remains far from clear. So far, panels of biomarkers have been reported from high-throughput data generated through different platforms. These biomarkers are primarily focused on one type of coding molecules such as transcripts or proteins, mainly due to the apparent heterogeneity of output data resulting from the use of various techniques specific to the molecular type. Hence, there is a major need to understand how these molecules interact and complement each other to be able to explain the deregulated processes involved. The breadth of large-scale data availability as well as the lack of in-depth analysis of publicly available data has raised concerns and enabled opportunities for new strategies to analyze “Big data” more comprehensively. Here, a new protocol to perform integrative analysis based on a systems biology approach is described. The foundation of the approach relies on groups of datasets from published studies compared within the original described groups and organized in a designated format to allow the integration and cross-comparison among different studies and different platforms. This approach follows an unbiased hypothesis-free methodology that will facilitate the identification of commonalities among different data-set sources, and ultimately map and characterize specific molecular pathways using significantly deregulated molecules. This in turn will generate novel insights about the mechanisms deregulated in complex diseases such as cancer.

Key words

  • Cancer
  • Systems biology
  • Integrative analysis
  • OMICS
  • High-throughput data
  • Molecular biomarkers

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-0716-1641-3_17
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-1-0716-1641-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 1

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. American Cancer Society (2018) Global cancer facts and figures 4th edition. Am Cancer Soc, pp 1–76

    Google Scholar 

  2. Levy SE, Boone BE (2019) Next-generation sequencing strategies. Cold Spring Harb Perspect Med 9:a025791

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Aslam B, Basit M, Nisar MA et al (2017) Proteomics: technologies and their applications. J Chromat Sci 55(2):182–196. https://doi.org/10.1093/chromsci/bmw167

  4. Serna G, Ruiz-Pace F, Cecchi F et al (2019) Targeted multiplex proteomics for molecular prescreening and biomarker discovery in metastatic colorectal cancer. Sci Rep 9:1–10

    CAS  Google Scholar 

  5. Zhang C, Leng W, Sun C et al (2018) Urine proteome profiling predicts lung cancer from control cases and other tumors. EBioMedicine 30:120–128

    PubMed  PubMed Central  Google Scholar 

  6. Sim SY, Choi YR, Lee JH et al (2019) In-depth proteomic analysis of human bronchoalveolar lavage fluid toward the biomarker discovery for lung cancers. ProteomicsClin Appl 13:e1900028

    Google Scholar 

  7. Yang QJ, Zhao JR, Hao J et al (2018) Serum and urine metabolomics study reveals a distinct diagnostic model for cancer cachexia. J Cachexia Sarcopenia Muscle 9:71–85

    PubMed  Google Scholar 

  8. Li Y, Kang K, Krahn JM et al (2017) A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data. BMC Genomics 18:508

    PubMed  PubMed Central  Google Scholar 

  9. Sunami K, Ichikawa H, Kubo T et al (2019) Feasibility and utility of a panel testing for 114 cancer-associated genes in a clinical setting: a hospital-based study. Cancer Sci 110:1480–1490

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Dagogo-Jack I, Shaw AT (2018) Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 15(2):81–94. www.nature.com/nrclinonc

    CAS  PubMed  Google Scholar 

  11. Pavlou MP, Diamandis EP, Blasutig IM (2013) The long journey of cancer biomarkers from the bench to the clinic. Clin Chem 59:147–157

    CAS  PubMed  Google Scholar 

  12. Borrebaeck CAK (2017) Precision diagnostics: moving towards protein biomarker signatures of clinical utility in cancer. Nat Rev Cancer 17(3):199–204. www.nature.com/nrc

    CAS  PubMed  Google Scholar 

  13. Vogel C, Marcotte EM (2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet 13:227–232

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhang B, Wang J, Wang X et al (2014) Proteogenomic characterization of human colon and rectal cancer. Nature 513:382–387

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Alfaro JA, Sinha A, Kislinger T et al (2014) Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Methods 11(11):1107–1113. https://www.nature.com/articles/nmeth.3138

    CAS  PubMed  Google Scholar 

  16. Hristova VA, Chan DW (2019) Cancer biomarker discovery and translation: proteomics and beyond. Expert Rev Proteomics 16(2):93–103. pmc/articles/PMC6635916/?report=abstract

    CAS  PubMed  Google Scholar 

  17. Tomczak K, Czerwińska P, Wiznerowicz M (2015) The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn) 19(1A):A68–A77. pmc/articles/PMC4322527/?report=abstract

    Google Scholar 

  18. Sondka Z, Bamford S, Cole CG et al (2018) The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 18(11):696–705. https://doi.org/10.1038/s41568-018-0060-1

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  19. Clough E, Barrett T (2016) The gene expression omnibus database. In: Methods in molecular biology. Humana Press Inc, Totowa, NJ, pp 93–110

    Google Scholar 

  20. Kechavarzi BD, Wu H, Doman TN (2019) Bottom-up, integrated -omics analysis identifies broadly dosage-sensitive genes in breast cancer samples from TCGA. PLoS One 14:e0210910

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Konstorum A, Lynch ML, Torti SV et al (2018) A systems biology approach to understanding the pathophysiology of high-grade serous ovarian cancer: focus on iron and fatty acid metabolism. Omi A J Integr Biol 22:502–513

    CAS  Google Scholar 

  22. Krempel R, Kulkarni P, Yim A et al (2018) Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB). BMC Bioinformatics 19:156

    PubMed  PubMed Central  Google Scholar 

  23. Selvaraj G, Kaliamurthi S, Kaushik AC et al (2018) Identification of target gene and prognostic evaluation for lung adenocarcinoma using gene expression meta-analysis, network analysis and neural network algorithms. J Biomed Inform 86:120–134

    PubMed  Google Scholar 

  24. Archer TC, Fertig EJ, Gosline SJC et al (2016) Systems approaches to cancer biology. In: Cancer research. American Association for Cancer Research Inc, Philadelphia, pp 6774–6777

    Google Scholar 

  25. Xia J, Fjell CD, Mayer ML et al (2013) INMEX—a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res 41:W63

    PubMed  PubMed Central  Google Scholar 

  26. Durinck S, Moreau Y, Kasprzyk A et al (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21:3439–3440

    CAS  PubMed  Google Scholar 

  27. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57

    CAS  Google Scholar 

  28. Husi H (2004) NMDA receptors, neural pathways, and protein interaction databases. Int Rev Neurobiol 61:49–77

    PubMed  Google Scholar 

  29. Brown J, Phillips AR, Lewis DA et al (2019) Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration. BMC Bioinformatics 20:255

    PubMed  PubMed Central  Google Scholar 

  30. Zhou G, Soufan O, Ewald J et al (2019) NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res 47:W234–W241

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Lehrmann A, Huber M, Polatkan AC et al (2013) Visualizing dimensionality reduction of systems biology data. Data Min Knowl Discov 27:146–165

    Google Scholar 

  32. Mramor M, Leban G, Demšar J et al (2007) Visualization-based cancer microarray data classification analysis. Bioinformatics 23:2147–2154

    CAS  PubMed  Google Scholar 

  33. Bartenhagen C, Klein HU, Ruckert C et al (2010) Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics 11:1–11

    Google Scholar 

  34. Lever J, Krzywinski M, Altman N (2017) Principal component analysis. Nat Methods 14:641–642. https://doi.org/10.1038/nmeth.4346

  35. Censi F, Calcagnini G, Bartolini P et al (2010) A systems biology strategy on differential gene expression data discloses some biological features of atrial fibrillation. PLoS One 5:e13668

    PubMed  PubMed Central  Google Scholar 

  36. Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17:763–774

    CAS  PubMed  Google Scholar 

  37. Tahmasebi A, Ebrahimie E, Pakniyat H et al (2019) Tissue-specific transcriptional biomarkers in medicinal plants: application of large-scale meta-analysis and computational systems biology. Gene 691:114–124

    CAS  PubMed  Google Scholar 

  38. Khan A, Rehman Z, Hashmi HF et al (2020) An integrated systems biology and network-based approaches to identify novel biomarkers in breast cancer cell lines using gene expression data. Interdiscip Sci Comput Life Sci 12:155–168

    CAS  Google Scholar 

  39. Reznik E, Luna A, Aksoy BA et al (2018) A landscape of metabolic variation across tumor types. Cell Syst 6:301–313.e3

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Van’t Veer LJ, Dai H, Van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536

    Google Scholar 

  41. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37

    Google Scholar 

  42. Kashyap H, Ahmed HA, Hoque N et al (2016) Big data analytics in bioinformatics: architectures, techniques, tools and issues. Netw Model Anal Health Inform Bioinforma 5:28

    Google Scholar 

  43. Husi H, Fernandes M, Skipworth RJ et al (2019) Identification of diagnostic upper gastrointestinal cancer tissue type-specific urinary biomarkers. Biomed Reports 10:165–174

    CAS  Google Scholar 

  44. Fernandes M, Patel A, Husi H (2018) C/VDdb: a multi-omics expression profiling database for a knowledge-driven approach in cardiovascular disease (CVD). PLoS One 13(11):e0207371

    PubMed  PubMed Central  Google Scholar 

  45. Cervantes-Gracia K, Husi H (2018) Integrative analysis of multiple sclerosis using a systems biology approach. Sci Rep 8:1–14

    CAS  Google Scholar 

  46. Krochmal M, Fernandes M, Filip S et al (2016) PeptiCKDdb-peptide-and protein-centric database for the investigation of genesis and progression of chronic kidney disease. Database (Oxford) 2016:baw128

    Google Scholar 

  47. Bindea G, Galon J, Mlecnik B (2013) CluePedia Cytoscape plugin: pathway insights using integrated experimental and in silico data. Bioinformatics 29:661–663

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Bindea G, Mlecnik B, Hackl H et al (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25:1091–1093

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Kutmon M, van Iersel MP, Bohler A et al (2015) PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol 11:e1004085

    PubMed  PubMed Central  Google Scholar 

  50. van Iersel MP, Pico AR, Kelder T et al (2010) The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 11:1–7

    Google Scholar 

  51. Pinu FR, Beale DJ, Paten AM et al (2019) Systems biology and multi-omics integration: viewpoints from the metabolomics research community. Metabolites 9(4):76

    CAS  PubMed Central  Google Scholar 

  52. Zhou G, Li S, Xia J (2020) Network-based approaches for multi-omics integration. Methods Mol Biol 2104:469–487

    CAS  PubMed  Google Scholar 

  53. Warde-Farley D, Donaldson SL, Comes O et al (2010) GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38:W214–W220

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J et al (2020) The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 48:D845–D855

    PubMed  Google Scholar 

  55. Enright AJ, John B, Gaul U et al (2003) MicroRNA targets in Drosophila. Genome Biol 5:R1

    PubMed  PubMed Central  Google Scholar 

  56. Karnovsky A, Weymouth T, Hull T et al (2012) Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics 28:373–380

    CAS  PubMed  Google Scholar 

  57. Pang Z, Chong J, Li S et al (2020) MetaboAnalystR 3.0: toward an optimized workflow for global metabolomics. Metabolites 10:186

    CAS  PubMed Central  Google Scholar 

  58. Sean D, Meltzer PS (2007) GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23:1846–1847

    CAS  Google Scholar 

  59. Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    PubMed  PubMed Central  Google Scholar 

  60. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3

    PubMed  Google Scholar 

  61. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289–300

    Google Scholar 

  62. Husi H, Skipworth RJE, Cronshaw A et al (2016) Proteomic identification of potential cancer markers in human urine using subtractive analysis. Int J Oncol 48:1921–1932

    CAS  PubMed  Google Scholar 

  63. Husi H, Van Agtmael T, Mullen W et al (2014) Proteome-based systems biology analysis of the diabetic mouse aorta reveals major changes in fatty acid biosynthesis as potential hallmark in diabetes mellitus-associated vascular disease. Circ Cardiovasc Genet 7:161–170

    CAS  PubMed  Google Scholar 

  64. Delles C, Husi H (2017) Systems biology approach in hypertension research. In: Methods in molecular biology. Humana Press Inc, Totowa, NJ, pp 69–79

    Google Scholar 

  65. Fernandes M, Husi H (2016) Integrative systems biology investigation of fabry disease. Diseases 4:35

    PubMed Central  Google Scholar 

  66. García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E (2015) Pathway analysis: state of the art. Front Physiol 6:383

    Google Scholar 

  67. De Anda-Jáuregui G, Mejía-Pedroza RA, Espinal-Enríquez J et al (2015) Crosstalk events in the estrogen signaling pathway may affect tamoxifen efficacy in breast cancer molecular subtypes. Comput Biol Chem 59:42–54

    Google Scholar 

Download references

Acknowledgments

KCG is supported by CONACYT Mexico scholarship (No. 2019-000021-01EXTF-00542). HH is supported by a grant from Highlands & Islands Enterprise.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Holger Husi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Cervantes-Gracia, K., Chahwan, R., Husi, H. (2021). Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets. In: Cecconi, D. (eds) Proteomics Data Analysis. Methods in Molecular Biology, vol 2361. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1641-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1641-3_17

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1640-6

  • Online ISBN: 978-1-0716-1641-3

  • eBook Packages: Springer Protocols