Integrating proteomic and phosphoproteomic data for pathway analysis in breast cancer
As protein is the basic unit of cell function and biological pathway, shotgun proteomics, the large-scale analysis of proteins, is contributing greatly to our understanding of disease mechanisms. Proteomics study could detect the changes of both protein expression and modification. With the releases of large-scale cancer proteome studies, how to integrate acquired proteomic and phosphoproteomic data in more comprehensive pathway analysis becomes implemented, but remains challenging. Integrative pathway analysis at proteome level provides a systematic insight into the signaling network adaptations in the development of cancer.
Here we integrated proteomic and phosphoproteomic data to perform pathway prioritization in breast cancer. We manually collected and curated breast cancer well-known related pathways from the literature as target pathways (TPs) or positive control in method evaluation. Three different strategies including Hypergeometric test based over-representation analysis, Kolmogorov-Smirnov (K-S) test based gene set analysis and topology-based pathway analysis, were applied and evaluated in integrating protein expression and phosphorylation. In comparison, we also assessed the ranking performance of the strategy using information of protein expression or protein phosphorylation individually. Target pathways were ranked more top with the data integration than using the information from proteomic or phosphoproteomic data individually. In the comparisons of pathway analysis strategies, topology-based method outperformed than the others. The subtypes of breast cancer, which consist of Luminal A, Luminal B, Basal and HER2-enriched, vary greatly in prognosis and require distinct treatment. Therefore we applied topology-based pathway analysis with integrating protein expression and phosphorylation profiles on four subtypes of breast cancer. The results showed that TPs were enriched in all subtypes but their ranks were significantly different among the subtypes. For instance, p53 pathway ranked top in the Basal-like breast cancer subtype, but not in HER2-enriched type. The rank of Focal adhesion pathway was more top in HER2- subtypes than in HER2+ subtypes. The results were consistent with some previous researches.
The results demonstrate that the network topology-based method is more powerful by integrating proteomic and phosphoproteomic in pathway analysis of proteomics study. This integrative strategy can also be used to rank the specific pathways for the disease subtypes.
KeywordsProteomics Phosphoproteomics Integration Pathway analysis Breast cancer
Bayesian Pathway Analysis
Clinical Proteomic Tumor Analysis Consortium
Focal adhesion kinase
Functional Class Score
Gene Set Analysis
Gene Set Enrichment Analysis
Signal pathway impact analysis
Following the quick accumulation of large-scale genome, transcriptome and other omics data, some studies or approaches integrating multiple omics data into pathway analysis have been reported [1, 2, 3, 4]. Mass-spectrometry-based proteomics provides insights into cell-type protein expression patterns, post-translational modifications (PTMs) and protein–protein interactions [5, 6, 7]. As the most common PTMs, up to 30% of all human proteins may be modified by kinase activity (Phosphorylation), and kinases are known to regulate the majority of cellular signal pathways. To date, how to integrate the information of protein expression, PTMs and protein interactions in pathway analysis is still a big challenge.
Signal pathways describe a group of molecular in a cell that work together to control one or more cell functions, such as cell division or cell death. Pathway analysis gives an insight into the underlying mechanism in a given condition and makes it more explanatory in comparison with the studies at individual gene or protein level. Pathway analysis methods include gene set analysis and topology-based analysis. Gene set methods only consider the set of genes/proteins in the pathways while the topology-based methods use both genes/proteins and the interactions among them. Gene set methods consist of Over-Representation Analysis (ORA) based on the Hypergeometric test or Fisher exact test [8, 9] and Functional Class Score (FCS) based on ranked gene list and Kolmogorov-Smirnov (K-S) test . The ORA only considers the differentially-expressed (DE) genes and the representative tools of ORA include DAVID , Onto-Expression , GenMAPP , GOMiner , GOstat  and so on. FCS considers the position of all genes in the ranked list, which is produced by a selected statistical test for differential expression, such as Gene Set Enrichment Analysis (GSEA) , Gene Set Analysis (GSA)  and so on. Topology-based pathway analysis integrate both changes in expression level and in topology of protein/gene interaction network, which includes Signal pathway impact analysis (SPIA)  and Bayesian Pathway Analysis (BPA) . In SPIA, the score of the pathway is based on the impact analysis consisting of two types of evidence. One is the over-representation of DE genes in a given pathway and the other is the abnormal perturbation of that pathway, which is measured by propagating expression changes across the pathway topology.
In this work, we tried to integrate proteomic and phosphoproteomic data in pathway analysis in breast cancer and its subtypes. The results showed that integrating protein and phosphorylation differential expression with the network-topology based method can identify the target pathways more accurately. What’s more, we also identified the top ranked pathways in four subtypes of breast cancer specifically.
Proteomics data and preprocessing
The proteomic and phosphoproteomic data of breast cancer in this study included 77 tumor samples and 3 normal breast tissue samples, which were downloaded from Clinical Proteomic Tumor Analysis Consortium (CPTAC). The process of quality control and normalization for both the proteomic and phosphoproteomic data was presented in Mertin et al.’s work . As the result, 12,553 proteins (10,062 genes) and 33,239 phosphosites with their relative abundances quantified across tumors were used in this work. The missing value in the data matrix was filled with the minimum value.
Integrating proteomic and phosphoproteomic data
Since ORA, GSEA and SPIA are the representatives of three kinds of pathway analysis, which are Over-Representation analysis, Functional Class Score and topology-based pathway analysis, we used these three strategies to do pathway analysis. We used R package ‘HTSanalyzeR’  to do ORA, GSEA pathway analysis and another R package ‘SPIA’  to do SPIA pathway analysis. P-values for pathway analysis resulting from the permutation (n = 2000) were provided in Additional File 1: Table S1.
Different methods of pathway analysis require different input data. For ORA, the input file is the list of DE proteins/modifications or the intersection of the DE protein and phosphoprotein as an integration (Student’s t-test, with BH-adjusted p < 0.05). The input file for GSEA method in our study was the list of all proteins/phosphoproteins with fold change between the case and control. We summed up and sorted the fold changes for the overlapping proteins in the protein expression and phosphorylation profiles as the integrated information for GSEA. As for SPIA, the input files consisted of the topology of the pathways downloaded from KEGG database and the DE proteins with their fold change. The topology changes of the pathways could be calculated by the ‘SPIA’ R package. The input for SPIA was the intersection list of the DE proteins and DE phosphoproteins with the sum of their fold change.
For the performance evaluation of pathway analysis, a widely used validation method is using the ranks of the target pathways in disease that have been validated or curated in publication, topper rank is better. This method is proposed in PADOG  and used in other studies of pathway analysis methods comparison [21, 22].
The target pathways for breast cancer
Ras signaling pathway
PI3K-Akt signaling pathway
MAPK signaling pathway
mTOR signaling pathway
Wnt signaling pathway
p53 signaling pathway
EGFR tyrosine kinase inhibitor resistance
ErbB signaling pathway
TGF-beta signaling pathway
Pathways in cancer
Performance evaluation of pathway analysis with protein expression and/or phosphorylation profiles
The overlap of top 50 ranking pathways in three methods with integrated information
Fanconi anemia pathway
Calcium signaling pathway
cAMP signaling pathway
Cytokine-cytokine receptor interaction
Adrenergic signaling in cardiomyocytes
Hedgehog signaling pathway
Retrograde endocannabinoid signaling
GnRH signaling pathway
Progesterone-mediated oocyte maturation
Basal cell carcinoma
Pathway rankings in subtypes of breast cancer
As shown in Fig. 3a, p53 pathway ranked lowest in the Basal-like breast cancer type and ranked lower in Luminal A than in Luminal B. It is reported that TP53 are the most recurrently mutated genes in breast cancer, with frequency of 84% in Basal-like tumors  and p53 pathway remains largely intact in Luminal A cancers but is often inactivated in the more aggressive Luminal B cancers .
In accordance with previous research, expression levels of Focal adhesion kinase (FAK/PTK2) are correlated strongly with poor tumor differentiation and significantly associated with HER2 overexpression in breast cancer . The highest level of FAK (Y861) and the lowest level of epidermal growth factor receptor 2 (HER2) activity can be observed in MDA-361 cells (ER+/HER2+ cell) . As FAK is the important role in the Focal adhesion pathway, we can infer that the activation of the Focal adhesion pathway was negative correlated with the expression of HER2. The rank of Focal adhesion pathway was lower in HER2- subtypes (Luminal A and HER2) than HER+ subtypes (Luminal B and Basal), as shown in Fig. 3b.
PI3K/AKT/mTOR pathway is a key intracellular signaling system that drives cellular growth and survival. Hyperactivation of this pathway is implicated in the tumorigenesis of ER+ breast cancer [36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. Besides, the pathway is also important in Triple-negative breast cancer  and HER2-overexpressing breast cancer . Preclinical studies indicate that inhibitors of the pathway can act synergistically with trastuzumab in resistant cells .
Many studies have established that mTOR pathway has tightly interaction with PI3K-AKT and MAPK signaling pathways. Inhibition of mTORC1, an important part of mTOR pathway, leads to MAPK pathway activation through a PI3K-dependent feedback in human cancer . It can be verified by the ranks of these pathways in four breast subtypes, the low rank of mTOR pathway corresponded to the high rank of PI3K-Akt signaling pathway (Fig. 3c and d). Luminal-type cells might use the MEK-ERK pathway to a lesser extent and seem to be more dependent on the PI3K pathway, shown by the preferential occurrence of PI3K mutations in this subtype . As show in Fig. 3d, PI3K-Akt signaling pathway in Luminal subtype ranked higher than the other two subtypes.
Expression and modification describe the in vivo changes of proteins in cancer proteome at different views. The pathway analysis based on the information at single level, such as protein expression or protein phosphorylation alone, often brings high risk of both false positive and false negative due to technological limitations. To the best of our knowledge, the integration proteomic and phosphoproteomic data in pathway analysis in cancer has not been evaluated and reported. In this study, the pathway analysis was performed and compared using the integration of proteomic and phosphoproteomic data in CPTAC’s breast cancer dataset. Moreover we tried to find the different patterns in pathway ranking among the subtypes.
Our results suggested that both differential expression of proteins and phosphorylation were useful for identifying the important pathways in cancer or cancer subtypes. Furthermore, the integration of protein expression and modification profiles could provide more comprehensive information and rank TPs more accurately. Although the ranking lists of three kinds of pathway analysis were different, some consistent results were observed since the expression change of proteins and phosphoproteins are used in all of strategies. While the GSEA requires the fold change of all proteins, it has more complete information reflecting the expression profile. SPIA needs the topology information of the pathways in addition, which can provide detailed influence between the nodes of pathways.
We also tested the performance using the union of DE proteins and phosphoproteins information in pathway ranking, but poor accuracy was obtained. It’s possibly because of too much noise in individual omics data. In order to control the risk of false positive, the intersection of the DE proteins or DE PTMs were used as input in this study that might be too conservative. Because only one dataset was tested here, for some new pathways in top ranking list, more independent proteomics datasets in cancer need to be processed and validated in the future.
Integrative pathway analysis by combing the information from protein expression, protein modification and the topology of protein interaction network is more efficient way to identify key pathway in breast cancer. Pathway ranking in certain subgroup of patients can provide insight into the specific mechanisms and be helpful for the precision medicine for each subtype.
We thank the High Performance Computing Center (HPCC) at Shanghai Jiao Tong University for the computation.
The work and the publication of this article were sponsored by grants from the National Natural Science Foundation of China (31271416).
Availability of data and materials
The datasets used in our study were downloaded from the publicly available databases mentioned in the text. The source code is available from the corresponding author on reasonable request.
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 8, 2018: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2018: systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume-12-supplement-8.
JR worked on the method, experiment and analyses. BW designed the Figures. JL contributed to the experiment and writing of the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 26.Moynahan ME, Cui TY, Jasin M. Homology-directed dna repair, mitomycin-c resistance, and chromosome stability is restored with correction of a Brca1 mutation. Cancer Res. 2001;61(12):4842–50.Google Scholar
- 34.Schmitz KJ, Grabellus F, Callies R, Otterbach F, Wohlschlaeger J, Levkau B, et al. High expression of focal adhesion kinase (p125FAK) in node-negative breast cancer is related to overexpression of HER-2/neu and activated Akt kinase but does not predict outcome. Breast Cancer Res. 2005;7(2):R194–203.CrossRefGoogle Scholar
- 36.Perez-Tenorio G, Alkhori L, Olsson B, Waltersson MA, Nordenskjold B, Rutqvist LE, et al. PIK3CA mutations and PTEN loss correlate with similar prognostic factors and are not mutually exclusive in breast cancer. Clin Cancer Res. 2007;13(12):3577–84. https://doi.org/10.1158/1078-0432.CCR-06-1609.CrossRefGoogle Scholar
- 38.Ellis MJ, Lin L, Crowder R, Tao Y, Hoog J, Snider J, et al. Phosphatidyl-inositol-3-kinase alpha catalytic subunit mutation and response to neoadjuvant endocrine therapy for estrogen receptor positive breast cancer. Breast Cancer Res Treat. 2010;119(2):379–90. https://doi.org/10.1007/s10549-009-0575-y.CrossRefPubMedPubMedCentralGoogle Scholar
- 42.Fu P, Ibusuki M, Yamamoto Y, Hayashi M, Murakami K, Zheng S, et al. Insulin-like growth factor-1 receptor gene expression is associated with survival in breast cancer: a comprehensive analysis of gene copy number, mRNA and protein expression. Breast Cancer Res Treat. 2011;130(1):307–17. https://doi.org/10.1007/s10549-011-1605-0.CrossRefGoogle Scholar
- 43.Law JH, Habibi G, Hu K, Masoudi H, Wang MY, Stratford AL, et al. Phosphorylated insulin-like growth factor-i/insulin receptor is present in all breast cancer subtypes and is related to poor survival. Cancer Res. 2008;68(24):10238–46. https://doi.org/10.1158/0008-5472.CAN-08-2755.CrossRefGoogle Scholar
- 46.Lehmann BD, Bauer JA, Chen X, Sanders ME, Chakravarthy AB, Shyr Y, et al. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest. 2011;121(7):2750–67. https://doi.org/10.1172/JCI45014.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.