Extending Biological Pathways by Utilizing Conditional Mutual Information Extracted from RNA-SEQ Gene Expression Data

  • Tham H. Hoang
  • Pujan Joshi
  • Seung-Hyun Hong
  • Dong-Guk Shin
Conference paper
Part of the IFMBE Proceedings book series (IFMBE, volume 63)


We propose a method of constructing a gene/protein regulatory network specifically tailored for assessing the disease state of a patient by combining generally known gene/protein pathways with transcription level changes obtained by comparing the patient’s data with the average gene expression data of the disease population. This approach uses histogram estimation with conditional mutual information to identify if some genes/proteins may more likely interact with each other. We applied our method to the Cancer Genome Atlas (TCGA) cancer data, specifically, RNA-Seq gene expression data of 110 breast cancer, 141 colorectal cancer, 445 gastric cancer and 105 rectal cancer samples, which are publicly available. We focused on examining transcription factors such as SNAI1, SNAI2, ZEB2, and TWIST1 and their downstream targets in EMT pathway (e.g., OCLN, DSP, VIM and CDH2…). We discovered that although the participating biological entities of the EMT pathway are generally known, our approach can extend their regulatory relationships through new discoveries. Our approach could form a basis for inventing a novel way of constructing a gene regulation pathway specifically tailored for each individual cancer patient.


TCGA Gene expression Cancer pathways EMT Conditional mutual information 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



Work by THH was funded by a grant from the Vietnam Education Foundation (VEF). The opinions, findings, and conclusions stated herein are solely of the authors and do not necessarily reflect the official view of VEF.


  1. 1.
    Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y et al (2010) MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178. doi: 10.1093/nar/gkq622 CrossRefGoogle Scholar
  2. 2.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:5763Google Scholar
  3. 3.
    Tan TZ, Miow QH, Miki Y, Noda T, Mori S, Huang RY, Thiery JP (2014) Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients. EMBO Mol Med 6(10):1279–1293. doi: 10.15252/emmm.201404208 CrossRefGoogle Scholar
  4. 4.
    Xie G et al (2014) Tumour-initiating capacity is independent of epithelial–mesenchymal transition status in breast cancer cell lines. Br J Cancer 110(10):2514–2523CrossRefGoogle Scholar
  5. 5.
    Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Chen L (2012) Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics 28(1):98–104CrossRefGoogle Scholar
  6. 6.
    Zhou Xionghui, Liu Juan (2014) Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer. PLoS ONE 9(3):e92023CrossRefGoogle Scholar
  7. 7.
    Brown G et al (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(1):27–66MathSciNetzbMATHGoogle Scholar
  8. 8.
    Hoang TH, Joshi P, Hong SH, Shin DG (2015) A bitwise encoding scheme designed to improve the speed of large scale gene set comparison. In: Proceedings of the international conference on bioinformatics & computational biology, p 67Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Tham H. Hoang
    • 1
  • Pujan Joshi
    • 1
  • Seung-Hyun Hong
    • 1
  • Dong-Guk Shin
    • 1
  1. 1.Computer Science and Engineering DepartmentUniversity of ConnecticutStorrs, ConnecticutUSA

Personalised recommendations