Gene Presence and Absence in Genomic Big Data for Precision Medicine

  • Mohamood Adhil
  • Mahima Agarwal
  • Krittika Ghosh
  • Manas Sule
  • Asoke K. TalukderEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 673)


The twenty–first-century precision medicine aims at using a systems-oriented approach to find the root cause of disease specific to an individual by including molecular pathology tests. The challenges of genomic data analysis for precision medicine are multifold, they are a combination of big data, high dimensionality, and with often multimodal distributions. Advanced investigations use techniques such as Next Generation Sequencing (NGS) which rely on complex statistical methods for gaining useful insights. Analysis of the exome and transcriptome data allow for in-depth study of the 22 thousand genes in the human body, many of which relate to phenotype and disease state. Not all genes are expressed in all tissues. In disease state, some genes are even deleted in the genome. Therefore, as part of knowledge discovery, exome and transcriptome big data needs to be analyzed to determine whether a gene is actually absent (deleted/not expressed) or present. In this paper, we present a statistical technique to identify the genes that are present or absent in exome or transcriptome data (big data) to improve the accuracy for precision medicine.


Big data Algorithms Genomics Multimodal distribution Exome analysis Transcriptomics analysis Gaussian mixture model Precision medicine 


  1. 1.
    Eisenstein, Michael. “Big data: the power of petabytes.” Nature 527.7576 (2015): S2–S4.Google Scholar
  2. 2.
    Bock, Hans-Hermann, and Edwin Diday, eds. Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer Science & Business Media, 2012.Google Scholar
  3. 3.
    Morley, Michael, et al. “Genetic analysis of genome-wide variation in human gene expression.” Nature 430.7001 (2004): 743–747.Google Scholar
  4. 4.
    Ried, Thomas, et al. “Genomic changes defining the genesis, progression, and malignancy potential in solid human tumors: a phenotype/genotype correlation.” Genes, Chromosomes and Cancer 25.3 (1999): 195–204.Google Scholar
  5. 5.
    Kitano, Hiroaki. “Computational systems biology.” Nature 420.6912 (2002): 206–210.Google Scholar
  6. 6.
    Maniatis, Tom, Stephen Goodbourn, and Janice A. Fischer. “Regulation of inducible and tissue-specific gene expression.” Science 236 (1987): 1237–1246.Google Scholar
  7. 7.
    Komura, Daisuke, et al. “Noise reduction from genotyping microarrays using probe level information.” In silico biology 6.1, 2 (2006): 79–92.Google Scholar
  8. 8.
    Schwartz, Schraga, Ram Oren, and Gil Ast. “Detection and removal of biases in the analysis of next-generation sequencing reads.” PloS one 6.1 (2011): e16685.Google Scholar
  9. 9.
    Trapnell, Cole, et al. “Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.” Nature protocols 7.3 (2012): 562–578.Google Scholar
  10. 10.
    iOMICS-Research Version 4.0.Google Scholar
  11. 11.
    Reynolds, Douglas. “Gaussian mixture models.” Encyclopedia of biometrics (2015): 827–832.Google Scholar
  12. 12.
    Moon, Todd K. “The expectation-maximization algorithm.” IEEE Signal processing magazine 13.6 (1996): 47–60.Google Scholar
  13. 13.
    Lappalainen, Tuuli, et al. “Transcriptome and genome sequencing uncovers functional variation in humans.” Nature 501.7468 (2013): 506–511.Google Scholar
  14. 14.
    Petryszak, Robert, et al. “Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants.” Nucleic acids research (2015): gkv1045.Google Scholar
  15. 15.
    Pleasance, Erin D., et al. “A comprehensive catalogue of somatic mutations from a human cancer genome.” Nature 463.7278 (2010): 191–196.Google Scholar
  16. 16.
    Talukder, Asoke K., et al. “Tracking Cancer Genetic Evolution using OncoTrack.” Scientific Reports 6 (2016).Google Scholar
  17. 17.
    Gracia-Aznarez, Francisco Javier, et al. “Whole exome sequencing suggests much of non-BRCA1/BRCA2 familial breast cancer is due to moderate and low penetrance susceptibility alleles.” PloS one 8.2 (2013): e55681.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Mohamood Adhil
    • 1
  • Mahima Agarwal
    • 1
  • Krittika Ghosh
    • 1
  • Manas Sule
    • 1
  • Asoke K. Talukder
    • 1
    Email author
  1. 1.InterpretomicsBangaloreIndia

Personalised recommendations