Statistically Supported Identification of Tumor Subtypes

  • Guoli Sun
  • Alexander KrasnitzEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1878)


Identification of biologically and clinically consequential subtypes within tumor types is a long-standing goal of cancer bioinformatics. Here we provide practical guidance to the use of a recently developed statistical subtyping tool, termed Tree Branches Evaluated Statistically for Tightness (TBEST), and its eponymous R language implementation. TBEST employs hierarchical clustering to partition the data at a user-specified level of significance. Functionalities of the package are illustrated using as an example a benchmark data set of mRNA expression levels in leukemia.

Key words

Tumor subtypes Unsupervised learning Hierarchical clustering Permutation tests 


  1. 1.
    Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dale AL, Botstein D (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100(14):8418–8423. Scholar
  2. 2.
    Sun G, Krasnitz A (2014) Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data. BMC Genomics 15:1000. Scholar
  3. 3.
    Diaz-Romero J, Romeo S, Bovee JV, Hogendoorn PC, Heini PF, Mainil-Varlet P (2010) Hierarchical clustering of flow cytometry data for the study of conventional central chondrosarcoma. J Cell Physiol 225(2):601–611. Scholar
  4. 4.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRefGoogle Scholar
  5. 5.
    Kislinger T, Cox B, Kannan A, Chung C, Hu P, Ignatchenko A, Scott MS, Gramolini AO, Morris Q, Hallett MT, Rossant J, Hughes TR, Frey B, Emili A (2006) Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling. Cell 125(1):173–186. Scholar
  6. 6.
    Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118. Scholar
  7. 7.
    Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Stepansky A, Levy D, Esposito D, Muthuswamy L, Krasnitz A, McCombie WR, Hicks J, Wigler M (2011) Tumour evolution inferred by single-cell sequencing. Nature 472(7341):90–94. Scholar
  8. 8.
    Munneke B, Schlauch KA, Simonsen KL, Beavis WD, Doerge RW (2005) Adding confidence to gene expression clustering. Genetics 170(4):2003–2011. Scholar
  9. 9.
    Liu Y, Hayes DN, Nobel A, Marron JS (2008) Statistical significance of clustering for high-dimension, low-sample size data. J Am Stat Assoc 103(483):1281–1293. Scholar
  10. 10.
    Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24(5):719–720. Scholar
  11. 11.
    Sun G, Krasnitz A (2013) TBEST: Tree branches evaluated statistically for tightness. The Comprehensive R Archive Network.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Intuit Inc.Mountain ViewUSA
  2. 2.Simons Center for Quantitative BiologyCold Spring Harbor LaboratoryCold Spring HarborUSA

Personalised recommendations