Skip to main content

CSAX: Characterizing Systematic Anomalies in eXpression Data

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Abstract

Methods for translating gene expression signatures into clinically relevant information have typically relied upon having many samples from patients with similar molecular phenotypes. Here, we address the question of what can be done when it is relatively easy to obtain healthy patient samples, but when abnormalities corresponding to disease states may be rare and one-of-a-kind. The associated computational challenge, anomaly detection, is a well-studied machine learning problem. However, due to the dimensionality and variability of expression data, existing methods based on feature space analysis or individual anomalously-expressed genes are insufficient. We present a novel approach, CSAX, that identifies pathways in an individual sample in which the normal expression relationships are disrupted. To evaluate our approach, we have compiled and released a compendium of public microarray data sets, reformulated to create a testbed for anomaly detection. We demonstrate the accuracy of CSAX on the data sets in our compendium, compare it to other leading anomaly-detection methods, and show that CSAX aids both in identifying anomalies and in explaining their underlying biology. We note the potential for the use of such methods in identifying subclasses of disease. We also describe an approach to characterizing the difficulty of specific expression anomaly detection tasks and discuss how one can estimate the feasibility of a specific task. Our approach provides an important step towards identification of individual disease patterns in the era of personalized medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lockhart, D., Dong, H., Byrne, M., Follettie, M., Gallo, M., Chee, M., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotech. 14, 1675–1680 (1996)

    Article  Google Scholar 

  2. Shalon, D., Smith, S., Brown, P.: A DNA micro-array system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Gen. Res. 6, 639–645 (1996)

    Article  Google Scholar 

  3. Mehta, R., Jain, R., Badve, S.: Personalized medicine: the road ahead. Clin. Breast Cancer 11(1), 20–26 (2011)

    Article  Google Scholar 

  4. Glas, A.M., Floore, A., Delahaye, L.J., Witteveen, A.T., Pover, R.C., Bakx, N., Lahti-Domenici, J.S., Bruinsma, T.J., Warmoes, M.O., Bernards, R., Wessels, L.F., Van’t Veer, L.J.: Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7, 278 (2006)

    Article  Google Scholar 

  5. Slonim, D.: From patterns to pathways: gene expression data analysis comes of age. Nature Genetics 32(suppl.), 502–508 (2002)

    Article  Google Scholar 

  6. Tusher, V., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98(9), 5116–5121 (2001)

    Article  MATH  Google Scholar 

  7. Dougherty, E.: Small sample issues for microarray-based classification. Comp. Funct. Genomics 2(1), 28–34 (2001)

    Article  Google Scholar 

  8. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)

    Google Scholar 

  9. Mikkelsen, T., Galagan, J., Mesirov, J.: Improving genome annotations using phylogenetic profile anomaly detection. Bioinformatics 21(4), 464–470 (2005)

    Article  Google Scholar 

  10. Kim, H., Gelenbe, E.: Anomaly detection in gene expression via stochastic models of gene regulatory networks. BMC Genomics 10(S3), S26 (2009)

    Google Scholar 

  11. Torkamani, A., Schork, N.: Prestige centrality-based functional outlier detection in gene expression analysis. Bioinformatics 25(17), 2222–2228 (2009)

    Article  Google Scholar 

  12. Mpindi, J.P., Sara, H., Haapa-Paananen, S., Kilpinen, S., Pisto, T., Bucher, E., Ojala, K., Iljin, K., Vainio, P., Bjorkman, M., Gupta, S., Kohonen, P., Nees, M., Kallioniemi, O.: GTI: a novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets. PLoS One 6(2), e17259 (2011)

    Google Scholar 

  13. Li, L., Chaudhuri, A., Chant, J., Tang, Z.: PADGE: analysis of heterogeneous patterns of differential gene expression. Physiol. Genomics 32(1), 154–159 (2007)

    Article  Google Scholar 

  14. Ghosh, D.: Discrete nonparametric algorithms for outlier detection with genomic data. J. Biopharm. Stat. 20(2), 193–208 (2010)

    Article  Google Scholar 

  15. Karrila, S., Lee, J., Tucker-Kellogg, G.: A comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery. Cancer Inform. 10, 109–120 (2011)

    Google Scholar 

  16. Sauer, U., Preininger, C., Hany-Schmatzberger, R.: Quick and simple: quality control of microarray data. Bioinformatics 21, 1572–1578 (2005)

    Article  Google Scholar 

  17. Tomlins, S., Rhodes, D., Perner, S., Dhanasekaran, S., Mehra, R., Sun, X., Varambally, S., Cao, X., Tchinda, J., Kuefer, R., Lee, C., Montie, J., Shah, R., Pienta, K., Rubin, M., Chinnaiyan, A.: Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005)

    Article  Google Scholar 

  18. Noto, K., Brodley, C., Slonim, D.: FRaC: A feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Mining and Knowledge Discovery 25, 109–133 (2011)

    Article  MathSciNet  Google Scholar 

  19. Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: identifying density-based local outliers. ACM SIGMOD Record 29(2), 93–104 (2000)

    Article  Google Scholar 

  20. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)

    Article  Google Scholar 

  21. Tribus, M.: Thermodynamics and Thermostatics: An Introduction to Energy, Information and States of Matter, with Engineering Applications. D. Van Nostrand Company Inc., New York (1961)

    Google Scholar 

  22. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., Mesirov, J.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  23. Mootha, V., Lindgren, C., Eriksson, K.-F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstråle, M., Laurila, E., Houstis, N., Daly, M., Patterson, N., Mesirov, J., Golub, T.R., Tamayo, P., Spiegelman, B., Lander, E.S., Hirschhorn, J.N., Altshuler, D., Groop, L.C.: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34(3), 267–273 (2003)

    Article  Google Scholar 

  24. Spackman, K.A.: Signal detection theory: Valuable tools for evaluating inductive learning. In: Proceedings of the Sixth International Workshop on Machine Learning, pp. 160–163. Morgan Kaufmann Publishers Inc., San Francisco (1989)

    Google Scholar 

  25. Sandilands, E., Akbarzadeh, S., Vecchione, A., McEwan, D., Frame, M., Heath, J.: Src kinase modulates the activation, transport and signalling dynamics of fibroblast growth factor receptors. EMBO Reports 8, 1162–1169 (2007)

    Article  Google Scholar 

  26. Francavilla, C., Cattaneo, P., Berezin, V., Bock, E., Ami, D., de Marco, A., Chrisofori, G., Cavallaro, U.: The binding of ncam to fgfr1 induces a specific cellular response mediated by receptor trafficking. J. Cell. Biol. 187(7), 1101 (2009)

    Article  Google Scholar 

  27. Kales, S., Ryan, P., Nau, M., Lipkowitz, S.: Cbl and human myeloid neoplasms: the Cbl oncogene comes of age. Cancer Res. 70(12), 4789–4794 (2010)

    Article  Google Scholar 

  28. MacDonald, J.W., Ghosh, D.: COPA–cancer outlier profile analysis. Bioinformatics 22(23), 2950–2951 (2006)

    Article  Google Scholar 

  29. Noto, K., Brodley, C., Slonim, D.: Anomaly detection using an ensemble of feature models. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010). IEEE Computer Society Press (2010)

    Google Scholar 

  30. Croft, D., O’Kelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., Jupe, S., Kalatskaya, I., Mahajan, S., May, B., Ndegwa, N., Schmidt, E., Shamovsky, V., Yung, C., Birney, E., Hermjakob, H., D’Eustachio, P., Stein, L.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research 39, D691–D697 (2011)

    Google Scholar 

  31. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines (2001) Software available at, http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Noto, K., Brodley, C., Majidi, S., Bianchi, D.W., Slonim, D.K. (2014). CSAX: Characterizing Systematic Anomalies in eXpression Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics