Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces

  • L. R. Grate
  • C. Bhattacharyya
  • M. I. Jordan
  • I. S. Mian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2452)

Abstract

Molecular profiling technologies monitor thousands of transcripts, proteins, metabolites or other species concurrently in biological samples of interest. Given two-class, high-dimensional profiling data, nominal Liknon [4] is a specific implementation of a methodology for performing simultaneous relevant feature identification and classification. It exploits the well-known property that minimizing an l 1 norm (via linear programming) yields a sparse hyperplane [15],[26],[2],[8],[17]. This work (i) examines computational, software and practical issues required to realize nominal Liknon, (ii) summarizes results from its application to five real world data sets, (iii) outlines heuristic solutions to problems posed by domain experts when interpreting the results and (iv) defines some future directions of the research.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S.V. Allander, N.N. Nupponen, M. Ringner, G. Hostetter, G.W. Maher, N. Goldberger, Y. Chen, Carpten J., A.G. Elkahloun, and P.S. Meltzer. Gastrointestinal Stromal Tumors with KIT mutations exhibit a remarkably homogeneous gene expression profile. Cancer Research, 61:8624–8628, 2001.Google Scholar
  2. 2.
    K. Bennett and A. Demiriz. Semi-supervised support vector machines. In Neural and Information Processing Systems, volume 11. MIT Press, Cambridge MA, 1999.Google Scholar
  3. 3.
    A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, and M. Meyerson. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci., 98:13790–13795, 2001.Google Scholar
  4. 4.
    C. Bhattacharyya, L.R. Grate, A. Rizki, D.C. Radisky, F.J. Molina, M.I. Jordan, M.J. Bissell, and I.S. Mian. Simultaneous relevant feature identification and classification in high-dimensional spaces: application to molecular profiling data. Submitted, Signal Processing, 2002.Google Scholar
  5. 5.
    M.P. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, Jr, and D. Haussler. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci., 97:262–267, 2000.Google Scholar
  6. 6.
    P. Cheeseman and J. Stutz. Bayesian Classification (AutoClass): Theory and Results. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153–180. AAAI Press/MIT Press, 1995. The software is available at the URL http://www.gnu.org/directory/autoclass.html.
  7. 7.
    M.L. Chow, E.J. Moler, and I.S. Mian. Identifying marker genes in transcription profile data using a mixture of feature relevance experts. Physiological Genomics, 5:99–111, 2001.Google Scholar
  8. 8.
    N. Cristianini and J. Shawe-Taylor. Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, England, 2000.Google Scholar
  9. 9.
    S.M. Dhanasekaran, T.R. Barrette, R. Ghosh, D. Shah, S. Varambally, K. Kurachi, K.J. Pienta, M.J. Rubin, and A.M. Chinnaiyan. Delineation of prognostic biomarkers in prostate cancer. Nature, 432, 2001.Google Scholar
  10. 10.
    D.L. Donoho and X. Huo. Uncertainty principles and idea atomic decomposition. Technical Report, Statistics Department, Stanford University, 1999.Google Scholar
  11. 11.
    R. Fletcher. Practical Methods in Optimization. John Wiley & Sons, New York, 2000.Google Scholar
  12. 12.
    T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16:906–914, 2000.CrossRefGoogle Scholar
  13. 13.
    M.E. Garber, O.G. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler, M. Pacyana-Gengelbach, M. van de Rijn, G.D. Rosen, C.M. Perou, R.I. Whyte, R.B. Altman, P.O. Brown, D. Botstein, and I. Petersen. Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci., 98:13784–13789, 2001.Google Scholar
  14. 14.
    T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfeld, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999. The data are available at the URL http://waldo.wi.mit.edu/MPR/data_sets.html.CrossRefGoogle Scholar
  15. 15.
    T. Graepel, B. Herbrich, R. Schölkopf, A.J. Smola, P. Bartlett, K. Müller, K. Obermayer, and R.C. Williamson. Classification on proximity data with lp-machines. In Ninth International Conference on Artificial Neural Networks, volume 470, pages 304–309. IEE, London, 1999.Google Scholar
  16. 16.
    L.R. Grate, C. Bhattacharyya, M.I. Jordan, and I.S. Mian. Integrated analysis of transcript profiling and protein sequence data. In press, Mechanisms of Ageing and Development, 2002.Google Scholar
  17. 17.
    T. Hastie, R. Tibshirani, and Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2000.Google Scholar
  18. 18.
    I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.-P. Kallioniemi, A. Borg, and J. Trent. Gene-expression profiles in hereditary breast cancer. New England Journal of Medicine, 344:539–548, 2001.CrossRefGoogle Scholar
  19. 19.
    J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, Antonescu C.R., Peterson C., and P.S. Meltzer. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7:673–679, 2001.CrossRefGoogle Scholar
  20. 20.
    G. Lanckerit, L. El Ghaoui, C. Bhattacharyya, and M.I. Jordan. Minimax probability machine. Advances in Neural Processing systems, 14, 2001.Google Scholar
  21. 21.
    L.A. Liotta, E.C. Kohn, and E.F. Perticoin. Clinical proteomics. personalized molecular medicine. JAMA, 14:2211–2214, 2001.CrossRefGoogle Scholar
  22. 22.
    E.J. Moler, M.L. Chow, and I.S. Mian. Analysis of molecular profile data using generative and discriminative methods. Physiological Genomics, 4:109–126, 2000.Google Scholar
  23. 23.
    D.A. Notterman, U. Alon, A.J. Sierk, and A.J. Levine. Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research, 61:3124–3130, 2001.Google Scholar
  24. 24.
    E.F. Petricoin III, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359:572–577, 2002.CrossRefGoogle Scholar
  25. 25.
    S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci., 98:15149–15154, 2001. The data are available from http://www-genome.wi.mit.edu/mpr/GCM.html.
  26. 26.
    A. Smola, T.T. Friess, and B. Schölkopf. Semiparametric support vector and linear programming machines. In Neural and Information Processing Systems, volume 11. MIT Press, Cambridge MA, 1999.Google Scholar
  27. 27.
    T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, T. Thorsen, H. Quist, J.C. Matese, P.O. Brown, D. Botstein, P.E. Lonning, and A.-L. Borresen-Dale. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci., 98:10869–10874, 2001.Google Scholar
  28. 28.
    A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson Jr, and G.M. Hampton. Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61:7388–7393, 2001.Google Scholar
  29. 29.
    L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, van der Kooy K., M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.CrossRefGoogle Scholar
  30. 30.
    V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.MATHGoogle Scholar
  31. 31.
    J.B. Welsh, L.M. Sapinoso, A.I. Su, S.G. Kern, J. Wang-Rodriguez, C.A. Moskaluk, J.F. Frierson Jr, and G.M. Hampton. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Research, 61:5974–5978, 2001.Google Scholar
  32. 32.
    J. Weston, Mukherjee S., O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature Selection for SVMs. In Advances in Neural Information Processing Systems, volume 13, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • L. R. Grate
    • 1
  • C. Bhattacharyya
    • 2
    • 3
  • M. I. Jordan
    • 2
    • 3
  • I. S. Mian
    • 1
  1. 1.Life Sciences DivisionLawrence Berkeley National LaboratoryBerkeley
  2. 2.Department of EECSUniversity of California BerkeleyBerkeley
  3. 3.Department of StatisticsUniversity of California BerkeleyBerkeley

Personalised recommendations