A Self-supervised Learning Framework for Classifying Microarray Gene Expression Data

  • Yijuan Lu
  • Qi Tian
  • Feng Liu
  • Maribel Sanchez
  • Yufeng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


It is important to develop computational methods that can effectively resolve two intrinsic problems in microarray data: high dimensionality and small sample size. In this paper, we propose a self-supervised learning framework for classifying microarray gene expression data using Kernel Discriminant-EM (KDEM) algorithm. This framework applies self-supervised learning techniques in an optimal nonlinear discriminating subspace. It efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends linear algorithm in DEM to kernel algorithm to handle nonlinearly separable data in a lower dimensional space. Extensive experiments on the Plasmodium falciparum expression profiles show the promising performance of the approach.


Plasmodium Falciparum Unlabeled Data Lower Dimensional Space Small Sample Size Problem Malaria Parasite Plasmodium Falciparum 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Wu, Y., Tian, Q., Huang, T.S.: Discriminant EM algorithm with application to image retrieval. In: Proc. of IEEE Conf. Computer Vision and Pattern Recognition (2000)Google Scholar
  2. 2.
    Duda, R.O., Hart, P.E., Stork, D.G.: 2nd Pattern Classification. John Wiley & Sons, Inc., Chichester (2001)zbMATHGoogle Scholar
  3. 3.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Mass (2002)Google Scholar
  4. 4.
    Bozdech, Z., Llinas, M., Pulliam, B.L., Wong, E.D., Zhu, J., DeRisi, J.L.: The transcriptome of the intraerythrocytic development cycle of plasmodium falciparum. Plos Biology 1(1), 1–16 (2003)CrossRefGoogle Scholar
  5. 5.
    Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R.W., et al.: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002)CrossRefGoogle Scholar
  6. 6.
    Brown, M.P., Grundy, W.N., Lin, D., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl, Acad. Sci. USA 97(1), 262–267 (2000)CrossRefGoogle Scholar
  7. 7.
    The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)Google Scholar
  8. 8.
    Wu, Y., Wang, X., Liu, X., Wang, Y.: Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. Genome Res. 13, 601–616 (2003)CrossRefGoogle Scholar
  9. 9.
    Gantt, S.M., Myung, J.M., Briones, M.R., Li, W.D., Corey, E.J., Omura, S., Nussenzweig, V., Sinnis, P.: Proteasome inhibitors block development of Plasmodium spp. Antimicrob Agents Chemother 42, 2731–2738 (1998)Google Scholar
  10. 10.
    Kitano, H.: Systems biology: A brief overview. Science 295, 1662–1664 (2002)CrossRefGoogle Scholar
  11. 11.
    Bowers, P.M., Cokus, S.J., Eisenberg, D., Yeates, T.O.: Use of logic relationships to decipher protein network organization. Science 306, 2246–2249 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yijuan Lu
    • 1
  • Qi Tian
    • 1
  • Feng Liu
    • 2
  • Maribel Sanchez
    • 3
  • Yufeng Wang
    • 3
  1. 1.Department of Computer ScienceUniversity of Texas at San AntonioUSA
  2. 2.Department of PharmacologyUniversity of Texas Health Science Center, at San AntonioUSA
  3. 3.Department of BiologyUniversity of Texas at San AntonioUSA

Personalised recommendations