A Parallel Classification and Feature Reduction Method for Biomedical Applications

  • Mario R. Guarracino
  • Salvatore Cuciniello
  • Davide Feminiano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4967)

Abstract

Classification is one of the most widely used methods in data mining, with numerous applications in biomedicine. The scope and the resolution of data involved in many real life applications require very efficient implementations of classification methods, developed to run on parallel or distributed computational systems. In this study we describe SVD-ReGEC, a fully parallel implementation, for distributed memory multicomputers, of a classification algorithm with a feature reduction. The classification is based on Regularized Generalized Eigenvalue Classifier (ReGEC) and the preprocessing stage is a filter method algorithm based on Singular Value Decomposition (SVD), that reduces the dimension of the space in which classification is accomplished. The implementation is tested on random datasets and results are discussed using standard parameters.

Keywords

Binary classification Generalized Eigenvalue Classifier Feature transformation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cannataro, M., Talia, D., Srimani, P.: Parallel data intensive computing in scientific and commercial applications. Par. Comp. 28(5), 673–704 (2002)CrossRefGoogle Scholar
  2. 2.
    Oja, E.: A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15, 267–273 (1982)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Wall, M., Dyck, P., Brettin, T.: SVDMAN - Singular Value Decomposition analysis of microarray data. Bioinformatics 17(6), 566–568 (2001)CrossRefGoogle Scholar
  4. 4.
    Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  5. 5.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)MATHGoogle Scholar
  6. 6.
    Osuna, R., Girosi, F.: An improved training algorithm for support vector machines. In: IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285 (1997)Google Scholar
  7. 7.
    Platt, J.: Fast training of SVMs using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT press, Cambridge (1999)Google Scholar
  8. 8.
    Graf, H., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik Parallel, V.: support vector machines: the cascade SVM. In: Press, M. (ed.) Proc. of Neural Information Processing Systems (NIPS), vol. 17 (2004)Google Scholar
  9. 9.
    Mangasarian, O., Wild, E.: Multisurface proximal support vector classification via generalized eigenvalues. Technical Report 04-03, Data Mining Institute (September 2004)Google Scholar
  10. 10.
    Guarracino, M.R., Cifarelli, C., Seref, O., Pardalos, P.M.: A classification algorithm based on generalized eigenvalue problems. Opt. Meth. Soft. 22(1), 73–81 (2007)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley-Interscience Publication, Chichester (2000)Google Scholar
  12. 12.
    Yan, R.: A matlab package for classification algorithms (2006), http://finalfantasyxi.inf.cs.cmu.edu/tmp/MATLABArsenal.zip
  13. 13.
    Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344, 539–548 (2001)CrossRefGoogle Scholar
  14. 14.
    Nutt, C., et al.: Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocelllaur carcinoma after curative resection. The Lancet 63(7), 1602–1607 (2003)Google Scholar
  15. 15.
    Blake, C., Merz, C.: Uci repository of machine learning databases (1998), www.ics.uci.edu/~mlearn/MLRepository.html
  16. 16.
    Dongarra, J., Whaley, R.: A user’s guide to the blacs v1.1. Technical Report UT-CS-95-281, Dept. of CS, U. of Tennessee, Knoxville (1995)Google Scholar
  17. 17.
    Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface, 2nd edn. The MIT Press, Cambridge (1999)CrossRefGoogle Scholar
  18. 18.
    Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.: Scalapack: A portable linear algebra library for distributed memory computers - design and performance. Comp. Phys. Comm. (97), 1–15 (1996)MATHCrossRefGoogle Scholar
  19. 19.
    Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.: A proposal for a set of parallel basic linear algebra subprograms. Technical Report UT-CS-95-292, Dept. of CS, U. of Tennessee, Knoxville (1995)Google Scholar
  20. 20.
    Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users Guide, 2nd edn. SIAM, Philadelphia (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Mario R. Guarracino
    • 1
  • Salvatore Cuciniello
    • 1
  • Davide Feminiano
    • 1
  1. 1.High Performance Computing and Networking InstituteItalian Research Council 

Personalised recommendations