Design Issues and Comparison of Methods for Microarray-Based Classification

  • Edward R. Dougherty
  • Sanju N. Attoor

9. Conclusion

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.


Design Issue Epanechnikov Kernel Part Ition Computational Genomics Accor Ding 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). “Tissue Classification with Gene Expression Profiles.” Computational Biology 7:559–583.CrossRefGoogle Scholar
  2. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: University Press.Google Scholar
  3. Bittner, M., Meltzer, P., Khan, J., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Gillanders, E., Leja, A., Dietrich, K., Beaudry, C., Berrens, M., Alberts, D., Sondak, V., Hayward, N., and Trent, J. (2000). “Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling.” Nature 406:536–540.PubMedCrossRefGoogle Scholar
  4. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr. M., and Haussler, D. (2000). “Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines.” Proc National Academy Science 97:262–267.CrossRefGoogle Scholar
  5. Cybenko, G. (1989). “Approximation by Superposition of Sigmoidal Functions.” Mathematics Control Signals Systems 2:303–314.CrossRefGoogle Scholar
  6. Devroye, L., Gyorfi, L., and G. Lugosi. (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag.Google Scholar
  7. Devroye, L. and Kryzak, A. (1989). “An Equivalence Theorem for L 1 Convergence of the Kernel Regression Estimate.” Statistical Planning and Inference 23:71–82.CrossRefGoogle Scholar
  8. Dougherty, E. R. (2001). “Small Sample Issues for Microarray-Based Classification.” Comparative and Functional Genomics 2:28–34.CrossRefGoogle Scholar
  9. Farago, A. and Lugosi, G. (1993). “Strong Universal Consistency of Neural Network Classifiers.” IEEE Trans on Information Theory 39:1146–1151.CrossRefGoogle Scholar
  10. Funahashi, K. (1989). “On the Approximate Realization of Continuous Mappings by Neural Networks.” Neural Networks 2:183–192.CrossRefGoogle Scholar
  11. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.” Science 286:531–537.PubMedCrossRefGoogle Scholar
  12. Gordon, L. and Olshen, R. (1978). “Asymptotically Efficient Solutions to the Classification Problem.” Annals of Statistics 6:525–533.Google Scholar
  13. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon. R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvverger, S., Loman, N., Johannsson, O., Olsson, H., Wifond, B., Sauter, G., Kallioniemi, O. P., Borg, A., and Trent, J. (2001). “Gene Expression Profiles Distinguish Hereditary Breast Cancers.” New England J Medicine 34:539–548.CrossRefGoogle Scholar
  14. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2002). “Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks.” Nature Medicine 7:673–679.CrossRefGoogle Scholar
  15. Kim, S., Dougherty, E. R., Barrera, J., Chen, Y., Bittner, M., and Trent, J. M. (2002). “Strong Feature Sets From Small Samples.” Journal of Computational Biology 9.Google Scholar
  16. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington DC: Spartan.Google Scholar
  17. Stone, C. (1977). “Consistent Nonparametric Regression.” Annals of Statistics 5:595–645.CrossRefGoogle Scholar
  18. Vapnik, V. N., Golowich, S. E., and Smola, A. (1997). “Support Vector Method for Function Approximation, Regression, and Signal Processing.” In: Advances In Neural Information Processing Systems 9.Google Scholar
  19. Vapnik, V. N. (1998). Statistical Learning Theory. New York: John Wiley.Google Scholar
  20. Vapnik, V. and Chervonenkis, A. (1974). Theory of Pattern Recognition. Moscow: Nauka.Google Scholar
  21. Vapnik, V. and Chervonenkis, A. (1971). “On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities.” Theory of Probability and its Applications 16:264–280.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Edward R. Dougherty
    • 1
  • Sanju N. Attoor
    • 1
  1. 1.Department of Electrical EngineeringTexas A & M UniversityCollege StationUSA

Personalised recommendations