On the Statistical Comparison of Inductive Learning Methods

  • A. Feelders
  • W. Verkooijen
Part of the Lecture Notes in Statistics book series (LNS, volume 112)


Experimental comparisons between statistical and machine learning methods appear with increasing frequency in the literature. However, there does not seem to be a consensus on how such a comparison is performed in a methodologically sound way. Especially the effect of testing multiple hypotheses on the probability of producing a ”false alarm” is often ignored.

We transfer multiple comparison procedures from the statistical literature to the type of study discussed in this paper. These testing procedures take the number of tests performed into account, thereby controlling the probability of generating ”false alarms”. The multiple comparison procedures selected are illustrated on well-know regression and classification data sets.


False Alarm Linear Discriminant Analysis Pairwise Difference Multiple Comparison Procedure Linear Discriminant Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [CH90]
    M.J. Crowder and D.J. Hand. Analysis of repeated measures. Chapman 0000 Hall, London, 1990.zbMATHGoogle Scholar
  2. [CO94]
    P. Cheeseman and R.W. Oldford, editors. Selecting models from data: AI and statistics IV. Lecture notes in statistics nr. 89. Springer-Verlag, New York, 1994.Google Scholar
  3. [DM94]
    F. Diebold and R. Mariano. Comparing predictive accuracy. Technical re- port, University of Pennsylvania, 1994.Google Scholar
  4. [Dun61]
    Olive Jean Dunn. Multiple comparisons among means. Journal of the Amer- ican Statistical Association, 56: 52–64, 1961.zbMATHCrossRefGoogle Scholar
  5. [Fis35]
    R. A. Fisher. The design of experiments. Olivier & Boyd, Edinburgh, 1935.Google Scholar
  6. [FG93]
    D. Fletcher and E. Goss. Forecasting with neural networks: an application using bankruptcy data. Information éf Management, 24: 159–167, 1993.CrossRefGoogle Scholar
  7. [Han93]
    D. J. Hand, editor. Artificial intelligence frontiers in statistics: AI and statis- tics III. Chapman Si Hall, London, 1993.Google Scholar
  8. [Hay88]
    W. Hays. Statistics. Holt, Rinehart and Winston, Inc, Fort Worth, 1988.Google Scholar
  9. [HT87]
    Y. Hochberg and A. Tamhane. Multiple comparison procedures. Wiley 0000 Sons, New York, 1987.zbMATHCrossRefGoogle Scholar
  10. [KWR93]
    J. Kim, H. Weistroffer, and R. Redmond. Expert systems for bond rating: a comparative analysis of statistical, rule-based and neural network systems. Expert Systems, 10: 167–171, 1993.CrossRefGoogle Scholar
  11. [MM77]
    L. Marascuilo and M. McSweeney. Nonparametric and distribution free methods for the social sciences. Brooks/Cole Publishing Company, Monterey, CA, 1977.Google Scholar
  12. [MST94]
    D. Michie, D.J. Spiegelhalter, and C.C. Taylor, editors. Machine learning, neural and statistical classification. Ellis Horwood, New York, 1994.zbMATHGoogle Scholar
  13. [Pre95]
    Lutz Prechelt. A quantitative study of neural network learning algorithm evaluation practices. In Proc Intl. Conf. on Artificial Neural Networks, Cambridge, UK, June 26–28, 1995.Google Scholar
  14. [RABCK93]
    A. Refenes, M. Azema-Barac, L. Chen, and S. Karoussos. Currency exchange rate prediction and neural network design strategies. Neural Computing & Applications, 1: 46–58, 1993.CrossRefGoogle Scholar
  15. [Rip93]
    B.D. Ripley. Flexible non-linear approaches to classification. In V. Cherkassky, J.H. Friedman, and H. C, editors, From Statistics to Neural Networks: Theory and Pattern Recognition Applications. Springer-Verlag, 1993.Google Scholar
  16. [TAF91]
    Z. Tang, C. de Almeida, and P. Fishwick. Time series forecasing using neural networks vs. box-jenkins methodology. Simulation, 57: 303–310, 1991.CrossRefGoogle Scholar
  17. [TK92]
    Kar Yan Tam and Melody Y. Kiang. Managerial applications of neural net- works: the case of bank failure predictions. Management science,38(7):926–947, 1992.Google Scholar
  18. [WBM91]
    B. Winer, D. Brown, and K. Michels. Statistical principles in experimental design. McGraw-Hill, New York, 1991.Google Scholar
  19. [WG94]
    A. Weigend and N. Gershenfield. Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading, 1994.Google Scholar
  20. WHR90] A. Weigend, A. Huberman, and D. Rumelhart. Predicting the future: a connectionist approach. International Journal of Neural Systems,1(3):193209, 1990.Google Scholar
  21. [WY93]
    P.H. Westfall and S.S. Young. Resampling-Based Multiple Testing. John Wiley & Sons, New York, 1993.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • A. Feelders
    • 1
  • W. Verkooijen
    • 2
  1. 1.Department of Computer ScienceUniversity of TwenteEnschedeThe Netherlands
  2. 2.Department of EconomicsTilburg UniversityTilburgThe Netherlands

Personalised recommendations