Statistics and Computing

, Volume 15, Issue 3, pp 155–166 | Cite as

Localized classification

Article

Abstract

The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. However, if localization is combined with a reduction of dimension the initial number of variables is less restricted. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data-adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures.

Keywords

local logistic regression discrimination data adaptive tuning parameters selection of predictors localized discrimination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert A. and Anderson J.A. 1984. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1): 1–10.Google Scholar
  2. Bellman R.E. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, NJ.Google Scholar
  3. Bishop C.M. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
  4. Blake C. and Merz C. 1998. ‘UCI Repository of machine learning databases’.Google Scholar
  5. Breiman L. 1996. Bagging predictors. Machine Learning 24(2): 123–140.Google Scholar
  6. Breiman L. 1999. Prediction games and arcing algorithms. Neural Computation 11: 1493–1517.CrossRefPubMedGoogle Scholar
  7. Breiman L. 2001. Random forests. Machine Learning 45(1): 5–32.CrossRefGoogle Scholar
  8. Breiman L. 2002. Manual On Setting Up, Using, And Understanding Random Forests V3.1.Google Scholar
  9. Breiman L., Friedman J.H., Olshen R.A., and Stone C.J. 1984. Classification and Regression Trees. Wadsworth.Google Scholar
  10. Bühlmann P. and Yu B. 2003. Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association 98: 324–339.CrossRefGoogle Scholar
  11. Chapelle O., Vapnik V., and Weston J. 1999. Transductive inference for estimating values of functions. Advances in Neural Information Processing Systems 12.Google Scholar
  12. Efron B., Hastie T., Johnstone I., and Tibshirani R. 2004. Least angle regression. The Annals of Statistics 32(2): 407–499.CrossRefGoogle Scholar
  13. Fan J. and Gijbels I. 1996. Local Polynomial Modelling and its Applications, Chapman & Hall, London.Google Scholar
  14. Fix E. and Hodges J.L. 1951. Discriminatory analysis, nonparametric discimination, consistency properties. Technical Report 4, United States Air Force, School of Aviation Medicine, Randolph Field, TX.Google Scholar
  15. Friedman J.H. 1994. Flexible metric nearest neighbor classification. Technical report, Standford University.Google Scholar
  16. Friedman J.H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29: 1189–1232.CrossRefGoogle Scholar
  17. Friedman J.H., Hastie T., and Tibshirani R. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28: 337–407.CrossRefGoogle Scholar
  18. Gorman R.P. and Sejnowski T.J. 1988. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1: 75–89.CrossRefGoogle Scholar
  19. Hand D.J. and Vinciotti V. 2003. Local versus global models for classification problems: Fitting models where it matters. The American Statistician 57(2): 124–131.CrossRefGoogle Scholar
  20. Hastie T. and Tibshirani R. 1996. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6): 607–615.CrossRefGoogle Scholar
  21. Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning, Springer, New York.Google Scholar
  22. Hoerl A.E. and Kennard R.W. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1): 55– 67.Google Scholar
  23. Holm S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6: 65–70.Google Scholar
  24. Ihaka R. and Gentleman R. 1996. A language for data analysis and graphics. Journal of Computational and Graphical Statistics 51(3): 299–314.Google Scholar
  25. Kauermann G. and Tutz G. 2000. Local likelihood estimates and bias reduction in varying coefficients models. Journal of Nonparametric Statistics 12: 343–371.Google Scholar
  26. Kira K. and Rendell L.A. 1992. A practical approach to feature selection. In: Sleeman D. and Edwards P. (Eds.), Machine Learning. Proceedings of the Ninth International Workshop (ML92). Morgan Kaufmann, San Mateo, pp. 249–256.Google Scholar
  27. Kohavi R. and John G.H. 1998. The wrapper approach. In: Liu H. and Motoda H. (Eds.), Feature Extraction, Construction and Selection. A Data Mining Perspective. Kluwer Academic Publishers, Dordrecht, pp. 33–50.Google Scholar
  28. Le Cessie S. and van Houwelingen J.C. 1992. Ridge estimators in logistic regression. Applied Statistics 41(1): 191–201.Google Scholar
  29. Loader C. 1999. Local Regression and Likelihood. Springer, New York.Google Scholar
  30. Michie D., Spiegelhalter D.J., and Taylor C.C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York.Google Scholar
  31. Powell M.J.D. 2002. UOBYQA: Unconstrained optimization by quadratic approximation. Math. Program. 92: 555–582.CrossRefGoogle Scholar
  32. Ripley B.D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.Google Scholar
  33. Schaal S., Vijayakumar S., and Atkeson C.G. 1998. Local dimensionality reduction. In: Jordan M.I., Kearns M.J., and Solla S.A. (Eds.), Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA.Google Scholar
  34. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58(1): 267–288.Google Scholar
  35. Venables W.N. and Ripley B.D. 1999. Modern Applied Statistics With S-Plus. 3rd edition, Springer.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Institut für StatistikLudwig-Maximilians-Universität MünchenMünchenGermany
  2. 2.Klinik und Poliklinik für Psychiatrie und PsychotherapieUniversität RegensburgGermany

Personalised recommendations