Abstract
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. However, if localization is combined with a reduction of dimension the initial number of variables is less restricted. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data-adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures.
Keywords
local logistic regression discrimination data adaptive tuning parameters selection of predictors localized discriminationPreview
Unable to display preview. Download preview PDF.
References
- Albert A. and Anderson J.A. 1984. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1): 1–10.Google Scholar
- Bellman R.E. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, NJ.Google Scholar
- Bishop C.M. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
- Blake C. and Merz C. 1998. ‘UCI Repository of machine learning databases’.Google Scholar
- Breiman L. 1996. Bagging predictors. Machine Learning 24(2): 123–140.Google Scholar
- Breiman L. 1999. Prediction games and arcing algorithms. Neural Computation 11: 1493–1517.CrossRefPubMedGoogle Scholar
- Breiman L. 2001. Random forests. Machine Learning 45(1): 5–32.CrossRefGoogle Scholar
- Breiman L. 2002. Manual On Setting Up, Using, And Understanding Random Forests V3.1.Google Scholar
- Breiman L., Friedman J.H., Olshen R.A., and Stone C.J. 1984. Classification and Regression Trees. Wadsworth.Google Scholar
- Bühlmann P. and Yu B. 2003. Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association 98: 324–339.CrossRefGoogle Scholar
- Chapelle O., Vapnik V., and Weston J. 1999. Transductive inference for estimating values of functions. Advances in Neural Information Processing Systems 12.Google Scholar
- Efron B., Hastie T., Johnstone I., and Tibshirani R. 2004. Least angle regression. The Annals of Statistics 32(2): 407–499.CrossRefGoogle Scholar
- Fan J. and Gijbels I. 1996. Local Polynomial Modelling and its Applications, Chapman & Hall, London.Google Scholar
- Fix E. and Hodges J.L. 1951. Discriminatory analysis, nonparametric discimination, consistency properties. Technical Report 4, United States Air Force, School of Aviation Medicine, Randolph Field, TX.Google Scholar
- Friedman J.H. 1994. Flexible metric nearest neighbor classification. Technical report, Standford University.Google Scholar
- Friedman J.H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29: 1189–1232.CrossRefGoogle Scholar
- Friedman J.H., Hastie T., and Tibshirani R. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28: 337–407.CrossRefGoogle Scholar
- Gorman R.P. and Sejnowski T.J. 1988. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1: 75–89.CrossRefGoogle Scholar
- Hand D.J. and Vinciotti V. 2003. Local versus global models for classification problems: Fitting models where it matters. The American Statistician 57(2): 124–131.CrossRefGoogle Scholar
- Hastie T. and Tibshirani R. 1996. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6): 607–615.CrossRefGoogle Scholar
- Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning, Springer, New York.Google Scholar
- Hoerl A.E. and Kennard R.W. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1): 55– 67.Google Scholar
- Holm S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6: 65–70.Google Scholar
- Ihaka R. and Gentleman R. 1996. A language for data analysis and graphics. Journal of Computational and Graphical Statistics 51(3): 299–314.Google Scholar
- Kauermann G. and Tutz G. 2000. Local likelihood estimates and bias reduction in varying coefficients models. Journal of Nonparametric Statistics 12: 343–371.Google Scholar
- Kira K. and Rendell L.A. 1992. A practical approach to feature selection. In: Sleeman D. and Edwards P. (Eds.), Machine Learning. Proceedings of the Ninth International Workshop (ML92). Morgan Kaufmann, San Mateo, pp. 249–256.Google Scholar
- Kohavi R. and John G.H. 1998. The wrapper approach. In: Liu H. and Motoda H. (Eds.), Feature Extraction, Construction and Selection. A Data Mining Perspective. Kluwer Academic Publishers, Dordrecht, pp. 33–50.Google Scholar
- Le Cessie S. and van Houwelingen J.C. 1992. Ridge estimators in logistic regression. Applied Statistics 41(1): 191–201.Google Scholar
- Loader C. 1999. Local Regression and Likelihood. Springer, New York.Google Scholar
- Michie D., Spiegelhalter D.J., and Taylor C.C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York.Google Scholar
- Powell M.J.D. 2002. UOBYQA: Unconstrained optimization by quadratic approximation. Math. Program. 92: 555–582.CrossRefGoogle Scholar
- Ripley B.D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.Google Scholar
- Schaal S., Vijayakumar S., and Atkeson C.G. 1998. Local dimensionality reduction. In: Jordan M.I., Kearns M.J., and Solla S.A. (Eds.), Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA.Google Scholar
- Tibshirani R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58(1): 267–288.Google Scholar
- Venables W.N. and Ripley B.D. 1999. Modern Applied Statistics With S-Plus. 3rd edition, Springer.Google Scholar