Advertisement

The Loss Rank Principle for Model Selection

  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4539)

Abstract

A key issue in statistics and machine learning is to automatically select the “right” model complexity, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. We suggest a novel principle (LoRP) for model selection in regression and classification. It is based on the loss rank, which counts how many other (fictitious) data would be fitted better. LoRP selects the model that has minimal loss rank. Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression functions and the loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN.

Keywords

Loss Function Bayesian Information Criterion Trace Formula Minimum Description Length Kernel Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proc. 2nd International Symposium on Information Theory, pp. 267–281, Budapest, Hungary, Akademiai Kaidó (1973)Google Scholar
  2. [BFG96]
    Bai, Z., Fahey, M., Golub, G.: Some large-scale matrix computation problems. Jrnl of Comp. and Applied Math. 74(1–2), 71–89 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  3. [Gr{\"u}04]
    Grünwald, P.D.: Tutorial on minimum description length. In: Minimum Description Length: recent advances in theory and practice, page Chapters 1 and 2. MIT Press, Cambridge (2004) http://www.cwi.nl/~pdg/ftp/mdlintro.pdf
  4. [HTF01]
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  5. [Mac92]
    MacKay, D.J.C.: Bayesian interpolation. Neural Comp. 4(3), 415–447 (1992)CrossRefGoogle Scholar
  6. [Reu02]
    Reusken, A.: Approximation of the determinant of large sparse symmetric positive definite matrices. SIAM Journal on Matrix Analysis and Applications 23(3), 799–818 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  7. [Ris78]
    Rissanen, J.J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)zbMATHCrossRefGoogle Scholar
  8. [Sch78]
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (1978)zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Marcus Hutter
    • 1
  1. 1.RSISE @ ANU and SML @ NICTA, Canberra, ACT, 0200Australia

Personalised recommendations