Abstract
A flexible nonparametric method is proposed for classifying high- dimensional data with a complex structure. The proposed method can be regarded as an extended version of linear logistic discriminant procedures, in which the linear predictor is replaced by a radial-basis-expansion predictor. Radial basis functions with a hyperparameter are used to take the information on covariates and class labels into account; this was nearly impossible within the previously proposed hybrid learning framework. The penalized maximum likelihood estimation procedure is employed to obtain stable parameter estimates. A crucial issue in the model-construction process is the choice of a suitable model from candidates. This issue is examined from information-theoretic and Bayesian viewpoints and we employed Ando et al. (Japanese Journal of Applied Statistics, 31, 123–139, 2002)’s model evaluation criteria. The proposed method is available not only for the high-dimensional data but also for the variable selection problem. Real data analysis and Monte Carlo experiments show that our proposed method performs well in classifying future observations in practical situations. The simulation results also show that the use of the hyperparameter in the basis functions improves the prediction performance.
Similar content being viewed by others
References
Akaike H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov B.N., Csaki F (eds). 2nd International Symposium on Information Theory. Budapest, Akademiai Kiado, pp. 267–281
Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control AC-19: 716–723
Alizadeh A.A., Eisen M.B., Davis R.E., Ma C., Lossos I.S., Rosenwald A., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511
Alpaydin E., Kaynak C. (1998) Cascading classifiers. Kybernetika 34, 369–374
Ando T. (2003). Kernel flexible discriminant analysis for classifying high-dimensional data with nonlinear structure and its applications (in Japanese). Proceedings of the Institute of Statistical Mathematics 51, 389–406
Ando T., Imoto S., Konishi S. (2001). Estimating nonlinear regression models based on radial basis function networks (in Japanese). Japanese Journal of Applied Statistics 30, 19–35
Ando T., Simauchi J., Konishi S. (2002). Nonlinear pattern recognition using radial basis function networks and its application (in Japanese). Japanese Journal of Applied Statistics 31, 123–139
Ando T., Imoto S., Konishi S. (2004). Adaptive learning machines for nonlinear classification and Bayesian information criteria. Bulletin of Informatics and Cybernetics 36, 147–162
Ando, T., Imoto, S., Konishi, S. (2005). Nonlinear regression modeling via regularized radial basis function networks. Journal of Statistical Planning and Inference (to appear).
Bishop C.M. (1995). Neural networks for pattern recognition. Oxford, Oxford University Press
Blake, C. L., Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/∼ mlearn/MLRepository.html, University of California, Department of Information and Computer Sciences.
Breiman L., Friedman J.H., Olshen R.A., Stone C.J. (1984). Classification and regression trees. Belmont, CA, Wadsworth
Broomhead D.S., Lowe D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems 2, 321–335
Dudoit S., Fridlyand J., Speed T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87
Eilers P.H.C., Marx B.D. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statistical Science 11, 89–121
Fujii T., Konishi S. (2006). Nonlinear regression modeling via regularized wavelets and smoothing parameter selection. Journal of Multivariate Analysis 97, 2023–2033
Girosi F., Jones M., Poggio T. (1995). Regularization theory and neural architectures. Neural Computation 7, 219–269
Green P.J., Silverman B.W. (1994). Nonparametric regression and generalized linear models. London, Chapman & Hall
Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537
Hastie T., Tibshirani R., Buja A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89, 1255–1270
Hastie T., Tibshirani R., Friedman J. (2001). The elements of statistical learning. New York, Springer
Hosmer D.W., Lemeshow S. (1989). Applied logistic regression. New York, Wiley
Imoto S., Konishi S. (2003). Selection of smoothing parameters in B-spline nonparametric regression models using information criteria. Annals of the Institute of Statistical Mathematics 55, 671–687
Karayiannis N.B., Mi G.W. (1997). Growing radial basis neural networks: merging supervised and unsupervised learning with network growth techniques. IEEE Transactions on Neural Networks 8, 1492–1506
Kass R.E., Tierney L., Kadane J.B. (1990). The validity of posterior expansions based on Laplace’s method. In Geisser S., Hodges J.S., Press S.J., Zellner A. (eds). Essays in honor of George Barnard. Amsterdam, North-Holland, pp. 473–488
Konishi S., Kitagawa G. (1996). Generalised information criteria in model selection. Biometrika 83, 875–890
Konishi S., Ando T., Imoto S. (2004). Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika 91, 27–43
Kullback S., Leibler R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics 22, 79–86
MacQueen J. (1967). Some methods for classification and analysis of multivariate observations. In LeCam L.M., Neyman J. (eds). Proceeding of the fifth Berkeley symposium on mathematics, statistics, and probability. Berkeley, University of California Press, p. 281
Moody J., Darken C.J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation 1, 281–294
Nabney I.T. (2002). NETLAB algorithms for pattern recognition. UK, Springer
Nonaka Y., Konishi S. (2005). Nonlinear regression modeling using regularized local likelihood method. Annals of the Institute of Statistical Mathematics 57, 617–635
Ranganath S., Arun K. (1997). Face recognition using transform features and neural networks. Pattern Recognition 30, 1615–1622
Ripley B.D. (1994). Neural networks and related methods for classification. Journal of the Royal Statistical Society Series B 56, 409–456
Ripley B.D. (1996). Pattern recognition and neural networks. Cambridge, UK, Cambridge University Press
Sato T. (1996). On artificial neural networks as a statistical model (in Japanese). Proceedings of the Institute of Statistical Mathematics 44, 85–98
Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461–464
Seber G.A.F. (1984). Multivariate observations. New York, Wiley
Shaffer A.L., Rosenwald A., Staudt L.M. (2002). Lymphoid malignancies: the dark side of B-cell differentiation. Nature Reviews Immunology 2, 920–933
Tierney L., Kadane J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81, 82–86
Tierney L., Kass R.E., Kadane J.B. (1989). Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association 84, 710–716
Troyanskaya O.G., Garber M.E., Brown P.O., Botstein D., Altman R.B. (2002). Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18, 1454–1461
Webb A. (1999). Statistical pattern recognition. London, Arnold
Xu L. (1998). RBF nets, mixture experts, and Bayesian ying-yang learning. Neurocomputing 19, 223–257
Xu L., Jordan M.I., Hinton G.E. (1995). An alternative model for mixtures of experts. In Cowan J.D., et al. (eds). Advances in Neural Information Processing Systems 7. Cambridge, MA, MIT Press, pp. 633–640
Zhou P., Levy N.B., Xie H., Qian L., Lee C.Y., Gascoyne R.D., et al. (2001). MCL1 transgenic mice exhibit a high incidence of B-cell lymphoma manifested as a spectrum of histologic subtypes. Blood 97, 3902–3909
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Ando, T., Konishi, S. Nonlinear logistic discrimination via regularized radial basis functions for classifying high-dimensional data. Ann Inst Stat Math 61, 331–353 (2009). https://doi.org/10.1007/s10463-007-0143-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-007-0143-3