Machine Learning

, Volume 65, Issue 1, pp 273–308 | Cite as

Training a reciprocal-sigmoid classifier by feature scaling-space

Article

Abstract

This paper presents a reciprocal-sigmoid model for pattern classification. This proposed classifier can be considered as a Φ-machine since it preserves the theoretical advantage of linear machines where the weight parameters can be estimated in a single step. The model can also be considered as an approximation to logistic regression under the framework of Generalized Linear Models. While inheriting the necessary classification capability from logistic regression, the problems of local minima and tedious recursive search no longer exist in the proposed formulation. To handle possible over-fitting when using high order models, the classifier is trained using multiple samples of uniformly scaled pattern features. Empirically, the classifier is evaluated using a benchmark synthetic data from random sampling runs for initial statistical evidence regarding its classification accuracy and computational efficiency. Additional experiments based on ten runs of 10-fold cross validations on 40 data sets further support the effectiveness of the reciprocal-sigmoid model, where its classification accuracy is seen to be comparable to several top classifiers in the literature. Main reasons for the good performance are attributed to effective use of reciprocal sigmoid for embedding nonlinearities and effective use of bundled feature sets for smoothing the training error hyper-surface.

Keywords

Machine learning Pattern classification Φ-machine Polynomials Parameter estimation 

References

  1. Agresti, A. (2002). Categorical data analysis, 2nd ed. New Jersey: John Wiley & Sons.MATHGoogle Scholar
  2. Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press Inc.MATHGoogle Scholar
  3. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (ACM, 1992). A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory (pp. 144–152). Pittsburgh.Google Scholar
  4. Breiman, L. (1994). Bagging predictors. Department of Statistics, University of California, Berkeley. Technical Report No. 421.Google Scholar
  5. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MATHMathSciNetGoogle Scholar
  6. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining & Knowledge Discovery, 2(2), 121–167.CrossRefGoogle Scholar
  7. Duch, W., & Grudziński, K. (1999). Weighting and selection of features. In: Proceedings of the Workshop on Intelligent Information Systems VIII (pp 32–36). Ustron, Poland.Google Scholar
  8. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley & Sons.MATHGoogle Scholar
  9. Duda, R. O., Hart, P. E., & Stork, D. G. (2001) Pattern classification, 2nd ed. New York: John Wiley & Sons, Inc.MATHGoogle Scholar
  10. Dunn, P. (2000). GLMLAB: Generalized linear models in MATLAB. In [http://www.sci.usq.edu.au/staff/dunn/glmlab/glmlab.html]. Dept. of Mathematics & Computing, University of Southern Queensland. (Version 2.5).
  11. Dunn, P. K. (1999). A graphical user interface to generalized linear models in MATLAB. The Journal of Statistical Software, 4(4).Google Scholar
  12. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall.Google Scholar
  13. Gordon, G. J. (2002). Generalized2 Linear2 Models. In: Advances in Neural Information Processing Systems (NIPS 2002) (pp. 577–584). Vancouver, British Columbia, Canada.Google Scholar
  14. Grandvalet, Y. (2000). Anisotropic noise injection for input variables relevance determination. IEEE Trans. on Neural Networks, 11(6), 1201–1212.CrossRefGoogle Scholar
  15. Grandvalet, Y., & Canu, S. (2002). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems.Google Scholar
  16. Hardin, J., & Hilbe, J. (2001). Generalized linear models and extensions. LakeWay Drive: Stata Press.Google Scholar
  17. Helzer, A., Barzohar, M., & Malah, D. (2004). Stable fitting of 2D curves and 3D surfaces by implicit polynomials. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(10), 1283–1294.CrossRefGoogle Scholar
  18. Hornik, K., Stinchcombe, M., & White, H. (1989). Multi-layer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.CrossRefGoogle Scholar
  19. Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans. Neural Networks, 14(2), 274–281.CrossRefGoogle Scholar
  20. Huang, G.-B., & Babri, H. A. (1998). Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation function. IEEE Trans. Neural Networks, 9(1), 224– 801.CrossRefGoogle Scholar
  21. Juszczak, P., Tax, D. M. J., & Duin, R. P. W. (2000). Feature scaling in support vector data description. In: N., Japkowicz (Ed.), Learning from Imbalanced Data Sets (pp. 25–30). Menlo Park, CA: AAAI Press.Google Scholar
  22. Lam, W., Keung, C.-K., & Liu, D. (2002). Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(8), 1075–1090.CrossRefGoogle Scholar
  23. Li, J., Dong, G., Ramamohanarao, K., & Wong, L. (2004). DeEPs: A new instance-based lazy discovery and classification system. Machine Learning, 54(2), 99–124.MATHCrossRefGoogle Scholar
  24. Lim, T.-S., Loh, W.-Y., & Shil, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3), 203–228.MATHCrossRefGoogle Scholar
  25. Lindeberg, T. (1990). Scale-space for discrete signals. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(3), 234–254.CrossRefGoogle Scholar
  26. Ma, J., Zhao, Y., & Ahalt, S. (2002). OSU SVM classifier matlab toolbox (ver 3.00). In [http://eewww.eng.ohio-state.edu/~maj/osu_svm/]. The Ohio State University.
  27. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models 2nd ed. London: Chapman and Hall.MATHGoogle Scholar
  28. Mitchell, T. M. (1997). Machine learning. Singapore, International Edition: The McGraw-Hill Companies, Inc.MATHGoogle Scholar
  29. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135, 370–384.CrossRefGoogle Scholar
  30. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear regression models, 3rd ed. Irwin, Chicago.Google Scholar
  31. Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. In [http://www.ics.uci.edu/~mlearn/MLRepository.html]. University of California, Irvine, Dept. of Information and Computer Sciences.
  32. Nilsson, N. J. (1965). Learning machines. New York: McGraw-Hill.MATHGoogle Scholar
  33. Osuna, E. E., Freund, R., & Girosi, F. (1997). Support Vector Machines: Training and Applications. MIT Artificial Intelligence Laboratory and CBCL Dept. of Brain and Cognitive Sciences. (Technical Report: A.I. Memo No. 1602, C.B.C.L. Paper No. 144).Google Scholar
  34. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.Google Scholar
  35. Precup, D., & Utgoff, P. E. (2004). Classification using Φ-machines and constructive function approximation. Machine Learning, 55(1), 31–52.MATHCrossRefGoogle Scholar
  36. Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge University Press.Google Scholar
  37. Schürmann, J. (1996). Pattern classification: A unified view of statistical and neural approaches. New York: John Wiley & Sons, Inc.Google Scholar
  38. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press.Google Scholar
  39. Skurichina, M., & Duin, R. P. W. (2002). Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis and Applications, 5, 121–135.MATHMathSciNetCrossRefGoogle Scholar
  40. Skurichina, M., Raudys, S., & Duin, R. P. W. (2000). K-nearest neighbours directed noise injection in multilayer perceptron training. IEEE Trans. on Neural Networks, 11(2), 504–511.CrossRefGoogle Scholar
  41. Tax, D. M. J., & Duin, R. P. W. (2000). Data description in subspaces. In: Proc. 15th International Conference on Pattern Recognition (ICPR), (Vol. 2, pp. 672–675). Barcelona, Spain.Google Scholar
  42. The MathWorks (2003). Matlab and simulink. In [http://www.mathworks.com/].
  43. Tipping, M. E. (2000). The relevance vector machine. In: S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in Neural Information Processing Systems, (Vol. 12, pp 652–658).Google Scholar
  44. Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.MATHMathSciNetCrossRefGoogle Scholar
  45. Toh, K.-A. (2003). Fingerprint and speaker verification decisions fusion. In: International Conference on Image Analysis and Processing (ICIAP) (pp 626–631). Mantova, Italy.Google Scholar
  46. Toh, K.-A., Tran, Q.-L., & Srinivasan, D. (2004). Benchmarking a reduced multivariate polynomial pattern classifier. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(6), 740–755.CrossRefGoogle Scholar
  47. Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-Interscience Pub.Google Scholar
  48. Vetter, T., Jones, M. J., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In Proc. International Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 40–46).Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Biometrics Engineering Research Center, School of Electrical & Electronic EngineeringYonsei UniversitySeoulKorea

Personalised recommendations