Skip to main content
Log in

An efficient orientation distance–based discriminative feature extraction method for multi-classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Blake CL, MERZ CJ (1998) UCI Repository of machine learning databases: http://www.ics.uci.edu/mlearn/MLRepository.html

  2. Chien J, Chen BC (2003) A new independent component analysis for speech recognition and separation. IEEE Trans Pattern Anal Mach Intell 14(4):1245–1254

    Google Scholar 

  3. Dagher I, Nachar R (2006) Face recognition using ipca-ica algorithm. IEEE Trans Pattern Anal Mach Intell 28(6):996–1000

    Article  Google Scholar 

  4. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, London

    MATH  Google Scholar 

  5. Escalera S, Pujol O, Radeva P (2011) Online error correcting output codes. Pattern Recognit Lett 32(3):458–467

    Article  Google Scholar 

  6. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776

    Article  Google Scholar 

  7. Girolami M, Cichocki A, Amari SI (1998) A common neural network model for unsupervised exploratory data analysis and independent component analysis. IEEE Trans Neural Netw 9(6):1495–1501

    Article  Google Scholar 

  8. Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction foundations and applications. Studies in fuzziness and soft computing. Springer, Germany

    Google Scholar 

  9. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  MATH  Google Scholar 

  10. He Q, Xie Z, Hu Q (2011) Neighborhood based sample and feature selection for svm classification learning. Neurocomputing 74(10):1585–1594

    Article  Google Scholar 

  11. Hsu C (2011) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13:415–425

    Google Scholar 

  12. Keerthi S (2002) Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Netw 5:1225–1229

    Google Scholar 

  13. Kocsor A, Kovacs K, Szepesvari C (2004) Margin maximizing discriminant analysis. In: International conference on machine learning, pp 227–238

  14. Kuo SC, Lin CJ, Liao JR (2011) 3d reconstruction and face recognition using kernel-based ica and neural networks. Expert Syst Appl 38(5):5406–5415

    Article  Google Scholar 

  15. Liu Y, Lita LV, Niculescu RS, Bai K, Mitra P, Giles CL (2008) Real-time data pre-processing technique for efficient feature extraction in large scale datasets. In: ACM international conference on information and knowledge management, ACM, pp 981–990

  16. Liu Z, Hsiao W, Cantarel BL (2011) Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27(23):3242–3249

    Article  Google Scholar 

  17. Mika S, Rätsch G, Weston J, Schoölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: Hu YH, Larsen J, Wilson E, and Douglas S, (eds) Neural Networks for Signal Processing IX, Piscataway, NJ:IEEE, pp 41–48

  18. Moustakidis SP, Theocharis JB (2010) A novel svm-based feature selection method using a fuzzy complementary criterion. Pattern Recognit 41(11):3712–3729

    Article  Google Scholar 

  19. Pan F, Converse T, Ahn D, Salvetti F, Donato G (2009) Feature selection for ranking using boosted trees. In: ACM international conference on information and knowledge management, pp 2025–2028

  20. Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Pacific-Asia conference on knowledge discovery and data mining, pp 970–976

  21. Roth V, Steinhage V (2000) Nonlinear discriminant analysis using kernel function. Adv Neural Inf Process Syst 568–574 MIT Press, Cambridge

  22. Schölkopf B, Mika S, Burges C, Knirsch P, Müller K, Rütsch G, Smola A (1999) Input space vs. feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017

    Google Scholar 

  23. Schölkopf B, Smola A (2011) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  24. Shima K, Todoriki M, Suzuki A (2004) Svm-based feature selection of latent semantic features. Pattern Recognit Lett 25(9):1051–1057

    Google Scholar 

  25. Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: International conference on machine learning, pp 823–830

  26. Sun T, Chen S, Yang J, Shi P (2008) A novel method of combined feature extraction for recognition. In: IEEE international conference on data mining, pp 1550–4786

  27. Tang F, Crabb R, Tao H (2007) Representing images using nonorthogonal haar-like bases. IEEE Trans Pattern Anal Mach Intell 29(12):2120–2134

    Article  Google Scholar 

  28. Tsang IW, Andras K, Kocsor TK (2006) Effcient kernel feature extraction for massive data sets. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 724–729

  29. Vapnik V (1998) Statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  30. Wang JH, Li Q, You J (2011) Fast kernel fisher discriminant analysis via approximating the kernel principal component analysis. Neurocomputing 74(17):3313–3322

    Article  Google Scholar 

  31. Weng J, Zhang Y, Hwang WS (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040

    Article  Google Scholar 

  32. Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of zero-norm with linear models and kernel method. J Mach Learn Res 3:1439–1461

    MATH  MathSciNet  Google Scholar 

  33. Xiao YQ, He YG (2011) A novel approach for analog fault diagnosis based on neural networks and improved kernel pca. Neurocomputing 74(7):1102–1115

    Article  Google Scholar 

  34. Xu B, Jin X, Guo P, Bie F (2006) Kica feature extraction in application to fnn based image registration. In: International joint conference on neural networks, pp 3602–3608

  35. Xu Y, Furao S, Zhao J, Hasegawa O (2009) To obtain orthogonal feature extraction using training data selection. In ACM international conference on information and knowledge management, pp 1819–1822

  36. Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244

    Article  Google Scholar 

  37. Zhang F (2004) A polygonal line algorithm based nonlinear feature extraction method. In: International conference on data mining, pp 281–288

  38. Zhang J, Gruenwald L (2006) A high-level approach to computer document formatting. In : IEEE opening the black box of feature extraction: incorporating into high-dimensional data mining processes, pp 1550–4786

  39. Zhao H, Sun S, Jing Z, Yang J (2006) Local structure based supervised feature extraction. Pattern Recognit 39:1546–1550

    Article  MATH  Google Scholar 

  40. Zhou JD, Wang XD, Song H (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186

    Article  Google Scholar 

  41. Zhu ZB, Song ZH (2011) A novel fault diagnosis system using pattern classification on kernel fda subspace. Expert Syst Appl 38(6):6895–6905

    Article  Google Scholar 

  42. Zuo W, Zhang D, Yang J, Wang K (2006) Bdpca plus lda: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern Part B Cybern 36(4):946–953

    Article  Google Scholar 

  43. Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30(2):375

    Article  Google Scholar 

  44. Zhang Z, Ye N (2011) Locality preserving multimodal discriminative learning for supervised feature selection. Knowl Inf Syst 27(3):473–490

    Article  Google Scholar 

  45. Yang S, Hu B (2012) Discriminative feature selection by nonparametric bayes error minimization. IEEE Trans Knowl Data Eng 24(8):1422–1434

    Article  MathSciNet  Google Scholar 

  46. Quanz B, Huan J, Mishra M (2012) Knowledge transfer with low-quality data: a feature extraction issue. IEEE Trans Knowl Data Eng 24(10):1789–1802

    Article  Google Scholar 

  47. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Google Scholar 

  48. Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all Pairwise Comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported in part by US NSF through grants IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, and OISE-1129076, US Department of Army through grant W911NF-12-1-0066, Google Mobile 2014 Program, HUAWEI and KAU grants, Natural Science Foundation of China (61070033, 61203280, 61202270), Natural Science Foundation of Guangdong province (9251009001000005, S2011040004187, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Australian Research Council Discovery Grant (DP1096218, DP130102691) and ARC Linkage Grant (LP100200774 and LP120100566).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanshan Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Xiao, Y., Yu, P.S. et al. An efficient orientation distance–based discriminative feature extraction method for multi-classification. Knowl Inf Syst 39, 409–433 (2014). https://doi.org/10.1007/s10115-013-0613-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0613-2

Keywords

Navigation