Knowledge and Information Systems

, Volume 39, Issue 2, pp 409–433 | Cite as

An efficient orientation distance–based discriminative feature extraction method for multi-classification

  • Bo Liu
  • Yanshan Xiao
  • Philip S. Yu
  • Zhifeng Hao
  • Longbing Cao
Regular Paper


Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.


Multi-class classification Feature extraction Support vector machine One-against-all scheme One-against-one scheme 



The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported in part by US NSF through grants IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, and OISE-1129076, US Department of Army through grant W911NF-12-1-0066, Google Mobile 2014 Program, HUAWEI and KAU grants, Natural Science Foundation of China (61070033, 61203280, 61202270), Natural Science Foundation of Guangdong province (9251009001000005, S2011040004187, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Australian Research Council Discovery Grant (DP1096218, DP130102691) and ARC Linkage Grant (LP100200774 and LP120100566).


  1. 1.
    Blake CL, MERZ CJ (1998) UCI Repository of machine learning databases:
  2. 2.
    Chien J, Chen BC (2003) A new independent component analysis for speech recognition and separation. IEEE Trans Pattern Anal Mach Intell 14(4):1245–1254Google Scholar
  3. 3.
    Dagher I, Nachar R (2006) Face recognition using ipca-ica algorithm. IEEE Trans Pattern Anal Mach Intell 28(6):996–1000CrossRefGoogle Scholar
  4. 4.
    Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, LondonMATHGoogle Scholar
  5. 5.
    Escalera S, Pujol O, Radeva P (2011) Online error correcting output codes. Pattern Recognit Lett 32(3):458–467CrossRefGoogle Scholar
  6. 6.
    Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776CrossRefGoogle Scholar
  7. 7.
    Girolami M, Cichocki A, Amari SI (1998) A common neural network model for unsupervised exploratory data analysis and independent component analysis. IEEE Trans Neural Netw 9(6):1495–1501CrossRefGoogle Scholar
  8. 8.
    Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction foundations and applications. Studies in fuzziness and soft computing. Springer, GermanyGoogle Scholar
  9. 9.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422CrossRefMATHGoogle Scholar
  10. 10.
    He Q, Xie Z, Hu Q (2011) Neighborhood based sample and feature selection for svm classification learning. Neurocomputing 74(10):1585–1594CrossRefGoogle Scholar
  11. 11.
    Hsu C (2011) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13:415–425Google Scholar
  12. 12.
    Keerthi S (2002) Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Netw 5:1225–1229Google Scholar
  13. 13.
    Kocsor A, Kovacs K, Szepesvari C (2004) Margin maximizing discriminant analysis. In: International conference on machine learning, pp 227–238Google Scholar
  14. 14.
    Kuo SC, Lin CJ, Liao JR (2011) 3d reconstruction and face recognition using kernel-based ica and neural networks. Expert Syst Appl 38(5):5406–5415CrossRefGoogle Scholar
  15. 15.
    Liu Y, Lita LV, Niculescu RS, Bai K, Mitra P, Giles CL (2008) Real-time data pre-processing technique for efficient feature extraction in large scale datasets. In: ACM international conference on information and knowledge management, ACM, pp 981–990Google Scholar
  16. 16.
    Liu Z, Hsiao W, Cantarel BL (2011) Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27(23):3242–3249CrossRefGoogle Scholar
  17. 17.
    Mika S, Rätsch G, Weston J, Schoölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: Hu YH, Larsen J, Wilson E, and Douglas S, (eds) Neural Networks for Signal Processing IX, Piscataway, NJ:IEEE, pp 41–48Google Scholar
  18. 18.
    Moustakidis SP, Theocharis JB (2010) A novel svm-based feature selection method using a fuzzy complementary criterion. Pattern Recognit 41(11):3712–3729CrossRefGoogle Scholar
  19. 19.
    Pan F, Converse T, Ahn D, Salvetti F, Donato G (2009) Feature selection for ranking using boosted trees. In: ACM international conference on information and knowledge management, pp 2025–2028Google Scholar
  20. 20.
    Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Pacific-Asia conference on knowledge discovery and data mining, pp 970–976Google Scholar
  21. 21.
    Roth V, Steinhage V (2000) Nonlinear discriminant analysis using kernel function. Adv Neural Inf Process Syst 568–574 MIT Press, CambridgeGoogle Scholar
  22. 22.
    Schölkopf B, Mika S, Burges C, Knirsch P, Müller K, Rütsch G, Smola A (1999) Input space vs. feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017Google Scholar
  23. 23.
    Schölkopf B, Smola A (2011) Learning with kernels. MIT Press, CambridgeGoogle Scholar
  24. 24.
    Shima K, Todoriki M, Suzuki A (2004) Svm-based feature selection of latent semantic features. Pattern Recognit Lett 25(9):1051–1057Google Scholar
  25. 25.
    Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: International conference on machine learning, pp 823–830Google Scholar
  26. 26.
    Sun T, Chen S, Yang J, Shi P (2008) A novel method of combined feature extraction for recognition. In: IEEE international conference on data mining, pp 1550–4786Google Scholar
  27. 27.
    Tang F, Crabb R, Tao H (2007) Representing images using nonorthogonal haar-like bases. IEEE Trans Pattern Anal Mach Intell 29(12):2120–2134CrossRefGoogle Scholar
  28. 28.
    Tsang IW, Andras K, Kocsor TK (2006) Effcient kernel feature extraction for massive data sets. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 724–729Google Scholar
  29. 29.
    Vapnik V (1998) Statistical learning theory. Springer, BerlinMATHGoogle Scholar
  30. 30.
    Wang JH, Li Q, You J (2011) Fast kernel fisher discriminant analysis via approximating the kernel principal component analysis. Neurocomputing 74(17):3313–3322CrossRefGoogle Scholar
  31. 31.
    Weng J, Zhang Y, Hwang WS (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040CrossRefGoogle Scholar
  32. 32.
    Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of zero-norm with linear models and kernel method. J Mach Learn Res 3:1439–1461MATHMathSciNetGoogle Scholar
  33. 33.
    Xiao YQ, He YG (2011) A novel approach for analog fault diagnosis based on neural networks and improved kernel pca. Neurocomputing 74(7):1102–1115CrossRefGoogle Scholar
  34. 34.
    Xu B, Jin X, Guo P, Bie F (2006) Kica feature extraction in application to fnn based image registration. In: International joint conference on neural networks, pp 3602–3608Google Scholar
  35. 35.
    Xu Y, Furao S, Zhao J, Hasegawa O (2009) To obtain orthogonal feature extraction using training data selection. In ACM international conference on information and knowledge management, pp 1819–1822Google Scholar
  36. 36.
    Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244CrossRefGoogle Scholar
  37. 37.
    Zhang F (2004) A polygonal line algorithm based nonlinear feature extraction method. In: International conference on data mining, pp 281–288Google Scholar
  38. 38.
    Zhang J, Gruenwald L (2006) A high-level approach to computer document formatting. In : IEEE opening the black box of feature extraction: incorporating into high-dimensional data mining processes, pp 1550–4786Google Scholar
  39. 39.
    Zhao H, Sun S, Jing Z, Yang J (2006) Local structure based supervised feature extraction. Pattern Recognit 39:1546–1550CrossRefMATHGoogle Scholar
  40. 40.
    Zhou JD, Wang XD, Song H (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186CrossRefGoogle Scholar
  41. 41.
    Zhu ZB, Song ZH (2011) A novel fault diagnosis system using pattern classification on kernel fda subspace. Expert Syst Appl 38(6):6895–6905CrossRefGoogle Scholar
  42. 42.
    Zuo W, Zhang D, Yang J, Wang K (2006) Bdpca plus lda: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern Part B Cybern 36(4):946–953CrossRefGoogle Scholar
  43. 43.
    Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30(2):375CrossRefGoogle Scholar
  44. 44.
    Zhang Z, Ye N (2011) Locality preserving multimodal discriminative learning for supervised feature selection. Knowl Inf Syst 27(3):473–490CrossRefGoogle Scholar
  45. 45.
    Yang S, Hu B (2012) Discriminative feature selection by nonparametric bayes error minimization. IEEE Trans Knowl Data Eng 24(8):1422–1434CrossRefMathSciNetGoogle Scholar
  46. 46.
    Quanz B, Huan J, Mishra M (2012) Knowledge transfer with low-quality data: a feature extraction issue. IEEE Trans Knowl Data Eng 24(10):1789–1802CrossRefGoogle Scholar
  47. 47.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502Google Scholar
  48. 48.
    Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all Pairwise Comparisons. J Mach Learn Res 9:2677–2694MATHGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Bo Liu
    • 1
    • 2
  • Yanshan Xiao
    • 3
  • Philip S. Yu
    • 2
    • 4
  • Zhifeng Hao
    • 3
  • Longbing Cao
    • 5
  1. 1.Faculty of AutomationGuangdong University of TechnologyGuangzhouChina
  2. 2.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  3. 3.Faculty of Computer ScienceGuangdong University of TechnologyGuangzhouChina
  4. 4.Computer Science DepartmentKing Abdulaziz UniversityJeddahSaudi Arabia
  5. 5.Faculty of Engineering and Information TechnologyUniversity of TechnologySydneyAustralia

Personalised recommendations