Knowledge and Information Systems

, Volume 49, Issue 1, pp 1–59 | Cite as

Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification

Regular Paper

Abstract

Multi-view ensemble learning has the potential to address issues related to the high dimensionality of data. It attempts to utilize all the relevant only discarding the irrelevant features. The view of a dataset is the sub-table of the training data with respect to a subset of the feature set. The problem of discarding the irrelevant features and obtaining subsets of the relevant features is useful for dimension reduction and dealing with the problem of having fewer training examples than even the reduced set of relevant features. A feature set partitioning resulting in the blocks of relevant features may not yield multiple-view-based classifiers with good classification performance. In this work the optimal feature set partition approach has been proposed. Further, the ensemble learning from views aims to maximize the performance of the classifier. The experiments study the performance of random feature set partitioning, attribute bagging, view generation using attribute clustering, view construction using genetic algorithm and OFSP proposed method. The blocks of relevant feature subsets are used to construct the multi-view classifier ensemble using K-nearest neighbor, Naïve Bayesian and support vector machine algorithm applied to sixteen high-dimensional data sets from UCI machine learning repository. The performance parameters considered for comparison are classification accuracy, disagreement among the classifiers, execution time and percentage reduction of attributes.

Keywords

Classification Feature set partitioning High dimensionality Multi-view ensemble learning 

References

  1. 1.
    Kumar V, Minz S (2013) Mood classification of lyrics using SentiWordNet. In: ICCCI-2013, India, IEEE Xplore, pp 1–5Google Scholar
  2. 2.
    Ando RK, Zhang T (2007) Two-view feature generation model for semi-supervised learning. In: ICMLGoogle Scholar
  3. 3.
    Xu C, Tao D, Xu C (2013) A survey on multi-view learning. Learning (cs.LG)Google Scholar
  4. 4.
    Kakade SM, Foster DP (2007) Multi-view regression via canonical correlation analysis. In: COLTGoogle Scholar
  5. 5.
    Yu S, Krishnapuram B, Rosales R, Steck H, Rao RB (2007) Bayesian co-training. In: NIPSGoogle Scholar
  6. 6.
    Kudo M, Sklansky J (1997) A comparative evaluation of medium and large-scale feature selectors for pattern classifiers. In: Proceeding of the 1st international workshop on statistical techniques in pattern recognition. Czech Republic, Prague, pp 91–96Google Scholar
  7. 7.
    Bluma AL, Langley P (1997) Selection of relevant features and examples in machine learning. In: Greiner R, Subramanian D (eds) Artificial intelligence on relevance, artificial intelligence, vol 97, pp 245–271Google Scholar
  8. 8.
    Kumar V, Minz S (2014) Multi-view ensemble learning for poem data classification using SentiWordNet. In: 2nd international conference on advanced computing, networking, and informatics (ICACNI-2014), Smart Innovation, Systems and Technologies, vol 27. Springer, Berlin, pp 57–66Google Scholar
  9. 9.
    Minz S, Kumar V (2014) Reinforced multi-view ensemble learning for high dimensional data classification. In: International conference on communication and computing (ICC-2014), ElsevierGoogle Scholar
  10. 10.
    Brefeld GC, Scheffe T (2005) Multi-view discriminative sequential learning. In: Machine learning, ECML 2005, pp 60–71Google Scholar
  11. 11.
    Ben-Bassat M (1982) Pattern recognition and reduction of dimensionality. In: Krishnaiah PR, Kanal LN (eds) Handbook of statistics-II. North Holland, pp 773–791Google Scholar
  12. 12.
    Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, LondonMATHGoogle Scholar
  14. 14.
    Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, pp 359–366Google Scholar
  15. 15.
    Ho TK (1998) Nearest neighbors in random subspaces. In: Proceeding of the second international workshop on statistical techniques in pattern recognition. Sydney, Australia, pp 640–648Google Scholar
  16. 16.
    Bay S (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3(3):191–209CrossRefGoogle Scholar
  17. 17.
    Bryll R, Gutierrez-Osunaa R, Quek F (2003) Attribute bagging: improving the accuracy of classifier ensembles by using random feature subsets. Pattern Recognit 36:1291–1302CrossRefMATHGoogle Scholar
  18. 18.
    Wu QX, Bell D, McGinnity M (2005) Multi-knowledge for decision-making. Knowl Inf Syst 7:246–266CrossRefGoogle Scholar
  19. 19.
    Hu QH, Yu DR, Wang MY (2005) Constructing rough decision forests. In: Slezak D et al (eds) RSFDGrC 2005, LNAI 3642. Springer, Berlin, pp 147–156Google Scholar
  20. 20.
    Bao Y, Ishii N (2002) Combining multiple K-nearest neighbor classifiers for text classification by reducts. In: Proceedings of 5th international conference on discovery science, LNCS 2534. Springer, Berlin, pp 340–347Google Scholar
  21. 21.
    Cunningham P, Carney J (2000) Diversity versus quality in classification ensembles based on feature selection. In: de Mntaras RL, Plaza E (eds) Proceedings of ECML 2000, 11th European conference on machine learning, Barcelona, Spain, LNCS 1810. Springer, Berlin, pp 109–116Google Scholar
  22. 22.
    Zenobi G, Cunningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the European conference on machine learningGoogle Scholar
  23. 23.
    Rokach L, Maimon O, Arad O (2005) Improving supervised learning by sample decomposition. Int J Comput Intell Appl 5(1):37–54CrossRefGoogle Scholar
  24. 24.
    Rodriguez JJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 20(10):1619–1630CrossRefGoogle Scholar
  25. 25.
    Rokach L (2010) Pattern classification using ensemble learning. In: Series in machine perception and artificial intelligence, vol 75. World Scientific, SingaporeGoogle Scholar
  26. 26.
    Kusiak A (2000) Decomposition in data mining: an industrial case study. IEEE Trans Electron Packag Manuf 23(4):345–353CrossRefGoogle Scholar
  27. 27.
    Gama J (2000) A linear-bayes classifier. In: Monard C (ed) Advances on artificial intelligence—SBIA 2000. LNAI 1952. Springer, Berlin, pp 269–279Google Scholar
  28. 28.
    Breiman L (1996) Bagging predictor. Mach Learn 24:123–140MathSciNetMATHGoogle Scholar
  29. 29.
    Ho TH (1998) The random subspace method for constructing decision forest. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRefGoogle Scholar
  30. 30.
    Sun S, Jin F, Tu W (2011) View construction for multi-view semi-supervised learning. In: Advances in neural networks-ISNN 2011, pp 595–601Google Scholar
  31. 31.
    Di W, Crawford M (2012) View generation for multi-view maximum disagreement based active learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 50(5)Google Scholar
  32. 32.
    Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3–4):385–404CrossRefGoogle Scholar
  33. 33.
    Liao Y, Moody J (2000) Constructing heterogeneous committees via input feature grouping. In: Solla SA, Leen TK, Muller K-R (eds) Advances in neural information processing systems, vol 12. MIT Press, CambridgeGoogle Scholar
  34. 34.
    Rokach L (2008) Mining manufacturing data using genetic algorithm-based feature set decomposition. Int J Intell Syst Technol Appl 4(1):57–78Google Scholar
  35. 35.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefMATHGoogle Scholar
  36. 36.
    Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 4(3):211–229CrossRefGoogle Scholar
  37. 37.
    Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, LondonCrossRefMATHGoogle Scholar
  38. 38.
    Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099CrossRefGoogle Scholar
  39. 39.
    De Sa V, Gallagher P, Lewis J, Malave V (2010) Multi-view kernel construction. Mach Learn 76:47–71MathSciNetGoogle Scholar
  40. 40.
    Szendmak S, Shawe-Taylor J (2007) Synthesis of maximum margin and multi-view learning using unlabeled data. Neurocomputing 70:1254–1264CrossRefGoogle Scholar
  41. 41.
    Rosenberg D, Sindhwani V, Bartlett P, Nuyogi P (2009) Multi-view point cloud kernels for semi-supervised learning. IEEE Signal Process Mag 145:145–150CrossRefGoogle Scholar
  42. 42.
    Xu Z, Sun S (2010) An algorithm on multi-view adaboost. Lect Note Comput Sci 6443:332–402Google Scholar
  43. 43.
    Dasgupta S, Littman ML, McCallum D, Mitchell T, Nigam K, Slattery S (2002) Pac gereralization bounds for co-training. Adv Neural Inf Process Syst 1:375–382Google Scholar
  44. 44.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRefGoogle Scholar
  45. 45.
    Tsymbal A, Pechenizkiy M, Cunningham P (2005) Diversity in search strategies for ensemble feature selection. Inf Fusion 6(1):83–98CrossRefGoogle Scholar
  46. 46.
    Gunter S, Bunke H (2004) Feature selection algorithms for the generation of multiple classifier systems. Pattern Recognit Lett 25(11):1323–1336CrossRefGoogle Scholar
  47. 47.
    Di W, Crawford MM (2012) View generation for multiview maximum disagreement based active learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 99:1–13Google Scholar
  48. 48.
    Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41(5):1676–1700CrossRefMATHGoogle Scholar
  49. 49.
    Muslea I, Minton S, Knoblock CA (2002) Adaptive view validation: a first step towards automatic view detection. In: Machine learning-international workshop then conference. Citeseer, pp 443–450Google Scholar
  50. 50.
    Christoudias CM, Urtasun R, Darrell T (2008) Multi-view learning in the presence of view disagreement. In: Proceedings of the 24th conference on uncertainty in artificial intelligenceGoogle Scholar
  51. 51.
    Christoudias CM, Urtasun R, Kapoorz A, Darrell T (2009) Co-training with noisy perceptual observations. In: Computer vision and pattern recognition, 2009. CVPR, 2009, IEEE conference on, pp 2844–2851. IEEEGoogle Scholar
  52. 52.
    Liu C, Yuen PC (2011) A boosted co-training algorithm for human action recognition. IEEE Trans Circuits Syst Video Technol 21(9):1203–1213CrossRefGoogle Scholar
  53. 53.
    Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1):5–20CrossRefGoogle Scholar
  54. 54.
    Margineantu D, Dietterich T (1997) Pruning adaptive boosting. In: Proceedings of fourteenth international conference machine learning, pp 211–218Google Scholar
  55. 55.
    Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn, pp 181–207Google Scholar
  56. 56.
    Sun S, Jin F (2011) Robust co-training. Int J Pattern Recognit Artif Intell 25:1113–1126MathSciNetCrossRefGoogle Scholar
  57. 57.
    Xu Z, Sun S (2010) An algorithm on multi-view adaboost. Lect Note Comput Sci 6443:355–362CrossRefGoogle Scholar
  58. 58.
    Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neural-network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Adv Neural Inf Process Syst, vol 8. The MIT Press, Cambridge, pp 535–541Google Scholar
  59. 59.
    Buntine W (1990) A theory of learning classification rules. Doctoral Dissertation, School of Computing Science University of Technology. Sydney, AustraliaGoogle Scholar
  60. 60.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–259Google Scholar
  61. 61.
    Chan PK, Stolfo SJ (1993) Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in knowledge discovery in databases, pp 227–240Google Scholar
  62. 62.
    Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8:5–28CrossRefGoogle Scholar
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
    Hodges JL, Lehmann EL (1962) Rank method for combination of independents experiment analysis of variance. Ann Math Stat 33:482–497MathSciNetCrossRefMATHGoogle Scholar
  68. 68.
    Garcia S, Herrera F (2008) An extension of statistical comparison of classifiers over multiple datasets for all pair wise comparisons. Mach Learn Res 09:2677–2694MATHGoogle Scholar
  69. 69.
    Steelv RGD (1959) A multiple comparison sign test: treatments versus control. J Am Stat Assoc 54:767–714Google Scholar
  70. 70.
    Doksum K (1967) Robust procedures for some linear models with one observation per cell. Ann Math Stat 38:878–883MathSciNetCrossRefMATHGoogle Scholar
  71. 71.
    Abramowitz M (1974) Handbook of mathematical functions. In: With formulas, graphs, and mathematical tables. Dover Publication, NYGoogle Scholar
  72. 72.
    Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithm. Swarm Evol Comput 1:3–18CrossRefGoogle Scholar
  73. 73.
    Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64MathSciNetCrossRefMATHGoogle Scholar
  74. 74.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70MathSciNetMATHGoogle Scholar
  75. 75.
    Holland BS, Copenhaver MD (1987) An improved sequentially rejective Bonferroni test procedure. Biometrics 43:417–423MathSciNetCrossRefMATHGoogle Scholar
  76. 76.
    Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923MathSciNetCrossRefMATHGoogle Scholar
  77. 77.
    Garcia S, Fernandez A, Luengo J, Herrera F (2010) Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 18:2044–2064CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.School of Computer and Systems SciencesJawaharlal Nehru UniversityNew DelhiIndia

Personalised recommendations