Advertisement

Pattern Analysis and Applications

, Volume 9, Issue 2–3, pp 257–271 | Cite as

Decomposition methodology for classification tasks: a meta decomposer framework

  • Lior Rokach
Theoretical Advances

Abstract

The idea of decomposition methodology for classification tasks is to break down a complex classification task into several simpler and more manageable sub-tasks that are solvable by using existing induction methods, then joining their solutions together in order to solve the original problem. In this paper we provide an overview of very popular but diverse decomposition methods and introduce a related taxonomy to categorize them. Subsequently, we suggest using this taxonomy to create a novel meta-decomposer framework to automatically select the appropriate decomposition method for a given problem. The experimental study validates the effectiveness of the proposed meta-decomposer on a set of benchmark datasets.

Keywords

Decomposition Method Target Attribute Decomposition Structure Function Decomposition Good Decomposition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgment

The author would like to thank Professor Oded Maimon for his helpful and inspiring comments.

References

  1. 1.
    Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Mach Learn 24(3):173-202Google Scholar
  2. 2.
    Anand R, Methrotra K, Mohan CK, Ranka S (1995) Efficient classification for multiclass problems using modular neural networks. IEEE Trans Neural Netw 6(1):117-125CrossRefGoogle Scholar
  3. 3.
    Baxt WG (1990) Use of an artificial neural network for data analysis in clinical decision making: the diagnosis of acute coronary occlusion. Neural Comput 2(4):480-489Google Scholar
  4. 4.
    Bay S (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3(3):191-209CrossRefGoogle Scholar
  5. 5.
    Bensusan H, Kalousis A (2001) Estimating the predictive accuracy of a classifier. In: Proceedings of the 12th European conference on machine learning, pp 25–36Google Scholar
  6. 6.
    Berry M, Linoff G (2000) Mastering data mining. WileyGoogle Scholar
  7. 7.
    Bhargava HK (1999) Data mining by decomposition: adaptive search for hypothesis generation. INFORMS J Comput 11(3):239-247zbMATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Biermann AW, Faireld J, Beres T (1982) Signature table systems and learning. IEEE Trans Syst Man Cybern 12(5):635-648zbMATHCrossRefGoogle Scholar
  9. 9.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with cotraining. In: Proceedings of the 11th annual conference on computational learning theory, pp 92–100Google Scholar
  10. 10.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123-140zbMATHMathSciNetGoogle Scholar
  11. 11.
    Buntine W (1996) Graphical models for discovering knowledge. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 59–82Google Scholar
  12. 12.
    Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inform Syst 8:5–28CrossRefGoogle Scholar
  13. 13.
    Chen K, Wang L, Chi H (1997) Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Intern J Pattern Recognit Artif Intell 11(3):417–445CrossRefGoogle Scholar
  14. 14.
    Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Notes, integrating multiple learned models for improving and scaling machine learning algorithms workshop, thirteenth national conference on artificial intelligence. AAAI Press, PortlandGoogle Scholar
  15. 15.
    Dietterich TG, Ghulum Bakiri (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286zbMATHGoogle Scholar
  16. 16.
    Domingos P (1996) Using partitioning to speed up specific-to-general rule induction. In: Proceedings of the AAAI-96 workshop on integrating multiple learned models, AAAI Press, pp 29–34Google Scholar
  17. 17.
    Domingos P, Pazzani M (1997) On the optimality of the Naive Bayes classifier under zero-one loss. Mach Learn 29(2):103–130zbMATHCrossRefGoogle Scholar
  18. 18.
    Fischer B (1995) Decomposition of time series—comparing different methods in theory and practice, Eurostat Working PaperGoogle Scholar
  19. 19.
    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–141zbMATHGoogle Scholar
  20. 20.
    Fürnkranz J (1997) More efficient windowing. In: Proceeding of the 14th national conference on artificial intelligence (AAAI-97). AAAI Press, Providence pp 509–514Google Scholar
  21. 21.
    Gama J (2000) A linear-Bayes classifier. In: Monard C (eds) Advances on artificial intelligence—SBIA2000. LNAI 1952, Springer Verlag, pp 269–279Google Scholar
  22. 22.
    Giraud–Carrier C, Vilalta R, Brazdil P (2004) Introduction to the special issue of on meta-learning. Mach Learn 54(3):197–194CrossRefGoogle Scholar
  23. 23.
    Guo Y, Sutiwaraphun J (1998) Knowledge probing in distributed data mining, In: Proceedings of the 4th international conference on knowledge discovery data mining, pp 61–69Google Scholar
  24. 24.
    Hampshire JB, Waibel A (1992) The meta-pi network—building distributed knowledge representations for robust multisource pattern-recognition. Pattern Anal Mach Intell 14(7): 751–769CrossRefGoogle Scholar
  25. 25.
    Hansen J (2000) Combining predictors. Meta machine learning methods and bias, variance & ambiguity decompositions. Ph.D. dissertation. Aurhus UniversityGoogle Scholar
  26. 26.
    He DW, Strege B, Tolle H, Kusiak A (2000) Decomposition in automatic generation of petri nets for manufacturing system control and scheduling. Int J Prod Res 38(6): 1437–1457zbMATHCrossRefGoogle Scholar
  27. 27.
    Holmstrom L, Koistinen P, Laaksonen J, Oja E (1997) Neural and statistical classifiers–taxonomy and a case study. IEEE Trans Neural Netw 8:5–17CrossRefGoogle Scholar
  28. 28.
    Hrycej T (1992) Modular learning in neural networks. Wiley, New YorkzbMATHGoogle Scholar
  29. 29.
    Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. ICDM01. pp 233–240Google Scholar
  30. 30.
    Jenkins R, Yuhas BP (1993) A simplified neural network solution through problem decomposition: the case of truck backer-upper. IEEE Trans Neural Netw 4(4):718–722CrossRefGoogle Scholar
  31. 31.
    Johansen TA, Foss BA (1992) A narmax model representation for adaptive control based on local model—modeling. Identification Control 13(1):25–39CrossRefGoogle Scholar
  32. 32.
    Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6:181–214Google Scholar
  33. 33.
    Kargupta H, Chan P (eds) (2000) Advances in distributed and parallel knowledge discovery. AAAI/MIT Press, pp 185–210Google Scholar
  34. 34.
    Kohavi R, John G (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Kluwer, DrechtGoogle Scholar
  35. 35.
    Kohavi R, Becker B, Sommerfield D (1997) Improving simple Bayes. In: Proceedings of the European conference on machine learningGoogle Scholar
  36. 36.
    Kononenko I (1990) Comparison of inductive and Naive Bayes learning approaches to automatic knowledge acquisition. In: Wielinga B (eds) Current trends in knowledge acquisition. The Netherlands IOS Press, AmsterdamGoogle Scholar
  37. 37.
    Kusiak A (2000) Decomposition in data mining: an industrial case study. IEEE Trans Electron Packag Manuf 23(4):345–353CrossRefGoogle Scholar
  38. 38.
    Kusiak E Szczerbicki, Park K (1991) A novel approach to decomposition of design specifications and search for solutions. Int J Prod Res 29(7):1391–1406Google Scholar
  39. 39.
    Liao Y, Moody J (2000) Constructing heterogeneous committees via input feature grouping, In: Solla SA, Leen TK, Muller K-R (eds) Advances in neural information processing systems, vol12. MIT PressGoogle Scholar
  40. 40.
    Long C (2003) Bi-Decomposition of function sets using multi-valued logic, eng. doc. Dissertation, Technischen Universitat Bergakademie FreibergGoogle Scholar
  41. 41.
    Lu BL, Ito M (1999) Task decomposition and module combination based on class relations: a modular neural network for pattern classification. IEEE Trans Neural Netw 10(5):1244–1256CrossRefGoogle Scholar
  42. 42.
    Maimon O, Rokach L (2005) Decomposition methodology for knowledge discovery and data mining: theory and applications. World ScientificGoogle Scholar
  43. 43.
    Merz CJ, Murphy PM (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science, IrvineGoogle Scholar
  44. 44.
    Michie D (1995) Problem decomposition and the learning of skills. In: Proceedings of the European conference on machine learning, Springer, Berlin Heidelberg New York, pp 17–31Google Scholar
  45. 45.
    Nowlan SJ, Hinton GE (1991) Evaluation of adaptive mixtures of competing experts. In: Lippmann RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kaufmann Publishers Inc., pp 774–780Google Scholar
  46. 46.
    Ohno-Machado L, Musen MA (1997) Modular neural networks for medical prognosis: quantifying the benefits of combining neural networks for survival prediction. Connect Sci 9(1):71–86CrossRefGoogle Scholar
  47. 47.
    Peng F, Jacobs RA, Tanner MA (1995) Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. J Am Stat AssocGoogle Scholar
  48. 48.
    Pratt LY, Mostow J, Kamm CA (1991) Direct transfer of learned information among neural networks. In: Proceedings of the ninth national conference on artificial intelligence, Anaheim, CA, pp 584–589Google Scholar
  49. 49.
    Prodromidis AL, Stolfo SJ, Chan PK (1999) Effective and efficient pruning of metaclassifiers in a distributed data mining system. Technical report CUCS-017-99, Columbia UniversityGoogle Scholar
  50. 50.
    Provost FJ, Kolluri V (1997) A survey of methods for scaling up inductive learning algorithms. In: Proceedings of the 3rd international conference on knowledge discovery and data miningGoogle Scholar
  51. 51.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los AltosGoogle Scholar
  52. 52.
    Rahman AFR, Fairhurst MC (1997) A new hybrid approach in combining multiple experts to recognize handwritten numerals. Pattern Recognit Lett 18: 781–790CrossRefGoogle Scholar
  53. 53.
    Ramamurti V, Ghosh J (1999) Structurally adaptive modular networks for non-stationary environments. IEEE Trans Neural Netw 10(1):152–160CrossRefGoogle Scholar
  54. 54.
    R’enyi A (1970) Probability theory. North-Holland, AmsterdamGoogle Scholar
  55. 55.
    Ridgeway G, Madigan D, Richardson T, O’Kane J (1998) Interpretable boosted Naive Bayes classification. Proceedings of the fourth international conference on knowledge discovery and data mining, pp 101–104Google Scholar
  56. 56.
    Rokach L, Maimon O (2005) Feature set decomposition for decision trees. Intell Data Anal 6(2):1–28Google Scholar
  57. 57.
    Rokach L, Maimon O (2006) Decomposition methodology for classification tasks. In: Proceedings of the IEEE international conference on granular computing, Beijing, July 2005, IEEE Computer Society Press. ISBN: 0-7803-9017-2, pp 636–641Google Scholar
  58. 58.
    Rokach L, Maimon O (2006) Data mining for improving the quality of manufacturing: a feature set decomposition approach. J Intell Manuf 17(3):285–299CrossRefGoogle Scholar
  59. 59.
    Rokach L, Maimon O, Lavi I (2003) Space decomposition in data mining: a clustering approach. In: Proceedings of the 14th international symposium on methodologies for intelligent systems, Maebashi, Japan. Lecture notes in computer science, Springer, Berlin Heidelberg, New York, pp 24–31Google Scholar
  60. 60.
    Rokach L, Maimon O, Arad O (2005) Improving supervised learning by sample decomposition. Int J Comput Intell Appl 5(1):37–54CrossRefGoogle Scholar
  61. 61.
    Ronco E, Gollee H, Gawthrop PJ (1996) Modular neural network and self-decomposition. CSC Research Report CSC-96012, Centre for Systems and Control, University of GlasgowGoogle Scholar
  62. 62.
    Saaty X (1993) The analytic hierarchy process: a 1993 overview. Cent Eur J Oper Res Econ 2(2):119–137zbMATHMathSciNetGoogle Scholar
  63. 63.
    Samuel A (1967) Some studies in machine learning using the game of checkers II: recent progress. IBM J Res Develop 11:601–617CrossRefGoogle Scholar
  64. 64.
    Schaffer C (1993) Selecting a classification method by cross-validation. Mach Learn 13(1):135–143Google Scholar
  65. 65.
    Sharkey A (1996) On combining artificial neural nets. Connect Sci 8:299–313CrossRefGoogle Scholar
  66. 66.
    Sharkey A (1999) Multi-net systems. In: Sharkey A. (eds) Combining artificial neural networks: ensemble and modular multi-net systems. Springer, Berlin Heidelberg New York, pp 1–30Google Scholar
  67. 67.
    Tsallis C (1988) Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 52:479–487zbMATHMathSciNetCrossRefGoogle Scholar
  68. 68.
    Tumer K, Ghosh J (1999) Linear and order statistics combiners for pattern classification. In: Sharkey A (ed) Combining artificial neural nets. Springer, Berlin Heidelberg New York, pp 127–162Google Scholar
  69. 69.
    Vilalta R, Giraud–Carrier C, Brazdil P (2005) Meta-learning. In: Maimon O, Rokach L (eds) Handbook of data mining and knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp 731–748Google Scholar
  70. 70.
    Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series—discovering regimes and avoiding overfitting. Int J Neural Syst 6(5):373–399CrossRefGoogle Scholar
  71. 71.
    Zaki MJ, Ho CT (eds) (2000) Large-scale parallel data mining. Springer, Berlin Heidelberg New YorkGoogle Scholar
  72. 72.
    Zaki MJ, Ho CT, Agrawal R (1999) Scalable parallel classification for data mining on shared- memory multiprocessors. In: Proceedings of the IEEE International Conference on Data Eng. WKDD99, Sydney, pp 198–20Google Scholar
  73. 73.
    Zupan B, Bohanec M, Demsar J, Bratko I (1998) Feature transformation by function decomposition. IEEE Intell Syst Appl 13: 38–43CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  1. 1.Department of Information Systems EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael

Personalised recommendations