Data Mining using Decomposition Methods

  • Lior Rokach
  • Oded Maimon


The idea of decomposition methodology is to break down a complex Data Mining task into several smaller, less complex and more manageable, sub-tasks that are solvable by using existing tools, then joining their solutions together in order to solve the original problem. In this chapter we provide an overview of decomposition methods in classification tasks with emphasis on elementary decomposition methods. We present the main properties that characterize various decomposition frameworks and the advantages of using these framework. Finally we discuss the uniqueness of decomposition methodology as opposed to other closely related fields, such as ensemble methods and distributed data mining.

Key words

Decomposition Mixture-of-Experts Elementary Decomposition Methodology Function Decomposition Distributed Data Mining Parallel Data Mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24: 3, 173-202, 1996.Google Scholar
  2. Anand R, Methrotra K, Mohan CK, Ranka S. Efficient classification for multiclass problems using modular neural networks. IEEE Trans Neural Networks, 6(1): 117-125, 1995.CrossRefGoogle Scholar
  3. Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier.CrossRefGoogle Scholar
  4. Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Contextsensitive medical information retrieval, The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286.Google Scholar
  5. Baxt,W. G., Use of an artificial neural network for data analysis in clinical decision making: The diagnosis of acute coronary occlusion. Neural Computation, 2(4):480-489, 1990.CrossRefGoogle Scholar
  6. Bay, S., Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis, 3(3): 191-209, 1999.CrossRefGoogle Scholar
  7. Bhargava H. K., Data Mining by Decomposition: Adaptive Search for Hypothesis Generation, INFORMS Journal on Computing Vol. 11, Iss. 3, pp. 239-47, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  8. Biermann, A.W., Faireld, J., and Beres, T., 1982. Signature table systems and learning. IEEE Trans. Syst. Man Cybern., 12(5):635-648.zbMATHCrossRefGoogle Scholar
  9. Blum A., and Mitchell T., Combining Labeled and Unlabeled Data with CoTraining. In Proc. of the 11th Annual Conference on Computational Learning Theory, pages 92-100, 1998.Google Scholar
  10. Breiman L., Bagging predictors, Machine Learning, 24(2):123-140, 1996.zbMATHMathSciNetGoogle Scholar
  11. Buntine, W., “Graphical Models for Discovering Knowledge”, in U. Fayyad, G. Piatetsky- Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp 59-82. AAAI/MIT Press, 1996.Google Scholar
  12. Chan P.K. and Stolfo S.J, On the Accuracy of Meta-learning for Scalable Data Mining, J. Intelligent Information Systems, 8:5-28, 1997.CrossRefGoogle Scholar
  13. Chen K., Wang L. and Chi H., Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification, International Journal of Pattern Recognition and Artificial Intelligence, 11(3): 417-445, 1997.CrossRefGoogle Scholar
  14. Cherkauer, K.J., Human Expert-Level Performance on a Scientific Image Analysis Task by a System Using Combined Artificial Neural Networks. In Working Notes, Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms Workshop, Thirteenth National Conference on Artificial Intelligence. Portland, OR: AAAI Press, 1996.Google Scholar
  15. Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.CrossRefGoogle Scholar
  16. Dietterich, T. G., and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2:263-286, 1995.zbMATHGoogle Scholar
  17. Domingos, P., Using Partitioning to Speed Up Specific-to-General Rule Induction. In Proceedings of the AAAI-96Workshop on Integrating Multiple Learned Models, pp. 29-34, AAAI Press, 1996.Google Scholar
  18. Domingos, P., & Pazzani, M., On the Optimality of the Naive Bayes Classifier under Zero- One Loss, Machine Learning, 29: 2, 103-130, 1997.zbMATHCrossRefGoogle Scholar
  19. Fischer, B., “Decomposition of Time Series - Comparing Different Methods in Theory and Practice”, Eurostat Working Paper, 1995.Google Scholar
  20. Friedman, J. H., “Multivariate Adaptive Regression Splines”, The Annual Of Statistics, 19, 1-141, 1991.zbMATHCrossRefGoogle Scholar
  21. Friedman N., Geiger D., and Goldszmidt M., Bayesian Network Classifiers, Machine Learning 29: 2-3, 131-163, 1997.zbMATHCrossRefGoogle Scholar
  22. Gama J., A Linear-Bayes Classifier. In C. Monard, editor, Advances on Artificial Intelligence – SBIA2000. LNAI 1952, pp 269-279, Springer Verlag, 2000Google Scholar
  23. Grossman R., Kasif S., Moore R., Rocke D., and Ullman J., Data Mining research: Opportunities and challenges. Report of three NSF workshops on mining large, massive, and distributed data, 1999.Google Scholar
  24. Guo Y. and Sutiwaraphun J., Knowledge probing in distributed Data Mining, in Proc. 4 h Int. Conf. Knowledge Discovery Data Mining, pp 61-69, 1998.Google Scholar
  25. Hansen J., Combining Predictors. Meta Machine Learning Methods and Bias, Variance & Ambiguity Decompositions. PhD dissertation. Aurhus University. 2000.Google Scholar
  26. Hampshire, J. B., and Waibel, A. The meta-Pi network - building distributed knowledge representations for robust multisource pattern-recognition. Pattern Analyses and Machine Intelligence 14(7): 751-769, 1992.CrossRefGoogle Scholar
  27. He D. W., Strege B., Tolle H., and Kusiak A., Decomposition in Automatic Generation of Petri Nets for Manufacturing System Control and Scheduling, International Journal of Production Research, 38(6): 1437-1457, 2000.zbMATHCrossRefGoogle Scholar
  28. Holmstrom, L., Koistinen, P., Laaksonen, J., and Oja, E., Neural and statistical classifiers - taxonomy and a case study. IEEE Trans. on Neural Networks, 8,:5–17, 1997.CrossRefGoogle Scholar
  29. Hrycej T., Modular Learning in Neural Networks. New York: Wiley, 1992.zbMATHGoogle Scholar
  30. Hu, X., Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications. ICDM01. pp 233-240, 2001.Google Scholar
  31. Jenkins R. and Yuhas, B. P. A simplified neural network solution through problem decomposition: The case of Truck backer-upper, IEEE Transactions on Neural Networks 4(4):718-722, 1993.CrossRefGoogle Scholar
  32. Johansen T. A. and Foss B. A., A narmax model representation for adaptive control based on local model -Modeling, Identification and Control, 13(1):25-39, 1992.CrossRefGoogle Scholar
  33. Jordan, M. I., and Jacobs, R. A., Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214, 1994.CrossRefGoogle Scholar
  34. Kargupta, H. and Chan P., eds, Advances in Distributed and Parallel Knowledge Discovery , pp. 185-210, AAAI/MIT Press, 2000.Google Scholar
  35. Kohavi R., Becker B., and Sommerfield D., Improving simple Bayes. In Proceedings of the European Conference on Machine Learning, 1997.Google Scholar
  36. Kononenko, I., Comparison of inductive and Naive Bayes learning approaches to automatic knowledge acquisition. In B. Wielinga (Ed.), Current Trends in Knowledge Acquisition, Amsterdam, The Netherlands IOS Press, 1990.Google Scholar
  37. Kononenko, I., SemiNaive Bayes classifier, Proceedings of the Sixth EuropeanWorking Session on Learning, pp. 206-219, Porto, Portugal: SpringerVerlag, 1991.Google Scholar
  38. Kusiak, A., Decomposition in Data Mining: An Industrial Case Study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, pp. 345-353, 2000.CrossRefGoogle Scholar
  39. Kusiak, E. Szczerbicki, and K. Park, A Novel Approach to Decomposition of Design Specifications and Search for Solutions, International Journal of Production Research, 29(7): 1391-1406, 1991.CrossRefGoogle Scholar
  40. Langley, P. and Sage, S., Oblivious decision trees and abstract cases. inWorking Notes of the AAAI-94 Workshop on Case-Based Reasoning, pp. 113-117, Seattle, WA: AAAI Press, 1994.Google Scholar
  41. Liao Y., and Moody J., Constructing Heterogeneous Committees via Input Feature Grouping, in Advances in Neural Information Processing Systems, Vol.12, S.A. Solla, T.K. Leen and K.-R. Muller (eds.),MIT Press, 2000.Google Scholar
  42. Long C., Bi-Decomposition of Function Sets Using Multi-Valued Logic, Eng. Doc. Dissertation, Technischen Universitat Bergakademie Freiberg 2003.Google Scholar
  43. Lu B.L., Ito M., Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification, IEEE Trans. on Neural Networks, 10(5):1244-1256, 1999.CrossRefGoogle Scholar
  44. Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.Google Scholar
  45. Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002.Google Scholar
  46. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial Intelligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005.Google Scholar
  47. Meretakis, D. and Wthrich, B., Extending Nave Bayes Classifiers Using Long Itemsets, in Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 165-174, San Diego, USA, 1999.Google Scholar
  48. Michie, D., Problem decomposition and the learning of skills, in Proceedings of the European Conference on Machine Learning, pp. 17-31, Springer-Verlag, 1995.Google Scholar
  49. Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008.zbMATHCrossRefMathSciNetGoogle Scholar
  50. Nowlan S. J., and Hinton G. E. Evaluation of adaptive mixtures of competing experts. In Advances in Neural Information Processing Systems, R. P. Lippmann, J. E. Moody, and D. S. Touretzky, Eds., vol. 3, pp. 774-780, Morgan Kaufmann Publishers Inc., 1991.Google Scholar
  51. Ohno-Machado, L., and Musen, M. A. Modular neural networks for medical prognosis: Quantifying the benefits of combining neural networks for survival prediction. Connection Science 9, 1, 1997, 71-86.CrossRefGoogle Scholar
  52. Peng, F. and Jacobs R. A., and Tanner M. A., Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Application to Speech Recognition, Journal of the American Statistical Association, 1995.Google Scholar
  53. Pratt, L. Y., Mostow, J., and Kamm C. A., Direct Transfer of Learned Information Among Neural Networks, in: Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 584-589, 1991.Google Scholar
  54. Provost, F.J. and Kolluri, V., A Survey of Methods for Scaling Up Inductive Learning Algorithms, Proc. 3rd International Conference on Knowledge Discovery and Data Mining, 1997.Google Scholar
  55. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, 1993.Google Scholar
  56. Rahman, A. F. R., and Fairhurst, M. C. A new hybrid approach in combining multiple experts to recognize handwritten numerals. Pattern Recognition Letters, 18: 781-790,1997.CrossRefGoogle Scholar
  57. Ramamurti, V., and Ghosh, J., Structurally Adaptive Modular Networks for Non-Stationary Environments, IEEE Transactions on Neural Networks, 10 (1):152-160, 1999.CrossRefGoogle Scholar
  58. Ridgeway, G., Madigan, D., Richardson, T. and O’Kane, J., Interpretable Boosted Naive Bayes Classification, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp 101-104, 1998.Google Scholar
  59. Rokach, L., Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.CrossRefMathSciNetGoogle Scholar
  60. Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.zbMATHCrossRefGoogle Scholar
  61. Rokach L., Mining manufacturing data using genetic algorithm-based feature set decomposition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.CrossRefGoogle Scholar
  62. Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001.Google Scholar
  63. Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intelligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.Google Scholar
  64. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer.Google Scholar
  65. Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–299, 2006, Springer.CrossRefGoogle Scholar
  66. Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications,World Scientific Publishing, 2008.Google Scholar
  67. Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.Google Scholar
  68. Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer-Verlag, 2004.Google Scholar
  69. Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350.CrossRefGoogle Scholar
  70. Ronco, E., Gollee, H., and Gawthrop, P. J., Modular neural network and self-decomposition. CSC Research Report CSC-96012, Centre for Systems and Control, University of Glasgow, 1996.Google Scholar
  71. Saaty, X., The analytic hierarchy process: A 1993 overview. Central European Journal for Operations Research and Economics, Vol. 2, No. 2, p. 119-137, 1993.zbMATHMathSciNetGoogle Scholar
  72. Samuel, A., Some studies in machine learning using the game of checkers II: Recent progress. IBM J. Res. Develop., 11:601-617, 1967.CrossRefGoogle Scholar
  73. Sharkey, A., On combining artificial neural nets, Connection Science, Vol. 8, pp.299-313, 1996.CrossRefGoogle Scholar
  74. Sharkey, A., Multi-Net Iystems, In Sharkey A. (Ed.) Combining Artificial Neural Networks: Ensemble and Modular Multi-Net Systems. pp. 1-30, Springer-Verlag, 1999.Google Scholar
  75. Tumer, K. and Ghosh J., Error Correlation and Error Reduction in Ensemble Classifiers, Connection Science, Special issue on combining artificial neural networks: ensemble approaches, 8 (3-4): 385-404, 1996.Google Scholar
  76. Tumer, K., and Ghosh J., Linear and Order Statistics Combiners for Pattern Classification, in Combining Articial Neural Nets, A. Sharkey (Ed.), pp. 127-162, Springer-Verlag, 1999.Google Scholar
  77. Weigend, A. S., Mangeas, M., and Srivastava, A. N. Nonlinear gated experts for time-series - discovering regimes and avoiding overfitting. International Journal of Neural Systems 6(5):373-399, 1995.CrossRefGoogle Scholar
  78. Zaki, M. J., Ho C. T., and Agrawal, R., Scalable parallel classification for Data Mining on shared- memory multiprocessors, in Proc. IEEE Int. Conf. Data Eng., Sydney, Australia, WKDD99, pp. 198– 205, 1999.Google Scholar
  79. Zaki, M. J., Ho C. T., Eds., Large- Scale Parallel Data Mining. New York: Springer- Verlag, 2000.Google Scholar
  80. Zupan, B., Bohanec, M., Demsar J., and Bratko, I., Feature transformation by function decomposition, IEEE intelligent systems & their applications, 13: 38-43, 1998.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Information System EngineeringBen-Gurion UniversityBeer-ShebaIsrael
  2. 2.Department of Industrial EngineeringTel-Aviv UniversityRamat-AvivIsrael

Personalised recommendations