Journal of Intelligent Manufacturing

, Volume 17, Issue 3, pp 285–299 | Cite as

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

  • Lior RokachEmail author
  • Oded Maimon


Data mining tools can be very beneficial for discovering interesting and useful patterns in complicated manufacturing processes. These patterns can be used, for example, to improve manufacturing quality. However, data accumulated in manufacturing plants have unique characteristics, such as unbalanced distribution of the target attribute, and a small training set relative to the number of input features. Thus, conventional methods are inaccurate in quality improvement cases. Recent research shows, however, that a decomposition tactic may be appropriate here and this paper presents a new feature set decomposition methodology that is capable of dealing with the data characteristics associated with quality improvement. In order to examine the idea, a new algorithm called (Breadth-Oblivious-Wrapper) BOW has been developed. This algorithm performs a breadth first search while using a new F-measure splitting criterion for multiple oblivious trees. The new algorithm was tested on various real-world manufacturing datasets, specifically the food processing industry and integrated circuit fabrication. The obtained results have been compared to other methods, indicating the superiority of the proposed methodology.


Data mining Quality engineering Feature set-decomposition Splitting criterion F-measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bellman, R. 1961Adaptive control processes: a guided tourPrinceton University PressNJGoogle Scholar
  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P 2002SMOTE: Synthetic minority over-sampling techniqueJournal of Artificial Intelligence Research16321357Google Scholar
  3. Duda, R., Hart, P. 1973Pattern Classification and Scene AnalysisWileyNew-YorkGoogle Scholar
  4. Dunteman, G.H. 1989Principal Components AnalysisSage PublicationsCA, Beverley HillsGoogle Scholar
  5. Estabrooks, A., Jo, T., Japkowicz, N. 2004A multiple resampling method for learning from imbalances data setsComputational Intelligence201836CrossRefGoogle Scholar
  6. Ferri, C., Flach, P., Hernández-Orallo, J 2002Learning decision trees using the area under the ROC curveSammut, C.Hoffmann, A. eds. Proceedings of the 19th International Conference on Machine LearningMorgan KaufmannCA139146Google Scholar
  7. Fountain, T. Dietterich T., & Sudyka B. (2000) Mining IC test data to optimize VLSI testing. In J. Simoff & O Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference Boston: MA, USA. pp 18–25Google Scholar
  8. Friedman, J.H., Tukey, J.W. 1973A Projection pursuit algorithm for exploratory data analysisIEEE Transactions on Computers23881889Google Scholar
  9. Gardner, M., & Bieker, J. (2000) Data mining solves tough semiconductor manufacturing problems. In J. Simoff & O. Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference. Boston: MA, USA. pp 376–383Google Scholar
  10. Hand, D. 1998Data mining—reaching beyond statisticsResearch in Official Statistics1517Google Scholar
  11. Hwang, J., Lay, S., Lippman, A. 1994Nonparametric multivariate density estimation: A comparative studyIEEE Transaction on Signal Processing4227952810Google Scholar
  12. Japkowicz, N., Stephen, S. 2002The class imbalance problem: a systematic studyIntelligent Data Analysis Journal6429449Google Scholar
  13. Jimenez, L.O., Landgrebe, D.A. 1998Supervised classification in high- dimensional space: geometrical, statistical, and asymptotical properties of multivariate dataIEEE Transaction on Systems Man, and Cybernetics—Part C: Applications and Reviews283954Google Scholar
  14. John, G.H., Kohavi, R., Pfleger, P 1994Irrelevant features and the subset selection problemCohen, W.Hirsh, H. eds. Proceedings of the Eleventh International Conference In Machine LearningNew BrunswickNJ121129USA, CA: Morgan KaufmannGoogle Scholar
  15. Joshi, V.M. 2002On evaluating performance of classifiers for rare classesWang, H.Yu, S.P.Stolfo, S eds. Proceedings Second IEEE International Conference on Data MiningIEEE Computer Society PressSan Jose, California641644Google Scholar
  16. Kim, J.O., Mueller, C.W. 1978Factor Analysis: Statistical Methods and Practical IssuesSage PublicationsCAGoogle Scholar
  17. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: one-Sided sampling. Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, pp. 179–186Google Scholar
  18. Kusiak, A. 2000Decomposition in data mining: An industrial case studyIEEE Transactions on Electronics Packaging Manufacturing23345353Google Scholar
  19. Kusiak, A. 2001Rough Set Theory: A Data Mining Tool for Semiconductor ManufacturingIEEE Transactions on Electronics Packaging Manufacturing244450CrossRefGoogle Scholar
  20. Kusiak, A., Kurasek, C. 2001Data Mining of Printed-Circuit Board DefectsIEEE Transactions on Robotics and Automation17191196CrossRefGoogle Scholar
  21. Last, M., Kandel, A. 2001Data mining for process and quality control in the semiconductor industryBraha, D eds. Data Mining for Design and Manufacturing: Methods and ApplicationsKluwer Academic PublishersDordrecht207234Google Scholar
  22. Last, M., Maimon, O., Minkov, E. 2002Improving stability of decision treesInternational Journal of Pattern Recognition and Artificial Intelligence16145159CrossRefGoogle Scholar
  23. Liu, H., Motoda, H. 1998Feature Selection for Knowledge Discovery and Data MiningKluwer Academic PublishersDordrechtGoogle Scholar
  24. Maimon, O., Rokach, L. 2001Data mining by attribute decomposition with semiconductors manufacturing case studyBraha, D. eds. Data Mining for Design and Manufacturing: Methods and ApplicationsKluwer Academic PublishersDordrecht311336Google Scholar
  25. Niblett, T. 1987Constructing decision trees in noisy domainsBratko, I.Lavrac, N. eds. Proceedings of the Second European Working Session on LearningSigma Press, WilmslowEngland6778Google Scholar
  26. Nickerson, A., Japkowicz, N., & Milios, E. (2001) Using unsupervised learning to guide resampling in imbalanced data sets. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pp 261–265Google Scholar
  27. Nugroho, A.S., Kuroyanagi, S., Iwata, A. 2002A Solution for Imbalanced Training Sets Problem by CombNET-II and Its Application on Fog ForcastingTransactions on Information and Systems, The Institute of Electronics, Information and Communication Engineers8511651174Google Scholar
  28. Pfahringer, B. (1994). Controlling constructive induction in CiPF. In F. Bergadano, & L. De Raedt (Eds.), Proceedings of the seventh European Conference on Machine Learning. pp 242–256. Springer-VerlagGoogle Scholar
  29. Quinlan, J.R. 1993C4.5: Programs for Machine LearningMorgan KaufmannCAGoogle Scholar
  30. Van Rijsbergen, C.J. (1979). Information retrieval, butterworth ISBN 0-408-70929-4Google Scholar
  31. Zant, P. 1997Microchip fabrication: a Practical Guide to semiconductor processingMcGraw-HillNew YorkGoogle Scholar
  32. Weiss, G.M., Provost, F. 2003Learning when training data are costly: the effect of class distribution on tree inductionJournal of Artificial Intelligence Research.19315354Google Scholar
  33. Weiss, G.M., & Zhang, T. (2003). Performance analysis and evaluation. In Y. Nong, (ed.), The Handbook of Data Mining. Lawrence Erlbaum Associates Publishers, pp 425–439Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Information System EngineeringBen-Gurion University of the NegevIsrael

Personalised recommendations