Skip to main content
Log in

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

Data mining tools can be very beneficial for discovering interesting and useful patterns in complicated manufacturing processes. These patterns can be used, for example, to improve manufacturing quality. However, data accumulated in manufacturing plants have unique characteristics, such as unbalanced distribution of the target attribute, and a small training set relative to the number of input features. Thus, conventional methods are inaccurate in quality improvement cases. Recent research shows, however, that a decomposition tactic may be appropriate here and this paper presents a new feature set decomposition methodology that is capable of dealing with the data characteristics associated with quality improvement. In order to examine the idea, a new algorithm called (Breadth-Oblivious-Wrapper) BOW has been developed. This algorithm performs a breadth first search while using a new F-measure splitting criterion for multiple oblivious trees. The new algorithm was tested on various real-world manufacturing datasets, specifically the food processing industry and integrated circuit fabrication. The obtained results have been compared to other methods, indicating the superiority of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • R. Bellman (1961) Adaptive control processes: a guided tour Princeton University Press NJ

    Google Scholar 

  • N.V. Chawla K.W. Bowyer L.O. Hall W.P Kegelmeyer (2002) ArticleTitleSMOTE: Synthetic minority over-sampling technique Journal of Artificial Intelligence Research 16 321–357

    Google Scholar 

  • R. Duda P. Hart (1973) Pattern Classification and Scene Analysis Wiley New-York

    Google Scholar 

  • G.H. Dunteman (1989) Principal Components Analysis Sage Publications CA, Beverley Hills

    Google Scholar 

  • A. Estabrooks T. Jo N. Japkowicz (2004) ArticleTitleA multiple resampling method for learning from imbalances data sets Computational Intelligence 20 IssueID1 18–36 Occurrence Handle10.1111/j.0824-7935.2004.t01-1-00228.x

    Article  Google Scholar 

  • C. Ferri P. Flach J Hernández-Orallo (2002) Learning decision trees using the area under the ROC curve C. Sammut A. Hoffmann (Eds) Proceedings of the 19th International Conference on Machine Learning Morgan Kaufmann CA 139–146

    Google Scholar 

  • Fountain, T. Dietterich T., & Sudyka B. (2000) Mining IC test data to optimize VLSI testing. In J. Simoff & O Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference Boston: MA, USA. pp 18–25

  • J.H. Friedman J.W. Tukey (1973) ArticleTitleA Projection pursuit algorithm for exploratory data analysis IEEE Transactions on Computers 23 IssueID9 881–889

    Google Scholar 

  • Gardner, M., & Bieker, J. (2000) Data mining solves tough semiconductor manufacturing problems. In J. Simoff & O. Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference. Boston: MA, USA. pp 376–383

  • D. Hand (1998) ArticleTitleData mining—reaching beyond statistics Research in Official Statistics 1 IssueID2 5–17

    Google Scholar 

  • J. Hwang S. Lay A. Lippman (1994) ArticleTitleNonparametric multivariate density estimation: A comparative study IEEE Transaction on Signal Processing 42 IssueID10 2795–2810

    Google Scholar 

  • N. Japkowicz S. Stephen (2002) ArticleTitleThe class imbalance problem: a systematic study Intelligent Data Analysis Journal 6 IssueID5 429–449

    Google Scholar 

  • L.O. Jimenez D.A. Landgrebe (1998) ArticleTitleSupervised classification in high- dimensional space: geometrical, statistical, and asymptotical properties of multivariate data IEEE Transaction on Systems Man, and Cybernetics—Part C: Applications and Reviews 28 39–54

    Google Scholar 

  • G.H. John R. Kohavi P Pfleger (1994) Irrelevant features and the subset selection problem W. Cohen H. Hirsh (Eds) Proceedings of the Eleventh International Conference In Machine Learning New Brunswick NJ 121–129

    Google Scholar 

  • V.M. Joshi (2002) On evaluating performance of classifiers for rare classes H. Wang S.P. Yu S Stolfo (Eds) Proceedings Second IEEE International Conference on Data Mining IEEE Computer Society Press San Jose, California 641–644

    Google Scholar 

  • J.O. Kim C.W. Mueller (1978) Factor Analysis: Statistical Methods and Practical Issues Sage Publications CA

    Google Scholar 

  • Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: one-Sided sampling. Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, pp. 179–186

  • A. Kusiak (2000) ArticleTitleDecomposition in data mining: An industrial case study IEEE Transactions on Electronics Packaging Manufacturing 23 IssueID4 345–353

    Google Scholar 

  • A. Kusiak (2001) ArticleTitleRough Set Theory: A Data Mining Tool for Semiconductor Manufacturing IEEE Transactions on Electronics Packaging Manufacturing 24 IssueID1 44–50 Occurrence Handle10.1109/6104.924792

    Article  Google Scholar 

  • A. Kusiak C. Kurasek (2001) ArticleTitleData Mining of Printed-Circuit Board Defects IEEE Transactions on Robotics and Automation 17 IssueID2 191–196 Occurrence Handle10.1109/70.928564

    Article  Google Scholar 

  • M. Last A. Kandel (2001) Data mining for process and quality control in the semiconductor industry D Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 207–234

    Google Scholar 

  • M. Last O. Maimon E. Minkov (2002) ArticleTitleImproving stability of decision trees International Journal of Pattern Recognition and Artificial Intelligence 16 IssueID2 145–159 Occurrence Handle10.1142/S0218001402001599

    Article  Google Scholar 

  • H. Liu H. Motoda (1998) Feature Selection for Knowledge Discovery and Data Mining Kluwer Academic Publishers Dordrecht

    Google Scholar 

  • O. Maimon L. Rokach (2001) Data mining by attribute decomposition with semiconductors manufacturing case study D. Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 311–336

    Google Scholar 

  • T. Niblett (1987) Constructing decision trees in noisy domains I. Bratko N. Lavrac (Eds) Proceedings of the Second European Working Session on Learning Sigma Press, Wilmslow England 67–78

    Google Scholar 

  • Nickerson, A., Japkowicz, N., & Milios, E. (2001) Using unsupervised learning to guide resampling in imbalanced data sets. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pp 261–265

  • A.S. Nugroho S. Kuroyanagi A. Iwata (2002) ArticleTitleA Solution for Imbalanced Training Sets Problem by CombNET-II and Its Application on Fog Forcasting Transactions on Information and Systems, The Institute of Electronics, Information and Communication Engineers 85 IssueID7 1165–1174

    Google Scholar 

  • Pfahringer, B. (1994). Controlling constructive induction in CiPF. In F. Bergadano, & L. De Raedt (Eds.), Proceedings of the seventh European Conference on Machine Learning. pp 242–256. Springer-Verlag

  • J.R. Quinlan (1993) C4.5: Programs for Machine Learning Morgan Kaufmann CA

    Google Scholar 

  • Van Rijsbergen, C.J. (1979). Information retrieval, butterworth ISBN 0-408-70929-4

  • P. Zant ParticleVan (1997) Microchip fabrication: a Practical Guide to semiconductor processing McGraw-Hill New York

    Google Scholar 

  • G.M. Weiss F. Provost (2003) ArticleTitleLearning when training data are costly: the effect of class distribution on tree induction Journal of Artificial Intelligence Research. 19 315–354

    Google Scholar 

  • Weiss, G.M., & Zhang, T. (2003). Performance analysis and evaluation. In Y. Nong, (ed.), The Handbook of Data Mining. Lawrence Erlbaum Associates Publishers, pp 425–439

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lior Rokach.

Additional information

Received: September 2004 / Accepted: September 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rokach, L., Maimon, O. Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach. J Intell Manuf 17, 285–299 (2006). https://doi.org/10.1007/s10845-005-0005-x

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-005-0005-x

Keywords

Navigation