Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

Rokach, Lior; Maimon, Oded

doi:10.1007/s10845-005-0005-x

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

Published: June 2006

Volume 17, pages 285–299, (2006)
Cite this article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Lior Rokach¹ &
Oded Maimon¹

794 Accesses
55 Citations
Explore all metrics

Abstract

Data mining tools can be very beneficial for discovering interesting and useful patterns in complicated manufacturing processes. These patterns can be used, for example, to improve manufacturing quality. However, data accumulated in manufacturing plants have unique characteristics, such as unbalanced distribution of the target attribute, and a small training set relative to the number of input features. Thus, conventional methods are inaccurate in quality improvement cases. Recent research shows, however, that a decomposition tactic may be appropriate here and this paper presents a new feature set decomposition methodology that is capable of dealing with the data characteristics associated with quality improvement. In order to examine the idea, a new algorithm called (Breadth-Oblivious-Wrapper) BOW has been developed. This algorithm performs a breadth first search while using a new F-measure splitting criterion for multiple oblivious trees. The new algorithm was tested on various real-world manufacturing datasets, specifically the food processing industry and integrated circuit fabrication. The obtained results have been compared to other methods, indicating the superiority of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. Bellman (1961) Adaptive control processes: a guided tour Princeton University Press NJ
Google Scholar
N.V. Chawla K.W. Bowyer L.O. Hall W.P Kegelmeyer (2002) ArticleTitleSMOTE: Synthetic minority over-sampling technique Journal of Artificial Intelligence Research 16 321–357
Google Scholar
R. Duda P. Hart (1973) Pattern Classification and Scene Analysis Wiley New-York
Google Scholar
G.H. Dunteman (1989) Principal Components Analysis Sage Publications CA, Beverley Hills
Google Scholar
A. Estabrooks T. Jo N. Japkowicz (2004) ArticleTitleA multiple resampling method for learning from imbalances data sets Computational Intelligence 20 IssueID1 18–36 Occurrence Handle10.1111/j.0824-7935.2004.t01-1-00228.x
Article Google Scholar
C. Ferri P. Flach J Hernández-Orallo (2002) Learning decision trees using the area under the ROC curve C. Sammut A. Hoffmann (Eds) Proceedings of the 19th International Conference on Machine Learning Morgan Kaufmann CA 139–146
Google Scholar
Fountain, T. Dietterich T., & Sudyka B. (2000) Mining IC test data to optimize VLSI testing. In J. Simoff & O Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference Boston: MA, USA. pp 18–25
J.H. Friedman J.W. Tukey (1973) ArticleTitleA Projection pursuit algorithm for exploratory data analysis IEEE Transactions on Computers 23 IssueID9 881–889
Google Scholar
Gardner, M., & Bieker, J. (2000) Data mining solves tough semiconductor manufacturing problems. In J. Simoff & O. Zaiane, (Eds.), Proceedings 6th ACM SIGKDD Conference. Boston: MA, USA. pp 376–383
D. Hand (1998) ArticleTitleData mining—reaching beyond statistics Research in Official Statistics 1 IssueID2 5–17
Google Scholar
J. Hwang S. Lay A. Lippman (1994) ArticleTitleNonparametric multivariate density estimation: A comparative study IEEE Transaction on Signal Processing 42 IssueID10 2795–2810
Google Scholar
N. Japkowicz S. Stephen (2002) ArticleTitleThe class imbalance problem: a systematic study Intelligent Data Analysis Journal 6 IssueID5 429–449
Google Scholar
L.O. Jimenez D.A. Landgrebe (1998) ArticleTitleSupervised classification in high- dimensional space: geometrical, statistical, and asymptotical properties of multivariate data IEEE Transaction on Systems Man, and Cybernetics—Part C: Applications and Reviews 28 39–54
Google Scholar
G.H. John R. Kohavi P Pfleger (1994) Irrelevant features and the subset selection problem W. Cohen H. Hirsh (Eds) Proceedings of the Eleventh International Conference In Machine Learning New Brunswick NJ 121–129
Google Scholar
V.M. Joshi (2002) On evaluating performance of classifiers for rare classes H. Wang S.P. Yu S Stolfo (Eds) Proceedings Second IEEE International Conference on Data Mining IEEE Computer Society Press San Jose, California 641–644
Google Scholar
J.O. Kim C.W. Mueller (1978) Factor Analysis: Statistical Methods and Practical Issues Sage Publications CA
Google Scholar
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: one-Sided sampling. Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, pp. 179–186
A. Kusiak (2000) ArticleTitleDecomposition in data mining: An industrial case study IEEE Transactions on Electronics Packaging Manufacturing 23 IssueID4 345–353
Google Scholar
A. Kusiak (2001) ArticleTitleRough Set Theory: A Data Mining Tool for Semiconductor Manufacturing IEEE Transactions on Electronics Packaging Manufacturing 24 IssueID1 44–50 Occurrence Handle10.1109/6104.924792
Article Google Scholar
A. Kusiak C. Kurasek (2001) ArticleTitleData Mining of Printed-Circuit Board Defects IEEE Transactions on Robotics and Automation 17 IssueID2 191–196 Occurrence Handle10.1109/70.928564
Article Google Scholar
M. Last A. Kandel (2001) Data mining for process and quality control in the semiconductor industry D Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 207–234
Google Scholar
M. Last O. Maimon E. Minkov (2002) ArticleTitleImproving stability of decision trees International Journal of Pattern Recognition and Artificial Intelligence 16 IssueID2 145–159 Occurrence Handle10.1142/S0218001402001599
Article Google Scholar
H. Liu H. Motoda (1998) Feature Selection for Knowledge Discovery and Data Mining Kluwer Academic Publishers Dordrecht
Google Scholar
O. Maimon L. Rokach (2001) Data mining by attribute decomposition with semiconductors manufacturing case study D. Braha (Eds) Data Mining for Design and Manufacturing: Methods and Applications Kluwer Academic Publishers Dordrecht 311–336
Google Scholar
T. Niblett (1987) Constructing decision trees in noisy domains I. Bratko N. Lavrac (Eds) Proceedings of the Second European Working Session on Learning Sigma Press, Wilmslow England 67–78
Google Scholar
Nickerson, A., Japkowicz, N., & Milios, E. (2001) Using unsupervised learning to guide resampling in imbalanced data sets. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pp 261–265
A.S. Nugroho S. Kuroyanagi A. Iwata (2002) ArticleTitleA Solution for Imbalanced Training Sets Problem by CombNET-II and Its Application on Fog Forcasting Transactions on Information and Systems, The Institute of Electronics, Information and Communication Engineers 85 IssueID7 1165–1174
Google Scholar
Pfahringer, B. (1994). Controlling constructive induction in CiPF. In F. Bergadano, & L. De Raedt (Eds.), Proceedings of the seventh European Conference on Machine Learning. pp 242–256. Springer-Verlag
J.R. Quinlan (1993) C4.5: Programs for Machine Learning Morgan Kaufmann CA
Google Scholar
Van Rijsbergen, C.J. (1979). Information retrieval, butterworth ISBN 0-408-70929-4
P. Zant ParticleVan (1997) Microchip fabrication: a Practical Guide to semiconductor processing McGraw-Hill New York
Google Scholar
G.M. Weiss F. Provost (2003) ArticleTitleLearning when training data are costly: the effect of class distribution on tree induction Journal of Artificial Intelligence Research. 19 315–354
Google Scholar
Weiss, G.M., & Zhang, T. (2003). Performance analysis and evaluation. In Y. Nong, (ed.), The Handbook of Data Mining. Lawrence Erlbaum Associates Publishers, pp 425–439

Download references

Author information

Authors and Affiliations

Department of Information System Engineering, Ben-Gurion University of the Negev, Israel
Lior Rokach & Oded Maimon

Authors

Lior Rokach
View author publications
You can also search for this author in PubMed Google Scholar
Oded Maimon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lior Rokach.

Additional information

Received: September 2004 / Accepted: September 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rokach, L., Maimon, O. Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach. J Intell Manuf 17, 285–299 (2006). https://doi.org/10.1007/s10845-005-0005-x

Download citation

Issue Date: June 2006
DOI: https://doi.org/10.1007/s10845-005-0005-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

Abstract

Access this article

Similar content being viewed by others

Process monitoring for quality–a feature selection method for highly unbalanced binary data

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Research on real time feature extraction method for complex manufacturing big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

Abstract

Access this article

Similar content being viewed by others

Process monitoring for quality–a feature selection method for highly unbalanced binary data

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Research on real time feature extraction method for complex manufacturing big data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation