Mining manufacturing databases to discover the effect of operation sequence on the product quality
- 162 Downloads
Data mining techniques can be used for discovering interesting patterns in complicated manufacturing processes. These patterns are used to improve manufacturing quality. Classical representations of quality data mining problems usually refer to the operations settings and not to their sequence. This paper examines the effect of the operation sequence on the quality of the product using data mining techniques. For this purpose a novel decision tree framework for extracting sequence patterns is developed. The proposed method is capable to mine sequence patterns of any length with operations that are not necessarily immediate precedents. The core induction algorithmic framework consists of four main steps. In the first step, all manufacturing sequences are represented as string of tokens. In the second step a large set of regular expression-based patterns are induced by employing a sequence patterns. In the third step we use feature selection methods to filter out the initial set, and leave only the most useful patterns. In the last stage, we transform the quality problem into a classification problem and employ a decision tree induction algorithm. A comparative study performed on benchmark databases illustrates the capabilities of the proposed framework.
KeywordsData mining Sequence mining Decision trees Quality engineering
Unable to display preview. Download preview PDF.
- Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases, in Proceedings of the International Conference on Large Databases, pp. 478–499.Google Scholar
- Chizi, B., & Maimon, O. (2005). Dimension reduction and feature selection, the data mining and knowledge discovery handbook. In O. Maimon & L. Rokach (Eds.), (pp. 93–111), Springer.Google Scholar
- Frank, E., Hall, M., Holmes, G., Kirkby, R., & Pfahringer, B. (2005). WEKA – A Machine Learning Workbench for Data Mining. In O. Maimon & L. Rokach (Eds.), The data mining and knowledge discovery handbook. Springer, pp. 1305–1314.Google Scholar
- Freitag, D. (1998) Toward general-purpose learning for information extraction. Proceedings of the thirty-sixth annual meeting of the association for computational linguistics and seventeenth international conference on computational linguistics, pp. 404–408.Google Scholar
- GNU Diff (2003). Retrieved October 31, 2006 from http://www.bmsi.com/java/#diff.
- Hall, M. (1999). Correlation-based feature selection for machine learning, Phd Thesis, University of Waikato.Google Scholar
- Hand D. (1998). Data Mining – reaching beyond statistics. Research in Official Statistics 1(2): 5–17 Google Scholar
- Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-2001), pp. 282–289.Google Scholar
- Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan KaufmannGoogle Scholar
- Rakotomalala, R. (2005). TANAGRA: a free software for research and academic purposes. In Proceedings of EGC’2005, RNTI-E-3, Vol. 2, pp.697–702.Google Scholar