Journal of Intelligent Manufacturing

, Volume 19, Issue 3, pp 313–325 | Cite as

Mining manufacturing databases to discover the effect of operation sequence on the product quality

  • Lior RokachEmail author
  • Roni Romano
  • Oded Maimon


Data mining techniques can be used for discovering interesting patterns in complicated manufacturing processes. These patterns are used to improve manufacturing quality. Classical representations of quality data mining problems usually refer to the operations settings and not to their sequence. This paper examines the effect of the operation sequence on the quality of the product using data mining techniques. For this purpose a novel decision tree framework for extracting sequence patterns is developed. The proposed method is capable to mine sequence patterns of any length with operations that are not necessarily immediate precedents. The core induction algorithmic framework consists of four main steps. In the first step, all manufacturing sequences are represented as string of tokens. In the second step a large set of regular expression-based patterns are induced by employing a sequence patterns. In the third step we use feature selection methods to filter out the initial set, and leave only the most useful patterns. In the last stage, we transform the quality problem into a classification problem and employ a decision tree induction algorithm. A comparative study performed on benchmark databases illustrates the capabilities of the proposed framework.


Data mining Sequence mining Decision trees Quality engineering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases, in Proceedings of the International Conference on Large Databases, pp. 478–499.Google Scholar
  2. Braha D. and Shmilovici A. (2003). On the use of decision tree induction for discovery of interactions in a photolithographic process. IEEE Transactions on Semiconductor Manufacturing 16(4): 644–652 CrossRefGoogle Scholar
  3. Chizi, B., & Maimon, O. (2005). Dimension reduction and feature selection, the data mining and knowledge discovery handbook. In O. Maimon & L. Rokach (Eds.), (pp. 93–111), Springer.Google Scholar
  4. da Cunha C., Agard B. and Kusiak A. (2006). Data mining for improvement of product quality. International Journal of Production Research 44(18–19): 4027–4041 CrossRefGoogle Scholar
  5. Damashek M. (1995). Gauging similarity with n-grams: language independent categorization of text. Science 267(5199): 843–848 CrossRefGoogle Scholar
  6. Frank, E., Hall, M., Holmes, G., Kirkby, R., & Pfahringer, B. (2005). WEKA – A Machine Learning Workbench for Data Mining. In O. Maimon & L. Rokach (Eds.), The data mining and knowledge discovery handbook. Springer, pp. 1305–1314.Google Scholar
  7. Freitag, D. (1998) Toward general-purpose learning for information extraction. Proceedings of the thirty-sixth annual meeting of the association for computational linguistics and seventeenth international conference on computational linguistics, pp. 404–408.Google Scholar
  8. GNU Diff (2003). Retrieved October 31, 2006 from
  9. Hall, M. (1999). Correlation-based feature selection for machine learning, Phd Thesis, University of Waikato.Google Scholar
  10. Hand D. (1998). Data Mining – reaching beyond statistics. Research in Official Statistics 1(2): 5–17 Google Scholar
  11. Kusiak A. (2006). Data mining: Manufacturing and service applications. International Journal of Production Research 44(18–19): 4175–4191 CrossRefGoogle Scholar
  12. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-2001), pp. 282–289.Google Scholar
  13. Myers E.W. (1986). An O(ND) difference algorithm and its variations. Algorithmica 1(1): 251–266 CrossRefGoogle Scholar
  14. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan KaufmannGoogle Scholar
  15. Rabiner L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2): 257–286 CrossRefGoogle Scholar
  16. Rigoutsos I. and Floratos A. (1998). Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14(1): 55–67 CrossRefGoogle Scholar
  17. Rakotomalala, R. (2005). TANAGRA: a free software for research and academic purposes. In Proceedings of EGC’2005, RNTI-E-3, Vol. 2, pp.697–702.Google Scholar
  18. Rokach L. (2008). Mining manufacturing data using genetic algorithm-based feature set decomposition. IJISTA 4(1): 57–78 CrossRefGoogle Scholar
  19. Rokach L. and Maimon O. (2005). Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man and Cybernetics, Part C 35(4): 476–487 CrossRefGoogle Scholar
  20. Rokach L. and Maimon O. (2006). Data mining for improving the quality of manufacturing: A feature set decomposition approach. Journal of Intelligent Manufacturing 17(3): 285–299 CrossRefGoogle Scholar
  21. Sebastiani F. (2002). Machine learning in automated text categorization. ACM Comp. Surv 34(1): 1–47 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Information Systems EngineeringBen Gurion UniversityBeer ShevaIsrael
  2. 2.Department of Industrial EngineeringTel-Aviv UniversityTel-AvivIsrael

Personalised recommendations