Skip to main content

Advertisement

Log in

Pitfalls and protocols of data science in manufacturing practice

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

Driven by ongoing migration for Industry 4.0, the increasing adoption of artificial intelligence, big data analytics, cloud computing, Internet of Things, and robotics have empowered smart manufacturing and digital transformation. However, increasing applications of machine learning and data science (DS) techniques present a range of procedural issues including those that involved in data, assumptions, methodologies, and applicable conditions. Each of these issues may increase difficulties for implementation in practice, especially associated with the manufacturing characteristics and domain knowledge. However, little research has been done to examine and resolve related issues systematically. Gaps of existing studies can be traced to the lack of a framework within which the pitfalls involved in implementation procedures can be identified and thus appropriate procedures for employing effective methodologies can be suggested. This study aims to develop a five-phase analytics framework that can facilitate the investigation of pitfalls for intelligent manufacturing and suggest protocols to empower practical applications of the DS methodologies from descriptive and predictive analytics to prescriptive and automating analytics in various contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Amaran, S., Sahinidis, N. V., Sharda, B., & Bury, S. J. (2016). Simulation optimization: a review of algorithms and applications. Annals of Operations Research, 240, 351–380.

    Article  Google Scholar 

  • Bai, Y., Sun, Z., Zeng, B., Long, J., Li, L., de Oliveira, J., et al. (2019). A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. Journal of Intelligent Manufacturing, 30, 2245–2256.

    Article  Google Scholar 

  • Bakker, M., Riezebos, J., & Teunter, R. H. (2012). Review of inventory systems with deterioration since 2001. European Journal of Operational Rsearch, 221(2), 275–284.

    Article  Google Scholar 

  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. (2010). A theory of learning from different domains. Machine Learning, 79, 151–175.

    Article  Google Scholar 

  • Beveridge, S., & Nelson, C. R. (1981). A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the ‘business cycle’. Journal of Monetary Economics, 7(2), 151–174.

    Article  Google Scholar 

  • Birant, D., & Kut, A. (2007). ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208–221.

    Article  Google Scholar 

  • Birge, J. R., & Louveaux, F. (2011). Introduction to stochastic programming (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning (1st ed.). Berlin: Springer.

    Google Scholar 

  • Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327.

    Article  Google Scholar 

  • Bordelon, J., & Maniar, P. (2006). The sub-100-nm imperative: parametric yield ramp. EETimes. Retrieved 6 May, 2019 from https://www.eetimes.com/the-sub-100-nm-imperative-parametric-yield-ramp/#.

  • Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65(332), 1509–1526.

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth: CRC Press.

    Google Scholar 

  • Brettel, M., Friederichsen, N., Keller, M., & Rosenberg, M. (2014). How virtualization, decentralization and network building change the manufacturing landscape: an Industry 4.0 perspective. International Journal of Information and Communication Engineering, 8(1), 37–44.

    Google Scholar 

  • Brown, D. (1927). Centralized control with decentralized responsibilities. American Management Association Annual Convention, Series 57, (reprinted in Johnson, H.T. (Ed.), Systems and Profits: Early Management Accounting at DuPont and General Motors (Arno Press, 1980)), 1927.

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

    Article  Google Scholar 

  • Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Article  Google Scholar 

  • Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.

    Article  Google Scholar 

  • Cheng, F.-T., Huang, H.-C., & Kao, C.-A. (2012). Developing an automatic virtual metrology system. IEEE Transactions on Automation Science and Engineering, 9(1), 181–188.

    Article  Google Scholar 

  • Chien, C.-F., Chang, K.-H., & Wang, W.-C. (2014a). An empirical study of design-of-experiment data mining for yield-loss diagnosis for semiconductor manufacturing. Journal of Intelligent Manufacturing, 25(5), 961–972.

    Article  Google Scholar 

  • Chien, C.-F., & Chen, C.-H. (2007). A novel timetabling algorithm for a furnace process for semiconductor fabrication with constrained waiting and frequency-based setups. OR Spectrum, 29(3), 391–419.

    Article  Google Scholar 

  • Chien, C.-F., Chen, Y.-J., Hsu, C.-Y., & Wang, H.-K. (2014b). Overlay error compensation using advanced process control with dynamically adjusted proportional-integral R2R controller. IEEE Transactions on Automation Science and Engineering, 11(2), 473–484.

    Article  Google Scholar 

  • Chien, C.-F., Chou, C.-W., & Yu, H.-C. (2016). A novel route selection and resource allocation approach to improve the efficiency of manual material handling system in 200-nm wafer fabs for Industry 3.5. IEEE Transactions on Automation Science and Engineering, 13(4), 1567–1580.

    Article  Google Scholar 

  • Chien, C.-F., & Chuang, S.-C. (2014). A framework for root cause detection of sub-batch processing system for semiconductor manufacturing big data analytics. IEEE Transactions on Semiconducutor Manufacturing, 27(4), 475–488.

    Article  Google Scholar 

  • Chien, C.-F., & Hsu, C.-Y. (2011). UNISON analysis to model and reduce step-and-scan overlay errors for semiconductor manufacturing. Journal of Intelligent Manufacturing, 22(3), 399–412.

    Article  Google Scholar 

  • Chien, C.-F., Hsu, C.-Y., & Hsiao, C. (2012). Manufacturing intelligence to forecast and reduce semiconductor cycle time. Journal of Intelligent Manufacturing, 23(6), 2281–2294.

    Article  Google Scholar 

  • Chien, C.-F., Kuo, C.-J., & Yu, C. (2020a). Tool allocation to smooth work-in-process for cycle time reduction and an empirical study. Annals of Operations Research, 290, 1009–1033.

    Article  Google Scholar 

  • Chien, C.-F., Lin, Y.-S., & Lin, S.-K. (2020b). Deep reinforcement learning for selecting demand forecast models to empower Industry 3.5 and an empirical study for a semiconductor component distributor. International Journal of Production Research, 58(9), 2784–2804.

    Article  Google Scholar 

  • Chien, C.-F., Wang, H.-J., & Wang, M. (2007). A UNISON framework for analyzing alternative strategies of IC final testing for enhancing overall operational effectiveness. International Journal of Production Economics, 107(1), 20–30.

    Article  Google Scholar 

  • Chien, C.-F., & Zheng, J.-N. (2012). Mini-max regret strategy for robust capacity expansion decisions in semiconductor manufacturing. Journal of Intelligent Manufacturing, 23(6), 2151–2159.

    Article  Google Scholar 

  • Choi, T.-M., Wallace, S. W., & Wang, Y. (2018). Big data analytics in operations management. Production and Operations Management, 27(10), 1868–1883.

    Article  Google Scholar 

  • Chopra, S., Reinhardt, G., & Dada, M. (2004). The effect of lead time uncertainty on safety stocks. Decision Sciences, 35(1), 1–24.

    Article  Google Scholar 

  • Chou, C.-W., Chien, C.-F., & Gen, M. (2014). A multiobjective hybrid genetic algorithm for TFT-LCD module assembly scheduling. IEEE Transactions on Automation Science and Engineering, 11(3), 692–705.

    Article  Google Scholar 

  • Chouichi, A., Blue, J., Yugma, C., & Pasqualini, F. (2020). Chamber-to-chamber discrepancy detection in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 33(1), 86–95.

    Article  Google Scholar 

  • Clemen, R. T., & Reilly, T. (2013). Making hard decisions with decision tools (3rd ed.). Boston: Cengage Learning.

    Google Scholar 

  • Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36, 287–314.

    Article  Google Scholar 

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.

    Google Scholar 

  • Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087–1091.

    Article  Google Scholar 

  • Efroymson, M. (1960). Multiple regression analysis. Mathematical Methods for Digital Computers, 1, 191–203.

    Google Scholar 

  • Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the seventeenth international joint conference on Artificial Intelligence (IJCAI’01) (vol. 2, pp. 973–978).

  • Fausett, L. V. (1994). Fundamentals of neural networks: Architectures algorithms and applications (1st ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Flynn, B. B., Huo, B., & Zhao, X. (2010). The impact of supply chain integration on performance: A contingency and configuration approach. Journal of Operations Management, 28(1), 58–71.

    Article  Google Scholar 

  • Francis, R. L., McGinnis, L. F., Jr., & White, J. A. (1992). Facility layout and location: An analytical approach (2nd ed.). Upper Saddle River: Prentice-Hall.

    Google Scholar 

  • Freivalds, A., & Niebel, B. (2013). Niebel’s methods, standards, & work design (13th ed.). New York: McGraw-Hill Education.

    Google Scholar 

  • Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.

    Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

    Article  Google Scholar 

  • Friedman, M. (1957). A theory of the consumption function. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Fu, W., & Chien, C.-F. (2019). UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution. Computers & Industrial Engineering, 135, 940–949.

    Article  Google Scholar 

  • Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37.

    Article  Google Scholar 

  • Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1), 2096–2130.

    Google Scholar 

  • Goldratt, E. M., & Cox, J. (1992). The goal: A process of ongoing improvement. Great Barrington: North River Pr.

    Google Scholar 

  • Golmohammadi, D. (2015). A study of scheduling under the theory of constraints. International Journal of Production Economics, 165, 38–50.

    Article  Google Scholar 

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. In Proceedings of the international conference on neural information processing systems (NIPS 2014) (pp. 2672–2680).

  • Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424–438.

    Article  Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  • Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate data analysis (7th ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Hammer, M., & Champy, J. A. (1993). Reengineering the corporation: A manifesto for business revolution (1st ed.). New York: HarperBusiness.

    Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Burlington: Morgan Kaufmann.

    Google Scholar 

  • Haskaraman, F. (2016). Chamber matching in semiconductor manufaturing using statistical analysis and run-to-run control. Master of Engineering Thesis, Massachusetts Institute of Technology, Cambridge, MA.

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.

    Book  Google Scholar 

  • He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  • Hillier, F., & Lieberman, G. J. (2015). Introduction to operations research (10th ed.). New York: McGraw-Hill.

    Google Scholar 

  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Based eimation for nnorthogonal poblems. Technometrics, 12(1), 55–67.

    Article  Google Scholar 

  • Hoffer, J. A., Venkataraman, R., & Topi, H. (2015). Modern database management (12th ed.). New York: Pearson.

    Google Scholar 

  • Hopp, W. J., & Spearman, M. L. (2011). Factory physics (3rd ed.). Long Grove: Waveland Press.

    Google Scholar 

  • Hu, Y.-F., Hou, J.-L., & Chien, C.-F. (2019). A UNISON framework for knowledge management of university–industry collaboration and an illustration. Computers & Industrial Engineering, 129, 31–43.

    Article  Google Scholar 

  • Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Society of London Proceedings Series A, 454, 903–998.

    Article  Google Scholar 

  • Huang, S.-H., & Pan, Y.-C. (2015). Automated visual inspection in the semiconductor industry: A survey. Computers in Industry, 66, 1–10.

    Article  Google Scholar 

  • Hung, S.-Y., Lee, C.-Y., & Lin, Y.-L. (2020). Data science for delamination prognosis and online batch learning in semiconductor assembly process. IEEE Transactions on Components, Packaging and Manufacturing Technology, 10(2), 314–324.

    Article  Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (1st ed.). Berlin: Springer.

    Book  Google Scholar 

  • Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.

    Article  Google Scholar 

  • Jutten, C., & Hérault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.

    Article  Google Scholar 

  • Kaisler, S. H., Espinosa, J. A., Armour, F., & Money, W. H. (2014). Advanced analytics: issues and challenges in a global environment. In 2014 47th Hawaii international conference on system sciences (pp. 729–738), Waikoloa, HI, January 6–9. https://doi.org/10.1109/hicss.2014.98.

  • Kao, Y.-T., Dauzère-Pérès, S., Blue, J., & Chang, S.-C. (2018). Impact of integrating equipment health in production scheduling for semiconductor fabrication. Computers & Industrial Engineering, 120, 450–459.

    Article  Google Scholar 

  • Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2004). Segmenting time series: A survey and novel approach. In M. Last, A. Kandel, & H. Bunke (Eds.), Data mining in time series databases (Vol. 57, pp. 1–22). Singapore: World Scientific.

    Chapter  Google Scholar 

  • Kerzner, H. R. (2017). Project management: A systems approach to planning, scheduling, and controlling (12th ed.). Hoboken: Wiley.

    Google Scholar 

  • Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2018). Bayesian inference for mining semiconductor manufacturing big data for yield enhancement and smart production to empower industry 4.0. Applied Soft Computing, 68, 990–999.

    Article  Google Scholar 

  • Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2020a). Dynamic support vector regression control system for overlay error compensation with stochastic metrology delay. IEEE Transactions on Automation Science and Engineering, 17(1), 502–512.

    Article  Google Scholar 

  • Khakifirooz, M., Chien, C.-F., & Fathi, M. (2019). Compensating misalignment using dynamic random-effect control system: A case of high-mixed wafer fabrication. IEEE Transactions on Automation Science and Engineering, 16(4), 1788–1799.

    Article  Google Scholar 

  • Khakifirooz, M., Chien, C.-F., Fathi, M., & Pardalos, P. (2020b). Minimax optimization for recipe management in high-mixed semiconductor lithography process. IEEE Transactions on Industrial Informatics, 16(8), 4975–4985.

    Article  Google Scholar 

  • Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of international conference on learning representations (ICLR).

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the international joint conference on artificial intelligence (IJCAI 1995) (pp. 1137–1143).

  • Koza, J. R. (1998). Genetic programming. Cambridge: MIT Press.

    Google Scholar 

  • Ku, C.-C., Chien, C.-F., & Ma, K.-T. (2020). Digital transformation to empower smart production for Industry 3.5 and an empirical study for textile dyeing. Computers & Industrial Engineering, 142, 106297.

    Article  Google Scholar 

  • Kuo, C.-J., Chien, C.-F., & Chen, C.-D. (2011). Manufacturing intelligence to exploit the value of production and tool data to reduce cycle time. IEEE Transactions on Automation Science and Engineering, 8(1), 103–111.

    Article  Google Scholar 

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.

    Article  Google Scholar 

  • Lee, C.-Y., & Chen, B.-S. (2018). Mutually-exclusive-and-collectively-exhaustive feature selection scheme. Applied Soft Computing, 68, 961–971.

    Article  Google Scholar 

  • Lee, C.-Y., & Chiang, M.-C. (2016). Aggregate demand forecast with small data and robust capacity decision in TFT-LCD manufacturing. Computers & Industrial Engineering, 99, 415–422.

    Article  Google Scholar 

  • Lee, C.-Y., & Chien, C.-F. (2014). Stochastic programming for vendor portfolio selection and order allocation under delivery uncertainty. OR Spectrum, 36(3), 761–797.

    Article  Google Scholar 

  • Lee, C.-Y., Huang, T.-S., Liu, M.-K., & Lan, C.-Y. (2019). Data science for vibration heteroscedasticity and predictive maintenance of rotary bearings. Energies, 12(5), 801.

    Article  Google Scholar 

  • Lee, C.-Y., Hung, Y.-H., & Chen, Y.-W. (2020a). Hybrid data science and reinforcement learning in data envelopment analysis. In J. Zhu & V. Charles (Eds.), Data-enabled analytics: DEA for big data. Berlin: Springer.

    Google Scholar 

  • Lee, C.-Y., & Johnson, A. L. (2013). Operational efficiency. In A. B. Badiru (Ed.), Handbook of industrial and systems engineering (pp. 17–44). Cambridge: CRC Press.

    Google Scholar 

  • Lee, C.-Y., & Liang, C.-L. (2018). Manufacturer’s printing forecast, reprinting decision, and contract design in the educational publishing industry. Computers & Industrial Engineering, 125, 678–687.

    Article  Google Scholar 

  • Lee, C.-Y., & Tsai, T.-L. (2019). Data science framework for variable selection, metrology prediction, and process control in TFT-LCD manufacturing. Robotics and Computer Integrated Manufacturing, 55, 76–87.

    Article  Google Scholar 

  • Lee, C.-Y., Wu, C.-S., & Hung, Y.-H. (2020b). In-line predictive monitoring framework. IEEE Transactions on Automation Science and Engineering (forthcoming). https://doi.org/10.1109/TASE.2020.3014177.

    Article  Google Scholar 

  • Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.

    Article  Google Scholar 

  • Lee, J., Lapira, E., Bagheri, B., & Kao, H.-A. (2013). Recent advances and trends in predictive manufacturing systems in big data environment. Manufacturing Letters, 1(1), 38–41.

    Article  Google Scholar 

  • Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mechanical Systems and Signal Processing, 42(1–2), 314–334.

    Article  Google Scholar 

  • Lee, W.-J., & Ong, S.-C. (2010). Learning from small data sets to improve assembly semiconductor manufacturing processes. In The 2nd International Conference on Computer and Automation Engineering (ICCAE), Singapore.

  • Lei, Y., Li, N., Guo, L., Li, N., Yan, T., & Lin, J. (2018). Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing, 104, 799–834.

    Article  Google Scholar 

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.

    Google Scholar 

  • Liker, J. (2004). The toyota way: 14 management principles from the world’s greatest manufacturer (1st ed.). New York: McGraw-Hill Education.

    Google Scholar 

  • Lin, K.-Y., Chien, C.-F., & Kerh, R. (2016). UNISON framework of data-driven innovation for extracting user experience of product design of wearable devices. Computers & Industrial Engineering, 99, 487–502.

    Article  Google Scholar 

  • Lin, Y.-C., Hung, M.-H., Huang, H.-C., Chen, C.-C., Yang, H.-C., Hsieh, Y.-S., et al. (2017). Development of advanced manufacturing cloud of things (AMCoT)—a smart manufacturing platform. IEEE Robotics and Automation Letters, 2(3), 1809–1816.

    Article  Google Scholar 

  • Little, J. D. C. (1961). A proof for the queuing formula: L = λW. Operations Research, 9(3), 383–387.

    Article  Google Scholar 

  • Lloyd, S. P. (1957). Least square quantization in PCM. Technical note, Bell laboratories, 1957. IEEE Transactions on Information Theory, 1982, 28(2), 129–137.

  • Low, C., Hsu, C.-M., & Huang, K.-I. (2004). Benefits of lot splitting in job-shop scheduling. The International Journal of Advanced Manufacturing Technology, 24(9–10), 773–780.

    Article  Google Scholar 

  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA.

  • Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.

    Google Scholar 

  • Moniruzzaman, A. B. M., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics-classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–14.

    Google Scholar 

  • Montgomery, D. C. (2012). Design and analysis of experiments (8th ed.). Hoboken: Wiley.

    Google Scholar 

  • Montgomery, D. C. (2019). Introduction to statistical quality control (8th ed.). Hoboken: Wiley.

    Google Scholar 

  • Nahmias, S., & Olsen, T. L. (2015). Production and operations analysis (7th ed.). Long Grove: Waveland Press.

    Google Scholar 

  • O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690.

    Article  Google Scholar 

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

    Article  Google Scholar 

  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.

    Article  Google Scholar 

  • Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.

    Article  Google Scholar 

  • Pillac, V., Gendreau, M., Guéret, C., & Medaglia, A. L. (2013). A review of dynamic vehicle routing problems. European Journal of Operational Research, 225(1), 1–11.

    Article  Google Scholar 

  • Pinedo, M. L. (2016). Scheduling: Theory, algorithms, and systems (5th ed.). Berlin: Springer.

    Book  Google Scholar 

  • Pisano, G. P., & Wheelwright, S. C. (1995). The new logic of high tech R&D. Harvard Business Review, 73(5), 93–105.

    Google Scholar 

  • Politis, D. N. (2015). Model-free prediction and regression: A transformation-based approach to inference. Cham: Springer.

    Book  Google Scholar 

  • Rai, A., Patnayakuni, R., & Seth, N. (2006). Firm performance impacts of digitally enabled supply chain integration capabilities. MIS Quarterly, 30(2), 225–246.

    Article  Google Scholar 

  • Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.

    Article  Google Scholar 

  • Rauch, E., Linder, C., & Dallasega, P. (2020). Anthropocentric perspective of production before and within Industry 4.0. Computers & Industrial Engineering, 139, 105644.

    Article  Google Scholar 

  • Reichertz, J. (2014). Induction, deduction, abduction. In U. Flick (Ed.), The SAGE handbook of qualitative data analysis. Thousand Oaks: SAGE Publications Ltd.

    Google Scholar 

  • Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(9), 2507–2517.

    Article  Google Scholar 

  • Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. https://arxiv.org/abs/1708.08296.

  • Savage, L. J. (1951). The theory of statistical decision. Journal of the American Statistical Association, 46, 55–67.

    Article  Google Scholar 

  • Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annuals of Statistics, 26(5), 1651–1686.

    Google Scholar 

  • Shen, L., Dauzère-Pérès, S., & Neufeld, J. S. (2018). Solving the flexible job shop scheduling problem with sequence-dependent setup times. European Journal of Operational Research, 265(2), 503–516.

    Article  Google Scholar 

  • Sisinni, E., Saifullah, A., Han, S., Jennehag, U., & Gidlund, M. (2018). Industrial internet of things: Challenges, opportunities, and directions. IEEE Transactions on Industrial Informatics, 14(11), 4724–4734.

    Article  Google Scholar 

  • Smith, S. (2003). Digital signal processing: A practical guide for engineers and scientists (1st ed.). Sydney: Newnes.

    Google Scholar 

  • Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576.

    Article  Google Scholar 

  • Stock, T., & Seliger, G. (2016). Opportunities of sustainable manufacturing in Industry 4.0. Procedia CIRP, 40, 536–541.

    Article  Google Scholar 

  • Suits, D. B. (1957). Use of dummy variables in regression equations. Journal of the American Statistical Association, 52(280), 548–551.

    Article  Google Scholar 

  • Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2019). Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.

    Google Scholar 

  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1), 91–108.

    Article  Google Scholar 

  • Tiwari, S., Wee, H. M., & Daryanto, Y. (2018). Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Computers & Industrial Engineering, 115, 319–330.

    Article  Google Scholar 

  • Tsai, T.-L., Huang, M.-H., Lee, C.-Y., & Lai, W.-W. (2019). Data science for extubation prediction and value of information in surgical intensive care unit. Journal of Clinical Medicine, 8, 1709.

    Article  Google Scholar 

  • Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.

    Article  Google Scholar 

  • Vapnik, V., & Chernovenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probabilities and its Applications, 16(2), 264–280.

    Article  Google Scholar 

  • Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science, 270(5235), 484–487.

    Article  Google Scholar 

  • Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77–84.

    Article  Google Scholar 

  • Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. E. (2012). Probability and statistics for engineers and scientists (9th ed.). London: Pearson.

    Google Scholar 

  • Wang, H.-K., Chien, C.-F., & Gen, M. (2015). An algorithm of multi-subpopulation parameters with hybrid estimation of distribution for semiconductor scheduling with constrained waiting time. IEEE Transactions on Semiconductor Manufacturing, 28(3), 353–366.

    Article  Google Scholar 

  • Wang, J.-Q., Chen, J., Zhang, Y., & Huang, G. Q. (2016). Schedule-based execution bottleneck identification in a job shop. Computers & Industrial Engineering, 98, 308–322.

    Article  Google Scholar 

  • Wang, T., Qiao, M., Zhang, M., Yang, Y., & Snoussi, H. (2020). Data-driven prognostic method based on self-supervised learning approaches for fault detection. Journal of Intelligent Manufacturing, 31, 1611–1619.

    Article  Google Scholar 

  • Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

    Article  Google Scholar 

  • Wen, L., Li, X., Gao, L., & Zhang, Y. (2018). A new convolutional neural network-based data-driven fault diagnosis method. IEEE Transactions on Industrial Electronmics, 65(7), 5990–5998.

    Article  Google Scholar 

  • Widrow, B. (1987). ADALINE and MADALINE. Plenary Speech, Vol. I. In Proceedings of IEEE 1st international conference on neural networks (pp. 143–158), San Diego, CA.

  • Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016). Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: Techniques and applications (DICTA). Gold Coast, QLD, Australia, 30 Nov.–2 Dec.

  • Xu, L., & Zhang, W.-J. (2001). Comparison of different methods for variable selection. Analytica Chimica Acta, 446(1–2), 475–481.

    Article  Google Scholar 

  • Xu, X. (2012). From cloud computing to cloud manufacturing. Robotics and Computer-Integrated Manufacturing, 28(1), 75–86.

    Article  Google Scholar 

  • Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159–175.

    Article  Google Scholar 

  • Zhang, J., Ding, G., Zou, Y., Qin, S., & Fu, J. (2019). Review of job shop scheduling research and its new perspectives under Industry 4.0. Journal of Intelligent Manufacturing, 30, 1809–1830.

    Article  Google Scholar 

  • Zhang, J.-L., Zhang, Y.-J., & Zhang, L. (2015). A novel hybrid method for crude oil price forecasting. Energy Economics, 49, 649–659.

    Article  Google Scholar 

  • Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: Principles and techniques for data scientists. Sebastopol: O’Reilly Media.

    Google Scholar 

Download references

Funding

The funding was provided by Ministry of Science and Technology, Taiwan (Grant Nos. MOST 106-2218-E-031-001 and MOST 109-2634-F-007-019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Yen Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, CY., Chien, CF. Pitfalls and protocols of data science in manufacturing practice. J Intell Manuf 33, 1189–1207 (2022). https://doi.org/10.1007/s10845-020-01711-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-020-01711-w

Keywords

Navigation