The operational data of advanced process systems have met with explosive growth, but its fluctuations are so slight that the number of the extracted representative samples is quite limited, making it difficult to reflect the nature of the process and to establish prediction models. In this study, inspired by the process of fisherman repairing nets, a Kriging-based virtual sample generation (VSG) named Kriging-VSG is proposed to generate feasible virtual samples in data sparse regions. Then, the accuracy of prediction models is further enhanced by applying the generated virtual samples. In order to reasonably find data sparse regions, a distance-based criterion is imposed on each dimension to identify important samples with large information gaps. Similar to the process of fisherman repairing nets, a certain dimension is initially fixed at different quantiles. A dimension-wise interpolation process using Kriging is then performed on the center between important samples with large information gaps. To validate the performance of the proposed Kriging-VSG, two numerical simulations and a real-world application from a cascade reaction process for high-density polyethylene are carried out. The results indicate that the proposed Kriging-VSG outperforms other methods.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Bouhlel MA, Martins JRRA (2018) Gradient-enhanced Kriging for high-dimensional problems. Eng Comput 35:157–173
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen Z-S, Zhu B, He Y-L, Yu L-A (2017) A PSO based virtual sample generation method for small sample sets: applications to regression datasets. Eng Appl Artif Intell 59:236–243
Dong Y, Zhang Z, Hong W-C (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11:1009
Feng S, Zhou H, Dong H (2019) Using deep neural network with small dataset to predict material defects. Mater Des 162:300–310
Gao X, Deng F, Yue X (2019) Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.10.109
Garg A, Mhaskar P (2018) Utilizing big data for batch process modeling and control. Comput Chem Eng 119:228–236
Ge Z (2014) Active learning strategy for smart soft sensor development under a small number of labeled data samples. J Process Control 24:1454–1461
Gong H-F, Chen Z-S, Zhu Q-X, He Y-L (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415
He Y-L, Wang P-J, Zhang M-Q, Zhu Q-X, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of ethylene industry. Energy 147:418–427
Hong W-C, Li M-W, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443
Huang H, He R, Sun Z, Tan T (2018, December 03-08) Introvae: introspective variational autoencoders for photographic image synthesis. Paper presented at the advances in neural information processing systems, Montréal, Canada. ACM, pp 52–63
Jamaly M, Kleissl J (2017) Spatiotemporal interpolation and forecast of irradiance data using Kriging. Sol Energy 158:407–423
Li D-C, Wu C-S, Tsai T-I, Lina Y-S (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982
Li D-C, Chen C-C, Chang C-J, Lin W-K (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581
Liu Z, Wang L, Zhang Y, Chen CLP (2016) A SVM controller for the stable walking of biped robots based on small sample sizes. Appl Soft Comput 38:738–753
Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163
Saha B, Gupta S, Phung D, Venkatesh S (2015) Multiple task transfer learning with small sample sizes. Knowl Inf Syst 46:315–342
Shaikhina T, Khovanova NA (2017) Handling limited datasets with neural networks in medical applications: a small-data approach. Artif Intell Med 75:51–63
Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2015) Machine learning for predictive modelling based on small data in biomedical engineering. IFAC-PapersOnLine 48:469–474
Shapiai MI, Ibrahim Z, Khalid M, Jau LW, Pavlovic V, Watada J (2011) Function and surface approximation based on enhanced kernel regression for small sample sets. Int J Innov Comput Inf Control 7:5947–5960
Silva VM, Costa JFCL (2016) Sensitivity analysis of ordinary Kriging to sampling and positional errors and applications in quality control. REM Int Eng J 69:491–496
Sun ZL, Wang J, Li R, Tong C (2017) LIF: a new Kriging based learning function and its application to structural reliability analysis. Reliab Eng Syst Saf 157:152–165
Talafuse TP, Pohl EA (2017) Small sample reliability growth modeling using a grey systems model. Qual Eng 29:455–467
Tang J, Qiao J, Gu K, Yan A (2017, October 20–22) Dioxin soft measuring method in municipal solid waste incineration based on virtual sample generation. Paper presented at the 2017 Chinese automation congress (CAC), Jinan, China. IEEE, pp 7323–7328
Tian CL, Li CD, Zhang GQ, Lv YS (2019) Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy Build 186:230–243
Tsai TI, Li DC (2008) Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Syst Appl 35:1293–1300
Ulaganathan S, Couckuyt I, Deschrijver D, Laermans E, Dhaene T (2015) A Matlab toolbox for Kriging metamodelling. Int Conf Comput Sci 51:2708–2713
Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25
Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188
This research was supported by the National Natural Science Foundation of China (Grant Nos. 61973022, 61973024, 61703027, 61533003, 61573051), the Fundamental Research Funds for the Central Universities (Grant No. JD1808), the China Scholarship Council State-Sponsored Scholarship Program (Grant Nos. 201806880024, 201806885004), and the Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, WUHAN University (Grant No. 18I01).
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
No individual participants are included in the study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by V. Loia.
About this article
Cite this article
Zhu, Q., Chen, Z., Zhang, X. et al. Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput (2019). https://doi.org/10.1007/s00500-019-04326-3
- Small sample size problems
- Virtual sample generation
- Kriging interpolation
- Soft sensing modeling
- High-density polyethylene