Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach


The operational data of advanced process systems have met with explosive growth, but its fluctuations are so slight that the number of the extracted representative samples is quite limited, making it difficult to reflect the nature of the process and to establish prediction models. In this study, inspired by the process of fisherman repairing nets, a Kriging-based virtual sample generation (VSG) named Kriging-VSG is proposed to generate feasible virtual samples in data sparse regions. Then, the accuracy of prediction models is further enhanced by applying the generated virtual samples. In order to reasonably find data sparse regions, a distance-based criterion is imposed on each dimension to identify important samples with large information gaps. Similar to the process of fisherman repairing nets, a certain dimension is initially fixed at different quantiles. A dimension-wise interpolation process using Kriging is then performed on the center between important samples with large information gaps. To validate the performance of the proposed Kriging-VSG, two numerical simulations and a real-world application from a cascade reaction process for high-density polyethylene are carried out. The results indicate that the proposed Kriging-VSG outperforms other methods.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.


  1. Bouhlel MA, Martins JRRA (2018) Gradient-enhanced Kriging for high-dimensional problems. Eng Comput 35:157–173

  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

  3. Chen Z-S, Zhu B, He Y-L, Yu L-A (2017) A PSO based virtual sample generation method for small sample sets: applications to regression datasets. Eng Appl Artif Intell 59:236–243

  4. Dong Y, Zhang Z, Hong W-C (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11:1009

  5. Feng S, Zhou H, Dong H (2019) Using deep neural network with small dataset to predict material defects. Mater Des 162:300–310

  6. Gao X, Deng F, Yue X (2019) Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing.

  7. Garg A, Mhaskar P (2018) Utilizing big data for batch process modeling and control. Comput Chem Eng 119:228–236

  8. Ge Z (2014) Active learning strategy for smart soft sensor development under a small number of labeled data samples. J Process Control 24:1454–1461

  9. Gong H-F, Chen Z-S, Zhu Q-X, He Y-L (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415

  10. He Y-L, Wang P-J, Zhang M-Q, Zhu Q-X, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of ethylene industry. Energy 147:418–427

  11. Hong W-C, Li M-W, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443

  12. Huang H, He R, Sun Z, Tan T (2018, December 03-08) Introvae: introspective variational autoencoders for photographic image synthesis. Paper presented at the advances in neural information processing systems, Montréal, Canada. ACM, pp 52–63

  13. Jamaly M, Kleissl J (2017) Spatiotemporal interpolation and forecast of irradiance data using Kriging. Sol Energy 158:407–423

  14. Li D-C, Wu C-S, Tsai T-I, Lina Y-S (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982

  15. Li D-C, Chen C-C, Chang C-J, Lin W-K (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581

  16. Liu Z, Wang L, Zhang Y, Chen CLP (2016) A SVM controller for the stable walking of biped robots based on small sample sizes. Appl Soft Comput 38:738–753

  17. Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163

  18. Saha B, Gupta S, Phung D, Venkatesh S (2015) Multiple task transfer learning with small sample sizes. Knowl Inf Syst 46:315–342

  19. Shaikhina T, Khovanova NA (2017) Handling limited datasets with neural networks in medical applications: a small-data approach. Artif Intell Med 75:51–63

  20. Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2015) Machine learning for predictive modelling based on small data in biomedical engineering. IFAC-PapersOnLine 48:469–474

  21. Shapiai MI, Ibrahim Z, Khalid M, Jau LW, Pavlovic V, Watada J (2011) Function and surface approximation based on enhanced kernel regression for small sample sets. Int J Innov Comput Inf Control 7:5947–5960

  22. Silva VM, Costa JFCL (2016) Sensitivity analysis of ordinary Kriging to sampling and positional errors and applications in quality control. REM Int Eng J 69:491–496

  23. Sun ZL, Wang J, Li R, Tong C (2017) LIF: a new Kriging based learning function and its application to structural reliability analysis. Reliab Eng Syst Saf 157:152–165

  24. Talafuse TP, Pohl EA (2017) Small sample reliability growth modeling using a grey systems model. Qual Eng 29:455–467

  25. Tang J, Qiao J, Gu K, Yan A (2017, October 20–22) Dioxin soft measuring method in municipal solid waste incineration based on virtual sample generation. Paper presented at the 2017 Chinese automation congress (CAC), Jinan, China. IEEE, pp 7323–7328

  26. Tian CL, Li CD, Zhang GQ, Lv YS (2019) Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy Build 186:230–243

  27. Tsai TI, Li DC (2008) Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Syst Appl 35:1293–1300

  28. Ulaganathan S, Couckuyt I, Deschrijver D, Laermans E, Dhaene T (2015) A Matlab toolbox for Kriging metamodelling. Int Conf Comput Sci 51:2708–2713

  29. Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25

  30. Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188

Download references


This research was supported by the National Natural Science Foundation of China (Grant Nos. 61973022, 61973024, 61703027, 61533003, 61573051), the Fundamental Research Funds for the Central Universities (Grant No. JD1808), the China Scholarship Council State-Sponsored Scholarship Program (Grant Nos. 201806880024, 201806885004), and the Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, WUHAN University (Grant No. 18I01).

Author information

Correspondence to Abbas Rajabifard or Yuan Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

No individual participants are included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Chen, Z., Zhang, X. et al. Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput (2019).

Download citation


  • Small sample size problems
  • Virtual sample generation
  • Kriging interpolation
  • Soft sensing modeling
  • High-density polyethylene