, Volume 102, Issue 1, pp 19–42 | Cite as

A novel PCA-DC-Bagging algorithm on yield stress prediction of RAFM steel

  • Sifan Long
  • Ming ZhaoEmail author
  • Jieqiong Song


For most regression tasks, we often use an ensemble learning technology of Bagging algorithm. However, the traditional Bagging algorithm is susceptible to extreme values. This leads to high bias and high variance in the prediction process. Therefore, this paper proposes an improved Bagging algorithm based on the best decision Committee model and the idea of selecting the base learner, and we have presented the idea of using the decision-making committee to filter learner, train the decision-making committee by the base learner to classify the error on the test set. Using the optimal interval separation factor’s mathematical model which is derived by the Lagrange multiplier method to classify the evaluation levels. The decision committee is trained according to the assigned evaluation level, and the learner is selected and assembled according to the decision result of the decision committee members. Meanwhile, our theoretical analysis shows that there are two different cases, which we can use maximum likelihood estimation and stochastic process theory to build mathematical models for analysis. The analysis results based on reduced activated ferritic/martensitic (RAFM) steel data sets show that the proposed algorithm can be applied to data sets with high dimension, high redundancy, high contradictory samples, sparse data sets, and then, we gives the strict theoretical framework to guarantees the further development and promotion. This gives algorithm model.


Bagging algorithm Decision Committee Dimension Optimization 

Mathematics Subject Classification

68T04 Artificial intelligence 



  1. 1.
    Peng L, Ge H, Dai Y et al (2016) Microstructure and microhardness of CLAM steel irradiated up to 20.8 dpa in STIP-V. J Nucl Mater 468:255–259CrossRefGoogle Scholar
  2. 2.
    Kano S, Yang HL, Suzue R et al (2016) Precipitation of carbides in F82H steels and its impact on mechanical strength. Nuclear Mater Energy 9(C):331–337CrossRefGoogle Scholar
  3. 3.
    Li A, Zhao Y (2018) Application of improved genetic algorithm based on bagging ensemble blustering in assembly line balancing. MachineryGoogle Scholar
  4. 4.
    Pham BT, Bui DT, Prakash I (2018) Bagging based support vector machines for spatial prediction of landslides. Environ Earth Sci 77(4):146CrossRefGoogle Scholar
  5. 5.
    Xinbo H, Wenjunzi LI, Tong S et al (2016) Application of Bagging-CART algorithm optimized by genetic algorithm in transformer fault diagnosis. High Volt Eng 42:1617–1623Google Scholar
  6. 6.
    Yang Y, Jiang J (2016) Hybrid sampling-based clustering ensemble with global and local constitutions. IEEE Trans Neural Netw Learn Syst 27(5):952–965MathSciNetCrossRefGoogle Scholar
  7. 7.
    Yang Y, Jiang J (2019) Adaptive bi-weighting toward automatic initialization and model selection for HMM-based hybrid meta-clustering ensembles. IEEE Transactions on Cybernetics 99:1–12. CrossRefGoogle Scholar
  8. 8.
    Gardner BJ, Gransberg DD, Rueda JA (2017) Stochastic conceptual cost estimating of highway projects to communicate uncertainty using bootstrap sampling. ASCE-ASME J Risk Uncertain Eng Syst Part A Civ Eng 3(3):05016002CrossRefGoogle Scholar
  9. 9.
    Vijayanand VD, Vanaja J, Das CR et al (2018) An investigation of microstructural evolution in electron beam welded RAFM steel and 316LN SS dissimilar joint under creep loading conditions. Mater Sci Eng A 742:432–441CrossRefGoogle Scholar
  10. 10.
    Laha K, Saroja S, Moitra A et al (2013) Development of India-specific RAFM steel through optimization of tungsten and tantalum contents for better combination of impact, tensile, low cycle fatigue and creep properties. J Nucl Mater 439(1–3):41–50CrossRefGoogle Scholar
  11. 11.
    Mao Chunliang, Liu Chenxi et al (2019) The correlation among microstructural parameter and dynamic strain aging (DSA) in influencing the mechanical properties of a reduced activated ferritic-martensitic (RAFM) steel. Mater Sci Eng A 40:90–98CrossRefGoogle Scholar
  12. 12.
    Zhang L, Shah SK, Kakadiaris IA (2017) Hierarchical multi-label classification using fully associative ensemble learn- ing. Pattern Recognit 70:89–103CrossRefGoogle Scholar
  13. 13.
    Oh TH, Tai YW, Bazin JC et al (2016) Partial sum minimization of singular values in robust PCA: algorithm and applications. IEEE Trans Pattern Anal Mach Intell 38(4):744–758CrossRefGoogle Scholar
  14. 14.
    Gao M, Yin L, Ning J (2018) Artificial neural network model for ozone concentration estimation and Monte Carlo analysis. Atmos Environ 184:129–139CrossRefGoogle Scholar
  15. 15.
    Zhang G, Wang S, Wang Y et al (2017) LS-SVM approximate solution for affine nonlinear systems with partially unknown functions. J Indus Manag Optim 10(2):621–636MathSciNetCrossRefGoogle Scholar
  16. 16.
    Baghdadi MHE, Darvish H, Rezaei H et al (2018) Applying LSSVM algorithm as a novel and accurate method for estimation of interfacial tension of brine and hydrocarbons. Pet Sci Technol 36(15):1–5Google Scholar
  17. 17.
    Meng Q, Ke G, Wang T et al (2016) A Communication-efficient parallel algorithm for decision tree. In: Proceedings of the 30th international conference on neural information systems. Curran Associates Inc, USA, pp 1279–1287Google Scholar
  18. 18.
    Kim K (2016) A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Elsevier Science Inc, AmsterdamCrossRefGoogle Scholar
  19. 19.
    Bryll R, Gutierrez-Osuna R, Quek F (2003) Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognit 36(6):1291–1302CrossRefGoogle Scholar
  20. 20.
    Guisan A, Thuiller W, Zimmermann NE (2017) Boosting and bagging approaches. Habitat suitability and distribution models: with applications in R. Ecology, biodiversity and conservation. Cambridge University Press, Cambridge, pp 202–216. CrossRefGoogle Scholar
  21. 21.
    Folkes SR, Lahav O, Maddox SJ (2018) An artificial neural network approach to the classification of galaxy spectra. Mon Not R Astron Soc 283(2):651–665CrossRefGoogle Scholar
  22. 22.
    Lachaize M, Le Hégarat-Mascle S, Aldea E, Maitrot A, Reynaud R (2016) SVM classifier fusion using belief functions: application to hyperspectral data classification. In: Vejnarová J, Kratochvíl V (eds) Belief functions: theory and applications. BELIEF 2016. Lecture notes in computer science, vol 9861. Springer, Cham. CrossRefzbMATHGoogle Scholar
  23. 23.
    Yeo B, Grant D (2018) Predicting service industry performance using decision tree analysis. Int J Inf Manag 38(1):288–300CrossRefGoogle Scholar
  24. 24.
    Marchiori E, Sebag M (2005) Bayesian learning with local support vector machines for cancer classification with gene expression data. In: Rothlauf F et al (eds) Applications of evolutionary computing. EvoWorkshops 2005. Lecture notes in computer science, vol 3449. Springer, Berlin. CrossRefGoogle Scholar
  25. 25.
    Liping Z, Jiekang W, Feida T et al (2018) Oil-paper insulation evaluation method of transformer based on kernel principal component analysis and random forest algorithm. Sichuan Electric Power TechnolGoogle Scholar
  26. 26.
    Ji-Shan LI, Liu QP, Qiao JJ et al (2018) Application of least square method to power grid voltage fitting waveform function. Value EngGoogle Scholar
  27. 27.
    Li J, Cen Z, Li X (2018) Simulation of aspheric tolerance with polynomial fitting. In: International conference on optical instruments and technology 2017: Optical systems and modern optoelectronic instruments, p 14Google Scholar
  28. 28.
    Bertsekas D, Boplug C (2016) Convex optimization algorithms. Athena Scientific, BelmotGoogle Scholar
  29. 29.
    Li M (2018) Generalized Lagrange multiplier method and kkt conditions with application to distributed optimization. IEEE Trans Circuits Syst II Express Briefs 66(99):1Google Scholar
  30. 30.
    Bhat BVR, Parthasarathy KR (1994) Kolmogorov’s existence theorem for Markov processes in C* algebras. Proc Math Sci 104(1):253–262MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang D, Guo H, Luo H et al (2017) Multi-step-ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by firefly algorithm. Appl Energy 190:390–407CrossRefGoogle Scholar
  32. 32.
    Bogachev VI, Miftakhov AF (2016) On weak convergence of finite-dimensional and infinite-dimensional distributions of random processes. Natl Res Univ High Sch Econ 21:1–11MathSciNetzbMATHGoogle Scholar
  33. 33.
    Andrews JL (2018) Addressing overfitting and underfitting in Gaussian model-based clustering. Comput Stat Data Anal 127:160–171MathSciNetCrossRefGoogle Scholar
  34. 34.
    Nie B, Luo J, Du J et al (2017) Improved algorithm of C4.5 decision tree on the arithmetic average optimal selection classification attribute. In: IEEE international conference on bioinformatics & biomedicineGoogle Scholar
  35. 35.
    Glen AG, Leemis LM, Barr DR (2017) Order statistics in goodness-of-fit testing. IEEE Trans Reliab 50(2):209–213CrossRefGoogle Scholar
  36. 36.
    Liu Q, Lee JD, Jordan M (2016) A kernelized stein discrepancy for goodness-of-fit tests. In: International conference on machine learning., pp 276–284
  37. 37.
    Pescim RR, Ortega EMM, Cordeiro GM et al (2017) A new log-location regression model: estimation, influence diagnostics and residual analysis. J Appl Stat 44(2):233–252MathSciNetCrossRefGoogle Scholar
  38. 38.
    Lu C, Zhou Z, Zhu Q et al (2017) Using residual analysis in electromagnetic induction data interpretation to improve the prediction of soil properties. CATENA 149:176–184CrossRefGoogle Scholar
  39. 39.
    Azzalini A, Capitanio A (2010) Statistical applications of the multivariate skew normal distribution. J Roy Stat Soc 61(3):579–602MathSciNetCrossRefGoogle Scholar
  40. 40.
    Picinbono B (2018) Second-order complex random vectors and normal distributions. IEEE Trans Signal Process 44(10):2637–2640CrossRefGoogle Scholar
  41. 41.
    Ghanem SAM (2016) Mutual information and minimum mean-square error in multiuser Gaussian channels. IEEE 10:18–21Google Scholar
  42. 42.
    Brassington G (2017) Mean absolute error and root mean square error: which is the better metric for assessing model perfor- mance? In: EGU general assembly conference. EGU General Assembly Conference AbstractsGoogle Scholar
  43. 43.
    Sylvester EVA, Bentzen P, Bradbury IR et al (2018) Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl 11(2):153–165CrossRefGoogle Scholar
  44. 44.
    Bach F, Jenatton R, Mairal J et al (2012) Optimization with sparsity-inducing penalties. Found Trends Mach Learn 4(1):1–106CrossRefGoogle Scholar
  45. 45.
    Aburomman AA, Reaz MBI (2016) A novel SVM-kNN-PSO ensemble method for intrusion detection system. Appl Soft Comput 38(C):360–372CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of SoftwareCentral South UniversityChangshaChina

Personalised recommendations