Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Li, Feifei; He, Anrui; Song, Yong; Shen, Chengzhe; Wang, Fenjia; Yuan, Tieheng; Zhang, Shiwei; Xu, Xiaoqing; Qiang, Yi; Liu, Chao; Liu, Pengfei; Zhao, Qiangguo

doi:10.1007/s10845-023-02275-1

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Published: 24 December 2023

(2023)
Cite this article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Feifei Li¹,
Anrui He¹,
Yong Song ORCID: orcid.org/0000-0003-2779-7075¹,
Chengzhe Shen¹,
Fenjia Wang¹,
Tieheng Yuan¹,
Shiwei Zhang²,
Xiaoqing Xu²,
Yi Qiang³,
Chao Liu²,
Pengfei Liu⁴ &
…
Qiangguo Zhao⁴

173 Accesses
Explore all metrics

Abstract

The problem of imbalanced regression is widely prevalent in various intelligent manufacturing systems, significantly constraining the industrial application of machine learning models. Existing research has overlooked the impact of redundant data and has lost valuable information within unlabeled data, therefore, the effectiveness of the models is limited. To this end, we propose a novel model framework (sNN-ST, similarity-based nearest neighbor and Self-Training fusion) to address imbalanced regression in industrial big data. This approach comprises two main steps: first, we identify and remove redundant samples by analyzing the redundancy relationships among samples. Then, we perform pseudo-labeling on unlabeled data, selectively incorporating reliable and non-redundant samples into the labeled dataset. We validate the proposed method on two imbalanced regression datasets. Removing redundant data and effectively utilizing unlabeled data optimize the dataset's distribution and enhance its information entropy. Consequently, the processed dataset significantly improves the overall model performance. We used this model to conduct a Multi-Parameter Global Relative Sensitivity Analysis within a production system. This analysis optimized existing process parameters and improved product quality consistency. This research presents a promising approach to addressing imbalanced regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Article Open access 23 May 2022

Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Data availability

The data that support the findings of this study are openly available at https://gitee.com/superfeif/imbalanced-regression-datasets.git.

References

Branco, P., Torgo, L., & Ribeiro, R. P. (2019). Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, 76–99.
Article Google Scholar
Camacho, L., Douzas, G., & Bacao, F. (2022). Geometric SMOTE for regression. Expert Systems with Applications, 193, 116387.
Article Google Scholar
Chen, B., Jiang, J., Wang, X., Wan, P., Wang, J., & Long, M. (2022). Debiased self-training for semi-supervised learning. Advances in Neural Information Processing Systems, 35, 32424–32437.
Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Gharehchopogh, F. S. (2023). Quantum-inspired metaheuristic algorithms: Comprehensive survey and classification. Artificial Intelligence Review, 56(6), 5479–5543.
Article Google Scholar
Gharehchopogh, F. S., & Khargoush, A. A. (2023). A chaotic-based interactive autodidactic school algorithm for data clustering problems and its application on COVID-19 disease detection. Symmetry, 15(4), 894.
Article Google Scholar
Gharehchopogh, F. S., Namazi, M., Ebrahimi, L., & Abdollahzadeh, B. (2023a). Advances in sparrow search algorithm: A comprehensive survey. Archives of Computational Methods in Engineering, 30(1), 427–455.
Article Google Scholar
Gharehchopogh, F. S., Ucan, A., Ibrikci, T., Arasteh, B., & Isik, G. (2023b). Slime mould algorithm: A comprehensive survey of its variants and applications. Archives of Computational Methods in Engineering, 30(4), 2683–2723.
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239.
Article Google Scholar
Haliduola, H. N., Bretz, F., & Mansmann, U. (2022). Missing data imputation using utility-based regression and sampling approaches. Computer Methods and Programs in Biomedicine, 226, 107172.
Article Google Scholar
Herbold, S., Trautsch, A., & Grabowski, J. (2018, May). A comparative study to benchmark cross-project defect prediction approaches. In Proceedings of the 40th international conference on software engineering (p. 1063).
Li, D., Liu, Y., Huang, D., & Xu, C. (2022a). A semi-supervised soft-sensor of just-in-time learning with structure entropy clustering and applications for industrial processes monitoring. IEEE Transactions on Artificial Intelligence, 4(4), 722–733.
Article Google Scholar
Li, F. F., He, A. R., Song, Y., Xu, X. Q., Zhang, S. W., Qiang, Y., & Liu, C. (2023). MDA-JITL model for on-line mechanical property prediction. Journal of Iron and Steel Research International, 30(3), 504–515.
Article Google Scholar
Li, F., Song, Y., Liu, C., Li, B., & Zhang, S. (2021). Ensemble learning model for mechanical performance prediction of strip and its reliability evaluation. Journal of Mechanical Engineering, 57(2), 239–246.
Article Google Scholar
Li, J., Savarese, S., & Hoi, S. (2022, September). Masked unsupervised self-training for label-free image classification. In The 11th international conference on learning representations.
Li, R. L., & Hu, Y. F. (2004). A density-based method for reducing the amount of training data in KNN text classification. Journal of Computer Research and Development, 41(4), 539–545.
Google Scholar
Liu, H., Wang, J., & Long, M. (2021). Cycle self-training for domain adaptation. Advances in Neural Information Processing Systems, 34, 22968–22981.
Google Scholar
Liu, J., Li, X., & Yang, G. (2018, September). Cross-class sample synthesis for zero-shot learning. In BMVC (p. 113).
Liu, W., Xu, W., Yan, S., Wang, L., Li, H., & Yang, H. (2022). Combining self-training and hybrid architecture for semi-supervised abdominal organ segmentation. In MICCAI challenge on fast and low-resource semi-supervised abdominal organ segmentation (pp. 281–292). Springer.
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99.
Article Google Scholar
Maurya, J., Ranipa, K. R., Yamaguchi, O., Shibata, T., & Kobayashi, D. (2023, January). Domain adaptation using self-training with Mixup for one-stage object detection. In 2023 IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 4178–4187). IEEE.
Meng, W., & Yolwas, N. (2023). A study of speech recognition for Kazakh based on unsupervised pre-training. Sensors, 23(2), 870.
Article Google Scholar
Mukherjee, S., & Awadallah, A. (2020). Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems, 33, 21199–21212.
Google Scholar
Okazaki, Y., Okazaki, S., Kajitani, Y., & Ishizuka, M. (2020). Regression of imbalanced river discharge data using resampling technique. Journal of Japan Society of Civil Engineers, Series B1 (Hydraulic Engineering), 76(2), I_133-I_138.
Article Google Scholar
Sahid, M. A., Hasan, M., Akter, N., & Tareq, M. M. R. (2022, July). Effect of imbalance data handling techniques to improve the accuracy of heart disease prediction using machine learning and deep learning. In 2022 IEEE Region 10 symposium (TENSYMP) (pp. 1–6). IEEE.
Scheepens, D. R., Schicker, I., Hlaváčková-Schindler, K., & Plant, C. (2023). Adapting a deep convolutional RNN model with imbalanced regression loss for improved spatio-temporal forecasting of extreme wind speed events in the short to medium range. Geoscientific Model Development, 16(1), 251–270.
Article Google Scholar
Shishavan, S. T., & Gharehchopogh, F. S. (2022). An improved cuckoo search optimization algorithm with genetic algorithm for community detection in complex networks. Multimedia Tools and Applications, 81(18), 25205–25231.
Article Google Scholar
Steininger, M., Kobs, K., Davidson, P., Krause, A., & Hotho, A. (2021). Density-based weighting for imbalanced regression. Machine Learning, 110, 2187–2211.
Article Google Scholar
Sun, S., Hu, X., & Liu, Y. (2022). An imbalanced data learning method for tool breakage detection based on generative adversarial networks. Journal of Intelligent Manufacturing, 33(8), 2441–2455.
Article Google Scholar
Temraz, M., & Keane, M. T. (2022). Solving the class imbalance problem using a counterfactual method for data augmentation. Machine Learning with Applications, 9, 100375.
Article Google Scholar
Torgo, L., Branco, P., Ribeiro, R. P., & Pfahringer, B. (2015). Resampling strategies for regression. Expert Systems, 32(3), 465–476.
Article Google Scholar
Torgo, L., & Ribeiro, R. (2007). Utility-based regression. In Knowledge discovery in databases: PKDD 2007: 11th European conference on principles and practice of knowledge discovery in databases, Warsaw, Poland, 17–21 September 2007. Proceedings 11 (pp. 597–604). Springer.
Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branco, P. (2013, September). Smote for regression. In Portuguese conference on artificial intelligence (pp. 378–389). Springer.
Wang, K., Guo, B., Yang, H., Li, M., Zhang, F., & Wang, P. (2022). A semi-supervised co-training model for predicting passenger flow change in expanding subways. Expert Systems with Applications, 209, 118310.
Article Google Scholar
Wei, C., Sohn, K., Mellina, C., Yuille, A., & Yang, F. (2021). Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10857–10866).
Wei, G., Mu, W., Song, Y., & Dou, J. (2022). An improved and random synthetic minority oversampling technique for imbalanced data. Knowledge-Based Systems, 248, 108839.
Article Google Scholar
Xianli, L. I. U., Qingzhen, S. U. N., Caixu, Y. U. E., & Hengshuai, L. I. (2022). Optimization of milling process parameters of titanium alloy based on data mining technology. Computer Integrated Manufacturing System, 28(8), 2440–2448.
Google Scholar
Xie, Q., Luong, M. T., Hovy, E., & Le, Q. V. (2020). Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10687–10698).
Yang, L., Zhuo, W., Qi, L., Shi, Y., & Gao, Y. (2022). St++: Make self-training work better for semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4268–4277).
Yang, G., Song, C., Yang, Z., & Cui, S. (2023). Bubble detection in photoresist with small samples based on GAN augmentations and modified YOLO. Engineering Applications of Artificial Intelligence, 123, 106224.
Article Google Scholar
Yang, W., Li, W. G., Zhao, Y. T., Yan, B. K., & Wang, W. B. (2018). Mechanical property prediction of steel and influence factors selection based on random forests. Iron and Steel, 3, 44–49.
Google Scholar
Yang, Y., & Xu, Z. (2020). Rethinking the value of labels for improving class-imbalanced learning. Advances in Neural Information Processing Systems, 33, 19290–19301.
Google Scholar
Zhang, Y., Li, X., Gao, L., Wang, L., & Wen, L. (2018). Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. Journal of Manufacturing Systems, 48, 34–50.
Article Google Scholar
Zhao, Y. B., Song, Y., Li, F. F., & Yan, X. L. (2023). Prediction of mechanical properties of cold rolled strip based on improved extreme random tree. Journal of Iron and Steel Research International, 30(2), 293–304.
Google Scholar
Zhao, Z., Zhou, L., Wang, L., Shi, Y., & Gao, Y. (2022, June). Lassl: Label-guided self-training for semi-supervised learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 8, pp. 9208–9216).
Ziqi, W., Jinwen, H. E., & Liangxiao, J. (2019). New redundancy-based algorithm for reducing amount of training examples in KNN. Computer Engineering and Applications., 55(22), 40–45.
Google Scholar
Zoph, B., Ghiasi, G., Lin, T. Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. (2020). Rethinking pre-training and self-training. Advances in Neural Information Processing Systems, 33, 3833–3845.
Google Scholar
Zou, Y., Yu, Z., Liu, X., Kumar, B. V. K., & Wang, J. (2019). Confidence regularized self-training. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5982–5991).

Download references

Acknowledgements

This research supported by the National Natural Science Foundation of China (No.52004029), Fundamental Research Funds for the Central Universities (FRF-TT-20-06) and division 8, Xinjiang Production and Construction Corps Science and Technology Program (2022ZD02).

Author information

Authors and Affiliations

National Engineering Research Center of Flat Rolling Equipment, University of Science and Technology Beijing, Beijing, 100083, China
Feifei Li, Anrui He, Yong Song, Chengzhe Shen, Fenjia Wang & Tieheng Yuan
National Engineering Research Center for Advanced Rolling and Intelligent Manufacturing, University of Science and Technology Beijing, Beijing, 100083, China
Shiwei Zhang, Xiaoqing Xu & Chao Liu
China Academy of Machinery Science and Technology, No. 2 Shouti Nanlu, Haidian District, Beijing, 100044, China
Yi Qiang
Shihezi Zhonghe New Material Co., Ltd, Shihezi, 832000, China
Pengfei Liu & Qiangguo Zhao

Authors

Feifei Li
View author publications
You can also search for this author in PubMed Google Scholar
Anrui He
View author publications
You can also search for this author in PubMed Google Scholar
Yong Song
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhe Shen
View author publications
You can also search for this author in PubMed Google Scholar
Fenjia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tieheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qiangguo Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Song.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, F., He, A., Song, Y. et al. Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel. J Intell Manuf (2023). https://doi.org/10.1007/s10845-023-02275-1

Download citation

Received: 06 July 2023
Accepted: 02 November 2023
Published: 24 December 2023
DOI: https://doi.org/10.1007/s10845-023-02275-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Abstract

Access this article

Similar content being viewed by others

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Abstract

Access this article

Similar content being viewed by others

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation