Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Shen, Jian; Li, Zhong; Lu, Yifei; Pan, Minxue; Li, Xuandong

doi:10.1007/s10515-024-00435-y

Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Published: 04 April 2024

Volume 31, article number 33, (2024)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Jian Shen¹,
Zhong Li¹,
Yifei Lu¹,
Minxue Pan¹ &
…
Xuandong Li¹

100 Accesses
Explore all metrics

Abstract

Deep predictive models have been widely employed in software engineering (SE) tasks due to their remarkable success in artificial intelligence (AI). Most of these models are trained in a supervised manner, and their performance heavily relies on the quality of training data. Unfortunately, mislabeling or label noise is a common issue in SE datasets, which can significantly affect the validity of models trained on such datasets. Although learning with noise approaches based on deep learning (DL) have been proposed to address the issue of mislabeling in AI datasets, the distinct characteristics of SE datasets in terms of size and data quality raise questions about the effectiveness of these approaches within the SE context. In this paper, we conduct a comprehensive study to understand how mislabeled samples exist in SE datasets, how they impact deep predictive models, and how well existing learning with noise approaches perform on SE datasets. Through an empirical evaluation on two representative datasets for the Bug Report Classification and Software Defect Prediction tasks, our study reveals that learning with noise approaches have the potential to handle mislabeled samples in SE tasks, but their effectiveness is not always consistent. Our research shows that it is crucial to address mislabeled samples in SE tasks. To achieve this, it is essential to take into account the specific properties of the dataset to develop effective solutions. We also highlight the importance of addressing potential class distribution changes caused by mislabeled samples and present the limitations of existing approaches for addressing mislabeled samples. Therefore, we urge the development of more advanced techniques to improve the effectiveness and reliability of deep predictive models in SE tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data

Article 04 June 2022

Towards One Reusable Model for Various Software Defect Mining Tasks

EkmEx - an extended framework for labeling an unlabeled fault dataset

Article 08 January 2022

Notes

The original dataset contains \(7401\) issues from \(5\) projects (Herzig et al. 2013). However, only issues from Jackrabbit, Lucene, and HttpClient are used in prior work (Pandey et al. 2017; Qin and Sun 2018). Therefore, we follow the prior practice and use the issues from the \(3\) projects in our study.

References

Alhroob, A., Imam, A.T., Al-Heisa, R.: The use of artificial neural networks for extracting actions and actors from requirements document. Inf. Softw. Technol. 101, 1–15 (2018)
Article Google Scholar
Allamanis, M., Barr, E.T., Devanbu, P.T., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51(4), 81–18137 (2018)
Google Scholar
Antoniol, G., Ayari, K., Penta, M.D., Khomh, F., Guéhéneuc, Y.: Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering, CASCON 2018, Markham, Ontario, Canada, October 29–31, 2018, pp. 2–16 (2018)
Cabral, G.G., Minku, L.L., Shihab, E., Mujahid, S.: Class imbalance evolution and verification latency in just-in-time software defect prediction. In: Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25–31, 2019, pp. 666–676 (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, P., Ye, J., Chen, G., Zhao, J., Heng, P.: Beyond class-conditional assumption: a primary attempt to combat instance-dependent label noise. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 11442–11450 (2021)
Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., Liu, Y.: Learning with instance-dependent label noise: a sample sieve approach. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (2021)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3, 261–283 (1989)
Article Google Scholar
Cliff, N.: Ordinal methods for behavioral data analysis. (1996)
Cui, Y., Jia, M., Lin, T., Song, Y., Belongie, S.J.: Class-balanced loss based on effective number of samples. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 9268–9277 (2019)
Fan, Y., Xia, X., Costa, D.A., Lo, D., Hassan, A.E., Li, S.: The impact of mislabeled changes by SZZ on just-in-time defect prediction. IEEE Trans. Softw. Eng. 47(8), 1559–1586 (2021)
Article Google Scholar
Fan, Y., Xia, X., Costa, D.A., Lo, D., Hassan, A.E., Li, S.: The impact of mislabeled changes by SZZ on just-in-time defect prediction. IEEE Trans. Software Eng. 47(8), 1559–1586 (2021)
Article Google Scholar
Fan, Y., Xia, X., Costa, D.A., Lo, D., Hassan, A.E., Li, S.: The impact of mislabeled changes by SZZ on just-in-time defect prediction. IEEE Trans. Softw. Eng. 47(8), 1559–1586 (2021)
Article Google Scholar
Feng, S., Keung, J., Yu, X., Xiao, Y., Zhang, M.: Investigation on the stability of smote-based oversampling techniques in software defect prediction. Inf. Softw. Technol. 139, 106662 (2021)
Article Google Scholar
Ferreira, F., Silva, L.L., Valente, M.T.: Software engineering meets deep learning: a mapping study. In: SAC, pp. 1542–1549 (2021)
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Networks Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4–8, 2017, pp. 72–83 (2017)
Fu, M., Tantithamthavorn, C.: Linevul: A transformer-based line-level vulnerability prediction. In: MSR, pp. 608–620 (2022)
Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27–30, 1999, pp. 143–151 (1999)
Gong, L., Jiang, S., Wang, R., Jiang, L.: Empirical evaluation of the impact of class overlap on software defect prediction. In: 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11–15, 2019, pp. 698–709 (2019)
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I.W., Sugiyama, M.: Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 8536–8546 (2018)
Han, J., Huang, C., Sun, S., Liu, Z., Liu, J.: bjxnet: an improved bug localization model based on code property graph and attention mechanism. Autom. Softw. Eng. 30(1), 12 (2023)
Article Google Scholar
He, S., Zhang, H., Tu, Z., Chu, D.: Personalized review recommendation without user interactive data. In: HPCC/DSS/SmartCity/DependSys, pp. 2062–2070 (2022)
Herbold, S., Trautsch, A., Grabowski, J.: A comparative study to benchmark cross-project defect prediction approaches. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27–June 03, 2018, p. 1063 (2018)
Herbold, S., Trautsch, A., Trautsch, F.: On the feasibility of automated issue type prediction. arxiv:abs/2003.05357 (2020)
Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: ICSE, pp. 392–401 (2013)
Hindle, A., Ernst, N.A., Godfrey, M.W., Mylopoulos, J.: Automated topic naming to support cross-project analysis of software maintenance activities. In: MSR, pp. 163–172 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, Shanghai, China, September 17–22, 2017, pp. 159–170 (2017)
Huang, L., Zhang, C., Zhang, H.: Self-adaptive training: beyond empirical risk minimization. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020)
Jain, P.K., Srivastava, G., Lin, J.C., Pamula, R.: Unscrambling customer recommendations: a novel LSTM ensemble approach in airline recommendation prediction using online reviews. IEEE Trans. Comput. Soc. Syst. 9(6), 1777–1784 (2022)
Article Google Scholar
Jiang, T., Tan, L., Kim, S.: Personalized defect prediction. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11–15, 2013, pp. 279–289 (2013)
Kallis, R., Sorbo, A.D., Canfora, G., Panichella, S.: Ticket tagger: machine learning driven issue classification. In: 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, USA, September 29 - October 4, 2019, pp. 406–409 (2019)
Kamei, Y., Shihab, E., Adams, B., Hassan, A.E., Mockus, A., Sinha, A., Ubayashi, N.: A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Software Eng. 39(6), 757–773 (2013)
Article Google Scholar
Kamei, Y., Fukushima, T., McIntosh, S., Yamashita, K., Ubayashi, N., Hassan, A.E.: Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21(5), 2072–2106 (2016)
Article Google Scholar
Khan, S.S., Niloy, N.T., Azmain, M.A., Kabir, A.: Impact of label noise and efficacy of noise filters in software defect prediction. In: The 32nd International Conference on Software Engineering and Knowledge Engineering, SEKE 2020, KSIR Virtual Conference Center, USA, July 9–19, 2020, pp. 347–352 (2020)
Khoshgoftaar, T.M., Rebours, P.: Generating multiple noise elimination filters with the ensemble-partitioning filter. In: Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, IRI—2004, November 8–10, 2004, Las Vegas Hilton, Las Vegas, NV, USA, pp. 369–375 (2004)
Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007)
Article Google Scholar
Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007)
Article Google Scholar
Kim, S., Zhang, H., Wu, R., Gong, L.: Dealing with noise in defect prediction. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21–28, 2011, pp. 481–490 (2011)
Kim, S., Zhang, H., Wu, R., Gong, L.: Dealing with noise in defect prediction. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21–28, 2011, pp. 481–490 (2011)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Kochhar, P.S., Le, T.B., Lo, D.: It’s not a bug, it’s a feature: does misclassification affect bug localization? In: MSR, pp. 296–299 (2014)
Krause, J., Sapp, B., Howard, A., Zhou, H., Toshev, A., Duerig, T., Philbin, J., Fei-Fei, L.: The unreasonable effectiveness of noisy data for fine-grained recognition. In: ECCV (3). Lecture Notes in Computer Science, vol. 9907, pp. 301–320 (2016)
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Artificial Intelligence Medicine, 8th Conference on AI in Medicine in Europe, AIME 2001, Cascais, Portugal, July 1–4, 2001, Proceedings. Lecture Notes in Computer Science, vol. 2101, pp. 63–66 (2001)
Li, G., Liu, H., Jin, J., Umer, Q.: Deep learning based identification of suspicious return statements. In: SANER, pp. 480–491 (2020)
Li, Z., Jing, X., Zhu, X.: Progress on approaches to software defect prediction. IET Softw. 12(3), 161–175 (2018)
Article Google Scholar
Lin, B., Zampetti, F., Bavota, G., Penta, M.D., Lanza, M.: Pattern-based mining of opinions in q &a websites. In: Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25–31, 2019, pp. 548–559 (2019)
Lloyd, R.V., Erickson, L.A., Casey, M.B., Lam, K.Y., Lohse, C.M., Asa, S.L., Chan, J.K., DeLellis, R.A., Harach, H.R., Kakudo, K., et al.: Observer variation in the diagnosis of follicular variant of papillary thyroid carcinoma. Am. J. Surg. Pathol. 28(10), 1336–1340 (2004)
Article Google Scholar
Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S.M., Bailey, J.: Normalized loss functions for deep learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 6543–6553 (2020)
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)
Article Google Scholar
McIntosh, S., Kamei, Y.: Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans. Software Eng. 44(5), 412–428 (2018)
Article Google Scholar
Nafi, K.W., Kar, T.S., Roy, B., Roy, C.K., Schneider, K.A.: CLCDSA: cross language code clone detection using syntactical features and API documentation. In: ASE, pp. 1026–1037 (2019)
Pak, C., Wang, T., Su, X.: An empirical study on software defect prediction using over-sampling by SMOTE. Int. J. Softw. Eng. Knowl. Eng. 28(6), 811–830 (2018)
Article Google Scholar
Palomba, F., Tamburri, D.A., Fontana, F.A., Oliveto, R., Zaidman, A., Serebrenik, A.: Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE Trans. Software Eng. 47(1), 108–129 (2021)
Article Google Scholar
Pan, C., Lu, M., Xu, B., Gao, H.: An improved CNN model for within-project software defect prediction. Appl. Sci. 9(10), 2138 (2019)
Article Google Scholar
Pandey, N., Sanyal, D.K., Hudait, A., Sen, A.: Automated classification of software issue reports using machine learning techniques: an empirical study. Innov. Syst. Softw. Eng. 13(4), 279–297 (2017)
Article Google Scholar
Project Homepage. https://github.com/RobustLearning/RobustLearning
Pudlitz, F., Brokhausen, F., Vogelsang, A.: Extraction of system states from natural language requirements. In: Damian, D.E., Perini, A., Lee, S. (Eds) 27th IEEE International Requirements Engineering Conference, RE 2019, Jeju Island, Korea (South), September 23–27, 2019, pp. 211–222 (2019). https://doi.org/10.1109/RE.2019.00031
Qin, H., Sun, X.: Classifying bug reports into bugs and non-bugs using LSTM. In: Proceedings of the Tenth Asia-Pacific Symposium on Internetware, Internetware 2018, Beijing, China, September 16–16, 2018, Beijing, China, pp. 20–1204 (2018)
Sabzevari, M., Martínez-Muñoz, G., Suárez, A.: A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275, 2374–2383 (2018)
Article Google Scholar
Sajnani, H.: Automatic software architecture recovery: a machine learning approach. In: ICPC, pp. 265–268 (2012)
Shafiq, S., Mashkoor, A., Mayr-Dorn, C., Egyed, A.: Machine learning for software engineering: a systematic mapping. CoRR arxiv:abs/2005.13299 (2020)
Song, H., Kim, M., Park, D., Lee, J.: Learning from noisy labels with deep neural networks: a survey. CoRR arXiv:abs/2007.08199 (2020)
Song, Q., Guo, Y., Shepperd, M.J.: A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Software Eng. 45(12), 1253–1269 (2019)
Article Google Scholar
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Ihara, A., Matsumoto, K.: The impact of mislabelling on the performance and interpretation of defect prediction models. In: Bertolino, A., Canfora, G., Elbaum, S.G. (Eds) 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16–24, 2015, Volume 1, pp. 812–823 (2015). https://doi.org/10.1109/ICSE.2015.93
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp. 321–332 (2016)
Thongkam, J., Xu, G., Zhang, Y., Huang, F.: Support vector machine for outlier detection in breast cancer survivability prediction. In: Advanced Web and NetworkTechnologies, and Applications, APWeb 2008 International Workshops: BIDM, IWHDM, and DeWeb Shenyang, China, April 26–28, 2008. Revised Selected Papers. Lecture Notes in Computer Science, vol. 4977, pp. 99–109 (2008)
Wang, X., Guan, Z., Xin, W., Wang, J.: Multi-type source code defect detection based on textcnn. In: FCS Communications in Computer and Information Science, vol. 1286, pp. 95–103 (2020)
Wei, H., Feng, L., Chen, X., An, B.: Combating noisy labels by agreement: A joint training method with co-regularization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 13723–13732 (2020). https://doi.org/10.1109/CVPR42600.2020.01374
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics (1945)
Wu, X., Zheng, W., Xia, X., Lo, D.: Data quality matters: a case study on data label correctness for security bug report prediction. IEEE Trans. Softw. Eng. 48(7), 2541–2556 (2022)
Article Google Scholar
Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., Chang, Y.: Robust early-learning: hindering the memorization of noisy labels. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (2021)
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., Sugiyama, M.: Part-dependent label noise: towards instance-dependent label noise. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020)
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 2691–2699 (2015)
Xu, B., Hoang, T., Sharma, A., Yang, C., Xia, X., Lo, D.: Post2vec: learning distributed representations of stack overflow posts. IEEE Trans. Software Eng. 48(9), 3423–3441 (2022)
Article Google Scholar
Yang, Y., Xia, X., Lo, D., Grundy, J.C.: A survey on deep learning for software engineering. arxiv:abs/2011.14597 (2020)
Yang, Y., Zhou, Y., Liu, J., Zhao, Y., Lu, H., Xu, L., Xu, B., Leung, H.: Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016, pp. 157–168 (2016)
Yang, Y., Xia, X., Lo, D., Bi, T., Grundy, J.C., Yang, X.: Predictive models in software engineering: challenges and opportunities. ACM Trans. Softw. Eng. Methodol. 31(3), 56–15672 (2022)
Article Google Scholar
Yatish, S., Jiarpakdee, J., Thongtanunam, P., Tantithamthavorn, C.: Mining software defects: should we consider affected releases? In: Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25–31, 2019, pp. 654–665 (2019)
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I.W., Sugiyama, M.: How does disagreement help generalization against label corruption? In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 7164–7173 (2019)
Zhang, Z., Sabuncu, M.R.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 8792–8802 (2018)
Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., Xie, C., Yang, X., Cheng, Q., Li, Z., Chen, J., He, X., Yao, R., Lou, J., Chintalapati, M., Shen, F., Zhang, D.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26–30, 2019, pp. 807–817 (2019)
Zhang, Y., Zheng, S., Wu, P., Goswami, M., Chen, C.: Learning with feature-dependent label noise: a progressive approach. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (2021)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar
Zheng, S., Wu, P., Goswami, A., Goswami, M., Metaxas, D.N., Chen, C.: Error-bounded correction of noisy labels. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 11447–11457 (2020)
Zhu, Z., Li, Y., Wang, Y., Wang, Y., Tong, H.: A deep multimodal model for bug localization. Data Min. Knowl. Discov. 35(4), 1369–1392 (2021)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University, State Key Laboratory for Novel Software Technology, Nanjing, China
Jian Shen, Zhong Li, Yifei Lu, Minxue Pan & Xuandong Li

Authors

Jian Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Minxue Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xuandong Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S. and Z.L. carried out the primary research and were responsible for drafting the manuscript. Y.L. engaged in discussions and manuscript revisions. M.P. oversaw the research project and contributed to manuscript revisions. X.L. contributed to discussions and provided valuable suggestions for improving the study. All authors have reviewed and approved the manuscript.

Corresponding author

Correspondence to Minxue Pan.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, J., Li, Z., Lu, Y. et al. Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks. Autom Softw Eng 31, 33 (2024). https://doi.org/10.1007/s10515-024-00435-y

Download citation

Received: 22 October 2023
Accepted: 18 March 2024
Published: 04 April 2024
DOI: https://doi.org/10.1007/s10515-024-00435-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Abstract

Access this article

Similar content being viewed by others

Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data

Towards One Reusable Model for Various Software Defect Mining Tasks

EkmEx - an extended framework for labeling an unlabeled fault dataset

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mitigating the impact of mislabeled data on deep predictive models: an empirical study of learning with noise approaches in software engineering tasks

Abstract

Access this article

Similar content being viewed by others

Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data

Towards One Reusable Model for Various Software Defect Mining Tasks

EkmEx - an extended framework for labeling an unlabeled fault dataset

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation