Abstract
Software aging is a phenomenon in which long-running software systems show an increasing failure rate and/or progressive performance degradation. Due to their nature, Aging-Related Bugs (ARBs) are hard to discover during software testing and are also challenging to reproduce. Therefore, automatically predicting ARBs before software release can help developers reduce ARB impact or avoid ARBs. Many bug prediction approaches have been proposed, and most of them show effectiveness in within-project prediction settings. However, due to the low presence and reproducing difficulty of ARBs, it is usually hard to collect sufficient training data to build an accurate prediction model. A recent work proposed a method named Transfer Learning based Aging-related bug Prediction (TLAP) for performing cross-project ARB prediction. Although this method considerably improves cross-project ARB prediction performance, it has been observed that its prediction result is affected by several key factors, such as the normalization methods, kernel functions, and machine learning classifiers. Therefore, this paper presents the first empirical study to examine the impact of these factors on the effectiveness of cross-project ARB prediction in terms of single-factor pattern, bigram pattern, and triplet pattern and validates the results with the Scott-Knott test technique. We find that kernel functions and classifiers are key factors affecting the effectiveness of cross-project ARB prediction, while normalization methods do not show statistical influence. In addition, the order of values in three single-factor patterns is maintained in three bigram patterns and one triplet pattern to a large extent. Similarly, the order of values in the three bigram patterns is also maintained in the triplet pattern.
Similar content being viewed by others
References
Al Shalabi, L., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In 2006 International conference on dependability of computer systems (pp. 207–214): IEEE.
Avritzer, A., & Weyuker, E.J. (1997). Monitoring smoothly degrading systems for increased dependability. Empirical Software Engineering, 2(1), 59–77.
Carrozza, G., Cotroneo, D., Natella, R., Pietrantuono, R., Russo, S. (2013). Analysis and prediction of mandelbugs in an industrial software system. In 2013 IEEE Sixth international conference on software testing, verification and validation (pp. 262–271): IEEE.
Cassidy, K.J., Gross, K.C., Malekpour, A. (2002). Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers. In Proceedings international conference on dependable systems and networks (pp. 478–482): IEEE.
Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P. (2001). Proactive management of software aging. IBM Journal of Research and Development, 45(2), 311–332.
Catal, C. (2011). Software fault prediction: a literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636.
Chen, L., Fang, B., Shang, Z., Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77.
Corazza, A, Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F, Mendes, E. (2010). How effective is tabu search to configure support vector regression for effort estimation?. In Proceedings of the 6th international conference on predictive models in software engineering (p. 4): ACM.
Cotroneo, D., Natella, R., Pietrantuono, R. (2010). Is software aging related to software metrics?. In 2010 IEEE Second international workshop on software aging and rejuvenation (pp. 1–6): IEEE.
Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S. (2013a). Fault triggers in open-source software: an experience report. In 2013 IEEE 24th International symposium on software reliability engineering (ISSRE) (pp. 178–187): IEEE.
Cotroneo, D, Natella, R, Pietrantuono, R. (2013b). Predicting aging-related bugs using software complexity metrics. Performance Evaluation, 70(3), 163–178.
Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F. (2011). A genetic algorithm to configure support vector machines for predicting fault-prone components. In International conference on product focused software process improvement (pp. 247–261): Springer.
Gao, K., Khoshgoftaar, T.M., Napolitano, A. (2012). A hybrid approach to coping with high dimensionality and class imbalance for software defect prediction. In 2012 11th international conference on machine learning and applications, (Vol. 2 pp. 281–288): IEEE.
Graf, A.B., & Borer, S. (2001). Normalization in support vector machines. In Joint pattern recognition symposium (pp. 277–282): Springer.
Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S. (2006). Analysis of software aging in a web server. IEEE Transactions on Reliability, 55(3), 411–420.
Grottke, M., Matias, R., Trivedi, K.S. (2008). The fundamentals of software aging. In 2008 IEEE International conference on software reliability engineering workshops (ISSRE Wksp) (pp. 1–6): IEEE.
Grottke, M., Nikora, A.P., Trivedi, K.S. (2010). An empirical investigation of fault types in space mission system software. In 2010 IEEE/IFIP international conference on dependable systems & networks (DSN) (pp. 447–456): IEEE.
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
Han, J, Pei, J, Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Hassan, A.E. (2009). Predicting faults using the complexity of code changes. In Proceedings of the 31st international conference on software engineering (pp. 78–88): IEEE Computer Society.
He, Z, Peters, F, Menzies, T, Yang, Y. (2013). Learning from open-source projects: an empirical study on defect prediction. In 2013 ACM/IEEE international symposium on empirical software engineering and measurement (pp. 45–54): IEEE.
He, P., Li, B., Liu, X., Chen, J., Ma, Y. (2015). An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 59, 170–190.
Herbold, S. (2013). Training data selection for cross-project defect prediction. In Proceedings of the 9th international conference on predictive models in software engineering (p. 6): ACM.
Herbold, S. (2017). A systematic mapping study on cross-project defect prediction. arXiv:170506429.
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995). Software rejuvenation: analysis, module and applications. In Twenty-fifth international symposium on fault-tolerant computing. Digest of papers (pp. 381–390): IEEE.
Jelihovschi, E.G., Faria, J.C., Allaman, I.B. (2014). Scottknott: a package for performing the scott-knott clustering algorithm in r. TEMA (São Carlos), 15(1), 3–17.
Khoshgoftaar, T.M., Gao, K., Seliya, N. (2010). Attribute selection and imbalanced data: problems in software defect prediction. In 2010 22nd IEEE International conference on tools with artificial intelligence, (Vol. 1 pp. 137–144): IEEE.
Kim, S, Zimmermann, T, Whitehead, EJ Jr, Zeller, A. (2007). Predicting faults from cached history. In Proceedings of the 29th international conference on software engineering (pp. 489–498): IEEE Computer Society.
Kotsiantis, S., Kanellopoulos, D., Pintelas, P. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2), 111–117.
Kumar, L, & Sureka, A. (2018). Feature selection techniques to counter class imbalance problem for aging related bug prediction: aging related bug prediction. In Proceedings of the 11th innovations in software engineering conference (p. 2): ACM.
Li, M., Zhang, H., Wu, R., Zhou, Z.H. (2012). Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19(2), 201–230.
Ma, Y., Luo, G., Zeng, X., Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256.
Marshall, E. (1992). Fatal error: how patriot overlooked a scud. Science, 255 (5050), 1347–1348.
Matias, R., & Paulo Filho, J. (2006). An experimental study on software aging and rejuvenation in web servers. In 30th Annual international computer software and applications conference (COMPSAC’06), (Vol. 1 pp. 189–196): IEEE.
Matias, R., Barbetta, P.A., Trivedi, K.S., Freitas Filho, P.J. (2010). Accelerated degradation tests applied to software aging experiments. IEEE Transactions on Reliability, 59(1), 102–114.
Menzies, T., Greenwald, J., Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Moser, R, Pedrycz, W, Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering (pp. 181–190): ACM.
Nagappan, N, & Ball, T. (2005). Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on software engineering (pp. 284–292): ACM.
Nam, J., Pan, S.J., Kim, S. (2013). Transfer defect learning. In 2013 35th International conference on software engineering (ICSE) (pp. 382–391): IEEE.
Nam, J., Fu, W., Kim, S., Menzies, T., Tan, L. (2018). Heterogeneous defect prediction. IEEE Transactions on Software Engineering, 44(9), 874–896.
Nayak, S., Misra, B., Behera, H. (2014). Impact of data normalization on stock index forecasting. International Journal of Computer and Information System Industrial Management Application, 6, 357–369.
Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.
Peters, F., Menzies, T., Gong, L., Zhang, H. (2013). Balancing privacy and utility in cross-company defect prediction. IEEE Transactions on Software Engineering, 39(8), 1054–1068.
Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K.S., Cai, K.Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.
Qin, F., Zheng, Z., Bai, C., Qiao, Y., Zhang, Z., Chen, C. (2015). Cross-project aging related bug prediction. In 2015 IEEE International conference on software quality, reliability and security (pp. 43–48): IEEE.
Qin, F., Zheng, Z., Li, X., Qiao, Y., Trivedi, K.S. (2017). An empirical investigation of fault triggers in android operating system. In 2017 IEEE 22nd Pacific Rim international symposium on dependable computing (PRDC) (pp. 135–*144): IEEE.
Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability, 99, 1–20.
Ren, J., Qin, K., Ma, Y., Luo, G. (2014). On software defect prediction using machine learning. Journal of Applied Mathematics, 2014.
Ryu, D., Choi, O., Baik, J. (2016). Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 21 (1), 43–71.
Ryu, D., Jang, J.I., Baik, J. (2017). A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 25(1), 235–272.
Scott, A.J., & Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 507–512.
Tai, A.T., Chau, S.N., Alkalaj, L., Hecht, H. (1997). On-board preventive maintenance: analysis of effectiveness and optimal duty period. In Proceedings Third international workshop on object-oriented real-time dependable systems (pp. 40–47): IEEE.
Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578.
Turhan, B., Mısırlı, A.T., Bener, A. (2013). Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 55(6), 1101–1118.
Vaidyanathan, K., & Trivedi, K.S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.
Watanabe, S, Kaiya, H., Kaijiri, K. (2008). Adapting a fault prediction model to allow inter languagereuse. In Proceedings of the 4th international workshop on predictor models in software engineering (pp. 19–24): ACM.
Weiss, K., Khoshgoftaar, T.M., Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 9.
Xiao, G., Zheng, Z., Yin, B., Trivedi, K.S., Du, X., Cai, K. (2017). Experience report: fault triggers in linux operating system: from evolution perspective. In 2017 IEEE 28th international symposium on software reliability engineering (ISSRE) (pp. 101–111): IEEE.
Zhao, L, Song, Q, Zhu, L. (2008). Common software-aging-related faults in fault-tolerant systems. In 2008 International conference on computational intelligence for modelling control & automation (pp. 327–331): IEEE.
Zhao, J, Jin, Y, Trivedi, K.S., Matias, R. Jr. (2011). Injecting memory leaks to accelerate software failures. In 2011 IEEE 22nd international symposium on software reliability engineering (pp. 260–269): IEEE.
Zhou, Z. (2016). Machine learning. Tsinghua Press.
Funding
This work was supported by the State Key Laboratory of Software Development Environment under Grant SKLSDE-2018ZX-09, National Natural Science Foundation of China under Grant 61772055 and Grant 61872169, and the Technical Foundation Project of Ministry of Industry and Information Technology of China under Grant JSZL2016601B003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qin, F., Wan, X. & Yin, B. An empirical study of factors affecting cross-project aging-related bug prediction with TLAP. Software Qual J 28, 107–134 (2020). https://doi.org/10.1007/s11219-019-09460-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-019-09460-7