Predicting Vulnerable Software Components via Bellwethers

  • Patrick Kwaku Kudjo
  • Jinfu ChenEmail author
  • Solomon Mensah
  • Richard Amankwah
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 960)


Software vulnerabilities are weakness, flaws or errors introduced during the life cycle of a software system. Although, previous studies have demonstrated the practical significance of using software metrics to predict vulnerable software components, empirical evidence shows that these metrics are plagued with issues pertaining to their effectiveness and robustness. This paper investigates the feasibility of using Bellwethers (i.e., exemplary data) for predicting and classifying software vulnerabilities. We introduced a Bellwether method using the following operators, PARTITION, SAMPLE + TRAIN and APPLY. The Bellwethers sampled by the three operators are used to train a learner (i.e., deep neural networks) with the aim of predicting essential or non-essential vulnerabilities. We evaluate the proposed Bellwether method using vulnerability reports extracted from three popular web browsers offered by CVE. Again, the mean absolute error (MAE), Welch’s t-test and Cliff’s δ effect size are used to further evaluate the prediction performance and practical statistical significant difference between the Bellwethers and the growing portfolio. We found that there exist subsets of vulnerability records (Bellwethers) in the studied datasets that can yield improved accuracy for software vulnerability prediction. The result shows that recall and precision measures from the text mining process were in a range of 73.9%–85.3% and 67.9%–81.8% respectively across the three studied datasets. The findings further show that the use of the Bellwethers for predictive modelling is a promising research direction for assisting software engineers and practitioners when seeking to predict instances of vulnerability records that demand much attention prior to software release.


Software vulnerability Bellwethers Software metrics Growing portfolio Web browsers 



This work is partly supported by National Natural Science Foundation of China (NSFC grant numbers: 61202110 and 61502205), the project of Jiangsu provincial Six Talent Peaks (Grant numbers: XYDXXJS-016), Natural Science Foundation of Jiangsu Province (Grant numbers: BK20170558), University Science Research Project of Jiangsu Province (Grant numbers: 16KJB520008), the Graduate Research Innovation Project of Jiangsu Province (Grant numbers: KYCX17_1807), and the Postdoctoral Science Foundation of China (Grant numbers: 2015M571687 and 2015M581739).


  1. 1.
    Longley, D., Shain, M.: The Data and Computer Security Dictionary of Standard. Concepts, and Terms. Macmillan, London (1990)Google Scholar
  2. 2.
    Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33, 544–557 (2007)CrossRefGoogle Scholar
  3. 3.
    Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)CrossRefGoogle Scholar
  4. 4.
    Stuckman, J., Walden, J., Scandariato, R.: The effect of dimensionality reduction on software vulnerability prediction models. IEEE Trans. Reliab. 66(1), 17–37 (2017)CrossRefGoogle Scholar
  5. 5.
    Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 25–59 (2013)CrossRefGoogle Scholar
  6. 6.
    Zhang, S., Caragea, D., Ou, X.: An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain, A., Liddle, Stephen W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 217–231. Springer, Heidelberg (2011). Scholar
  7. 7.
    Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)Google Scholar
  8. 8.
    Yang, J., Ryu, D., Baik, J.: Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In: International Conference on Big Data and Smart Computing, BigComp, pp. 115–122 (2016)Google Scholar
  9. 9.
    Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th IEEE International Conference in Machine Learning and Applications (ICMLA), pp. 543–548 (2015)Google Scholar
  10. 10.
    Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of the Third IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 421–428. IEEE (2010)Google Scholar
  11. 11.
    Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 993–1006 (2014)CrossRefGoogle Scholar
  12. 12.
    Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pp. 11–18 (2008)Google Scholar
  13. 13.
    Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Software change history. IEEE Trans. Softw. Eng. 26, 653–661 (2000)CrossRefGoogle Scholar
  14. 14.
    Jinkun, G., Ping, L.U.O.: A novel vulnerability prediction model to predict vulnerability loss based on probit regression. Wuhan Univ. J. Nat. Sci. 21, 214–220 (2016)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans. Softw. Eng. 29, 297–310 (2003)CrossRefGoogle Scholar
  16. 16.
    Zimmermann, T., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, p. 9 (2007)Google Scholar
  17. 17.
    Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceeding of the 28th International Conference on Software Engineering - ICSE 2006, p. 452 (2006)Google Scholar
  18. 18.
    Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3, 14–19 (2005)CrossRefGoogle Scholar
  19. 19.
    Alhazmi, O.H., Malaiya, Y.K.: Prediction capabilities of vulnerability discovery models. In: Annual Reliability and Maintainability Symposium, RAMS 2006, pp. 86–91 (2006)Google Scholar
  20. 20.
    Musa, J.D., Okumoto, K.: A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th International Conference on Software Engineering, pp. 230–238 (1984)Google Scholar
  21. 21.
    Roumani, Y., Nwankpa, J.K., Roumani, Y.F.: Time series modeling of vulnerabilities. Comput. Secur. 51, 32–40 (2015)CrossRefGoogle Scholar
  22. 22.
    Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)CrossRefGoogle Scholar
  23. 23.
    Walden, J., Stuckman, J., Scandariato, R.: Predicting vulnerable components: software metrics vs text mining. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 23–33 (2014)Google Scholar
  24. 24.
    Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies, pp. 6–10 (2017)Google Scholar
  25. 25.
    Alves, H., Fonseca, B., Antunes, N.: Experimenting machine learning techniques to predict vulnerabilities. In: Proceedings of the 7th Latin-American Symposium on Dependable Computing, LADC 2016, pp. 151–156 (2016)Google Scholar
  26. 26.
    Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131 (2016)Google Scholar
  27. 27.
    Mensah, S., Keung, J., Macdonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of bellwether effect to improve software effort estimation. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 340–351 (2017)Google Scholar
  28. 28.
    Mensah, S., Keung, J., MacDonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of the Bellwether effect to improve software effort prediction: further empirical study. IEEE Trans. Reliab. 67(3), 1176–1198 (2018)CrossRefGoogle Scholar
  29. 29.
    Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: searching for cost-effective query-defined predictors in large databases. ACM Trans. Knowl. Discov. Data (TKDD), 3, 5 (2009)CrossRefGoogle Scholar
  30. 30.
    Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceedings of the 32nd International Conference on Very Large Databases, pp. 655–666 (2006)Google Scholar
  31. 31.
    Dobrovoljc, A., Trcek, D., Likar, B.: Predicting exploitations of information systems vulnerabilities through attackers characteristics. IEEE Access (2017)Google Scholar
  32. 32.
  33. 33.
    Wang, J.A., Wang, H., Guo, M., Xia, M.: Security metrics for software systems. In: Proceedings of the 47th ACM Annual Southeast Regional Conference, p. 47 (2009)Google Scholar
  34. 34.
    Morrison, P.J., Pandita, R., Xiao, X., Chillarege, R., Williams, L.: Are vulnerabilities discovered and resolved like other defects? Empir. Softw. Eng. 23, 1383–1421 (2018) CrossRefGoogle Scholar
  35. 35.
    Munaiah, N., Camilo, F., Wigham, W., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project. Empir. Softw. Eng., 22, 1305–1347 (2017)CrossRefGoogle Scholar
  36. 36.
    Alhazmi, O.H., Woo, S.-W., Malaiya, Y.K.: Security vulnerability categories in major software systems. Commun. Netw. Inf. Secur., 138–143 (2006)Google Scholar
  37. 37.
    Fruhwirth, T.M.C.: Improving CVSS-based vulnerability prioritization and response with context information. In: Proceedings of the 3rd IEEE International Symposium on Empirical Software Engineering and Measurement, (IEEE Computer Society, 2009), pp. 535–544 (2009)Google Scholar
  38. 38.
    Morrison, P., Herzig, K. , Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, p. 4 (2015)Google Scholar
  39. 39.
    Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis. ACM Trans. Knowl. Discov. Data 3(1), 1–49 (2009)CrossRefGoogle Scholar
  40. 40.
    Rahimi, S., Zargham, M.: Vulnerability scrying method for software vulnerability discovery prediction. IEEE Trans. Reliab. 62, 395–407 (2013)CrossRefGoogle Scholar
  41. 41.
    Younis, A.A., Malaiya, Y.K.: Using software structure to predict vulnerability exploitation potential. In: Proceedings of the Eighth IEEE International Conference on Software Security and Reliability-Companion (SERE-C), pp. 13–18 (2014)Google Scholar
  42. 42.
    Li, X., et al.: A mining approach to obtain the software vulnerability characteristics. In: Proceedings of the Fifth IEEE International Conference on Advanced Cloud and Big Data, vol. 1, pp. 2–7 (2017)Google Scholar
  43. 43.
    Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, Table Contents, pp. 727–734 (2000)Google Scholar
  44. 44.
    Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I.: Cluster-based vulnerability assessment applied to operating systems. In: Proceedings of the 13th European Dependable Computing Conference (2017)Google Scholar
  45. 45.
    Sui, Y.: Association rule mining and evaluation based on information security vulnerabilities main body. In: Applied Mechanics and Materials, pp. 1282–1285 (20140Google Scholar
  46. 46.
    Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, pp. 199–208 (2008)Google Scholar
  47. 47.
    N-gram and Fast Pattern Extraction Algorithm.
  48. 48.
    Corbin, J., Strauss, A.: Grounded theory research: procedures, canons and evaluative criteria. Zeitschrift für Soziologie 19, 418–427 (1990)CrossRefGoogle Scholar
  49. 49.
    Bavota, G., Russo, B.: A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE 2016, pp. 315–326 (2016)Google Scholar
  50. 50.
    Mensah, S., Keung, J., Svajlenko, J., Bennin, K.E., Mi, Q.: On the value of a prioritization scheme for resolving Self-admitted technical debt. J. Syst. Softw. 135, 37–54 (2018)CrossRefGoogle Scholar
  51. 51.
    Deep learning in neural networks: an overview: J. Schmidhuber. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  52. 52.
    Zhang, N., Shetty, D.: An effective LS-SVM-based approach for surface roughness prediction in machined surfaces. Neurocomputing 189, 35–39 (2016)Google Scholar
  53. 53.
    Zhang, S., Ou, X., Caragea, D.: Predicting cyber risks through national vulnerability database. Inf. Secur. J. Glob. Perspect. 24, 194–206 (2015)CrossRefGoogle Scholar
  54. 54.
    Kitchenham, B., et al.: Robust statistical methods for empirical software engineering. Empir. Softw. Eng. 22, 579–630 (2017)CrossRefGoogle Scholar
  55. 55.
    Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11–12), 1073–1086 (2007)CrossRefGoogle Scholar
  56. 56.
    Romano, D., Raila, P., Pinzger, M., Khomh, F.: Analyzing the impact of antipatterns on change-proneness using fine-grained source code changes. In: Proceedings - Working Conference on Reverse Engineering, WCRE, pp. 437–446 (2012)Google Scholar
  57. 57.
    Menzies, T., Yang, Y., Mathew, G., Boehm, B., Hihn, J.: Negative results for software effort estimation. Empir. Softw. Eng. 25(5), 2658–2683 (2017)CrossRefGoogle Scholar
  58. 58.
    Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Patrick Kwaku Kudjo
    • 1
  • Jinfu Chen
    • 1
    Email author
  • Solomon Mensah
    • 2
  • Richard Amankwah
    • 1
  1. 1.School of Computer Science and Communication EngineeringJiangsu UniversityZhenjiangChina
  2. 2.Department of Computer ScienceCity University of Hong KongKowloon TongHong Kong

Personalised recommendations