Skip to main content

Predicting Vulnerable Software Components via Bellwethers

  • Conference paper
  • First Online:
Trusted Computing and Information Security (CTCIS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 960))

Included in the following conference series:

Abstract

Software vulnerabilities are weakness, flaws or errors introduced during the life cycle of a software system. Although, previous studies have demonstrated the practical significance of using software metrics to predict vulnerable software components, empirical evidence shows that these metrics are plagued with issues pertaining to their effectiveness and robustness. This paper investigates the feasibility of using Bellwethers (i.e., exemplary data) for predicting and classifying software vulnerabilities. We introduced a Bellwether method using the following operators, PARTITION, SAMPLE + TRAIN and APPLY. The Bellwethers sampled by the three operators are used to train a learner (i.e., deep neural networks) with the aim of predicting essential or non-essential vulnerabilities. We evaluate the proposed Bellwether method using vulnerability reports extracted from three popular web browsers offered by CVE. Again, the mean absolute error (MAE), Welch’s t-test and Cliff’s δ effect size are used to further evaluate the prediction performance and practical statistical significant difference between the Bellwethers and the growing portfolio. We found that there exist subsets of vulnerability records (Bellwethers) in the studied datasets that can yield improved accuracy for software vulnerability prediction. The result shows that recall and precision measures from the text mining process were in a range of 73.9%–85.3% and 67.9%–81.8% respectively across the three studied datasets. The findings further show that the use of the Bellwethers for predictive modelling is a promising research direction for assisting software engineers and practitioners when seeking to predict instances of vulnerability records that demand much attention prior to software release.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://nvd.nist.gov/, www.cvedetails.com.

References

  1. Longley, D., Shain, M.: The Data and Computer Security Dictionary of Standard. Concepts, and Terms. Macmillan, London (1990)

    Google Scholar 

  2. Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33, 544–557 (2007)

    Article  Google Scholar 

  3. Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)

    Article  Google Scholar 

  4. Stuckman, J., Walden, J., Scandariato, R.: The effect of dimensionality reduction on software vulnerability prediction models. IEEE Trans. Reliab. 66(1), 17–37 (2017)

    Article  Google Scholar 

  5. Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 25–59 (2013)

    Article  Google Scholar 

  6. Zhang, S., Caragea, D., Ou, X.: An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain, A., Liddle, Stephen W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 217–231. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23088-2_15

    Chapter  Google Scholar 

  7. Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)

    Google Scholar 

  8. Yang, J., Ryu, D., Baik, J.: Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In: International Conference on Big Data and Smart Computing, BigComp, pp. 115–122 (2016)

    Google Scholar 

  9. Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th IEEE International Conference in Machine Learning and Applications (ICMLA), pp. 543–548 (2015)

    Google Scholar 

  10. Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of the Third IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 421–428. IEEE (2010)

    Google Scholar 

  11. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 993–1006 (2014)

    Article  Google Scholar 

  12. Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pp. 11–18 (2008)

    Google Scholar 

  13. Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Software change history. IEEE Trans. Softw. Eng. 26, 653–661 (2000)

    Article  Google Scholar 

  14. Jinkun, G., Ping, L.U.O.: A novel vulnerability prediction model to predict vulnerability loss based on probit regression. Wuhan Univ. J. Nat. Sci. 21, 214–220 (2016)

    Article  MathSciNet  Google Scholar 

  15. Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans. Softw. Eng. 29, 297–310 (2003)

    Article  Google Scholar 

  16. Zimmermann, T., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, p. 9 (2007)

    Google Scholar 

  17. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceeding of the 28th International Conference on Software Engineering - ICSE 2006, p. 452 (2006)

    Google Scholar 

  18. Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3, 14–19 (2005)

    Article  Google Scholar 

  19. Alhazmi, O.H., Malaiya, Y.K.: Prediction capabilities of vulnerability discovery models. In: Annual Reliability and Maintainability Symposium, RAMS 2006, pp. 86–91 (2006)

    Google Scholar 

  20. Musa, J.D., Okumoto, K.: A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th International Conference on Software Engineering, pp. 230–238 (1984)

    Google Scholar 

  21. Roumani, Y., Nwankpa, J.K., Roumani, Y.F.: Time series modeling of vulnerabilities. Comput. Secur. 51, 32–40 (2015)

    Article  Google Scholar 

  22. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)

    Article  Google Scholar 

  23. Walden, J., Stuckman, J., Scandariato, R.: Predicting vulnerable components: software metrics vs text mining. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 23–33 (2014)

    Google Scholar 

  24. Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies, pp. 6–10 (2017)

    Google Scholar 

  25. Alves, H., Fonseca, B., Antunes, N.: Experimenting machine learning techniques to predict vulnerabilities. In: Proceedings of the 7th Latin-American Symposium on Dependable Computing, LADC 2016, pp. 151–156 (2016)

    Google Scholar 

  26. Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131 (2016)

    Google Scholar 

  27. Mensah, S., Keung, J., Macdonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of bellwether effect to improve software effort estimation. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 340–351 (2017)

    Google Scholar 

  28. Mensah, S., Keung, J., MacDonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of the Bellwether effect to improve software effort prediction: further empirical study. IEEE Trans. Reliab. 67(3), 1176–1198 (2018)

    Article  Google Scholar 

  29. Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: searching for cost-effective query-defined predictors in large databases. ACM Trans. Knowl. Discov. Data (TKDD), 3, 5 (2009)

    Article  Google Scholar 

  30. Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceedings of the 32nd International Conference on Very Large Databases, pp. 655–666 (2006)

    Google Scholar 

  31. Dobrovoljc, A., Trcek, D., Likar, B.: Predicting exploitations of information systems vulnerabilities through attackers characteristics. IEEE Access (2017)

    Google Scholar 

  32. https://www.exploit-db.com

  33. Wang, J.A., Wang, H., Guo, M., Xia, M.: Security metrics for software systems. In: Proceedings of the 47th ACM Annual Southeast Regional Conference, p. 47 (2009)

    Google Scholar 

  34. Morrison, P.J., Pandita, R., Xiao, X., Chillarege, R., Williams, L.: Are vulnerabilities discovered and resolved like other defects? Empir. Softw. Eng. 23, 1383–1421 (2018)

    Article  Google Scholar 

  35. Munaiah, N., Camilo, F., Wigham, W., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project. Empir. Softw. Eng., 22, 1305–1347 (2017)

    Article  Google Scholar 

  36. Alhazmi, O.H., Woo, S.-W., Malaiya, Y.K.: Security vulnerability categories in major software systems. Commun. Netw. Inf. Secur., 138–143 (2006)

    Google Scholar 

  37. Fruhwirth, T.M.C.: Improving CVSS-based vulnerability prioritization and response with context information. In: Proceedings of the 3rd IEEE International Symposium on Empirical Software Engineering and Measurement, (IEEE Computer Society, 2009), pp. 535–544 (2009)

    Google Scholar 

  38. Morrison, P., Herzig, K. , Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, p. 4 (2015)

    Google Scholar 

  39. Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis. ACM Trans. Knowl. Discov. Data 3(1), 1–49 (2009)

    Article  Google Scholar 

  40. Rahimi, S., Zargham, M.: Vulnerability scrying method for software vulnerability discovery prediction. IEEE Trans. Reliab. 62, 395–407 (2013)

    Article  Google Scholar 

  41. Younis, A.A., Malaiya, Y.K.: Using software structure to predict vulnerability exploitation potential. In: Proceedings of the Eighth IEEE International Conference on Software Security and Reliability-Companion (SERE-C), pp. 13–18 (2014)

    Google Scholar 

  42. Li, X., et al.: A mining approach to obtain the software vulnerability characteristics. In: Proceedings of the Fifth IEEE International Conference on Advanced Cloud and Big Data, vol. 1, pp. 2–7 (2017)

    Google Scholar 

  43. Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, Table Contents, pp. 727–734 (2000)

    Google Scholar 

  44. Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I.: Cluster-based vulnerability assessment applied to operating systems. In: Proceedings of the 13th European Dependable Computing Conference (2017)

    Google Scholar 

  45. Sui, Y.: Association rule mining and evaluation based on information security vulnerabilities main body. In: Applied Mechanics and Materials, pp. 1282–1285 (20140

    Google Scholar 

  46. Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, pp. 199–208 (2008)

    Google Scholar 

  47. N-gram and Fast Pattern Extraction Algorithm. https://www.codeproject.com

  48. Corbin, J., Strauss, A.: Grounded theory research: procedures, canons and evaluative criteria. Zeitschrift für Soziologie 19, 418–427 (1990)

    Article  Google Scholar 

  49. Bavota, G., Russo, B.: A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE 2016, pp. 315–326 (2016)

    Google Scholar 

  50. Mensah, S., Keung, J., Svajlenko, J., Bennin, K.E., Mi, Q.: On the value of a prioritization scheme for resolving Self-admitted technical debt. J. Syst. Softw. 135, 37–54 (2018)

    Article  Google Scholar 

  51. Deep learning in neural networks: an overview: J. Schmidhuber. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  52. Zhang, N., Shetty, D.: An effective LS-SVM-based approach for surface roughness prediction in machined surfaces. Neurocomputing 189, 35–39 (2016)

    Google Scholar 

  53. Zhang, S., Ou, X., Caragea, D.: Predicting cyber risks through national vulnerability database. Inf. Secur. J. Glob. Perspect. 24, 194–206 (2015)

    Article  Google Scholar 

  54. Kitchenham, B., et al.: Robust statistical methods for empirical software engineering. Empir. Softw. Eng. 22, 579–630 (2017)

    Article  Google Scholar 

  55. Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11–12), 1073–1086 (2007)

    Article  Google Scholar 

  56. Romano, D., Raila, P., Pinzger, M., Khomh, F.: Analyzing the impact of antipatterns on change-proneness using fine-grained source code changes. In: Proceedings - Working Conference on Reverse Engineering, WCRE, pp. 437–446 (2012)

    Google Scholar 

  57. Menzies, T., Yang, Y., Mathew, G., Boehm, B., Hihn, J.: Negative results for software effort estimation. Empir. Softw. Eng. 25(5), 2658–2683 (2017)

    Article  Google Scholar 

  58. Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017)

    Google Scholar 

Download references

Acknowledgments

This work is partly supported by National Natural Science Foundation of China (NSFC grant numbers: 61202110 and 61502205), the project of Jiangsu provincial Six Talent Peaks (Grant numbers: XYDXXJS-016), Natural Science Foundation of Jiangsu Province (Grant numbers: BK20170558), University Science Research Project of Jiangsu Province (Grant numbers: 16KJB520008), the Graduate Research Innovation Project of Jiangsu Province (Grant numbers: KYCX17_1807), and the Postdoctoral Science Foundation of China (Grant numbers: 2015M571687 and 2015M581739).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinfu Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kudjo, P.K., Chen, J., Mensah, S., Amankwah, R. (2019). Predicting Vulnerable Software Components via Bellwethers. In: Zhang, H., Zhao, B., Yan, F. (eds) Trusted Computing and Information Security. CTCIS 2018. Communications in Computer and Information Science, vol 960. Springer, Singapore. https://doi.org/10.1007/978-981-13-5913-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5913-2_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5912-5

  • Online ISBN: 978-981-13-5913-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics