Predicting Vulnerable Software Components via Bellwethers

Kudjo, Patrick Kwaku; Chen, Jinfu; Mensah, Solomon; Amankwah, Richard

doi:10.1007/978-981-13-5913-2_24

Patrick Kwaku Kudjo¹²,
Jinfu Chen¹²,
Solomon Mensah¹³ &
…
Richard Amankwah¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 960))

Included in the following conference series:

Chinese Conference on Trusted Computing and Information Security

619 Accesses
3 Citations

Abstract

Software vulnerabilities are weakness, flaws or errors introduced during the life cycle of a software system. Although, previous studies have demonstrated the practical significance of using software metrics to predict vulnerable software components, empirical evidence shows that these metrics are plagued with issues pertaining to their effectiveness and robustness. This paper investigates the feasibility of using Bellwethers (i.e., exemplary data) for predicting and classifying software vulnerabilities. We introduced a Bellwether method using the following operators, PARTITION, SAMPLE + TRAIN and APPLY. The Bellwethers sampled by the three operators are used to train a learner (i.e., deep neural networks) with the aim of predicting essential or non-essential vulnerabilities. We evaluate the proposed Bellwether method using vulnerability reports extracted from three popular web browsers offered by CVE. Again, the mean absolute error (MAE), Welch’s t-test and Cliff’s δ effect size are used to further evaluate the prediction performance and practical statistical significant difference between the Bellwethers and the growing portfolio. We found that there exist subsets of vulnerability records (Bellwethers) in the studied datasets that can yield improved accuracy for software vulnerability prediction. The result shows that recall and precision measures from the text mining process were in a range of 73.9%–85.3% and 67.9%–81.8% respectively across the three studied datasets. The findings further show that the use of the Bellwethers for predictive modelling is a promising research direction for assisting software engineers and practitioners when seeking to predict instances of vulnerability records that demand much attention prior to software release.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://nvd.nist.gov/, www.cvedetails.com.

References

Longley, D., Shain, M.: The Data and Computer Security Dictionary of Standard. Concepts, and Terms. Macmillan, London (1990)
Google Scholar
Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33, 544–557 (2007)
Article Google Scholar
Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)
Article Google Scholar
Stuckman, J., Walden, J., Scandariato, R.: The effect of dimensionality reduction on software vulnerability prediction models. IEEE Trans. Reliab. 66(1), 17–37 (2017)
Article Google Scholar
Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 25–59 (2013)
Article Google Scholar
Zhang, S., Caragea, D., Ou, X.: An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain, A., Liddle, Stephen W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 217–231. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23088-2_15
Chapter Google Scholar
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)
Google Scholar
Yang, J., Ryu, D., Baik, J.: Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In: International Conference on Big Data and Smart Computing, BigComp, pp. 115–122 (2016)
Google Scholar
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th IEEE International Conference in Machine Learning and Applications (ICMLA), pp. 543–548 (2015)
Google Scholar
Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of the Third IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 421–428. IEEE (2010)
Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 993–1006 (2014)
Article Google Scholar
Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pp. 11–18 (2008)
Google Scholar
Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Software change history. IEEE Trans. Softw. Eng. 26, 653–661 (2000)
Article Google Scholar
Jinkun, G., Ping, L.U.O.: A novel vulnerability prediction model to predict vulnerability loss based on probit regression. Wuhan Univ. J. Nat. Sci. 21, 214–220 (2016)
Article MathSciNet Google Scholar
Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans. Softw. Eng. 29, 297–310 (2003)
Article Google Scholar
Zimmermann, T., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, p. 9 (2007)
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceeding of the 28th International Conference on Software Engineering - ICSE 2006, p. 452 (2006)
Google Scholar
Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3, 14–19 (2005)
Article Google Scholar
Alhazmi, O.H., Malaiya, Y.K.: Prediction capabilities of vulnerability discovery models. In: Annual Reliability and Maintainability Symposium, RAMS 2006, pp. 86–91 (2006)
Google Scholar
Musa, J.D., Okumoto, K.: A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th International Conference on Software Engineering, pp. 230–238 (1984)
Google Scholar
Roumani, Y., Nwankpa, J.K., Roumani, Y.F.: Time series modeling of vulnerabilities. Comput. Secur. 51, 32–40 (2015)
Article Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Walden, J., Stuckman, J., Scandariato, R.: Predicting vulnerable components: software metrics vs text mining. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 23–33 (2014)
Google Scholar
Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies, pp. 6–10 (2017)
Google Scholar
Alves, H., Fonseca, B., Antunes, N.: Experimenting machine learning techniques to predict vulnerabilities. In: Proceedings of the 7th Latin-American Symposium on Dependable Computing, LADC 2016, pp. 151–156 (2016)
Google Scholar
Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131 (2016)
Google Scholar
Mensah, S., Keung, J., Macdonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of bellwether effect to improve software effort estimation. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 340–351 (2017)
Google Scholar
Mensah, S., Keung, J., MacDonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of the Bellwether effect to improve software effort prediction: further empirical study. IEEE Trans. Reliab. 67(3), 1176–1198 (2018)
Article Google Scholar
Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: searching for cost-effective query-defined predictors in large databases. ACM Trans. Knowl. Discov. Data (TKDD), 3, 5 (2009)
Article Google Scholar
Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceedings of the 32nd International Conference on Very Large Databases, pp. 655–666 (2006)
Google Scholar
Dobrovoljc, A., Trcek, D., Likar, B.: Predicting exploitations of information systems vulnerabilities through attackers characteristics. IEEE Access (2017)
Google Scholar
https://www.exploit-db.com
Wang, J.A., Wang, H., Guo, M., Xia, M.: Security metrics for software systems. In: Proceedings of the 47th ACM Annual Southeast Regional Conference, p. 47 (2009)
Google Scholar
Morrison, P.J., Pandita, R., Xiao, X., Chillarege, R., Williams, L.: Are vulnerabilities discovered and resolved like other defects? Empir. Softw. Eng. 23, 1383–1421 (2018)
Article Google Scholar
Munaiah, N., Camilo, F., Wigham, W., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project. Empir. Softw. Eng., 22, 1305–1347 (2017)
Article Google Scholar
Alhazmi, O.H., Woo, S.-W., Malaiya, Y.K.: Security vulnerability categories in major software systems. Commun. Netw. Inf. Secur., 138–143 (2006)
Google Scholar
Fruhwirth, T.M.C.: Improving CVSS-based vulnerability prioritization and response with context information. In: Proceedings of the 3rd IEEE International Symposium on Empirical Software Engineering and Measurement, (IEEE Computer Society, 2009), pp. 535–544 (2009)
Google Scholar
Morrison, P., Herzig, K. , Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, p. 4 (2015)
Google Scholar
Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis. ACM Trans. Knowl. Discov. Data 3(1), 1–49 (2009)
Article Google Scholar
Rahimi, S., Zargham, M.: Vulnerability scrying method for software vulnerability discovery prediction. IEEE Trans. Reliab. 62, 395–407 (2013)
Article Google Scholar
Younis, A.A., Malaiya, Y.K.: Using software structure to predict vulnerability exploitation potential. In: Proceedings of the Eighth IEEE International Conference on Software Security and Reliability-Companion (SERE-C), pp. 13–18 (2014)
Google Scholar
Li, X., et al.: A mining approach to obtain the software vulnerability characteristics. In: Proceedings of the Fifth IEEE International Conference on Advanced Cloud and Big Data, vol. 1, pp. 2–7 (2017)
Google Scholar
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, Table Contents, pp. 727–734 (2000)
Google Scholar
Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I.: Cluster-based vulnerability assessment applied to operating systems. In: Proceedings of the 13th European Dependable Computing Conference (2017)
Google Scholar
Sui, Y.: Association rule mining and evaluation based on information security vulnerabilities main body. In: Applied Mechanics and Materials, pp. 1282–1285 (20140
Google Scholar
Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, pp. 199–208 (2008)
Google Scholar
N-gram and Fast Pattern Extraction Algorithm. https://www.codeproject.com
Corbin, J., Strauss, A.: Grounded theory research: procedures, canons and evaluative criteria. Zeitschrift für Soziologie 19, 418–427 (1990)
Article Google Scholar
Bavota, G., Russo, B.: A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE 2016, pp. 315–326 (2016)
Google Scholar
Mensah, S., Keung, J., Svajlenko, J., Bennin, K.E., Mi, Q.: On the value of a prioritization scheme for resolving Self-admitted technical debt. J. Syst. Softw. 135, 37–54 (2018)
Article Google Scholar
Deep learning in neural networks: an overview: J. Schmidhuber. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Zhang, N., Shetty, D.: An effective LS-SVM-based approach for surface roughness prediction in machined surfaces. Neurocomputing 189, 35–39 (2016)
Google Scholar
Zhang, S., Ou, X., Caragea, D.: Predicting cyber risks through national vulnerability database. Inf. Secur. J. Glob. Perspect. 24, 194–206 (2015)
Article Google Scholar
Kitchenham, B., et al.: Robust statistical methods for empirical software engineering. Empir. Softw. Eng. 22, 579–630 (2017)
Article Google Scholar
Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11–12), 1073–1086 (2007)
Article Google Scholar
Romano, D., Raila, P., Pinzger, M., Khomh, F.: Analyzing the impact of antipatterns on change-proneness using fine-grained source code changes. In: Proceedings - Working Conference on Reverse Engineering, WCRE, pp. 437–446 (2012)
Google Scholar
Menzies, T., Yang, Y., Mathew, G., Boehm, B., Hihn, J.: Negative results for software effort estimation. Empir. Softw. Eng. 25(5), 2658–2683 (2017)
Article Google Scholar
Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017)
Google Scholar

Download references

Acknowledgments

This work is partly supported by National Natural Science Foundation of China (NSFC grant numbers: 61202110 and 61502205), the project of Jiangsu provincial Six Talent Peaks (Grant numbers: XYDXXJS-016), Natural Science Foundation of Jiangsu Province (Grant numbers: BK20170558), University Science Research Project of Jiangsu Province (Grant numbers: 16KJB520008), the Graduate Research Innovation Project of Jiangsu Province (Grant numbers: KYCX17_1807), and the Postdoctoral Science Foundation of China (Grant numbers: 2015M571687 and 2015M581739).

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 202000, China
Patrick Kwaku Kudjo, Jinfu Chen & Richard Amankwah
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
Solomon Mensah

Authors

Patrick Kwaku Kudjo
View author publications
You can also search for this author in PubMed Google Scholar
Jinfu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Solomon Mensah
View author publications
You can also search for this author in PubMed Google Scholar
Richard Amankwah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinfu Chen .

Editor information

Editors and Affiliations

School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Huanguo Zhang
School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Bo Zhao
School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Fei Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kudjo, P.K., Chen, J., Mensah, S., Amankwah, R. (2019). Predicting Vulnerable Software Components via Bellwethers. In: Zhang, H., Zhao, B., Yan, F. (eds) Trusted Computing and Information Security. CTCIS 2018. Communications in Computer and Information Science, vol 960. Springer, Singapore. https://doi.org/10.1007/978-981-13-5913-2_24

Download citation

DOI: https://doi.org/10.1007/978-981-13-5913-2_24
Published: 09 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5912-5
Online ISBN: 978-981-13-5913-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics