Abstract
Software vulnerabilities are weakness, flaws or errors introduced during the life cycle of a software system. Although, previous studies have demonstrated the practical significance of using software metrics to predict vulnerable software components, empirical evidence shows that these metrics are plagued with issues pertaining to their effectiveness and robustness. This paper investigates the feasibility of using Bellwethers (i.e., exemplary data) for predicting and classifying software vulnerabilities. We introduced a Bellwether method using the following operators, PARTITION, SAMPLE + TRAIN and APPLY. The Bellwethers sampled by the three operators are used to train a learner (i.e., deep neural networks) with the aim of predicting essential or non-essential vulnerabilities. We evaluate the proposed Bellwether method using vulnerability reports extracted from three popular web browsers offered by CVE. Again, the mean absolute error (MAE), Welch’s t-test and Cliff’s δ effect size are used to further evaluate the prediction performance and practical statistical significant difference between the Bellwethers and the growing portfolio. We found that there exist subsets of vulnerability records (Bellwethers) in the studied datasets that can yield improved accuracy for software vulnerability prediction. The result shows that recall and precision measures from the text mining process were in a range of 73.9%–85.3% and 67.9%–81.8% respectively across the three studied datasets. The findings further show that the use of the Bellwethers for predictive modelling is a promising research direction for assisting software engineers and practitioners when seeking to predict instances of vulnerability records that demand much attention prior to software release.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Longley, D., Shain, M.: The Data and Computer Security Dictionary of Standard. Concepts, and Terms. Macmillan, London (1990)
Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33, 544–557 (2007)
Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Bener, A.B.: Mining trends and patterns of software vulnerabilities. J. Syst. Softw. 117, 218–228 (2016)
Stuckman, J., Walden, J., Scandariato, R.: The effect of dimensionality reduction on software vulnerability prediction models. IEEE Trans. Reliab. 66(1), 17–37 (2017)
Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18, 25–59 (2013)
Zhang, S., Caragea, D., Ou, X.: An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain, A., Liddle, Stephen W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 217–231. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23088-2_15
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)
Yang, J., Ryu, D., Baik, J.: Improving vulnerability prediction accuracy with Secure Coding Standard violation measures. In: International Conference on Big Data and Smart Computing, BigComp, pp. 115–122 (2016)
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th IEEE International Conference in Machine Learning and Applications (ICMLA), pp. 543–548 (2015)
Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of the Third IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 421–428. IEEE (2010)
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 993–1006 (2014)
Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, pp. 11–18 (2008)
Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Software change history. IEEE Trans. Softw. Eng. 26, 653–661 (2000)
Jinkun, G., Ping, L.U.O.: A novel vulnerability prediction model to predict vulnerability loss based on probit regression. Wuhan Univ. J. Nat. Sci. 21, 214–220 (2016)
Subramanyam, R., Krishnan, M.S.: Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans. Softw. Eng. 29, 297–310 (2003)
Zimmermann, T., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, p. 9 (2007)
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceeding of the 28th International Conference on Software Engineering - ICSE 2006, p. 452 (2006)
Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3, 14–19 (2005)
Alhazmi, O.H., Malaiya, Y.K.: Prediction capabilities of vulnerability discovery models. In: Annual Reliability and Maintainability Symposium, RAMS 2006, pp. 86–91 (2006)
Musa, J.D., Okumoto, K.: A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th International Conference on Software Engineering, pp. 230–238 (1984)
Roumani, Y., Nwankpa, J.K., Roumani, Y.F.: Time series modeling of vulnerabilities. Comput. Secur. 51, 32–40 (2015)
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Walden, J., Stuckman, J., Scandariato, R.: Predicting vulnerable components: software metrics vs text mining. In: Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 23–33 (2014)
Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies, pp. 6–10 (2017)
Alves, H., Fonseca, B., Antunes, N.: Experimenting machine learning techniques to predict vulnerabilities. In: Proceedings of the 7th Latin-American Symposium on Dependable Computing, LADC 2016, pp. 151–156 (2016)
Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131 (2016)
Mensah, S., Keung, J., Macdonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of bellwether effect to improve software effort estimation. In: IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 340–351 (2017)
Mensah, S., Keung, J., MacDonell, S.G., Bosu, M.F., Bennin, K.E.: Investigating the significance of the Bellwether effect to improve software effort prediction: further empirical study. IEEE Trans. Reliab. 67(3), 1176–1198 (2018)
Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: searching for cost-effective query-defined predictors in large databases. ACM Trans. Knowl. Discov. Data (TKDD), 3, 5 (2009)
Chen, B., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceedings of the 32nd International Conference on Very Large Databases, pp. 655–666 (2006)
Dobrovoljc, A., Trcek, D., Likar, B.: Predicting exploitations of information systems vulnerabilities through attackers characteristics. IEEE Access (2017)
Wang, J.A., Wang, H., Guo, M., Xia, M.: Security metrics for software systems. In: Proceedings of the 47th ACM Annual Southeast Regional Conference, p. 47 (2009)
Morrison, P.J., Pandita, R., Xiao, X., Chillarege, R., Williams, L.: Are vulnerabilities discovered and resolved like other defects? Empir. Softw. Eng. 23, 1383–1421 (2018)
Munaiah, N., Camilo, F., Wigham, W., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? An in-depth study of the chromium project. Empir. Softw. Eng., 22, 1305–1347 (2017)
Alhazmi, O.H., Woo, S.-W., Malaiya, Y.K.: Security vulnerability categories in major software systems. Commun. Netw. Inf. Secur., 138–143 (2006)
Fruhwirth, T.M.C.: Improving CVSS-based vulnerability prioritization and response with context information. In: Proceedings of the 3rd IEEE International Symposium on Empirical Software Engineering and Measurement, (IEEE Computer Society, 2009), pp. 535–544 (2009)
Morrison, P., Herzig, K. , Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, p. 4 (2015)
Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis. ACM Trans. Knowl. Discov. Data 3(1), 1–49 (2009)
Rahimi, S., Zargham, M.: Vulnerability scrying method for software vulnerability discovery prediction. IEEE Trans. Reliab. 62, 395–407 (2013)
Younis, A.A., Malaiya, Y.K.: Using software structure to predict vulnerability exploitation potential. In: Proceedings of the Eighth IEEE International Conference on Software Security and Reliability-Companion (SERE-C), pp. 13–18 (2014)
Li, X., et al.: A mining approach to obtain the software vulnerability characteristics. In: Proceedings of the Fifth IEEE International Conference on Advanced Cloud and Big Data, vol. 1, pp. 2–7 (2017)
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, Table Contents, pp. 727–734 (2000)
Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I.: Cluster-based vulnerability assessment applied to operating systems. In: Proceedings of the 13th European Dependable Computing Conference (2017)
Sui, Y.: Association rule mining and evaluation based on information security vulnerabilities main body. In: Applied Mechanics and Materials, pp. 1282–1285 (20140
Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, pp. 199–208 (2008)
N-gram and Fast Pattern Extraction Algorithm. https://www.codeproject.com
Corbin, J., Strauss, A.: Grounded theory research: procedures, canons and evaluative criteria. Zeitschrift für Soziologie 19, 418–427 (1990)
Bavota, G., Russo, B.: A large-scale empirical study on self-admitted technical debt. In: Proceedings of the 13th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE 2016, pp. 315–326 (2016)
Mensah, S., Keung, J., Svajlenko, J., Bennin, K.E., Mi, Q.: On the value of a prioritization scheme for resolving Self-admitted technical debt. J. Syst. Softw. 135, 37–54 (2018)
Deep learning in neural networks: an overview: J. Schmidhuber. Neural Netw. 61, 85–117 (2015)
Zhang, N., Shetty, D.: An effective LS-SVM-based approach for surface roughness prediction in machined surfaces. Neurocomputing 189, 35–39 (2016)
Zhang, S., Ou, X., Caragea, D.: Predicting cyber risks through national vulnerability database. Inf. Secur. J. Glob. Perspect. 24, 194–206 (2015)
Kitchenham, B., et al.: Robust statistical methods for empirical software engineering. Empir. Softw. Eng. 22, 579–630 (2017)
Kampenes, V.B., Dybå, T., Hannay, J.E., Sjøberg, D.I.K.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11–12), 1073–1086 (2007)
Romano, D., Raila, P., Pinzger, M., Khomh, F.: Analyzing the impact of antipatterns on change-proneness using fine-grained source code changes. In: Proceedings - Working Conference on Reverse Engineering, WCRE, pp. 437–446 (2012)
Menzies, T., Yang, Y., Mathew, G., Boehm, B., Hihn, J.: Negative results for software effort estimation. Empir. Softw. Eng. 25(5), 2658–2683 (2017)
Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017)
Acknowledgments
This work is partly supported by National Natural Science Foundation of China (NSFC grant numbers: 61202110 and 61502205), the project of Jiangsu provincial Six Talent Peaks (Grant numbers: XYDXXJS-016), Natural Science Foundation of Jiangsu Province (Grant numbers: BK20170558), University Science Research Project of Jiangsu Province (Grant numbers: 16KJB520008), the Graduate Research Innovation Project of Jiangsu Province (Grant numbers: KYCX17_1807), and the Postdoctoral Science Foundation of China (Grant numbers: 2015M571687 and 2015M581739).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kudjo, P.K., Chen, J., Mensah, S., Amankwah, R. (2019). Predicting Vulnerable Software Components via Bellwethers. In: Zhang, H., Zhao, B., Yan, F. (eds) Trusted Computing and Information Security. CTCIS 2018. Communications in Computer and Information Science, vol 960. Springer, Singapore. https://doi.org/10.1007/978-981-13-5913-2_24
Download citation
DOI: https://doi.org/10.1007/978-981-13-5913-2_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5912-5
Online ISBN: 978-981-13-5913-2
eBook Packages: Computer ScienceComputer Science (R0)