Skip to main content

Advertisement

Log in

Using machine learning to assist with the selection of security controls during security assessment

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

In many domains such as healthcare and banking, IT systems need to fulfill various requirements related to security. The elaboration of security requirements for a given system is in part guided by the controls envisaged by the applicable security standards and best practices. An important difficulty that analysts have to contend with during security requirements elaboration is sifting through a large number of security controls and determining which ones have a bearing on the security requirements for a given system. This challenge is often exacerbated by the scarce security expertise available in most organizations.

Objective

In this article, we develop automated decision support for the identification of security controls that are relevant to a specific system in a particular context.

Method and Results

Our approach, which is based on machine learning, leverages historical data from security assessments performed over past systems in order to recommend security controls for a new system. We operationalize and empirically evaluate our approach using real historical data from the banking domain. Our results show that, when one excludes security controls that are rare in the historical data, our approach has an average recall of ≈ 94% and average precision of ≈ 63%. We further examine through a survey the perceptions of security analysts about the usefulness of the classification models derived from historical data.

Conclusions

The high recall – indicating only a few relevant security controls are missed – combined with the reasonable level of precision – indicating that the effort required to confirm recommendations is not excessive – suggests that our approach is a useful aid to analysts for more efficiently identifying the relevant security controls, and also for decreasing the likelihood that important controls would be overlooked. Further, our survey results suggest that the generated classification models help provide a documented and explicit rationale for choosing the applicable security controls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Almeida L, Respício A (2018) Decision support for selecting information security controls. J Decis Syst 27(sup1):173–180

    Article  Google Scholar 

  • Batista G E A P A, Prati R C, Monard M C (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6:20–29

    Article  Google Scholar 

  • Bettaieb S, Shin SY, Sabetzadeh M, Briand LC, Nou G, Garceau M (2019) Decision support for security-control identification using machine learning. In: Proceedings of the 25th international working conference on requirements engineering: Foundation for software quality (REFSQ’19), pp 3–20

  • Bishop C M (2007) Pattern recognition and machine learning. Information Science and Statistics. Springer, Berlin

    Google Scholar 

  • Boutell M R, Luo J, Shen X, Brown C M (2004) Learning multi-label scene classification. Pattern Recogn 37:1757–1771

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen R (1984) Classification and Regression Trees. Wadsworth International Group

  • Caralli R A, Stevens J F, Young L R, Wilson W R (2007) Introducing OCTAVE allegro: Improving the information security risk assessment process. Tech. rep CMU/SEI-2007-TR-012, SEI, Carnegie Mellon University

  • Casamayor A, Godoy D, Campo M R (2010) Identification of non-functional requirements in textual specifications: A semi-supervised learning approach. Inf Softw Technol (IST’10) 52(4):436–445

    Article  Google Scholar 

  • CASES (2018) Method for an optimised analysis of risks by @CASES-LU. https://www.monarc.lu, accessed September 2018

  • Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res (JAIR’02) 16:321–357

    Article  Google Scholar 

  • CLUSIF (2018) Method for harmonized analysis of risk. https://clusif.fr/mehari, accessed September 2018

  • Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning (ICML’95), pp 115–123

  • Cyber Threat Institute (2019) Vector matrix - risk assessment methodology, security, impact. http://www.riskvector.com, accessed June 2019

  • Dalpiaz F, Paja E, Giorgini P (2016) Security requirements engineering: Designing secure socio-technical systems. MIT Press, Cambridge

    Google Scholar 

  • Dowd M, McDonald J, Schuh J (2006) The art of software security assessment: Identifying and preventing software vulnerabilities. Pearson Education, London

    Google Scholar 

  • Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI’01), pp 973–978

  • Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning (ICML’98), pp 144–151

  • Furnell S (2008) End-user security culture: A lesson that will never be learnt? Comput Fraud Secur 2008:6–9

    Google Scholar 

  • Grinstein G, Trutschl M, Cvek U (2001) High-dimensional visualizations. In: Proceedings of the visual data mining workshop (KDD’01), pp 120–134

  • Haley C B, Laney R C, Moffett J D, Nuseibeh B (2008) Security requirements engineering: A framework for representation and analysis. IEEE Trans Softw Eng (TSE’08) 34(1):133–153

    Article  Google Scholar 

  • Hall M A, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11:10–18

    Article  Google Scholar 

  • Ionita D, Wieringa RJ (2016) Web-based collaborative security requirements elicitation. In: Joint proceedings of REFSQ-2016 workshops, Doctoral symposium, Research method track, and poster track co-located with the 22nd international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’16), pp 3–6

  • ISACA (2018) Framework for it governance and control. http://www.isaca.org/Knowledge-Center/cobit/Pages/Overview.aspx, accessed June 2018

  • ISO (2018) ISO 31000 - risk management. ISO Standard, London

    Google Scholar 

  • ISO and IEC (2005) ISO/IEC 27002:2005 code of practice for information security controls. ISO Standard, London

    Google Scholar 

  • ISO and IEC (2018) ISO/IEC 27000:2018 information security management systems. ISO Standard, London

    Google Scholar 

  • John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th annual conference on uncertainty in artificial intelligence (UAI’95), pp 338–345

  • Jufri MT, Hendayun M, Suharto T (2017) Risk-assessment based academic information system security policy using OCTAVE Allegro and ISO 27002. In: Proceedings of the 2nd international conference on informatics and computing (ICIC’17), pp 1–6

  • Kiesling E, Ekelhart A, Grill B, Strauss C, Stummer C (2016) Selecting security control portfolios: A multi-objective simulation-optimization approach. EURO J Decision Process 4:85–117

    Article  Google Scholar 

  • Kitchenham B A, Pfleeger S L (2002) Principles of survey research: Part 3: Constructing a survey instrument. ACM SIGSOFT Software Engineering Notes 27 (2):20–24

    Article  Google Scholar 

  • Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 61–70

  • le Cessie S, van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201

    Article  Google Scholar 

  • Li T (2017) Identifying security requirements based on linguistic analysis and machine learning. In: Proceedings of the 24th Asia-Pacific software engineering conference (APSEC’17), pp 388–397

  • Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22 (140):5–55

    Google Scholar 

  • Meier J, Mackman A, Vasireddy S, Dunner M, Escamilla R, Murukan A (2003) Improving web application security: Threats and countermeasures, Tech. rep., Microsoft

  • Mitchell T M (1999) Machine learning and data mining. Commun ACM 42 (11):30–36

    Article  Google Scholar 

  • Myagmar S, Lee AJ, Yurcik W (2005) Threat modeling as a basis for security requirements. In: Proceedings of the IEEE symposium on requirements engineering for information security (SREIS’05), pp 1–8

  • NIST (2012) NIST special publication 800-30: Guide for conducting risk assessments. NIST Standard, Gaithersburg

    Google Scholar 

  • OSA (2018) Open security architecture. http://www.opensecurityarchitecture.org, accessed September 2018

  • Park S, Fürnkranz J (2007) Efficient pairwise classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 658–665

  • Quinlan J R (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, Massachusetts

    Google Scholar 

  • Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the 2009 joint European conference on machine learning and knowledge discovery in databases (ECML PKDD’09), pp 254–269

  • Rodeghero P, Jiang S, Armaly A, McMillan C (2017) Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th international conference on software engineering (ICSE’17), pp 49–59

  • Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York

    Google Scholar 

  • Schmitt C, Liggesmeyer P (2015) A model for structuring and reusing security requirements sources and security requirements. In: Joint proceedings of REFSQ-2015 workshops, doctoral symposium, research method track, and poster track co-located with the 21st international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’15), pp 34–43

  • Sihwi SW, Andriyanto F, Anggrainingsih R (2016) An expert system for risk assessment of information system security based on ISO 27002. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications (ICKEA’16), pp 56–61

  • Sindre G, Opdahl A L (2005) Eliciting security requirements with misuse cases. Requir Eng 10:34–44

    Article  Google Scholar 

  • Tsoumakas G, Vlahavas IP (2007) Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 406–417

  • Türpe S (2017) The trouble with security requirements. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 122–133

  • Wilson D L (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    Article  MathSciNet  Google Scholar 

  • Yevseyeva I, Basto-Fernandes V, Emmerich M, van Moorsel A (2015) Selecting optimal subset of security controls. Procedia Comput Sci 64:1035–1042

    Article  Google Scholar 

  • Yevseyeva I, Basto-Fernandes V, van Moorsel A, Janicke H, Emmerich M (2016) Two-stage security controls selection. Procedia Comput Sci 100:971–978

    Article  Google Scholar 

  • Yu Y, Franqueira V N, Tun T T, Wieringa R J, Nuseibeh B (2015) Automated analysis of security requirements through risk-based argumentation. J Syst Softw (JSS’15) 106:102–116

    Article  Google Scholar 

  • Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng (TKDE’14) 26(8):1819–1837

    Article  Google Scholar 

Download references

Acknowledgements

Financial support for this work was provided by the Alphonse Weicker Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seung Yeob Shin.

Additional information

Communicated by: Eric Knauss, Michael Goedicke and Paul Grünbacher

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Requirements Engineering for Software Quality (REFSQ)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bettaieb, S., Shin, S.Y., Sabetzadeh, M. et al. Using machine learning to assist with the selection of security controls during security assessment. Empir Software Eng 25, 2550–2582 (2020). https://doi.org/10.1007/s10664-020-09814-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-020-09814-x

Keywords

Navigation