Çorba: crowdsourcing to obtain requirements from regulations and breaches

Guo, Hui; Kafalı, Özgür; Jeukeng, Anne-Liz; Williams, Laurie; Singh, Munindar P.

doi:10.1007/s10664-019-09753-2

Çorba: crowdsourcing to obtain requirements from regulations and breaches

Published: 15 August 2019

Volume 25, pages 532–561, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Hui Guo ORCID: orcid.org/0000-0003-4887-6354¹,
Özgür Kafalı²,
Anne-Liz Jeukeng³,
Laurie Williams¹ &
…
Munindar P. Singh¹

548 Accesses
5 Citations
Explore all metrics

Abstract

Context

Modern software systems are deployed in sociotechnical settings, combining social entities (humans and organizations) with technical entities (software and devices). In such settings, on top of technical controls that implement security features of software, regulations specify how users should behave in security-critical situations. No matter how carefully the software is designed and how well regulations are enforced, such systems are subject to breaches due to social (user misuse) and technical (vulnerabilities in software) factors. Breach reports, often legally mandated, describe what went wrong during a breach and how the breach was remedied. However, breach reports are not formally investigated in current practice, leading to valuable lessons being lost regarding past failures.

Objective

Our research aim is to aid security analysts and software developers in obtaining a set of legal, security, and privacy requirements, by developing a crowdsourcing methodology to extract knowledge from regulations and breach reports.

Method

We present Çorba, a methodology that leverages human intelligence via crowdsourcing, and extracts requirements from textual artifacts in the form of regulatory norms. We evaluate Çorba on the US healthcare regulations from the Health Insurance Portability and Accountability Act (HIPAA) and breach reports published by the US Department of Health and Human Services (HHS). Following this methodology, we have conducted a pilot and a final study on the Amazon Mechanical Turk crowdsourcing platform.

Results

Çorba yields high quality responses from crowd workers, which we analyze to identify requirements for the purpose of complementing HIPAA regulations. We publish a curated dataset of the worker responses and identified requirements.

Conclusions

The results show that the instructions and question formats presented to the crowd workers significantly affect the response quality regarding the identification of requirements. We have observed significant improvement from the pilot to the final study by revising the instructions and question formats. Other factors, such as worker types, breach types, or length of reports, do not have notable effect on the workers’ performance. Moreover, we discuss other potential improvements such as breach report restructuring and text highlighting with automated methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy and artificial intelligence: challenges for protecting health information in a new era

Article Open access 15 September 2021

Ethical Implications and Accountability of Algorithms

Article Open access 07 June 2018

The Ethics of Privacy in Research and Design: Principles, Practices, and Potential

Notes

Çorba is Turkish for “Soup”: It acts as a memorable name for our methodology (close to an acronym), and reflects the mixture of multiple artifacts contained in our study.
https://www.mturk.com/mturk/help

References

Allen IE, Seaman CA (2007) Likert scales and data analyses. Qual Prog 40 (7):64
Google Scholar
Arora C, Sabetzadeh M, Briand L, Zimmer F (2015) Automated checking of conformance to requirements templates using natural language processing. IEEE Trans Softw Eng 41(10):944–968
Article Google Scholar
Barth A, Datta A, Mitchell JC, Nissenbaum H (2006) Privacy and contextual integrity: framework and applications. In: Proceedings of the IEEE symposium on security and privacy (SP). IEEE Computer Society, Washington, DC, pp 184–198
Bhatia J, Breaux TD, Schaub F (2016) Mining privacy goals from privacy policies using hybridized task recomposition. ACM Transa Softw Eng Methodol (TOSEM) 25 (3):1–24
Article Google Scholar
Breaux TD, Antón AI (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20
Article Google Scholar
Breaux TD, Schaub F (2014) Scaling requirements extraction to the crowd: experiments with privacy policies. In: Proceedings of the 22nd international requirements engineering conference (RE), pp 163–172
Dalpiaz F, Paja E, Giorgini P (2016) Security requirements engineering: designing secure socio-technical systems. The MIT Press
Dam HK, Savarimuthu BTR, Avery D, Ghose A (2015) Mining software repositories for social norms. In: Proceedings of the 37th international conference on software engineering (ICSE). IEEE Press, pp 627–630
DataLossDB (2015) 2015 reported data breaches surpasses all previous years. https://blog.datalossdb.org/2016/02/11/2015-reported-data-breaches-surpasses-all-previous-years/
Dean D, Gaurino S, Eusebi L, Keplinger A, Pavlik T, Watro R, Cammarata A, Murray J, McLaughlin K, Cheng J et al (2015) Lessons learned in game development for crowdsourced software formal verification. In: Proceedings of USENIX summit on gaming, games, and gamification in security education (3GSE 15). USENIX Association, Washington, D.C
Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the SIGCHI conference on human factors in computing systems CHI ’10. ACM, New York, pp 2399–2402
Dwarakanath A, Shrikanth NC, Abhinav K, Kass A (2016) Trustworthiness in enterprise crowdsourcing: a taxonomy & evidence from data. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 41–50
Gao X, Singh MP (2014) Extracting normative relationships from business contracts. In: Proceedings of the 13th international conference on autonomous agents and multiagent systems (AAMAS). IFAAMAS, Paris, pp 101–108
Getman AP, Karasiuk VV (2014) A crowdsourcing approach to building a legal ontology from text. Artif Intell Law 22(3):313–335
Article Google Scholar
Ghanavati S, Rifaut A, Dubois E, Amyot D (2014) Goal-oriented compliance with multiple regulations. In: Proceedings of IEEE 22nd international requirements engineering conference (RE), pp 73–82
Gürses S, Rizk R, Günther O (2008) Privacy design in online social networks: learning from privacy breaches and community feedback. In: Proceedings of international conference on information systems (ICIS), p 90
Hao J, Kang E, Sun J, Jackson D (2016) Designing minimal effective normative systems with the help of lightweight formal methods. In: Proceedings of the 24th ACM SIGSOFT international symposium on the foundations of software engineering (FSE), pp 50–60
Hashmi M (2015) A methodology for extracting legal norms from regulatory documents. In: Proceedings of IEEE 19th international enterprise distributed object computing workshop, pp 41–50
HHS (2003) Summary of the HIPAA privacy rule. United States Department of Health and Human Services (HHS). http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/
HHS Breach Portal (2016) Notice to the Secretary of HHS breach of unsecured protected health information affecting 500 or more individuals. United States Department of Health and Human Services (HHS). https://ocrportal.hhs.gov/ocr/breach/
Kafalı Ö, Ajmeri N, Singh MP (2016a) Revani: revising and verifying normative specifications for privacy. IEEE Intell Syst 31(5):8–15
Article Google Scholar
Kafalı Ö, Singh MP, Williams L (2016b) Nane: identifying misuse cases using temporal norm enactments. In: Proceedings of the 24th IEEE international requirements engineering conference (RE). IEEE Computer Society, Beijing, pp 136–145
Kafalı Ö, Jones J, Petruso M, Williams L, Singh MP (2017) How good is a security policy against real breaches? a HIPAA case study. In: Proceedings of the 39th international conference on software engineering (ICSE). IEEE Computer Society, Buenos Aires, pp 530–540
Kashyap A, Han L, Yus R, Sleeman J, Satyapanich T, Gandhi S, Finin T (2016) Robust semantic text similarity using LSA, machine learning, and linguistic resources. Lang Resour Eval 50(1):125–161
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59 (1–2):161–205
Article Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st International conference on international conference on machine learning - vol 32, ICML’14, pp 1188–1196
Liu Y, Sarabi A, Zhang J, Naghizadeh P, Karir M, Bailey M, Liu M (2015) Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX conference on security symposium, pp 1009–1024
MacLean DL, Heer J (2013) Identifying medical terms in patient-authored text: a crowdsourcing-based approach. J Am Med Inform Assoc 20(6):1120–1127
Article Google Scholar
Massey AK, Rutledge RL, Antón AI, Swire PP (2014) Identifying and classifying ambiguity for regulatory requirements. In: 2014 IEEE 22nd international requirements engineering conference (RE), pp 83– 92
Matulevic̆ius R, Mayer N, Heymans P (2008) Alignment of misuse cases with security risk management. In: Proceedings of the 3rd international conference on availability, reliability and security (ARES), pp 1397–1404
Maxwell JC, Anton AI (2009) Developing production rule models to aid in acquiring requirements from legal texts. In: 2009 17th IEEE International requirements engineering conference, pp 101–110
Murukannaiah PK, Ajmeri N, Singh MP (2016) Acquiring creative requirements from the crowd: understanding the influences of individual personality and creative potential in crowd RE. In: Proceedings of the 24th IEEE international requirements engineering conference (RE). IEEE Computer Society, Beijing, pp 176–185
Murukannaiah PK, Dabral C, Sheshadri K, Sharma E, Staddon J (2017) Learning a privacy incidents database. In: Proceedings of the hot topics in science of security: symposium and bootcamp, HoTSoS. ACM, New York, pp 35–44
Patwardhan M, Sainani A, Sharma R, Karande S, Ghaisas S (2018) Towards automating disambiguation of regulations: using the wisdom of crowds. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 850–855
Reidenberg JR, Breaux T, Carnor LF, French B (2015) Disagreeable privacy policies: mismatches between meaning and users’ understanding. Berkeley Technol Law J 30(1):39
Google Scholar
Riaz M, King J, Slankas J, Williams L (2014) Hidden in plain sight: automatically identifying security requirements from natural language artifacts. In: Proceedings of the 22nd IEEE international requirements engineering conference (RE), pp 183–192
Riaz M, Stallings J, Singh MP, Slankas J, Williams L (2016) DIGS: a framework for discovering goals for security requirements engineering. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). ACM, pp 35:1–35:10
Savarimuthu BTR, Dam HK (2014) Towards mining norms in open source software repositories. In: Agents and data mining interaction, lecture notes in computer science. https://doi.org/10.1007/978-3-642-55192-5_3, vol 8316. Springer, Berlin, pp 26–39
Siena A, Jureta I, Ingolfo S, Susi A, Perini A, Mylopoulos J (2012) Capturing variability of law with Nomoś 2. In: Atzeni P, Cheung D, Ram S (eds) Conceptual modeling. Springer, Berlin, pp 383–396
Sindre G, Opdahl AL (2005) Eliciting security requirements with misuse cases. Requir Eng 10(1):34–44
Article Google Scholar
Singh MP (2013) Norms as a basis for governing sociotechnical systems. ACM Trans Intell Syst Technol (TIST) 5(1):21,1–21,23
Google Scholar
Slankas J, Williams L (2013) Access control policy extraction from unconstrained natural language text. In: Proceedings of the international conference on social computing (SocialCom), pp 435–440
Sleimi A, Sannier N, Sabetzadeh M, Briand L, Dann J (2018) Automated extraction of semantic legal metadata using natural language processing. In: Proceedings of IEEE international requirements engineering conference (RE), pp 124–135
Staddon J (2016) Privacy incidents database: the data mining challenges and opportunities. Cyber Security Practitioner
Sumner M, Frank E, Hall M (2005) Speeding up logistic model tree induction. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases. Springer, Berlin, pp 675–683
Verizon (2016) Data breach investigations reports. http://www.verizonenterprise.com/verizon-insights-lab/dbir/
Von Wright GH (1999) Deontic logic: a personal view. Ratio Juris 12(1):26–38
Article Google Scholar
Wilson S, Schaub F, Ramanath R, Sadeh N, Liu F, Smith NA, Liu F (2016) Crowdsourcing annotations for websites’ privacy policies: can it really work?. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 133–143
Zeni N, Kiyavitskaya N, Mich L, Cordy JR, Mylopoulos J (2015) GaiusT: supporting the extraction of rights and obligations for regulatory compliance. Requir Eng 20(1):1–22
Article Google Scholar
Zeni N, Mich L, Mylopoulos J (2017) Annotating legal documents with GaiusT 2.0. Int J Metadata Semant Ontol 12:47
Article Google Scholar
Zeni N, Seid EA, Engiel P, Mylopoulos J (2018) NómosT: building large models of law with a tool-supported process. Data Knowl Eng 117:407–418
Article Google Scholar

Download references

Acknowledgements

This research is supported by the US Department of Defense under the Science of Security Lablet (SoSL) grant to NC State University and by the National Science Foundation under the Research Experiences for Undergraduates (REU) program.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Hui Guo, Laurie Williams & Munindar P. Singh
School of Computing, University of Kent, Canterbury, UK
Özgür Kafalı
Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL, USA
Anne-Liz Jeukeng

Authors

Hui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Özgür Kafalı
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Liz Jeukeng
View author publications
You can also search for this author in PubMed Google Scholar
Laurie Williams
View author publications
You can also search for this author in PubMed Google Scholar
Munindar P. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Guo.

Additional information

Communicated by: Daniel Amyot

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A.1 Breach Modifications

The original breach report for Breach #11 from Table 3 is shown below.

The modified breach report v2 for Breach #11 from Table 3 is shown below.

1.2 A.2 Survey Tutorial

Figure 7 shows what the workers see in the study tutorial as a sample “correct” answer for Task Malice (Task 1). Figure 8 shows what the workers see in the study tutorial as sample “correct” answers for some of the questions under Task Breach (Task 2).

1.3 A.3 Worker Responses

Figure 9 shows answers from a sample worker for Task Regulation (Task 4).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, H., Kafalı, Ö., Jeukeng, AL. et al. Çorba: crowdsourcing to obtain requirements from regulations and breaches. Empir Software Eng 25, 532–561 (2020). https://doi.org/10.1007/s10664-019-09753-2

Download citation

Published: 15 August 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10664-019-09753-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Çorba: crowdsourcing to obtain requirements from regulations and breaches