1 Introduction

Research on Computer Supported Cooperative Work (CSCW) has driven the field of crisis informatics, which has been described as a multidisciplinary field “concerned with the ways in which information systems are entangled with socio-behavioral phenomena connected to disasters” (Soden and Palen 2018, p. 2). To respond to crises, gathering and analysing social media data for emergency services has been studied. Especially its use for emergency operators in collaboration with informal response communities (Purohit et al. 2014), the mitigation of information overload (Kaufhold et al. 2020), and social media users’ expectations towards crisis communication (Petersen et al. 2017; Reuter et al. 2017) has been explored. Similar to existing emergency services for natural disasters, Computer Emergency Response Teams (CERTs), which are also known as Computer Security Incident Response Teams (CSIRTs) serve as a central point of contact, advice, and coordination for government institutions and private actors in the event of cybersecurity incidents and threats (Kossakowski, 2001; Riebe et al. 2021a).

CERTs do not only respond to incidents, which are reported to them, they also monitor various media sources for new vulnerabilities and other threats, verify different pieces of information, analyse threats, communicate with other CERTs, and are expected to support “stakeholder[s] with specific recommendations, to provide (daily) reports for selected stakeholders (e.g., a daily vulnerability report for ministries), or to issue a general warning for multiple stakeholders (in case larger- scaled ICT infrastructures are threatened)” (Riebe et al. 2021a, p. 11). The main challenge CERTs face when executing their tasks, lies in ensuring adequate cyber situational awareness when evaluating information from numerous public and closed sources (Franke and Brynielsson, 2014; Riebe et al. 2021a). Relevant public sources such as social media, blogs, websites, and feeds can be included in this process, as part of an open-source intelligence (OSINT) approach (Glassman and Kang, 2012). Considering the risk of information overload when evaluating public sources, especially in the case of serious security incidents with many potential civilian casualties, the use of technical systems utilising machine learning (ML) algorithms for information filtering and analysis has become common (Kaufhold et al. 2020). In such decision support systems, artificial intelligence (AI) agents are becoming increasingly relevant as assistants for decision-making (Chouldechova et al. 2018). CERT members have stated that they are in need of (semi-)automated assistance for data gathering, (pre-)processing, analysis, and communication of cyber threats based on ML (Riebe et al. 2021a; Van der Kleij et al. 2017). Thus, there is the increasing use of OSINT systems within CERTs (Kassim et al. 2022). As OSINT mostly relies on private data from users of online media to be an effective tool for cybersecurity operators, the acceptance of such a system is decisive. Value conflicts may arise as a consequence of different groups in society being directly or indirectly affected in different ways, depending on the application of OSINT technologies. Therefore, it is imperative that not only OSINT systems be further researched and investigated, but also arising value conflicts. Research that focuses primarily on the values and value conflicts relevant to the development of OSINT systems for cybersecurity incident response has not yet been conducted extensively. This paper is guided by the collaborative Value Sensitive Design (VSD) method (Friedman, 1996) and will contribute to answering the following research question: Which values and value conflicts emerge due to the application and development of ML-based open-source intelligence technologies in the context of cybersecurity incident response?

Our study is part of the CYWARN research project developing OSINT artefacts for CERTs (Kaufhold et al. 2021), and contributes to the CSCW-discourse with (1) a systematic literature review about technical research on OSINT technologies for the application in the domain of cybersecurity, (2) an empirically grounded elaboration of relevant stakeholder values and value conflicts in connection to the application and development of OSINT technologies for cybersecurity incident response, and (3) an outline of implications for the research on and the design of ML-based OSINT technologies for collaborative cybersecurity incident response.

This paper is structured as follows: In Section 2, related work on OSINT and VSD is presented and the research gap is outlined. Afterwards, Section 3 introduces the research design. We employ a triangulation of methods that combines an empirical case study consisting of a focus group (N = 7) and semi-structured expert interviews (N = 9) with a systematic literature review of technical research on OSINT technologies in the context of cybersecurity (N = 73). The results of both the literature review and the empirical case study are presented in Section 4. In Section 5, insights obtained are synthesised by elaborating research and design implications and it is discussed how value sensitivity can facilitate collaboration. Finally, the limitations of the study are indicated, possible starting points for further research are outlined, and in Section 6 a brief summary of the work is provided.

2 Background and Related Work

As the design and application of novel information and communication technology (ICT) artefacts interferes with existing social practices, it is necessary to engage with the practices and problems of professionals, institutional arrangements, and technical infrastructures of the respective application environment (Wulf et al. 2011). Approaches for participatory design, which aim to address this issue, have been part of the CSCW discourse (Randall et al. 2007), as they follow the objective of facilitating cooperation (Kensing and Blomberg, 1998). Extensive research has focused on the design for collaboration in crisis response to better understand the collaborative practices and, thus, design systems which support response teams (Büscher et al. 2016; Cobb et al. 2014; Liegl et al. 2016; Reuter et al. 2014). Here, collaboration can be described as the development of a set of common practices which could be adopted by newcomers without previous participation and explanation (Heath and Luff, 1992). With regard to CERTs, this includes monitoring of and responding to cyber threats and incidents, as well as evaluating and sharing relevant information with outside parties. OSINT systems can help collaborating distributed teams to gain a shared situational awareness due to their support of context awareness, thus facilitating the establishment of a meta-perspective (Jones et al. 2021).

This section will first provide an overview on how OSINT systems as AI agents assist CERTs (Section 2.1), second introduce VSD as our participatory design approach and situate the paper in context of previous research (Section 2.2) and third, outline the research gap (Section 2.3).

2.1 OSINT Systems as AI-based Decision Support in Cybersecurity Incident Response

Central to OSINT is the idea that various pieces of publicly available information can be combined in unforeseen ways to gain innovative insights about the subject of interest (Glassman and Kang, 2012). OSINT can accordingly be defined as an activity that “involves the collection, analysis, and use of data from open sources for intelligence purposes” (Koops et al. 2013, p. 677). Approaches for cybersecurity incident response predominantly use social media as their main source (Riebe et al. 2021b), thereby taking advantage of crowdsourcing. Crowdsourcing for emergency response, however, depends on the quality and the trustworthiness of the information (Tapia and Moore, 2014).

ML algorithms are increasingly used for the automation of data gathering, pre-processing, and analysis (Williams and Blum, 2018). With the adoption of ML, challenges of explainability arise, as non-expert users are often unable to comprehend how an algorithm produces a certain output (Burrell, 2016). This is problematic as explainability is crucial to establishing users’ trust in a system (Dzindolet et al. 2003). Therefore, recent research focuses on possibilities of explainable artificial intelligence (XAI) (Longo et al. 2020; Wang et al. 2019).

As part of decision support systems, AI has gained importance in assisting teams with particular types of expertise (Bansal et al. 2019). In their study on Human-AI interaction, Zhang et al. (2022, p. 1) study “how people trust and rely on an AI assistant that performs with different levels of expertise relative to the person, ranging from completely overlapping expertise to perfectly complementary expertise”. In their experiments, they found that the “ideal partnership between humans and AI has been based on the premise of their complementary expertise” (Zhang et al. 2022, p. 20). In addition, they found that trust in AI was lowest when there was a complete overlap in the expertise of AI and human operators. Thus, trust in an AI agent is associated with the perceived usefulness of the AI and its complementary expertise. For the trust of the human operator, the style of communication of the AI agents has also been shown to be relevant (Zhang et al. 2022). As shown in an study by Feng and Boyd-Graber (2019) using human-computer teams to perform play a trivia knowledge game, the skill level of human operators is crucial for the interpretation of the expertise of AI. This is supported by Schaffer et al. (2019), who found in their study (N = 529), that an AI agent was only effective at lower levels of self-assessed knowledge, whereas self-confident users often rejected the agent’s suggestions. In summary, for an effective Human-AI-Teaming in decision-making processes, the expertise of the users, the capabilities of the AI systems, e.g. managing large amounts of data in real-time and identifying similarities, as well as the communication style of the AI agents towards the users are relevant.

For cybersecurity incident response, OSINT technologies leveraging ML are primarily used in three areas. First, they are used for investigative purposes, e.g. to support digital forensics (Quick and Choo, 2018), or cyberattack attribution (Layton, 2016). Second, they are utilised for gathering cyber threat intelligence (CTI), which can be understood as “threat-related information which allows cyber security experts to investigate on a certain threat, e.g. the name of a malware, adversary or vulnerability” (Tundis et al. 2020, p. 454). Third, they are also used for risk assessment and mitigation purposes, e.g. to assess the attack surface of organisations (Hayes and Cappa, 2018), or to expose social engineering attack opportunities (Edwards et al. 2017).

In a study comprising an online survey and semi-structured interviews with staff of 13 national CERTs from Asia, Europe, the Caribbean, and North America, Kassim et al. (2022) found that the use of OSINT tools in cybersecurity incident response is on the rise. In accordance to Riebe et al. (2021a), they found that CERTs lack the resources to manage the increasing amount of public available data, which requires further verification and risk analysis. In their study on the collaborative practices of German CERTs, Riebe et al. (2021a) found that the (semi-)automation of threat detection and analysis, as well as reporting interfaces were found to be useful improvements.

2.2 VSD Research on OSINT

VSD, as a theoretically grounded method, is particularly well suited to anticipate value conflicts that arise through technology use, and proactively addresses them during design (Friedman et al. 2013). As a central theoretical assumption, VSD takes an interactional position on the relationship between technology design and social context; design features support or undermine certain values, but ultimately only their interplay with users and the context of use determines how a technology influences society (Davis and Nathan, 2015). A value can be defined as “what a person or group of people consider important in life” (Friedman et al. 2013, p. 57). VSD strives to consider direct and indirect stakeholders and their values during design (Friedman et al. 2013). As often differing values are considered important, value conflicts may arise. A value conflict exists, if competing values suggest incompatible choices as the best for the design of technical artefacts and no single value trumps all others (van de Poel and Royakkers, 2011).

In order to ensure that values are taken into account, VSD proposes a methodology that is composed of three interdependent and iteratively applied types of investigation (Friedman et al. 2013). In conceptual investigations, stakeholder groups affected by the envisaged technical artefacts are identified, and values expected to be important to them are elaborated as well as conceptualised (Friedman et al. 2013). In empirical investigations, social science methods are used to revise these findings with a focus on the opinions of stakeholders, as well as anticipated usage contexts (Manders-Huits, 2011). During both types of investigations, potential value conflicts may be identified (Friedman et al. 2013). Finally, in technical investigations, design choices that support identified and prioritised values are derived (Manders-Huits, 2011). Concerning value discovery, Le Dantec et al. (2009) argue that values should be identified during direct stakeholder engagement. In agreement with this, we utilise empirical investigations for value discovery in this work.

Several studies have specifically explored values and value conflicts in the cybersecurity domain. Among others, potential conflicts have been identified between the values security and privacy, the values security and fairness (Christen et al. 2017; Domingo-Ferrer and Blanco-Justicia, 2020; van de Poel, 2020), as well as the values security and autonomy (Christen et al. 2017; Domingo-Ferrer and Blanco-Justicia, 2020). Further, privacy was found to be potentially conflicting with both fairness and accountability (van de Poel, 2020). However, the identified conflicts mostly involve either security or privacy and altogether the works remained on a conceptual level, without reference to specific technical artefacts. Other publications referred to specific OSINT artefacts for other security purposes, but they were narrowly focused on safeguarding the value privacy through regulatory Privacy by Design approaches (Casanovas, 2017; Casanovas et al. 2014; Casanovas, 2014; Cuijpers, 2013; Koops et al. 2013; Rajamäki, 2019; Rajamäki and Simola, 2019).

2.3 Research Gap

While values and value conflicts relevant to cybersecurity have been investigated conceptually (Christen et al. 2017; Domingo-Ferrer and Blanco-Justicia, 2020; van de Poel, 2020), to the best of the authors’ knowledge, there are no publications that primarily focus on the values relevant to ML-based OSINT technologies for cybersecurity incident response, despite their increasing significance. Moreover, the consideration of Privacy by Design principles (Casanovas, 2017; Casanovas et al. 2014; Casanovas, 2014; Cuijpers, 2013; Koops et al. 2013; Rajamäki, 2019; Rajamäki and Simola, 2019) has only been studied in connection to OSINT artefacts for other security related scenarios. Riebe et al. (2021b) have further shown that Privacy by Design principles are hardly taken into consideration in technical research on the development of OSINT artefacts for cybersecurity event detection. Accordingly, a research gap can be found with regard to the empirical investigation of relevant stakeholder values related to potential value conflicts resulting from the application and development of such technologies as ML-based decision support systems. The derived implications for design and research may be essential for the future development of OSINT systems for cybersecurity incident response in order to ensure their societal acceptance and stakeholder cooperation.

3 Methods

To elaborate which values and value conflicts emerge due to the application and development of ML-based OSINT technologies in the context of cybersecurity incident response, the research design uses a triangulation of methods (see Fig. 1). While the empirical investigation of relevant values and value conflicts is performed on the basis of a case study in which the results of a focus group (N = 7) and of semi-structured expert interviews (N = 9) are content-analysed, along with a preceding conceptual investigation of direct and indirect stakeholder groups, a systematic literature study (N = 73) reviews technical research on OSINT technologies for the domain of cybersecurity. A combination of these approaches is reasonable, particularly taking into account the elaboration of the values and value conflicts can be based on an adequate empirical basis, and that it is possible to complement the gained insights with perspectives from other OSINT artefacts and application scenarios. While the methodological procedure of the literature review is described in Section 3.1, the details regarding the case study are presented in Section 3.2.

Fig. 1
figure 1

Illustration of the research design

3.1 Systematic Literature Review: OSINT in the Domain of Cybersecurity

To situate the findings of the empirical case study within the broader context of technical research on OSINT-technologies for application in the field of cybersecurity, the literature review section seeks to answer the following questions:

  1. 1.

    For which deployment scenarios in the cybersecurity domain are OSINT technologies being developed?

  2. 2.

    What technical features, techniques, and data sources are used?

  3. 3.

    Are ethical, legal, and social implications taken into consideration?

As this review follows an explicit and reproducible method to identify and evaluate the publications, it can be considered a systematic literature review (vom Brocke et al. 2015). Specifically, a sequential review approach is used in which literature search, analysis, and the writing of the review follow a step-by-step process (Levy and Ellis, 2006). As research conducted by private actors and state bodies in many cases is not accessible, only research published in academic publications is taken into account. For the review, a search in the literature databases ACM Digital Library, IEEE-Xplore, Science Direct, and Springer Link was conducted. As the review focuses on technical research, the selection of the databases was based on their coverage of computer science literature and the number of publications they contain. Moreover, to ensure the quality of the reviewed works, it seemed sensible to limit the search to publications in peer-reviewed journals and conference proceedings. Finally, only work published from the beginning of the databases’ coverage to the end of May 2021, the beginning of the literature research, was included. The full-text and metadata search in the databases was conducted with the following search expression using Boolean operators: (“cyber security” OR cybersecurity OR “information security” OR cybercrime) AND (OSINT OR SOCMINT OR WEBINT OR “open-source intelligence” OR “social media intelligence” OR “web intelligence”). The procedure of publication search and selection is illustrated in Table 1.

Table 1 Procedure of the publication selection for the systematic literature review, differentiated by database

The search resulted in 1,419 preliminary results, of which 945 were papers published in journals and conference proceedings. In a next step, the articles’ abstracts were screened to identify irrelevant publications to the goal of the review. First, publications not focused on the development of OSINT artefacts for the cybersecurity domain, including those related to cybercrime in a broader sense, were excluded. Second, publications in which the processing of publicly available data is only a secondary aspect of an artefact were excluded. Third, research published in languages other than English was excluded. Finally, inaccessible papers and duplications were excluded. This resulted in the exclusion of 872 publications. The remaining 73 were quantitatively analysed with Excel. A table with the categories and subcategories of analysis can be found in Appendix A. The categories were compiled in response to the three guiding questions of the review. A structured examination of the usage scenarios, features, technical approaches, and data sources of available OSINT approaches, as well as their attention to ethical, legal, and social implications (ELSI), is crucial to derive design and research implications that extend beyond the individual case studied in depth in this paper. The subcategories were initially generated by screening review papers and chapters on OSINT (Pastor-Galindo et al. 2020; Simran et al. 2020; Tundis et al. 2020). They were then revised in light of a preliminary engagement with the selected publications before the final analysis was performed.

3.2 Conceptual and Empirical VSD Case Study

3.2.1 Conceptual Stakeholder Analysis

A conceptual investigation helped to identify the stakeholders directly and indirectly affected (Friedman et al. 2017). For this purpose, a structured workshop was conducted within the research project team, in which, building on potential use cases, it was asked which groups interact with or are affected by OSINT artefacts. The results are presented in Section 4.2. In a next step, the authors identified potential harms and benefits for stakeholders as well as potentially implicated values and established working definitions based on relevant literature (Friedman et al. 2013).

3.2.2 Data Collection: Focus Group and Semi-structured Interviews

In order to identify relevant values and value conflicts, we conducted a focus group within the team of developers and researchers and nine semi-structured interviews with key stakeholder groups. In designing the procedure for data collection, we adapted the approach of Mueller and Heger (2018). Table 2 summarises the interviews and the focus group conducted.

Table 2 Overview of the interviews and the focus group with the involved stakeholder groups and the respective types of organisations

The focus group (F1) involved seven participants from the fields of computer science, media and cognitive sciences, and software development, who were all part of the CYWARN research project, including one staff member of a German state level CERT. The sample consisted of six male participants and one female participant. The design of the focus group followed the recommendations by Krueger and Casey (2015). The discussion was held digitally and was semi-structured by a moderation guideline. After an input about VSD and a hypothetical usage scenario of the OSINT artefacts in development as a stimulus to facilitate a discussion, we asked the participants to brainstorm and write down ethical, legal, and social implications on a digital board. Afterwards, we went through the issues collected and asked the participants to discuss them with a focus on potentially implicated stakeholder values and value conflicts.

The semi-structured expert interviews (Gläser and Laudel, 2010; Kallio et al. 2016) were designed to gather empirical insights on the values important to key stakeholder groups. To collect the data, we followed a convenience sampling approach and sent interview requests to relevant organisations and individuals. When selecting the participants, we drew on the insights of the stakeholder analysis (see Subsection 4.2) and took care to involve both stakeholders directly and indirectly affected by technology development; however, since indirect effects may be experienced by a wide array of actors, we restricted the scope to stakeholders that might be most significantly affected (Friedman et al. 2017). Overall, we interviewed nine individuals from three stakeholder groups: (1) Five interviews were conducted with CERT employees, as they belong to the prospective user group of the developed artefacts. (2) Three interviews were conducted with further potential users as it is intended to transfer the artefacts to other application domains as well (I6, I7, I8). Specifically, we interviewed information security officers of a state company (I6) and a humanitarian organisation active in disaster relief (I7), as well as the head of a virtual operations support team (VOST) (I8). (3) Finally, to consider the perspective of individuals potentially affected by OSINT gathering, we interviewed one individual who is regularly disseminating cybersecurity information on social media and is active in cybersecurity related civil society organisations (I9). After obtaining the interviewees’ informed consent, several blocks of questions were asked based on an interview guideline, which was slightly adapted to suit the particularities of the different stakeholder groups. The interview sessions were conducted online, were recorded and lasted on average 74 minutes.

3.2.3 Data Analysis: Qualitative Content Analysis

After the focus group and the interviews were transcribed, a software-assisted and category-based structuring qualitative content analysis following (Kuckartz, 2016) was conducted. We worked with thematic categories that were developed deductively on the basis of existing literature on values, as well as inductively during the analysis of the empirical material. In this study, the main category Value with ten subcategories, as well as the main category Value Conflict were used. The categories were defined in a codebook and supplemented with coding rules and examples. A shortened version of the codebook can be found in Appendix B. The transcripts were coded with the qualitative content analysis software MAXQDA. First, all the material was revised to select coding examples for each category. Then, the focus group and two interviews were coded to verify the intercoder agreement with MAXQDA. This resulted in a kappa coefficient after Brennan and Prediger (1981) of 0.69, what can be interpreted as a good result (Rädiker and Kuckartz, 2019). Furthermore, the codebook was later revised in order to further increase intercoder agreement. The text segments assigned to each category were then assembled and analysed together.

4 Results

In the following, the results of the literature study are presented in Section 4.1. Afterwards, Section 4.2 introduces the stakeholder groups identified and outlines the results of the content analysis of the empirical material.

4.1 OSINT-Technologies in the Domain of Cybersecurity

Of the 73 publications evaluated in the review, 10% named investigative purposes as the intended scenario of use for the systems. In 74% of the publications, systems were developed for primary use in the context of gathering CTI, in 12% for use in the area of risk assessment and mitigation, and in 4% for both investigative and CTI purposes. The temporal distribution of publications per year is shown in Fig. 2.

Fig. 2
figure 2

Number of publications per year, differentiated by intended scenarios of use

The publications were also examined concerning respective features of the systems. In 44 publications, data gathering methods were either an integrated part of the artefacts, or new data sets were specifically created in the context of research. In 36 publications, approaches for the detection of cybersecurity events have been developed. This included models for the detection of emerging cybersecurity topics (Al-Ramahi et al. 2020; Dalton et al. 2017; Kawaguchi et al. 2017; Schäfer et al. 2019), the aggregation of individual pieces of information into security events (Alves et al. 2019; Alves et al. 2021; Azevedo et al. 2019; Vacas et al. 2018), the detection of distinct types of information (Behzadan et al. 2018; González-Granadillo et al. 2019; González-Granadillo et al. 2021; Liao et al. 2016; Syed, 2020), and the detection of threats related to specific infrastructures (Dionisio et al. 2019) or products (Kannavara et al. 2019; Neil et al. 2018; Nunes et al. 2018). Approaches to the classification or filtering of relevant information are presented in 26 publications, and 15 systems comprise data visualisation functions, including Social Network Analysis to explore relationships in hacker forums and marketplaces (Huang et al. 2019; Huang and Ban, 2019; Schäfer et al. 2019).

While twelve systems have the capacity to generate reports or structured pieces of information, e.g. Indicators of Compromise (IoCs), eleven systems aim to identify specific users or communities. This is related to the assessment of organisational attack surfaces or penetration testing (Chitkara et al. 2020; Edwards et al. 2017; Urban et al. 2020), the identification of individuals with insider threat potential (Kandias et al. 2013a; Kandias et al. 2013b; Kandias et al. 2013c; Kandias et al. 2017), and the investigation of hacker forums and marketplaces (Fallmann et al. 2010; Huang et al. 2019; Huang and Ban, 2019; Schäfer et al. 2019). Finally, five papers demonstrate techniques to analyse the quality or credibility of CTI (Ghazi et al. 2018; Gong et al. 2018; Jo et al. 2021; Khurana et al. 2019; Liu et al. 2017), and three propose methods to assess the quality or credibility of CTI sources (Gong et al. 2018; Liu et al. 2017; Tundis et al. 2020).

Additionally, the publications were analysed for the use of selected algorithmic approaches (see Fig. 3).

Fig. 3
figure 3

Algorithmic approaches implemented in the artefacts developed

Most frequently, in 45 cases, algorithms for classification were implemented. Clustering, on the other hand, was only used ten times and regression only once. In addition, 13 papers used named-entity recognition, i.e. the classification of named entities in unstructured text into predefined categories for the purpose of information extraction, and seven papers used latent Dirichlet allocation for topic modelling, i.e. the discovery of previously undefined topics in a document corpus. Artificial neural networks were used in 27 systems. Concerning the use of ML, 46 systems used supervised ML, 28 unsupervised ML, one semi-supervised ML and 19 none. In line with the features of the examined OSINT systems, the research is focused on ML algorithms that assist operators in managing the high volume, variety, and velocity of big data by using trained classifiers, self-learning neural networks, named entity recognition, clustering, topic modeling, and regression to identify cybersecurity events, threats, and threat actors, as well as to assess the relevance, quality, or credibility of CTI and respective sources.

A variety of different sources of information were used with the systems.

Twitter was used 20 times, followed by cybersecurity blogs, forums, or websites that were utilised eleven times. Information from hacker forums, as well as from CTI feeds and platforms was accessed ten times each. Information from other social networks, e.g. Reddit, Facebook, and YouTube, was processed in nine instances, while seven systems made use of data gathered from dark web forums and marketplaces. Less common data sources can be found in Fig. 4.

Fig. 4
figure 4

Publicly available information sources used for data gathering with the artefacts developed

Finally, it was examined whether ELSI of the respective systems were discussed. Of the 73 papers, only eleven considered such issues. While some authors argued that using only publicly available data circumvents ethical issues (Pournouri et al. 2019; Pournouri and Akhgar, 2015), Edwards et al. (2017) justified their decision not to list individuals in reports on organisations’ social engineering attack surface with the concern that this could cause disciplinary action. In addition, to increase algorithmic comprehensibility, they decided to use decision tree classifiers to identify employee profiles. In a similar study, Urban et al. (2020) emphasised strict compliance with data protection requirements and the avoidance of any legally or ethically questionable strategies for data acquisition. With regard to the investigation of dark web marketplaces, Lawrence et al. (2017) mitigate the risk of legal ramifications by restricting web scraping to cybercrime related sections, textual data, and non-personal information. Ranade et al. (2018) motivated their development of a deep learning model for CTI translation partly on the premise that analysts are often not allowed to use third party services due to privacy, security, and confidentiality policies. Beyond that, a trade-off between data protection and demands of forensic investigators to have access to proactively collected data is discussed by Nisioti et al. (2021). The most extensive discussion of ELSI is found in the context of research on the identification of employees with insider threat potential. Negative effects on personal and human rights of those affected, as well as dangers concerning algorithmic profiling are discussed, and the recommendation that such screenings should be subject to strict preconditions is provided (Kandias et al. 2013a; Kandias et al. 2013b; Kandias et al. 2013c; Kandias et al. 2017).

4.2 Stakeholder Values and Value Conflicts

During the preliminary conceptual investigation, six main stakeholder groups affected by the application and development of OSINT artefacts in the domain of cybersecurity incident response were identified. Figure 5 presents the stakeholder groups and their interaction with OSINT artefacts.

Fig. 5
figure 5

Main stakeholder groups of the prospective OSINT framework and their interaction with OSINT artefacts

The first stakeholder group consists of the individuals that interact directly with OSINT systems. In the context of this case study, these are the employees of CERTs who are expected to use a demonstrator with OSINT components. The second stakeholder group comprises actors that are indirectly affected by the collection of publicly available data with OSINT systems. In this case study, these include, in particular, actors that disseminate information on threats on social media. While the third stakeholder group, the direct beneficiaries, is directly advised and supported by the direct users of OSINT artefacts - in the case of CERTs primarily public authorities, critical infrastructure operators, and enterprises - the fourth group, the indirect beneficiaries, only receives unidirectional communication about threats and best practices - in the case of CERTs, among others, citizens and other cybersecurity organisations. The fifth stakeholder group, the potential users, comprises actors that have an interest in using OSINT systems. In our case study, there may be both potential users in the field of cybersecurity and in other domains, e.g. law enforcement, civil protection, and emergency services. Finally, the developers and researchers concerned with OSINT artefacts comprise the sixth stakeholder group. In our case, this encompasses individuals from both academic research and private software engineering.

4.2.1 Stakeholder Values

During the content analysis, ten values were identified. Table 3 shows which values were discussed in the individual interviews and the focus group, and how often they were coded in total.

Table 3 Overview of the identified values and the number of coded sections

Accuracy can be defined as the correspondence or closeness of a statement or piece of information to the truth, the reality, or a differently defined standard (Hayes et al. 2020). Accuracy is particularly relevant in connection to ML algorithms and the quality of data. CERT staff, potential users, and developers emphasised the importance of the accuracy and quality of different types of data. The accuracy of data collected was considered very important (I1, I3, I4, I5, I7, I8, F1). Gathered information should not only be correct, but also structured consistently and have minimal redundancy to enable effective analysis (I1, I4). Since this requires repetitive and time-consuming activities, interviewees suggested drawing on the expertise of ML algorithms to harmonise information from heterogeneously structured texts and aggregate multiple pieces of information related to the same topic (I1, I4). Furthermore, the issue of disinformation was highlighted: “You also have to be cautious not to be fooled by people who try to make themselves important and publish something that is not true” (I1). The output data of algorithms needs to be relevant for OSINT analysis (I1, I4, I6, I8, F1), as well as for the information requirements of clients (I1, I2, I4, I5, I6, I7, I8, F1). For these reasons, specifically the accuracy of algorithmic decisions and the quality of training data for ML algorithms were highlighted (I5). Yet, it was argued that the application of ML should be limited to very specific tasks, as human expertise is crucial for creative or unstructured activities:

“ML-supported systems ... are built for pattern recognition and the patterns are trained. And you just have to get out of the pattern thinking, which is really thinking inside a box” (I5).

The interviewee potentially affected by data collection pleaded for a reduction of biases in algorithms (I9). This was also emphasised by the developers with a view on algorithms for prioritisation and credibility assessment of CTI (F1).

At a high level of abstraction, security can be conceptualised as “the state of being free from danger or threat” (van de Poel 2020, p. 50). CERT employees and potential users highlighted the importance of security in relation to the IT infrastructure of organisations in their area of responsibility and the data processed by clients (I1, I2, I3, I4, I5, I6, I7). To ensure security, OSINT is used to leverage the expertise of numerous cybersecurity experts (I1, I3, I5, I6, I7, I9). Their expertise lies in detailed and up-to-date knowledge of specific cyber threats (I1, I9), threat actors and their strategies (I1, I9), vulnerabilities (I1, I6, I7, I9), and protection and mitigation measures (I6, I7). The civil society representative called for OSINT tools to be operated in a secure environment (I9). Finally, the developers also addressed the security of the ML algorithms against poisoning attacks, especially if information about training data and algorithmic models used is publicly accessible:

“If a hacker notices something like this, that in some form [data] is merged and recommendations are derived from it, ... he can carry out a targeted attack based on it” (F1).

Efficiency describes the ability to accomplish specific tasks or outputs with minimal expenditure of resources (Cousins et al. 2019). In the interviews with CERT staff and potential users, efficiency considerations were cited as a key rationale for the intention to use OSINT tools (I1, I2, I3, I4, I5, I6, I7, I8). Furthermore, the efficiency gain may also improve the quality of certain services:

“If the data collection process is simplified, then it will be intensified on the other side. Because if I am relieved of the data collection, then the evaluation will probably be more intensive. Then I might take a much closer look at the reports, which I might have published before with the watering can principle” (I6).

Specifically, possible efficiency gains were identified through technical support in the acquisition and evaluation of security advisories (I6, I7, I8), the evaluation of cybersecurity websites and blogs (I1, I3, I6, I7), the search of Twitter and other social networks for cybersecurity-relevant information (I6, I7), and supporting communication by providing target-specific cybersecurity reports or alerts (I1, I2, I3, I4, I6). Particularly for the extraction of information from unstructured texts, the use of ML algorithms has been suggested (I1, I8, I9). Here, the expertise of ML-based information extraction techniques, is to discover specific pieces of information in unstructured texts or to create summaries (I1, I9). The developers saw an interest in efficiency gains through OSINT tools also among the direct beneficiaries of CERT activities, who could receive faster support in case of incidents (F1). Finally, with a view on development, it was also suggested to keep in mind that it should be as easy as possible to adapt the artefacts to changing legal requirements (F1).

Accountability can be seen as “the (moral) obligation to account for what you did or what happened (and your role in it happening)” (van de Poel 2011, p. 39). In contrast, responsibility is directed towards current actions and their prospective consequences, as it refers to the obligation to evaluate one’s own role and duties in relation to a situation or a context of action (van de Poel and Royakkers, 2011). CERT staff members pointed out that alerts and reports must be approved by superiors for reasons of political accountability. (I2, I3, I4). In particular, a fixed approval process for alerts hinders automation: “There are too many sensitivities or responsibilities involved to automate something like this” (I2). With regard to disaster management, the importance of documenting verification steps and analysts involved was also pointed out in order to render the evaluation of information comprehensible for decision-makers (I8). Referring to CERTs’ use of OSINT tools, the interviewee from civil society pleads for a responsible protection of the data infrastructures used (I9). It was also pointed out that when processing certain data, the design of OSINT tools should consider the obligations for CERTs to comply with reporting chains and guidelines (F1). In this context, the question was raised to what extent clear responsibilities for the consequences of incorrect predictions of ML algorithms can be ensured:

“So if security vulnerabilities are perhaps not taken seriously, even though they are announced on social media, because this relevance algorithm has perhaps decided that it is irrelevant for various reasons, there would also be the question of whether CERTs would perhaps even be legally liable in some way, because they should actually have acted” (F1).

With regard to ICT, autonomy can be understood as users’ ability to control the technical systems in a context appropriate manner, and to enable decisions deemed suitable for them to achieve their objectives (Friedman and Kahn, 2002). The consideration of the autonomy of stakeholders was brought forward by direct users, potential users, and developers. One interviewee in particular places the value at the centre of human-computer interaction:

“So really the point is that you don’t have to replace anyone in that sense, but you can support everyone. So I see the point with all technology that it should still be supportive, it should be a tool for people. But it should not determine people” (I5).

The complete automation of analytical OSINT processes with the help of ML is seen particularly critical, as “artificial intelligence logic always trims someone down to blinkered thinking and an increasingly narrow focus” (I5), thus restricting the analysts’ evaluative capabilities. Furthermore, ensuring the autonomy of users was also discussed in the context of the adaptability of the selection of sources (I1) and the relevance assessment of information (I4, I5). For the latter, an evaluation by experienced analysts was considered crucial (I4, I5). Potential users also advocated for a prioritisation of information that could be individually adapted to the respective infrastructure (I7, I8).

Transparency can be best understood in relation to a situation in which it is beneficial for actors to make knowledge and information about a certain topic extensively available, accessible and comprehensible, without obscuring any information (Turilli and Floridi, 2009). A CERT employee advocated for the disclosure of contextual information on algorithmic decisions of OSINT artefacts to analysts (I5). Similarly, a potential user reported that the degree of transparency of algorithmic decisions should always depend on the expertise and task of the respective user group, as too much information can also be counterproductive, especially in time-sensitive situations (I8). The developers discussed the promises and pitfalls of open sourcing the code of the OSINT artefacts to be developed (F1), while our interviewee from civil society requested transparency on the part of the developers and, ideally, an involvement of the cybersecurity community in the development of OSINT artefacts:

“So of course I would be happy if the whole system is open source as far as possible, subject to this evaluation and the risks, and is also open development. So it’s not just open source, here’s the software. But open development” (I9).

For this work, privacy can be defined as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” (Westin 1967, p. 7). The importance of privacy was raised by CERT employees in conjunction with compliance with the legal requirements of data protection legislation (I1, I4, I5). In particular, the automated analysis of personal data is legally problematic and sometimes only granted with special permission (I1). Thus, “in the ideal case, the data ... is completely without personal reference” (I1). In the interviews with potential users it became clear that organisations are subject to very different regulations regarding privacy and data protection (I7, I8). The respondent from the group of those potentially affected by data collection considered the protection of private data a central principle: “Well, I would generally have a stomach-ache with it, if it was private data. So not publicly available data” (I9). The developers discussed privacy aspects of the development of the OSINT artefacts with a focus on the principles of data minimisation, the necessity of a justification for storing data, requirements on data deletion and anonymisation, as well as the adaptability of artefacts to changing legal requirements (F1).

According to Friedman and Kahn (2002), the value ownership and property is related to the rights of individuals or groups to possess, use, manage, derive profit from, or bequeath objects or pieces of information. For CERT employees and the developers, questions of ownership and property are important when it comes to legal requirements regarding the extent of data collection and the type of data to be collected (I1, I4). One CERT employee describes that the e-government law of the respective state strongly affects the processing of personal data, which should also be taken into account in the design of OSINT artefacts (I1). One potential user expressed the view that organisational policies on data processing may need to be changed before OSINT tools can be applied (I6). In addition, a part of the focus group discussion focused on the question of who should have the right to use the artefacts:

“Perhaps it would be conceivable for a government to somehow offer the tool ... to make it available as open source and that even the public can somehow co-develop it or use it” (F1).

The value freedom from bias is associated with the absence of systematic unfairness against individuals or groups (Friedman and Kahn, 2002). Both the CERT staff and other organisations’ employees stressed the importance of addressee-oriented communication that is free from any systematic bias (I3, I4, I7). Pre-formulated templates for alerts were mentioned as a possible solution to this issue, because “if you have different stakeholders with technical skill levels, you can relatively easily find the right tone” (I4). Furthermore, when distributing warnings for a broad target group, appropriate communication channels should be chosen (I7). Specifically with regard to the use of ML in OSINT systems, our interview partner from civil society warned against the tendency to systematically replicate a pre-existing bias in training data:

“The problem such systems always have is that, whatever framing or bias exists in the data and structures, machine learning ... will simply consider it as a relevant parameter” (I9).

During the focus group it was raised that the algorithmic credibility assessment of information sources may have detrimental consequences, if the labelling of an actor as an untrustworthy source became public or lead to permanent non-inclusion in future analyses (F1).

For the purpose of this paper, trust may be understood as “expectations, assumptions or beliefs about the likelihood that another’s future actions will be beneficial, favorable or at least not detrimental to ones’ interests” (Robinson 1996, p. 576). For direct stakeholders, trust in respective providers of information plays a major role in the verification of information from public sources (I1, I5). The developers, however, discussed trust in context of the societal acceptance of the use of OSINT technologies (F1). The trust of citizens in those using such systems may be influenced by the transparency towards the public:

“But perhaps trust in general also depends very much on who operates the tool in the end, whether the whole thing is transparent, i.e. how much is communicated about the artificial intelligence to the outside world, what data is collected (F1).

4.2.2 Value Conflicts

While engaging with the stakeholders, eight value conflicts arised. These are illustrated in Fig. 6 together with the associated design issues.

Fig. 6
figure 6

Value conflicts and associated design issues identified in the empirical material

Privacy conflicts first emerge between the privacy of actors affected by data collection and the value of ownership and property in terms of the requirements for CERT staff to be allowed to use non-anonymised data with reference to individuals (F1, I1, I4). While respect for the privacy of data subjects requires refraining from collecting personal data, it may be of interest for CERTs to collect such information. “So we’re pretty restricted there, and I think if you develop us a tool that we use in the CERT, it’s subject to those same regulations” (I1), stated a CERT employee. Thus, besides the ethical weighing of both values, the consideration of privacy and data protection requirements is central, e.g. when determining what data is collected or whether personal data is minimised, anonymised, or deleted (F1, I1, I4). Demands of safeguarding privacy and compliance with data protection regulations also partially conflict with the value of efficiency on the part of the CERT staff (I1). Semi-automated aggregation and analysis of public information is a key requirement of CERTs that would come with time savings, yet it was pointed out that data protection requirements might prohibit such functionalities: “This automated evaluation of public sources is not permitted to all CERTs, some of them are not allowed to do this for legal reasons” (I1).

Transparency conflicts arise, as the interviewee potentially affected by data collection, in particular, demands transparency about the specifications of the technical artefacts, training data, and ML algorithms used in, as well as the scope of data collected with OSINT tools (I9). Developers suggested that such transparency-motivated decisions could be counterproductive to the value of security, in terms of ensuring the reliable functioning of the ML models:

“If it is known from which sources learning has taken place, one has of course again... you obviously provide an attacker the opportunity to poison the models. To do this model poisoning” (F1).

In connection to a prospective open-source implementation, a possible conflict with the value of ownership and property on the part of the developers of OSINT tools was brought forward: “You might not want to disclose the training data or explain the algorithms in detail so that you can still earn money commercially with it” (F1).

The interviews and the discussion revealed three different efficiency conflicts. First, due to the use of ML to accelerate OSINT processes, a conflict with the values of accountability and responsibility might arise. Considering the stakeholders whose data is processed, and the actors who receive information from CERTs, it is imperative for information to be correct, guidelines to be adhered to during processing, and misconduct to be clearly attributed to responsible actors (I2, I3, I4). ML algorithm based decision-making could undermine accountability, but conversely, the integration of manual control steps could imply higher resource consumption (I2, I3, I4, F1). Moreover, the question to what extent liability for algorithmic errors may be allocated to CERT personnel is unresolved:

“If vulnerabilities are not taken seriously, despite being announced on social media, because this relevance algorithm has perhaps decided that it is irrelevant, there is the question whether CERTs might somehow be liable” (F1).

Second, due to the utilisation of ML algorithms, a conflict could arise between efficiency and freedom from bias. This especially applies to the direct and indirect beneficiaries of generated alerts. Warning messages generated by algorithms should be adapted to the target group to avoid systematic discrimination (I4). This, however, “means that it takes a lot of effort to reach the right level of communication” (I4), thus coinciding with a higher consumption of resources during development and application. Since CERT members expressed concerns that the present state-of-the-art is not sufficient to automatically generate target group-specific alerts (I2, I4), it seems appropriate to split the communication step into two individual tasks, thereby leveraging the expertise of both ML-based natural language processing (NLP) techniques as well as CERT analysts. In a first step, efficiency in communication could be enhanced by using NLP models to generate text segments based on a set of threat scenario- and target group-related parameters (I2, I4). In a second step, the expertise of analysts is employed to adapt the text to ensure that it actually reflects the status, requirements, and expertise of the target audience (I2, I4, I5, I9). Thereby, it is ensured that bias in communication is limited.

Third, a conflict between efficiency and users’ autonomy emerges. It is particularly important for users of OSINT artefacts to remain in control over technical processes (I1, F1). However, it was highlighted that “many of the points that are aimed at, for example, additional manual control would significantly increase the time it takes for decisions to be made and solutions to be developed” (F1), thus resulting in a lower efficiency. Conversely, an exclusive focus on resource-saving optimisation may diminish operators’ autonomy. Trade-offs arise especially at the stages of the design process when it is determined which decisions should be handed over to ML algorithms and to what extent users should be able to supervise these decisions. An adequate balance between both values is particularly important for OSINT tasks, where the expertise of ML algorithms and CERT analysts complement each other and can thus yield advantages over exclusively manual or automated solutions. In our context, this is especially the case with the relevance and credibility assessment of CTI. While the strength of ML in relevance assessment lies in a rapid evaluation of large amounts of information using predefined relevance criteria (I1, I6, I7), analysts can draw on this to select actually relevant information using their contextual knowledge about serviced infrastructures, e.g. deployed software (I1, I4, I5, I6, I7). During credibility assessment, three types of expertise may interact. While the expertise of ML algorithms is to compute a credibility rating using features of information previously evaluated as credible or non-credible (I5, I8), analysts, taking into account the rating and underlying contextual information, supplementary research and personal experience, as well as, if necessary, the opinions of external experts, can ultimately verify a piece of information (I1, I3, I5, I7, I8, I9). Whereas for these two tasks the trade-off between autonomy and efficiency can be mitigated by a two-step procedure, interviewees advocated for a non-automated criticality evaluation of vulnerabilities, hence prioritising autonomy (I1, I4, I6). Here, analysts resort to the expertise of external experts, which lies in their ability to determine the general criticality on basis of detailed knowledge about affected hardware, software, or corresponding exploits (I1, I4, I5). This evaluation, which can be reflected in a rating according to the Common Vulnerability Scoring System, enables analysts to decide, on basis of knowledge of the serviced infrastructure, whether there is a necessity to prioritise the vulnerability (I1, I4, I5, I6).

An accuracy conflict involving the value trust became apparent in the context of the credibility assessment of CTI. Both interviewed CERT staff and potential users emphasised the importance of trust in the respective providers for the selection and verification of information (I1, I5, I7, I8). In this context, trustworthiness is determined based on respective sources’ past reliability (I1, I5). However, it was pointed out “that only the trustworthy position of a communication partner does not of course ensure that he does not publish nonsense anyway” (I1). Thus, in the development of ML algorithms for credibility assessment, an exclusive consideration of characteristics of trustworthy sources could compromise the accuracy of the output data.

5 Discussion and Implications

To answer the research question: Which values and value conflicts emerge due to the application and development of ML-based open-source intelligence technologies in the context of cybersecurity incident response? this paper has investigated the state of technical research on OSINT technologies for cybersecurity, as well as stakeholders, values, and value conflicts relevant for their application in the field of cybersecurity incident response. In this section, implications for the design of OSINT systems for this domain and for research are elaborated (Section 5.1). This is followed by a discussion on how sensitivity to the uncovered values and value conflicts can facilitate collaboration (Section 5.2), as well as an outline of the study’s limitations and opportunities for future work (Section 5.3).

5.1 Research and Design Implications

The use of OSINT increases in many domains (Pastor-Galindo et al. 2020). In the area of emergency management, OSINT is used for the purpose of crisis response and shared situational awareness and collaboration (Akhgar et al. 2013; Backfried et al. 2012; Bernard et al. 2018), data sharing (Skopik et al. 2016; Mtsweni et al. 2016), and collective sense-making (Büscher et al. 2018). This has led to an increased discussion of participatory design and technology assessment methods which account for the specific organisational and legal characteristics and technology use of emergency management organisations (Büscher et al. 2018; Liegl et al. 2016). OSINT is not a single technology, but a framework in which individual steps can be performed with various technical approaches. In all three steps envisioned in Fig. 7, ML algorithms can be used. While they can support the extraction, deduplication, and harmonisation of cybersecurity information during data gathering and pre-processing, they can also contribute to the relevance and credibility assessment of CTI in the following analysis phase. Finally, in terms of communication, they can be used to pre-formulate warning messages as a foundation for their customization by CERT staff to fit the respective target groups. In the described steps, human and ML-expertise can be complementary and in interaction increase the effectiveness of CERTs. However, this study also identified values and value conflicts that need to be considered when designing OSINT technologies for cybersecurity incident response. In the following, implications for the individual OSINT steps will be discussed, while taking the findings of the literature survey and the identified value conflicts into account.

Fig. 7
figure 7

Human and ML expertise in the cybersecurity OSINT process

The systematic literature review revealed that information gathering (1) is mostly conducted using publicly available data from social media platforms. As personal information is shared on such platforms, surveyed stakeholders have indicated that challenges arise in connection to privacy protection and compliance with data protection regulations. Therefore, Privacy Impact Assessments specifically for OSINT in the context of cybersecurity are needed (Liegl et al. 2016; Wright and Friedewald, 2013). The extent to which privacy infringement can be prevented by exclusively using sources specialised on the distribution of cybersecurity related information should be further analysed (Riebe et al. 2021b). In an evaluation of the Cyber Threat Observatory dashboard with CERT-employees, Kaufhold et al. (2022) found that the modular and customisable integration of different data sources and feeds has been identified as a crucial feature. With regard to information extraction from long unstructured texts, ML approaches offer clear advantages over the performance of human analysts. Specifically, their expertise lies in topic discovery and information summarisation. Since interviewees emphasised that the use of such ML techniques would increase efficiency and consequently enable the gathering of a larger amount of data for subsequent steps, their use can be recommended. For a summary of the observations and the derived implications see Table 4.

Table 4 Observations and design implications for information gathering

The preprocessing (2) of gathered information is a sensitive part of the system, as biases in the data used to train algorithms might be detected in this stage (see Table 5). Serious consequences may occur if an artefact’s objective is to infer human characteristics and relationships or to profile individuals, as was the case with some of the artefacts described in the publications of the literature review. However, none of these publications discussed issues of bias and potential countermeasures. Stakeholders’ demands to minimise bias in training datasets for ML algorithms as part of OSINT systems should therefore be addressed in research through studies on the creation and evaluation of appropriate datasets, the development of guidelines for the inclusive annotation of training data, and the establishment of guidelines for the evaluation and documentation of training datasets (Friedman and Hendry, 2019). With regard to cyber threat data, the guidelines would need to reflect the cybersecurity context, the respective data source, and potential bias related human as well as other characteristics. Another challenge, according to our interviewees, is to structure collected data in a consistent way and reduce redundancies prior to further analysis. Since associated tasks are repetitive and time-consuming, they suggested drawing on the expertise of ML algorithms, which lies in harmonising information from heterogeneously structured texts (e.g. named entity recognition) and in grouping multiple pieces of information on the same topic together (e.g. clustering), thus reducing the amount of redundant information.

Table 5 Observations and design implications for preprocessing

Implications for the development of ML algorithms arise in connection to the analysis (3) of OSINT information (see Table 6). While the literature review showed that algorithms are used for a variety of tasks, algorithm selection was rarely reflected from an ethical or social point of view, with exception of a publication that justifies the selection of a decision tree classifier by improving the comprehensibility of algorithmic decisions (Edwards et al. 2017). The empirical investigation showed that value conflicts can occur when algorithm selection disregards operators’ needs regarding the comprehensibility, traceability, and influenceability of algorithmic decisions. With respect to the selection and development of algorithms that meet end-users’ requirements, there is a need for further research on exploring the applicability of XAI and white-boxing approaches for OSINT and the evaluation of different algorithmic solutions with end-users, e.g. by considering the recommendations for XAI by (Wang et al. 2019), which include support reasoning and hypothesis-generation, as well as access to source and situational data. During the interviews, it became apparent that ML can support analysts primarily in relevance and credibility assessment. As shown by Zhang et al. (2022), ML algorithms with complementary expertise are most useful to human operators. However, since in ML-assisted relevance and credibility assessment the human and algorithmic expertise overlap on a specific task, it is particularly important to ensure that human and algorithmic steps are clearly delineated by design so that advantages and limitations of both can surface. This can be achieved by implementing a two-step procedure in which the analyst always makes the final decision on the basis of an algorithmic pre-assessment, under disclosure of relevant decision parameters. In addition, as indicated by research on human-AI interaction experiments, understanding the parameters of algorithmic decisions will be crucial to establish system operators’ trust (Feng and Boyd-Graber, 2019; Schaffer et al. 2019).

Table 6 Observations and design implications for analysis

In the literature review, we found that NLP techniques are used in many systems, but with regard to the generation of alerts and text (4), this is limited to the creation of pre-structured texts such as IoCs (see Table 7). It seems worth investigating whether NLP approaches can also be used for the generation of target group specific alerts and notifications. Advances in fundamental NLP research, especially in conjunction with the development of large pre-trained language models, might be leveraged for the development and training of models for these specific cases. However, the use of such models must be seen in light of the tension between the values efficiency and freedom from bias. In order to streamline communication while ensuring that warnings and notifications do not disadvantage relevant target groups, it is advisable to implement a two-step process. In a first step, large pre-trained language models can swiftly generate text segments based on a few parameters. In a second step, analysts can draw on their knowledge and experience of the needs and proficiency of target groups to adapt the texts accordingly. This mitigates the tension and leverages the complementary expertise of NLP models and CERT analysts, potentially increasing confidence in the system (Zhang et al. 2022).

Table 7 Observations and design implications for communication

With regard to OSINT systems’ implementation into the context of cybersecurity incident response (5) (see Table 8), some of the reviewed studies raised questions of accountability and responsibility in connection with consequences of processing illegal material (Lawrence et al. 2017), or compliance with organisational secrecy and security regulations (Ranade et al. 2018). However, the challenge that state actors are often subject to enhanced requirements in terms of safeguarding accountability and compliance with different standards and responsibilities, which were also emphasised by consulted stakeholders, remained unaddressed. Since considering such requirements results in a higher resource consumption and may prevent the utilisation of particular ML algorithms, a trade-off with the value efficiency occurs. Thus, the challenge lies in developing OSINT systems that support the documentation of the operators’ decisions without disproportionately impairing efficiency and usability. It is advisable to conduct case studies on the specific requirements of respective governmental user groups with regard to ensuring accountability, clear responsibilities, and reporting chains, and based on this derive concrete guidelines for the legitimate application of OSINT systems as well as requirements for their design. Finally, in the empirical investigation, stakeholders voiced a demand for transparency on training data used for ML algorithms and on OSINT system specifications, which, in turn, may increase opportunities for model poisoning and, thus, conflict with safeguarding the security of ML models. While first studies have proposed solutions to mitigate this threat (Khurana et al. 2019; Longo et al. 2020), there is a need for continued research on the magnitude of the problem and technical countermeasures. With regard to the reconciliation of transparency and security, the involvement of stakeholders in a scrutiny committee that reviews algorithm design could be a reasonable solution.

Table 8 Observations and design implications for the implementation of the OSINT system into the CERT context

5.2 Value Sensitivity as a Facilitator of Collaboration

Understanding value conflicts is not an end in itself, but offers venture points for value-sensitive technology design and detailed evaluations of conflicts in complex socio-political systems. From a CSCW perspective, three implications for supporting multi-actor collaboration emerge from the findings of this research paper: First, as the work of CERTs strongly relies on collaboration with other CERTs, authorities, and organisations (Riebe et al. 2021a), a tool for shared situational awareness needs to be trustworthy and support the operators reasoning and sense-making (Ley et al. 2014; Lukosch et al. 2015). Trust can be achieved by supporting the operators alignment with legal provisions and social norms. As OSINT systems work with different ML algorithms, research on the explainability of the systems and on solutions to maintain the autonomy of the operators are crucial in all application domains. Second, with regard to the communication of cyber threats, CERTs need to collaborate with different stakeholders to improve their situational awareness and provide risk mitigation strategies. It became apparent that bias-free and addressee-specific communication is pivotal to fulfilling these objectives, a factor also to be taken into account in the design of systems with communication functionalities. Additionally, the spread of social media, in particular, has opened up opportunities for CERTs to leverage novel resources. However, this paper also highlights the challenges and concerns of how this information is used and processed in such a demanding and time-sensitive collaborative environment. Therefore, the results of this study can be of use for the field of control room research, e.g. in the context of traffic management (Jones et al. 2021) or other emergency services (Normark and Randall, 2005). Third, OSINT, especially when using social media as sources, is dependent on information provided by the respective medium’s users. Therefore, it involves the use of crowdsourcing, which is collaborative (Liu, 2014). Social media users need to trust OSINT operators using their data (Tapia and Moore, 2014), which can be achieved by ensuring transparency and accountability, e.g. by organisational oversight infrastructures, as well as data minimisation by Privacy by Design approaches.

5.3 Limitations and Future Work

The findings of this work must be considered in the light of some limitations, which at the same time, however, offer impulses for future research. First, the empirical investigations in this study were limited to selected stakeholder groups. In addition, only one individual potentially affected by data collection was interviewed. Thus, to consolidate the findings, further qualitative interviews and focus groups are necessary. For enquiries about citizens’ attitudes, however, quantitative surveys appear to be more suitable. Our future research will therefore also include a representative survey on the attitudes of the German population towards the use of OSINT technologies. Second, the generalisability of the results is limited due to the case study design of the VSD-approach. However, similar cases of ML-based OSINT systems for cybersecurity can utilize the design implications. Within this limitation, this work pursued the goal of elaborating values and value conflicts as abstractly as possible. Nevertheless, as the interviews and the discussion were strongly focused on the design of OSINT systems for aggregating CTI for the CERT context, the results are primarily relevant with regard to artefacts for this application field. Accordingly, studies focusing on systems for investigation and risk assessment and mitigation purposes represent promising avenues for further research. Third, this work only includes conceptual and empirical VSD investigations. In the further course of our project, it is therefore intended to conduct technical VSD investigations to derive concrete design requirements and find technical solutions through which value conflicts are minimised and preferred stakeholder values are supported as adequately as possible.

6 Conclusion

In this paper, we employed a triangulation of methods to investigate which values and value conflicts are relevant to the application and development of ML-based OSINT technologies in the context of cybersecurity incident response. In order to situate our empirical findings in the broader research and application context, we first systematically reviewed the technical research literature on the development of OSINT artefacts for the cybersecurity domain (N = 73). Then, an empirical VSD case study, comprising semi-structured interviews (N = 9) and a focus group (N = 7) for data collection, including a subsequent qualitative content analysis of the gathered material, was undertaken to identify values of key stakeholders and to systematise potential value conflicts. The results of the literature review underlined the identified research gap, as despite research activities on OSINT for cybersecurity have increased, stakeholder values and other ethical, legal, and social issues have only been addressed in a minority of publications. In the empirical investigation, we identified ten values and eight value conflicts, particularly involving privacy, transparency, efficiency, and accuracy, that are relevant to the application and development of OSINT artefacts for cybersecurity incident response. Drawing on our findings, we derived implications for the design of and research on ML-based OSINT technologies for this application domain and discussed how sensitivity to the uncovered value conflicts and the division of tasks between human operators and ML algorithms can facilitate collaboration. Though certain limitations remain, this paper offers a systematic review of the technical research literature on the development of OSINT technologies for cybersecurity (C1), an empirically grounded elaboration of values and value conflicts related to the development and application of OSINT technologies for cybersecurity incident response (C2), and an elaboration of research and design implications for ML-based OSINT technologies for collaborative cybersecurity incident response (C3).