1 Introduction

With the growth of the digital world the scale of data that is stored, transmitted and processed is increasing at an exponential rate. The International Data Corporation estimates that 64.2 zeta bytes of data were created or copied in 2020 [87] and in 2019 just over 50 per cent of the world’s population used the internet [91]. This includes a large amount of information that is publicly available related to individuals, states and organisations. This open data can be utilised in intelligence operations for a wide range of applications and can be especially useful in security contexts such as national security, law enforcement, defence, cyber security and threat intelligence operations. Open-source intelligence (OSINT) is created when this publicly available and open data is processed into actionable intelligence and distributed to the relevant parties [273]. This process is known as the intelligence cycle and consists of planning, collection, processing, analysis, dissemination and evaluation [90]. The size and scope of online open data sources can cause problems for OSINT investigations in addition to the opportunities presented.

Utilising artificial intelligence (AI) can provide methods to overcome some of the challenges presented by this vast source of information by automating certain components of the intelligence cycle. For example, the paper [44] investigates the automation of identifying national security incidents from data gathered on social media, which is a frequently used data source for online OSINT. The paper proposes the use of a machine learning technique known as clustering to group similar pieces of information. In the intelligence cycle, this relates to the processing stage, as the data is grouped and classified before more thorough analysis. Using AI techniques in this manner can help reduce the time and workload of intelligence analysis by grouping data into relevant categories. A further example can be found in [260], in this paper, the researchers construct a malicious domain name detection system using recurrent neural networks (RNN) and long short-term memory (LTSM) machine learning algorithms. To train their system various OSINT sources are appropriated, such as OSINT feeds for malicious domain names and the Domain Name System (DNS) for benign domain names. Ultimately the system provides processing and analysis for the intelligence process as it classifies domain names but ultimately provides a conclusive output on which domain names are malicious. A further example can be found in the article [17], which details the use of machine learning for the processing of OSINT. The article compares the Naive Bayesian and K-nearest Neighbour machine learning algorithms to process and classify data for information retrieval. This was performed using online news articles as an OSINT source, with the performance of each model being compared against various categories for information retrieval such as people, topics and places.

Despite the extensive research combining OSINT practices and sources with AI, there are limited systematic reviews concerning the topic. Two examples that cover this form of academic material include the paper [46], which conducted a review that directly covers the application of OSINT to machine learning and AI practices and [42], which covers the use of data mining technology for law enforcement and includes articles covering the use of OSINT concurrent with AI. However, these earlier systematic reviews lack discussion about sources of OSINT, OSINT tools and types of AI techniques used in each phase of the intelligence process. Therefore, this paper contributes by focusing research questions in these areas and identifies research limitations relating to AI-based OSINT applications. The paper is outlined as follows: Sect. 2 provides the background knowledge related to OSINT and the review methodology is presented in Sect. 3. The answers to the research questions are presented in Sects. 4 and 5 presents identified limitations, future research directions and a summary of findings. Finally, the paper concludes with Sect. 6.

2 Background

OSINT refers to intelligence that has been gathered from freely available and non-classified sources and can be found online and offline. Various processes and tools can be applied to assist with various stages of the intelligence cycle. AI provides a mechanism that can assist these processes as AI mimics the cognitive ability of humans and can reduce the workload of tasks requiring the processing and analysis of significant amounts of data. There are a variety of non-AI-based OSINT techniques and tools that can be used to assist in OSINT and intelligence processes. Combining AI with OSINT tools, processes and practices can provide a solution for intelligence operations to efficiently navigate the substantial amount of open data available online that can be used for OSINT purposes.

2.1 Open source intelligence (OSINT)

OSINT is not a new area of interest for intelligence operations, before the existence of the world wide web, intelligence services utilised offline OSINT by monitoring newspapers and radio broadcasts to gain vital intelligence. One of the first attempts at methodically collecting OSINT was during World War Two when the US established the Foreign Broadcast Information Service to monitor public broadcasting orientating from adversarial nations [221]. The introduction and growth of the internet have increased the scope of potential sources of OSINT. Figure 1 outlines some of these OSINT categories for both offline and online OSINT and demonstrates the extensive scope of OSINT particularly in the digital domain.

OSINT can be collected from multiple online sources, including social media, search engines, websites, online forums, company directories and online databases [75]. When focusing on online OSINT sources, OSINT operations must use and develop techniques to identify, collect, categorise and analyse the relevant information. There are a variety of tools available to OSINT practitioners and researchers to assist with these tasks. Even when utilising these tools the amount of information gathered can be substantial, leading to difficulties concerning the classification and analysis of the collected data. AI and machine learning can provide a mechanism to assist OSINT operations in the collection, processing and analysis of online OSINT.

Fig. 1
figure 1

Sources of online and offline OSINT. Digital growth has resulted in an increase in the number of online data sources available for OSINT operations

Fig. 2
figure 2

The Rand Corporation’s OSINT operations cycle [273]. The four-step OSINT process fits within the larger intelligence cycle. Collection and processing are included. Exploitation and production fit within analysis and dissemination

2.2 The OSINT process and intelligence cycle

As OSINT has been a source of information for intelligence operations for many years, pre-existing high-level intelligence processes can be used in the modern digital landscape. There are multiple versions of this process used by different organisations. The Rand corporation proposes a simplified four-step process designed to fit into the intelligence cycle [273]. The steps outlined by RAND include collection, processing, exploitation and production. Figure 2 outlines this high-level four-step process. This four-step process would be used within the intelligence cycle as defined by the US-based Office of the Director of National Intelligence [90] and combined with other intelligence disciplines. The intelligence cycle can be described with the chart in Fig. 3 and includes planning, collection, processing, analysis, dissemination and evaluation. AI-based OSINT systems could be categorised based on which step in this higher-level process they are assisting.

According to the RAND Corporation, a highly respected US-based research organisation that specialises in subjects concerning national security [217], metrics alone are not sufficient to gauge the effectiveness of an AI-based system employed for intelligence purposes. The RAND corporation proposes that the performance of such systems must be evaluated based on the value they provide to the next stage of the intelligence cycle [90]. It is also important to note that intelligence operations are not reserved exclusively for intelligence agencies. This is particularly true in the area of OSINT, for example, cyber security professionals can leverage data collected from Twitter to improve their threat intelligence capabilities [220] and law enforcement can monitor online marketplaces to identify the sale of illegal and controlled narcotics [141]. Business intelligence is a growing area of interest and is driven by the growth of available digital data, this is usually defined under the term big data. As such businesses have had to create processes similar to the intelligence cycle for their intelligence operations [15].

Fig. 3
figure 3

The Intelligence Cycle as defined by the Office of the Director of National Intelligence [90]. The process is step-by-step and reiterative. The evaluation stage informs the parameters implemented during the planning of subsequent investigations

2.3 OSINT techniques and tools

There is a wide variety of tools available to support OSINT operations throughout the intelligence cycle. These include comprehensive tools that can manage entire investigations, such as Spiderfoot [159] and Maltego [147], or they can provide single functions to assist OSINT operations, for example, TheHarvester, which provides email scraping functions [155]. Broadly speaking OSINT tools and techniques can fit into the different stages of the intelligence cycle. TheHarvesters scraping functions would fit into the collection stage of the intelligence cycle and Spiderfoot and Maltego also provide processing and analysis in addition to the collection of open-source data. Tables 1 and 2 outline a number of tools and techniques employed in OSINT processes.

Effective tools are essential for any OSINT operation to reduce the investigation workload. An example of this during the collection phase of the intelligence cycle would be employing the tool known as Sublist3r [2]. Sublist3r makes use of various OSINT sources such as search engines and DNS to efficiently enumerate the subdomains of a target organisation. An OSINT operation would waste significant amounts of time attempting to replicate this process manually, especially if the target is a large organisation with numerous subdomains. Tables 1 and 2 provide a demonstration of commonly used tools and techniques. The OSINT framework can also be used to obtain a comprehensive list of OSINT tools [51]. It is important to note that most of these tools do not currently leverage AI algorithms.

Table 1 These examples of OSINT tools are provided for background purposes. For a comprehensive list of OSINT tools, sources and frameworks please see the OSINT framework [51]
Table 2 Popular OSINT techniques. These techniques can be implemented through a variety of manual methods and tools

2.4 Artificial intelligence and machine learning

AI is the emulation of human intelligence by machines. This is usually implemented in the form of code being processed and executed by computers [98]. AI in computing usually consists of various algorithms which attempt to mimic aspects of human thinking and cognition. One particular domain of AI is machine learning. Machine learning impersonates the way humans learn in terms of inferring, categorising and analysing data, the benefit is that a machine learning algorithm can do this with greater speed, efficiency and accuracy than a human. One example of this is the use of convolutional neural networks (CNN) in computer vision tasks. CNN algorithms can analyse images piece by piece to identify features common to a particular set of images and be used to categorise images or identify objects in images [171]. To an extent, CNN algorithms can be used to mimic the human ability of sight and provide an example of how machine learning algorithms can be used to complete tasks usually performed by a human operator. A further example of this comes from the application of machine learning algorithms for natural language processing (NLP). NLP techniques can be employed for a large range of tasks that in the past would have been reserved for a human analyst. NLP can be used to derive meaning from human speech and text and can be used for sentiment analysis, to understand people’s feelings or opinions on a subject. NLP can also be used to efficiently extract knowledge and information from bodies of text [119]. NLP and CNN algorithms are only two examples of how AI and machine learning can perform tasks usually conducted by humans. The full range of available AI and machine learning techniques is extensive but these examples show the potential for machine learning algorithms to reduce the workload of human analysts with specific tasks.

2.5 Combining AI, machine learning and OSINT

The ability of AI to perform human tasks provides a significant opportunity to improve the speed and efficiency of systems utilised in OSINT operations. Recently academic researchers have been utilising AI and machine learning in combination with OSINT sources and practices. Some use cases include the detection of malicious domain names [102], misinformation detection [94] and threat intelligence through monitoring online hacker forums [57].

These examples show that this combination of AI and OSINT opens a powerful new tool for OSINT operations. Incorporating AI and machine learning with OSINT is enabling the dissemination of an otherwise insurmountable amount of data and provides a means of accelerating investigations and bolstering intelligence gathering and analysis. In addition to this, open data available online can also be utilised as training sets for machine learning algorithms to assist in various security and intelligence applications and generate actionable OSINT. This is frequently the case in models used for the detection of maliciously generated domain names which are trained on data gathered from DNS [276].

Combining AI and machine learning with OSINT presents many challenges. The RAND Corporation has recently identified that judging the performance of AI use in the intelligence process does not rest solely on metrics such as how accurate the algorithm’s results are. The AI needs to be judged on the outcomes it provides to the intelligence process [90]. Put another way, even if an AI implemented in the intelligence process is correct 99 per cent of the time, it will still fail if the outcomes it provides to the intelligence process are insufficient, or worse still, result in catastrophic errors. The RAND Corporation believes this concept should be applied when evaluating any AI or machine learning algorithms that are being applied in combination with OSINT or other intelligence operations.

3 Methodology

This study makes use of the PRISMA guidelines for conducting systematic literature reviews. Two researchers conducted the review and evaluated each other’s results. The methodology utilised to identify, include, exclude, evaluate and review literature for this paper consists of the following steps:

  1. 1.

    Search academic databases using selected keyword set.

  2. 2.

    Initial evaluation of search results to exclude non-relevant articles by appraisal of the article keywords, abstract and title.

  3. 3.

    Remove all duplicate articles.

  4. 4.

    Include or exclude identified articles with the defined criteria through appraisal of the full text.

  5. 5.

    Use quality questions to identify high-quality articles and exclude low-quality articles.

  6. 6.

    Use research questions to collect the appropriate data from the included research

  7. 7.

    Record collected data.

  8. 8.

    Synthesise collected data through graphical display and description of findings.

The Number of articles was recorded at each stage of the process to show aggregated results throughout the review process. Articles in foreign languages were translated so that data from these articles could be included in the results.

3.1 Academic database search

To initially identify potential articles for review, several online academic databases were queried. The following databases and academic search engines were used:

  1. 1.

    IEEE [88]

  2. 2.

    Scopus [222]

  3. 3.

    ACM [26]

  4. 4.

    Science Direct [35]

  5. 5.

    Springer Link [238]

  6. 6.

    Google Scholar [66]

The keywords identified for use in this study include OSINT, machine learning, artificial intelligence and cyber reconnaissance. An initial search was run using the “OR” operator to identify the total number of articles relating to these keywords available on the selected databases and academic search engines. This was refined by searching for exact matches of the specific keywords, to identify the total number of articles available for each term. Once this was completed the “AND” operator was used to identify articles that relate to either OSINT and machine learning/ artificial intelligence, or cyber reconnaissance and machine learning/ artificial intelligence. An example search would be “OSINT” AND “Machine Learning”. Total numbers and search dates were recorded for each search to construct flow charts that break down the total number of articles related to these keywords. All keyword searches are included below:

  1. 1.

    OSINT OR cyber reconnaissance OR artificial intelligence OR machine learning

  2. 2.

    OSINT OR artificial Intelligence

  3. 3.

    OSINT OR machine Learning

  4. 4.

    cyber reconnaissance OR artificial intelligence

  5. 5.

    Cyber reconnaissance OR machine learning

  6. 6.

    OSINT

  7. 7.

    Cyber reconnaissance

  8. 8.

    Artificial intelligence

  9. 9.

    Machine learning

  10. 10.

    OSINT AND artificial intelligence

  11. 11.

    OSINT AND machine learning

  12. 12.

    Cyber reconnaissance AND artificial intelligence

  13. 13.

    Cyber reconnaissance AND machine Learning

Only articles using the AND operator were considered for the initial evaluation. Results for the other searches will be included in the appendix of this paper for further review.

3.2 Initial evaluation

When each search was entered the number of articles was compared between the reviewers to assess if the search had been run correctly. Once this was done an initial evaluation was conducted to remove unrelated articles. This evaluation was completed by reviewing the keywords provided by the articles, the abstract and the title to assess if the article matches the required subject matter. If an article matched the required keywords of the search it proceeded to the next stage of the review. The review of the search results was terminated once the search results started returning significant amounts of non-relevant material.

Once articles had been identified through the initial evaluation, they were added to a spreadsheet, and any duplicate papers were identified and removed from the list. These articles then proceeded to be selected based on our quality questions and article criteria.

3.3 Article criteria

Articles that proceed from the initial evaluation will be assessed for inclusion using the following criteria:

  1. 1.

    Article must post-date the year 2011.

  2. 2.

    Article must be a journal article, conference proceeding, technical report or academic archive.

  3. 3.

    The article must include the subject matter of using machine learning or AI in combination with OSINT sources or processes in a security context. A security context could relate to either national security, cyber security, law enforcement or intelligence operations. Systems that are general but could be used in a security context will also be included in this study.

  4. 4.

    The article must cover the use of a machine Learning or AI algorithm, proposed model or framework through:

    1. a.

      Real-world use.

    2. b.

      Experimental use.

    3. c.

      Proposed or theoretical use.

3.4 Quality questions

Once articles were identified to adhere to the selection criteria they were assessed on their level of academic quality. The following questions were considered when determining the quality of a paper:

  1. 1.

    Is the article peer-reviewed?

  2. 2.

    Does the article include a repeatable methodology?

  3. 3.

    Does the paper provide a statistical analysis of results?

  4. 4.

    Does the paper contain appropriate references?

  5. 5.

    Are details of any proposed models or frameworks provided?

3.5 Research questions

Once the articles were identified for inclusion in this study the following research questions were asked to extract data and information from the selected articles. If multiple categorisation occurs, both categories will be included in the total figures. For example, if researchers collaborate from multiple countries, both countries are recorded in the final results. The same process is followed for multiple data sources, professionals and organisations, algorithms, intelligence cycle phases, metrics, OSINT sources and tools:

  1. 1.

    What is the trend in AI and machine-learning-based OSINT?

  2. 2.

    What geographical regions are contributing the most to this area of study?

  3. 3.

    What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

  4. 4.

    What machine learning algorithms, techniques or tools are being used in OSINT?

  5. 5.

    What phases of the intelligence cycle does the included research apply to?

  6. 6.

    What metrics and analysis are provided to evaluate system performance in the included research?

  7. 7.

    What are the sources of OSINT used in the included research?

  8. 8.

    What OSINT tools are applied in the included research?

3.5.1 What is the trend in AI and machine-learning-based OSINT?

This question is asked to identify growth patterns in AI-based OSINT research. Given the current need to analyse vast amounts of open data, it is important to identify if the research community is contributing sufficiently to this field of research and if any events have impacted research output. The research community must be sufficiently covering this topic to assist both research and industry in their OSINT efforts. To answer this question the publication date field from the article metadata will be utilised.

3.5.2 What geographical regions are contributing the most to this area of study?

This question will be answered based on each author’s affiliated institution. Multiple institutions could be recorded for each paper. The question seeks to address whether there is sufficient linguistic and cultural contextual spread in AI-based OSINT research. This is important to note, for example, that elements such as regional slang and dialect could influence results when AI models are employed in text-based tasks. For example the previously mentioned monitoring of hacker forums. This could have implications for criminal, cyber threat intelligence and general intelligence investigations.

3.5.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

This question is qualitative as it diverges from the analysis of the article’s metadata. Answering this question will help identify the variance in industry end use for the proposed models. Ultimately it will identify if there are any limitations or gaps in the assessed research articles concerning the domain of final use. This can be used to identify future directions for which there is limited research material.

3.5.4 What machine learning algorithms, techniques or tools are being used in OSINT?

This question seeks to understand what algorithms are being applied to OSINT use cases. It also seeks to identify other tools and techniques that are being used by researchers. This could provide other researchers with potential starting points in terms of tools, techniques and algorithms for their research. The question will be answered by recording identified tools, algorithms and techniques used in models or experimentation within the included articles.

3.5.5 What phases of the intelligence cycle does the included research apply to?

Taking into consideration the advice provided by the Rand Corporation, it is important to understand which phase of the intelligence cycle AI models are being applied to. This provides context to establish their performance in terms of how they support the next phase of the intelligence cycle. Understanding this would be beneficial in appraising models for use in both industry and government settings. This question also supports academic research in identifying if there are research gaps in the context of the intelligence cycle. To answer this question each paper was analysed to identify which intelligence stage is most appropriate. If the research model included multiple stages, each stage was included in the results.

3.5.6 What metrics and analysis are provided to evaluate system performance in the included research?

This question seeks to understand how researchers are currently evaluating their proposed experimental models. Presentation of results that demonstrate how models fit into the intelligence cycle would be beneficial. Identifying any research gaps in this area can potentially provide potential future directions in terms of model evaluation. The question is answered through analysis of metrics provided in the results section for papers and identifying any further evaluation analysis.

3.5.7 What OSINT tools are applied in the included research?

This question is of particular interest to industry and practitioners as it can be used to evaluate the extent to which AI models can be integrated with pre-existing non-AI OSINT tools. The question is answered through identifying in OSINT tools described in the included papers.

4 Results

The following section details the results of the methodology described in Sect. 3. It is broken into two sections, the first of which details the results of the systematic review using the outlined keywords and shows the results obtained at each stage of the review process. The second section addresses the research questions that were defined at the start of the review process and provides graphical representations of the data collected from the included papers.

4.1 Systematic review results and paper selection

The initial database search returned a total of 54,138 articles. Screening of the title, keywords and abstract reduced this number to 415, or 213 with duplicates removed. 84 additional articles were identified through references, providing a total of 297 articles. 27 articles were removed after assessing article quality and 123 were removed after comparing criteria to the full text. A further 11 were removed after comparison to the quality questions. A total of 163 articles have been included in this review. Figure 4 shows the PRISMA flow diagram for the number of articles identified at each stage of the review and is divided into search, screening, exclusion and inclusion sections in line with the PRISMA framework.

Fig. 4
figure 4

The Prisma flow diagram outlining the steps followed for this systematic review. A total of 163 articles passed the selection criteria and were included in the study

4.2 Research questions

This section reports the analysis of the systematic literature review results framed along the eight research questions presented in Sect. 3.5.

4.2.1 What is the trend in AI and machine-learning-based OSINT?

There has been significant growth in the number of research articles that fit the criteria for this study, this positive trend can be seen in Fig. 5. The volume of research in this area peaked in 2019 at 33 articles but this was followed by a decline of about 21.2 percent in 2020.

It is possible that the COVID-19 pandemic slowed the growth of research in this area post-2019. The initial stages of the pandemic correlate with a decline in time spent on scientific research and reduce the number of new scientific research projects [53]. Concerning the assessed articles of this study, there appears to be a more significant decrease in conference papers between 2019 and 2020, with conference publications decreasing by about 22 per cent, while journal articles remain relatively stable. This could be due to disruption in academic conferences during the initial pandemic stages.

Computer science conferences appear to have adapted well in this period but a study into this question showed that 23.5 per cent of computer science conferences were postponed or cancelled during this period [169]. This appears to be a similar level of decline when compared to the decline observed in this study between the years 2019 and 2020.

This may provide some insight into the reduced growth in research for this study in 2020 but the impacts of COVID-19 on academic research is ultimately outside the scope of the systematic review and a larger data set would be required to offer definite conclusions. The polynomial trend line in Fig. 5 shows that this period represented a slowing of growth, as the trend is still positive over this period based on past data.

Fig. 5
figure 5

Number of research articles combining AI with OSINT by year. There are a significant number of articles that post-date previous systematic reviews. The positive trend demonstrates significant growth in this area of study

4.2.2 What geographical regions are contributing the most to this area of study?

A substantial portion of the research that has been included in this paper has occurred in either the United States or India. The United States and India combined account for about 44 per cent of the research included in this study. The United Kingdom and China are also driving research in this area. The top four countries that are contributing to this area of research account for about 57 per cent of the included research papers. The heat map provided in Fig. 6 demonstrates the global distribution of the research included in this study.

Figure 6 additionally shows the total number of articles from the top fifteen countries producing research that fit the criteria of this study. These countries account for 150 of the total 163 included articles, or about 92 per cent of all the included research articles. Countries from the continents of North America, Asia and Europe are included on this list. The United States is the largest producer of included research in North America, with Canada being second in that region. India is the highest-ranked country in Asia for research combining AI with OSINT, with China being the second. In Europe, the United Kingdom is the highest-producing country for research in this field, with Italy being the second highest. There is limited research that meets the criteria of this systematic literature review is being undertaken in the continents of Africa and South America. Only Argentina, Columbia, Egypt and South Africa contributed some papers from these regions.

Overall there seems to be a fair distribution of papers in terms of geographic location. However, there are concentrations of papers in the United States, India and the United Kingdom, two of which are predominantly English-speaking with cultural similarities. The reduced number of papers in areas such as Africa, Central Asia, Eastern Europe and South America could indicate limitations based on cultural context. This would be particularly important for OSINT applications leveraging natural language processing that are concerned with law enforcement or intelligence operations. Contributions from academics in these areas could help produce truly universal AI-based OSINT systems dedicated to these forms of investigation.

Fig. 6
figure 6

Geographic distribution and number of articles produced on a per-country basis for research that incorporates OSINT with AI. Out of a total of 163 articles included in this review. Large amounts of AI-driven OSINT have been undertaken in the United States and India, which together account for 44 % of the research included in this review

4.2.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

There is less variation in the domains that could apply the included research when compared to the specific applications. The majority of the included research can be applied to either security operations, law enforcement, intelligence services or cyber threat intelligence operations. Figure 7 breaks down the included research by the potential domain of use. There are few papers leveraging AI for OSINT purposes that apply to the domain of penetration testing. This presents a research opportunity as an important aspect of penetration testing is collecting publicly available data about the target so that the penetration tester can gain knowledge of the potential attack surface [210].

One specific piece of research that could assist penetration testers in their projects is presented in the paper [187], this conference paper utilises OSINT collected from the National Vulnerability Database to train several classifiers to identify Structured Query Language (SQL) injection vulnerabilities from online texts. The model can be used to scan OSINT sources such as tweets and websites to identify SQL injection-related information. There are also a few papers that are specific to the defence domain although this could be due to the classified and sensitive nature of such research and some defence research may not be available through normal academic sources.

Fig. 7
figure 7

Types and functions of organisations that could benefit from the included research that combines AI with OSINT applications. Some research papers could be used in multiple organisation types

Much research that would be well suited for the defence sector is also well suited for other domains such as intelligence services and government departments. The journal article [58] provides a demonstration of this by proposing a tweet classification system based on various machine learning algorithms that identify the exposure of sensitive or classified information on social media. This system has obvious applications to government, defence and the intelligence sector but such a system could also be utilised by security operations conducted in the private sector. Ultimately the results of this systematic review show that many systems employing AI methods with OSINT can be utilised across multiple domains and are not confined solely to government, defence or intelligence service projects.

Figure 8 shows the main applications and functions of research that combines machine learning with OSINT. The uses for the systems being used in the research are varied but a substantial portion covers either sentiment analysis, cyber threat intelligence, domain name generation algorithm (DGA) detection or OSINT investigations. There are also several other uses such as cyber attack prediction, natural disaster management and human trafficking investigations.

The paper [13] provides an interesting example of how a particular application leveraging OSINT with AI can be applied to specific domains. This research utilises Twitter as an OSINT source and leverages machine learning algorithms to identify hate speech using sentiment analysis. The experiments in this paper show that deep learning methods outperform n-gram techniques in the identification of hate speech. Such systems could be incorporated into law enforcement investigations to identify individuals or groups engaging in such behaviour online, this intelligence could be utilised to monitor these people or groups and make an assessment on if a potentially violent act is about to occur, or if the law has been broken regarding the harassment of particular groups in society.

Fig. 8
figure 8

Core functions and objectives of the proposed AI-based OSINT systems were found in the included research articles

Another major application of AI systems utilising OSINT is for DGA detection. In [263] the researchers demonstrate how automated detection of malicious domains can save significant time for security operations by removing the need to manually blacklist malicious domains. The proposed machine learning model has been trained on OSINT collected from DNS and ultimately provides analysis of domain names to confirm if they are malicious or benign. This can be used to prevent malware from attempting to communicate with command and control servers. Such systems are an interesting application of OSINT as the open data is used for initial training but the final intelligence analysis is provided by the trained machine learning algorithm. The conference paper [8] applies OSINT with machine learning algorithms to improve organisational threat monitoring capabilities.

An example of an OSINT application leveraging AI that can provide proactive threat monitoring can be found in SYNAPSE. SYNAPSE can monitor Twitter for information on emerging threats concerning particular information technology hardware or software. The benefit of this is that information on new threats can be efficiently and automatically obtained before the information is made available on threat and vulnerability databases.

Fig. 9
figure 9

The Machine learning algorithms used in the included research articles. There is a high degree of variety in the algorithms used. This variation is due to the number of different tasks that can be performed when conducting OSINT operations

4.2.4 What machine learning algorithms, techniques or tools are being used in OSINT?

Figure 9 shows the types of AI algorithms and techniques being used for machine-learning-based OSINT research. This is similar to the results concerning particular domains and applications the research could be employed as there is a large variety of algorithms researchers have used in the included research. The five most commonly employed algorithms include Support Vector Machines, Naive Bayes, CNN, Random Forest and LTSM. The covered research includes models that are a combination of different algorithms, for example, the research paper by Choudhary, Sivaguru, Pereira, Yu, Nasciomento and Cock into Domain Generation Algorithm and Malware detection uses a CNN and LTSM combination [24]. Another example of this is the paper by Ravi, Kp and Poornachandran, which also uses a CNN and LTSM combination in their research into the categorisation of malicious domain names. There are also less known algorithms being used in the included research, such as LASSO, which was used in the research undertaken by Kumar and Babu into prepossessing for sentiment analysis [190] and Annotated Probabilistic Temporal Logic (APT), which was used by Marin, Almukaynizi and Shakarian in their work into the prediction of cyber-attacks [151].

Support Vector Machines are commonly employed for tasks that involve the classification of text. Researchers based in Canada have demonstrated this by using Support Vector Machines and Linear Support Vector Machines for the classification of fake news [4]. The proposed model works by analysing information such as article text, type, title and date and uses Term Frequency-Inverted Document Frequency for information extraction, which provides a measure of how critical particular terms are to a particular text. The Support Vector Machine algorithms are then employed to perform the final classification. This form of system leverages OSINT found in news articles and aids in upholding national security. This is particularly true in democracies where fake news is being used to undermine faith in democratic institutions and manipulate elections.

CNN can be employed for OSINT tasks that involve the classification and identification of images. PicHunt is a system that can be used to analyse images to identify evidence and location from pictures obtained from social media [62]. The results showed that CNN outperformed other state-of-the-art processes for this image classification task. The model is particularly well suited to law enforcement to identify evidence and the location of where specific events have occurred. The results of this systematic review show that a wide variety of algorithms are used to analyse OSINT sources or perform OSINT tasks. This is mainly due to the large variety of tasks that can be performed concerning OSINT and particular algorithms are better suited for different tasks.

A small number of machine learning tools, frameworks and libraries were also identified in the included research. Some of thesse include Apache OpenNLP [7], TensorFlow [6], SentiSAIL [12], CoreNLP [117] and NLTK [16, 94, 139, 176, 189]. The Natural Language Toolkit (NLTK) [18], was fairly prevalent with about 5 included research articles using this tool. More information concerning this would be beneficial to further research and industry as it would improve the ability of others to recreate processes, procedures and models that utilise OSINT in combination with AI.

4.2.5 What phases of the intelligence cycle does the included research apply to?

The vast majority of Machine Learning research included in this study applies to either the processing or analysis phases of the intelligence cycle. From the reviewed material no research covers the planning or evaluation phases of the intelligence cycle. There is a sizeable sample of research covered in this study that covers the collection phase and only one paper identified that includes dissemination as part of a proposed model. The distribution of papers in regards to the stage of the intelligence they can be applied to is represented in Fig. 10.

The machine learning-based system in [44] is the sole piece of research that could be applied to the intelligence stage of dissemination. The proposed model covers the stages of collection, processing, analysis and dissemination with a focus on social media data to identify and prevent national security incidents. The dissemination part of the intelligence cycle occurs when the system generates alerts and distributes them to the relevant audience based on jurisdiction and classification.

Fig. 10
figure 10

Stages of intelligence cycle that apply to AI to OSINT applications research. Currently, OSINT AI research is focusing on the processing and analysis of open data. Two models in the included research included methods for dissemination of the gathered intelligence by sending the information to the appropriate parties. Many of the models that collect data also process data by only collecting data the algorithm deems to be relevant

An example of a paper that provides support for the processing stage of the intelligence cycle can be found in [277]. The system makes use of data originating from automated threat intelligence applications, as well as, OSINT sources such as cyber security blogs, websites and malware databases. The automated system collected over 25,092 cyber threat intelligence reports that were available from open sources, these reports are then labelled by the system so that the security operations team can efficiently analyse the data and decide if further mitigation is necessary. This paper appears to provide a useful system for intelligence operations as it clearly defines which phase of the intelligence cycle it applies to and how it assists operations conducted at the subsequent phase of the process.

One reason for this predominance of the processing and analysis phases can be identified through the algorithms utilised in the research. Most are well suited to classification-based tasks, using either binary or categorical cross-entropy. These are well suited to either the processing or analysis of data and also could provide dissemination support by classifying on a need-to-know basis. They are however not well suited to tasks such as planning or evaluation.

Developments since 2021, including the increased use of large language models such as GPT-4 [1] and AI-based scheduling tools such as motion [167]. There are also references within academic literature to utilising AI for efficient scheduling-based tasks. In [110] heuristic-based algorithms and large neighbourhood search are utilised for maintenance scheduling for offshore oil and gas platforms. Similar methods could be used for the planning and scheduling of OSINT investigations or operations. An interesting avenue of research could be provided by identifying how these tools and models could be incorporated into the planning phase of the intelligence cycle and integrated with classification models for processing, analysis and dissemination.

4.2.6 What metrics and analysis are provided to evaluate system performance in the included research?

Figure 11 shows the metrics used for evaluation in the included research. Nearly half the metrics identified for the evaluation of machine learning systems included in this study are accuracy, precision, f1-Score, recall and area under curve (AUC). These five metrics are some of the most commonly used for the evaluation of machine learning systems [280]. There are also a smaller amount of papers providing totals for metrics such as true positive rate, false positives and receiver operating characteristics. It should also be noted that some papers fail to provide a full accounting of the performance of the algorithm in terms of these metrics, which makes it harder to conclude how suitable the system is to complete its intended task. For example, in the paper [9] an analysis is given comparing the true positive rate to the true negative rate but no precise figures are given for precision, recall or f1-score. Including these results would make it easier to assess the suitability of their system for processing cyber threat intelligence. A further example of this is in the conference paper [57] which only provides figures for accuracy and precision but fails to account for recall and f1-score.

Fig. 11
figure 11

Metrics used for evaluation of machine learning-based systems in OSINT applications research

There are some examples in the included research that provide results related to the intended use of the system or system objectives. In [33] cyber threat intelligence is performed using data from Twitter and the researchers can provide information relating to vulnerabilities that were discovered before their inclusion in vulnerability databases. A further example of this is in which provides measurements on whether an investigated individual is engaging in cyberstalking. Providing measurements such as these can help potential users evaluate the usefulness of a system in terms of real-world outcomes and is especially useful information to non-technical stakeholders or team members. These are however limited cases, researchers should endeavour to evaluate their system based on the the objectives of a particular phase of the intelligence cycle. This would better place the systems for use in real-world OSINT operations.

Fig. 12
figure 12

Types of OSINT utilised included research articles. Social media is the largest direct source of data used in the included research. Significant amounts of the included research make use of OSINT data sets

4.2.7 What are the sources of OSINT used in the included research?

Figure 12 shows the types of OSINT used in machine learning and AI-based OSINT applications, a significant proportion of the papers included in this study obtain OSINT sources through either online data sets or social media. Online news, websites, blogs and the dark web are also being utilised to some degree in the included research. A model that makes use of a social media data source is the TwitterOSINT application proposed in [83]. TwitterOSINT uses Twitter data to create real-time visual representations, this feature supports intelligence analysts by easily presenting the data while it is current and relevant. The proposed system leverages NLP to annotate the collected data so it can be further processed for search and final visualisation. This tool is particularly well suited to provide real-time threat intelligence to cyber security operations. An example of research using online datasets as the OSINT data source is the bidirectional LSTM models for DGA classification proposed in [11]. The datasets used consisted of OSINT data collected from DNS that has been flagged as malicious, this data was collected from a data set provided by netlab-360. This obtained data was used as the malicious domain component of the model’s training set.

Fig. 13
figure 13

The sources of OSINT utilised included research articles. Social media includes significant amounts of data gathered from Twitter. Online datasets include data retrieved from OpenDNS, OSINT feeds and the National Vulnerability Database

Figure 13 breaks down the direct sources of the OSINT data from the included articles. For social media, Twitter is the most prominent source of OSINT, followed to a lesser extent by Facebook and Reddit. Video-based social media is also present with YouTube being utilised in some research included in this study. Online data sets collected from services such as Alexa are also prominent, especially in DGA detection. Twitter is used quite predominantly for cyber threat intelligence and sentiment analysis. An example of Twitter being used for cyber threat intelligence is found in [220], which covers automating the collection and analysis of Twitter data for threat intelligence purposes. The system they propose is still in the concept phase. The researcher’s proposal includes the collection, processing and analysis of Twitter data to identify cyber threats before they appear in common vulnerability and exposure databases. The conference paper [5] provides an example of Twitter data being used for sentiment analysis, this is also an example of a system that is general in nature. The model is directed to the analysis of people’s opinions but could also be repurposed to identify posts that concern security incidents or users with malicious intent.

The results of this systematic review show that Twitter is over-represented in the collected data and this is most likely due to the ease of access to posts on the platform. This information can easily be obtained by researchers using the Twitter API [255] or a service such as Tweepy that leverages the API provided by Twitter [10]. The downside of this ease of access is the potential for researchers to overlook other social media sources for OSINT data, this may cause the underrepresentation of services such as YouTube and Facebook in the included research. Online datasets feature quite predominantly in systems being used for the detection of DGA. These systems use OSINT to train their machine learning systems to correctly identify malicious domain names. The systems will generally include separate data sets for benign domains and malicious domains, with benign domains being sourced from DNS and online data sets such as Alexa. Malicious domain data sets are then constructed from security databases such as 360netlab [208].

4.2.8 What OSINT tools are applied in the included research?

There were relatively few examples of pre-existing non-AI-based OSINT tools being incorporated into AI models identified in this study. Many of the researchers have elected to collect data manually, or through application programming interfaces such as the Twitter API [138, 190, 196], or by downloading preexisting data sets, such as UNSW-NB15 [103] or AmritaDGA [264]. There are some examples of OSINT tools being used in the included research in combination with AI, such as Tor Crawler [108], Shodan [267], ReKognition boto3 [251], SAIL LABS Media Mining System [12] and the Tweepy API [94]. In [136] Maltego is used in the prototype Twitter cyberbullying detection system to graph relationships between the various accounts in the collected data. This provides useful information on coordination between stalkers.

There are also a few OSINT tools being developed by researchers that combine AI directly into a proposed OSINT tool, for example, the previously mentioned TwitterOSINT [83, 220] provides a complete system that collects data from the Twitter API and then uses NLP to annotate the unstructured text. This then allows intelligence analysts to easily search and create a visualisation of the collected data. Additional research that either creates new OSINT tools utilising AI or that leverages the abilities of current OSINT tools would be beneficial to intelligence or security analysts. This is because these systems would link directly with their existing tool kits or add new applications that they can use in their existing processes.

5 Limitations, future directions and summary

The following sections outline identified limitations with current research utilising OSINT in combination with AI. First current research limitations are presented followed by future directions that specific researchers are undertaking to reduce the limitations of their research. A summary of findings will also be presented and finally the identified limitations of this study.

5.1 Research limitations

The following section outlines limitations that have been identified from the papers included in the systematic review.

5.1.1 OSINT tools

Very few studies use pre-existing OSINT tools in their machine learning research, while there are a few examples of OSINT tools being used in the included research the significant majority did not include the use of these tools in their AI-based OSINT research. A possible reason for this is the large number of research papers that are dedicated to the processing and analysis phases of the intelligence cycle, rather than collection. As many OSINT tools are used mainly for the collection phase for example theHarvester. This could provide a reason for their exclusion from the research [155]. There are OSINT tools that include functionality for processing and analysis, such as Maltego [147] but due to the nature of the research in this paper, the processing and analysis phases are being undertaken by machine learning algorithms, which are essentially replacing the functions undertaken by non-machine learning based OSINT tools. However, it appears researchers are ignoring the opportunity to preprocess their data using these tools which could refine data sets and enhance the capabilities of their final AI models. An opportunity to demonstrate how their systems can integrate with current tools is also being lost, which would be a great benefit to current users of OSINT tools.

There are sparse examples where researchers have created new AI-based OSINT applications that could be employed in research and industry. There are however a few examples of this. NoRegINT is a machine learning-based OSINT tool [105], its authors place it alongside current OSINT tools that are used for cyber reconnaissance such as Spiderfoot [159] and Maltego [147]. Currently, the proposed framework for NoRegINT uses Twitter, Reddit and Tumblr as OSINT sources and can perform sentiment analysis on the collected data. The tool is however far from complete and does not compare to the scope of data sources and analysis that is currently available in Spiderfoot or Maltego. Another example is TwitterOSINT outlined in [83, 220], although the AI-based functionality is limited to annotations and does not provide further analysis. Future research could focus on the development of an AI-based OSINT suite that can perform multiple functions across a variety of OSINT sources. Another potential research direction is to focus on demonstrating how current OSINT tools can be used in tandem with proposed AI systems. This would be especially beneficial to individuals utilising these OSINT tools in industry or intelligence circles.

5.1.2 Penetration testing

Only a small amount of the included research could be applied to the domain of penetration testing. Some examples of papers included in this study that could be applied to this domain include work by Layton, Perez, Birregah, Watters and Lemercier that links profile ownership between different social media accounts [121] and could provide useful information to test organisational resilience in the face of social engineering attacks. The mentioned research is of particular interest as it could circumvent an organisation’s stakeholders’ efforts to keep particular social media accounts anonymous. A further example is FastEmbed which determines the possibility of the exploitation of vulnerabilities using the LightGMB algorithm [48]. The proposed model uses OSINT collected from an exploit database for its training set and can complete its predictions when considering real-world exploits. This was done with an accuracy of 83 per cent. These two papers provide examples of how AI-based OSINT applications could be beneficial to penetration testing.

Despite these examples, the volume of research is limited in comparison to other domains such as cyber threat intelligence. Another paper of interest for this domain is presented in [205]. The researchers propose a model that generates fake cyber threat intelligence reports to distribute to threat intelligence vendors. This is of interest as such an attack could cause data poisoning of defensive cyber AI systems, or cause confusion within a security operations team leading to sets of actions that benefit the attacker. This paper was ultimately not included in the aggregated results for penetration testing as such a system would impact multiple threat intelligence vendors and organisations. This indiscriminate nature would lead to issues concerning the scope of the penetration testing exercise. However, penetration testers may want to consider how to test organisational resilience against such an attack.

5.1.3 Underutilised data sources

Research that uses data from social media appears to be heavily reliant on Twitter. This seems to be due to the ease of access to the Twitter API [255]. This appears to be true for various forms of applications, including cyber threat intelligence. To demonstrate there are multiple papers covering the collection of threat intelligence from Twitter including [16, 209, 220, 233, 254], with a total number of 80 papers obtaining data from Twitter. The number of papers using data obtained from the dark web is severely limited in comparison. It appears that the ease of access to data from Twitter may be driving an overreliance on this data source in the included research to the detriment of other potential sources.

In [57] a similar threat intelligence system is described that uses the dark web with similar functionality to the stated Twitter-based threat intelligence systems. The system looks specifically at hacker forums and potentially could provide more valuable and current data than Twitter. The same is true when comparing the number of papers using data from other social media sources such as Facebook.

There also appears to be no research into using data from newer social platforms such as Discord [37]. Discord could provide an interesting source of OSINT data for AI-based applications due to its community-based nature. Cyber security-related servers on Discord can be found using services such as Disboard [36]. Researchers could also make attempts to join private Discord groups to gather data, although this might raise ethical concerns.

Essentially the over-utilisation of a single data source due to its ease of access may skew research results and forfeit the ability to gain access to a wider variety and more complete set of data. In the case of OSINT for cyber threat intelligence this would include building AI-based systems that work with the dark web, community forums and newer social platforms. The same reasoning could also be applied to other applications such as behaviour profiling and hate speech detection. Future AI-based OSINT research should endeavour to include these data sources.

5.1.4 Dissemination

From the included papers there are limited examples of systems that include the dissemination phase of the intelligence cycle. An example of research that includes the dissemination phase of the intelligence cycle in their machine learning-based system is a paper by Nnaemeka Ekwunife of Marymount University [44]. The paper is a proposed model that gathers intelligence from social media and includes alert generation when intelligence relevant to national security is uncovered. This is the sole example of the included research that covers this aspect of the intelligence cycle. Further research could include this component as part of future AI-based models. Adding dissemination functionality to AI-based OSINT models would enable researchers to create fully automated systems that can work across multiple jurisdictions.

5.2 Future directions

The following sections detail future directions being undertaken by researchers in this area of study, some avenues present potential opportunities to overcome some of the previously identified limitations.

5.2.1 Multi-lingual capabilities

Some studies have identified the inclusion of multi-lingual functionality as an avenue for future research. AI-enabled OSINT applications with this ability could potentially operate at a global scale across multiple geographic regions. This potential future research is not isolated to a particular domain of research or application. For example, the proposed model in [14] conducts sentiment analysis on social media and the researchers plan to continue the research by training their model on larger datasets containing data from multiple languages. A further example of this can be found in the human trafficking detection model proposed in [22]. The authors of this paper also seek to add multi-lingual functionality to their system in the future. The avenue they propose for this future research is to add automatic translation to the model. Support for multiple languages is also a planned feature for future research for the natural disaster monitoring system proposed in [226] and in the area of cyber threat intelligence, the authors of [277] have included multilingual capabilities into their future research plans. Incorporation of multi-language support will be especially beneficial in domains that tackle problems at the global level, such as monitoring the cyber threat landscape and human trafficking.

5.2.2 Incorporation of additional data sources

Some papers have identified the limitations of relying on a single source of data and plan to incorporate multiple data sources in future research. In [233] the researchers plan to include data originating from Facebook to train their cyber threat intelligence model. The authors of [94], also plan to modify their misinformation detection model to include data sources in addition to Twitter and a further example can be found in [237], where the authors plan to increase the diversity of their dataset to improve their fake news and misinformation detection model. In [209, 220], the researchers plan to include additional social media data sources in their cyber intelligence system and the authors of [85] note that their use of data contained on MITRE ACC &CK [27] framework required expansion due to the nature of the MITRE ACC &CK framework. The framework relies on expert contribution so there is a time delay from when the threat materialises in the real world and from when the threat is reported. The researchers plan to include additional sources of OSINT to circumvent this data source issue. This provides an example that each OSINT data source presents unique strengths and weaknesses and incorporating multiple sources in future research will provide researchers with a means to mitigate any potential open data source shortcomings. Being able to include additional data sources will increase the monitoring surface of AI-based OSINT applications and assist in the creation of general OSINT suites that incorporate AI algorithms.

5.2.3 Robustness against data poisoning and misinformation

An interesting future direction of research is presented in [209] for their cyber event detection system. The avenue for future research is to check the ability of their system to perform robustly in the event of data poisoning attacks, or when dealing with fake accounts. This is vital for all areas of OSINT as the collected data is publicly available. Any AI system that relies on OSINT could be subject to incorrect or misleading information corrupting the normal functionality of the system. This could be unintentional exposure to misleading data, or in the case of a data poisoning attack, part of a purposeful and targeted cyber event that aims to corrupt the output of the AI system. As mentioned in the limitations section the paper [205] provides an example of an AI-based system generating fake cyber threat intelligence reports. Such a system in the hands of a cyber adversary could manipulate multiple organisations into performing a set of desired actions. An interesting avenue for future AI-based OSINT research could focus on how penetration testers test the robustness of such an attack. These tests could be based on current systems or systems that utilise AI for cyber threat intelligence or security operations.

5.2.4 Real world use

Several research teams intend to take steps to apply their OSINT-based AI research for real-world applications or are taking steps to integrate with currently used cyber security applications. In the cyber threat intelligence domain, the authors of [116] plan to improve the integration of their AI model into currently used intrusion detection and prevention systems. A large-scale operations evaluation of the platform will also be performed to assess any additional requirements that may be required before real-world use. This is a common future research aspiration for many of the studies included in this paper. A further example in the threat intelligence domain can be found in [220], who plan to trial their TwitterOSINT system in a security operations environment. Some researchers do not have plans for an operational implementation but are making improvements to their system so it is suitable in a real-world context [97].

5.2.5 Alert generation and dissemination

As previously stated, a limited number of papers include the dissemination of intelligence generated by AI systems utilising OSINT. However, some researchers have included developing this capability in future research. This includes future directions outlined for the model proposed in [141], which uses Twitter data to identify the online sale of narcotics. The researchers wish to pursue including script automation to generate reports that report the findings of their models to the Food and Drug Administration (FDA) and the Drug Enforcement Administration (DEA) using the required reporting templates of those organisations. Another example of this direction is the research team that created CyberDect. These researchers are seeking to improve their cyberbullying detection AI model by incorporating an alert system to notify the appropriate parties when bullying is occurring online [136].

5.2.6 The planning phase of the intelligence cycle

The planning phase of the intelligence cycles may also provide a future avenue for research. The majority of the models found in this study perform classification and are well suited towards the processing and analysis phases of the intelligence cycle. In regards to planning, there are some examples of algorithms dedicated to this purpose, such as the aforementioned research conducted in [110]. A potential future research direction would be to use a similar system for OSINT planning and integrate it with subsequent classifiers for collection, processing, analysis, and perhaps even dissemination.

5.3 Summary of findings

The results of the research questions presented in this systematic review show that there has been significant growth in research using AI for OSINT applications from 2011 to 2021. Despite a slowing of growth between the years 2020 and 2021 during the COVID-19 pandemic, the positive trend is maintained. Continued growth in this area of study is beneficial to business and industry as it becomes more vital to convert open data into actionable intelligence.

There is a fairly even geographic spread for the research covered by this systematic review, although, from the three highest-producing geographic areas, two have similar lingual and cultural backgrounds and some areas are producing limited research in this area which include South America, Central Asia, Africa and Eastern Europe. This is particularly important for systems employing natural language processing as cultural and lingual context will be vital to ensure system effectiveness.

This is important as this systematic review identified a large number of papers being utilised for law enforcement and intelligence purposes. Universal OSINT applications in these areas will need to be employed in areas with varying cultural contexts. It should also be noted the lack of papers focusing on the domains of penetration testing or offensive security, both of which may provide an interesting research opportunity for researchers investigating the use of AI with OSINT.

The study identified a wide range of different AI models and algorithms, which can be related to the various tasks needed for OSINT operations and investigations. However, most of these algorithms are used for classification tasks using binary or categorical cross-entropy. This can explain why the study also identified that the majority of the included research was focused on the processing or analysis phases of the intelligence cycle. These models are not well suited, or at the very least need to be adapted if they can be successfully implemented for planning and evaluation tasks. Researchers could also consider other models if they wish to investigate how these intelligence cycle stages can be served by AI models. Potentially using algorithms suited for planning tasks could be integrated into subsequent models employed in classification tasks.

This systematic review also identified there was a limited degree of integration of pre-existing OSINT tools with AI models. This could be beneficial to practitioners and industry by enabling them to integrate directly with their pre-existing tool set. There was also a small amount of research detailing work towards a fully AI-enabled OSINT tool. To create such a tool that is universal, or general. there will need to be contributions from various geographic regions, continued growth in this area of research, and direction taken to include the planning and evaluation stages of the intelligence cycle. This is most likely possible given the already large variety of OSINT tasks being performed by AI.

5.4 Limitations of this study

Several limitations can be outlined specifically concerning this systematic review. These can be highlighted by the domains of use and the scope of the study.

The domain research questions can be highlighted as containing some limitations. One such limitation is that the assessment of papers for their domain suitability can be subjective. This is particularly true when there is a degree of cross-over between domains. Some systems used for domain name generation algorithm detection have been designed for use within security operations centres but the systems could also be well placed for cyber threat intelligence operations. There is no way to directly quantify this cross-over between domains of application, so the assessment is qualitative and could be open to debate discussion in terms of the usefulness of the application to a particular domain. The same could be said for systems that are to be used in law enforcement investigations, intelligence services and government departments may also find the systems useful but once again there is no direct way to quantify this and it is open to some discussion.

The scope of the study is also wide and while this has uncovered a high number of interesting papers in this area, specific domains might be better served by a more focused review that focuses on a particular area. For example, a similar review could be conducted that focuses purely on law enforcement or the operations of intelligence agencies. The same could be done for focusing exclusively on AI-based OSINT applications used solely in a cyber threat intelligence domain of use.

6 Conclusion

This study completed a systematic review of current research combining AI algorithms for OSINT applications using the Prisma framework. 163 articles were identified for inclusion in the study. It was found that research that combines the fields of OSINT and AI is a growing area of research and currently, India and the United States are the main geographic centres engaging in this area of study. The identified systematic reviews in this area of study do not incorporate much of this growth as they do not include papers that post-date the year 2019.

OSINT combined with AI provides a method to reduce the workload of intelligence analysts across multiple domains working with large amounts of data. Various security-related domains could utilise this form of research including security operations, law enforcement and intelligence services. This variety of potential use is due to the wide range of potential applications that can leverage benefits from AI-based OSINT systems. This is made possible by the number and variety of available AI algorithms that are each suited to specific and different tasks.

There is limited research that covers the use of how AI can be used in conjunction with pre-existing OSINT tools and it appears there has been little significant effort to produce a multi-faceted AI-based OSINT tool that could combine the various models available. The majority of the included research fits into either the collection, processing or analysis phases of the intelligence cycle. There has been limited research concerning the dissemination phase, so it would be beneficial to include this in the form of alerts to relevant parties. Future research could help with dissemination by categorising pre-analysed OSINT based on which persons need to be notified. This could be based on jurisdiction or security clearance.

Social media and online datasets form a significant portion of the types of OSINT that are used in the included research. Research that uses social media appears to be heavily focused on Twitter and this is especially true of threat intelligence systems. This could be due to the ease of access to Twitter as a data source. Future research should seek to engage with alternate data sources such as the dark web or hacker forums. Newer platforms such as Discord appear to be completely absent and could provide researchers with novel sources of data in the future.

It was also found that there is a limited volume of research relating to combined OSINT AI applications that could be utilised for penetration testing purposes. OSINT is a valuable source of information for penetration testers and the domain could benefit from research that combines AI with OSINT in this context. Future research in this area is progressing in several directions that could either address some of these limitations or provide novel directions. These future research pathways include adding multi-lingual support to OSINT AI models, incorporating additional data sources, improving model robustness against misinformation and data poisoning, testing platforms in real-world situations and finally adding alert generation and dissemination functions.