A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications

Browne, Thomas Oakley; Abedin, Mohammad; Chowdhury, Mohammad Jabed Morshed

doi:10.1007/s10207-024-00868-2

A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications

Regular Contribution
Open access
Published: 05 June 2024

Volume 23, pages 2911–2938, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Information Security Aims and scope Submit manuscript

A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications

Download PDF

Thomas Oakley Browne¹,
Mohammad Abedin¹ &
Mohammad Jabed Morshed Chowdhury¹

1138 Accesses
Explore all metrics

Abstract

This paper presents a systematic review to identify research combining artificial intelligence (AI) algorithms with Open source intelligence (OSINT) applications and practices. Currently, there is a lack of compilation of these approaches in the research domain and similar systematic reviews do not include research that post dates the year 2019. This systematic review attempts to fill this gap by identifying recent research. The review used the preferred reporting items for systematic reviews and meta-analyses and identified 163 research articles focusing on OSINT applications leveraging AI algorithms. This systematic review outlines several research questions concerning meta-analysis of the included research and seeks to identify research limitations and future directions in this area. The review identifies that research gaps exist in the following areas: Incorporation of pre-existing OSINT tools with AI, the creation of AI-based OSINT models that apply to penetration testing, underutilisation of alternate data sources and the incorporation of dissemination functionality. The review additionally identifies future research directions in AI-based OSINT research in the following areas: Multi-lingual support, incorporation of additional data sources, improved model robustness against data poisoning, integration with live applications, real-world use, the addition of alert generation for dissemination purposes and incorporation of algorithms for use in planning.

Artificial Intelligence Use in e-Government Services: A Systematic Interdisciplinary Literature Review

A systematic review of artificial intelligence impact assessments

Article Open access 24 March 2023

Auditing large language models: a three-layered approach

Article Open access 30 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the growth of the digital world the scale of data that is stored, transmitted and processed is increasing at an exponential rate. The International Data Corporation estimates that 64.2 zeta bytes of data were created or copied in 2020 [87] and in 2019 just over 50 per cent of the world’s population used the internet [91]. This includes a large amount of information that is publicly available related to individuals, states and organisations. This open data can be utilised in intelligence operations for a wide range of applications and can be especially useful in security contexts such as national security, law enforcement, defence, cyber security and threat intelligence operations. Open-source intelligence (OSINT) is created when this publicly available and open data is processed into actionable intelligence and distributed to the relevant parties [273]. This process is known as the intelligence cycle and consists of planning, collection, processing, analysis, dissemination and evaluation [90]. The size and scope of online open data sources can cause problems for OSINT investigations in addition to the opportunities presented.

Utilising artificial intelligence (AI) can provide methods to overcome some of the challenges presented by this vast source of information by automating certain components of the intelligence cycle. For example, the paper [44] investigates the automation of identifying national security incidents from data gathered on social media, which is a frequently used data source for online OSINT. The paper proposes the use of a machine learning technique known as clustering to group similar pieces of information. In the intelligence cycle, this relates to the processing stage, as the data is grouped and classified before more thorough analysis. Using AI techniques in this manner can help reduce the time and workload of intelligence analysis by grouping data into relevant categories. A further example can be found in [260], in this paper, the researchers construct a malicious domain name detection system using recurrent neural networks (RNN) and long short-term memory (LTSM) machine learning algorithms. To train their system various OSINT sources are appropriated, such as OSINT feeds for malicious domain names and the Domain Name System (DNS) for benign domain names. Ultimately the system provides processing and analysis for the intelligence process as it classifies domain names but ultimately provides a conclusive output on which domain names are malicious. A further example can be found in the article [17], which details the use of machine learning for the processing of OSINT. The article compares the Naive Bayesian and K-nearest Neighbour machine learning algorithms to process and classify data for information retrieval. This was performed using online news articles as an OSINT source, with the performance of each model being compared against various categories for information retrieval such as people, topics and places.

Despite the extensive research combining OSINT practices and sources with AI, there are limited systematic reviews concerning the topic. Two examples that cover this form of academic material include the paper [46], which conducted a review that directly covers the application of OSINT to machine learning and AI practices and [42], which covers the use of data mining technology for law enforcement and includes articles covering the use of OSINT concurrent with AI. However, these earlier systematic reviews lack discussion about sources of OSINT, OSINT tools and types of AI techniques used in each phase of the intelligence process. Therefore, this paper contributes by focusing research questions in these areas and identifies research limitations relating to AI-based OSINT applications. The paper is outlined as follows: Sect. 2 provides the background knowledge related to OSINT and the review methodology is presented in Sect. 3. The answers to the research questions are presented in Sects. 4 and 5 presents identified limitations, future research directions and a summary of findings. Finally, the paper concludes with Sect. 6.

2 Background

OSINT refers to intelligence that has been gathered from freely available and non-classified sources and can be found online and offline. Various processes and tools can be applied to assist with various stages of the intelligence cycle. AI provides a mechanism that can assist these processes as AI mimics the cognitive ability of humans and can reduce the workload of tasks requiring the processing and analysis of significant amounts of data. There are a variety of non-AI-based OSINT techniques and tools that can be used to assist in OSINT and intelligence processes. Combining AI with OSINT tools, processes and practices can provide a solution for intelligence operations to efficiently navigate the substantial amount of open data available online that can be used for OSINT purposes.

2.1 Open source intelligence (OSINT)

OSINT is not a new area of interest for intelligence operations, before the existence of the world wide web, intelligence services utilised offline OSINT by monitoring newspapers and radio broadcasts to gain vital intelligence. One of the first attempts at methodically collecting OSINT was during World War Two when the US established the Foreign Broadcast Information Service to monitor public broadcasting orientating from adversarial nations [221]. The introduction and growth of the internet have increased the scope of potential sources of OSINT. Figure 1 outlines some of these OSINT categories for both offline and online OSINT and demonstrates the extensive scope of OSINT particularly in the digital domain.

OSINT can be collected from multiple online sources, including social media, search engines, websites, online forums, company directories and online databases [75]. When focusing on online OSINT sources, OSINT operations must use and develop techniques to identify, collect, categorise and analyse the relevant information. There are a variety of tools available to OSINT practitioners and researchers to assist with these tasks. Even when utilising these tools the amount of information gathered can be substantial, leading to difficulties concerning the classification and analysis of the collected data. AI and machine learning can provide a mechanism to assist OSINT operations in the collection, processing and analysis of online OSINT.

2.2 The OSINT process and intelligence cycle

As OSINT has been a source of information for intelligence operations for many years, pre-existing high-level intelligence processes can be used in the modern digital landscape. There are multiple versions of this process used by different organisations. The Rand corporation proposes a simplified four-step process designed to fit into the intelligence cycle [273]. The steps outlined by RAND include collection, processing, exploitation and production. Figure 2 outlines this high-level four-step process. This four-step process would be used within the intelligence cycle as defined by the US-based Office of the Director of National Intelligence [90] and combined with other intelligence disciplines. The intelligence cycle can be described with the chart in Fig. 3 and includes planning, collection, processing, analysis, dissemination and evaluation. AI-based OSINT systems could be categorised based on which step in this higher-level process they are assisting.

According to the RAND Corporation, a highly respected US-based research organisation that specialises in subjects concerning national security [217], metrics alone are not sufficient to gauge the effectiveness of an AI-based system employed for intelligence purposes. The RAND corporation proposes that the performance of such systems must be evaluated based on the value they provide to the next stage of the intelligence cycle [90]. It is also important to note that intelligence operations are not reserved exclusively for intelligence agencies. This is particularly true in the area of OSINT, for example, cyber security professionals can leverage data collected from Twitter to improve their threat intelligence capabilities [220] and law enforcement can monitor online marketplaces to identify the sale of illegal and controlled narcotics [141]. Business intelligence is a growing area of interest and is driven by the growth of available digital data, this is usually defined under the term big data. As such businesses have had to create processes similar to the intelligence cycle for their intelligence operations [15].

2.3 OSINT techniques and tools

There is a wide variety of tools available to support OSINT operations throughout the intelligence cycle. These include comprehensive tools that can manage entire investigations, such as Spiderfoot [159] and Maltego [147], or they can provide single functions to assist OSINT operations, for example, TheHarvester, which provides email scraping functions [155]. Broadly speaking OSINT tools and techniques can fit into the different stages of the intelligence cycle. TheHarvesters scraping functions would fit into the collection stage of the intelligence cycle and Spiderfoot and Maltego also provide processing and analysis in addition to the collection of open-source data. Tables 1 and 2 outline a number of tools and techniques employed in OSINT processes.

Effective tools are essential for any OSINT operation to reduce the investigation workload. An example of this during the collection phase of the intelligence cycle would be employing the tool known as Sublist3r [2]. Sublist3r makes use of various OSINT sources such as search engines and DNS to efficiently enumerate the subdomains of a target organisation. An OSINT operation would waste significant amounts of time attempting to replicate this process manually, especially if the target is a large organisation with numerous subdomains. Tables 1 and 2 provide a demonstration of commonly used tools and techniques. The OSINT framework can also be used to obtain a comprehensive list of OSINT tools [51]. It is important to note that most of these tools do not currently leverage AI algorithms.

Table 1 These examples of OSINT tools are provided for background purposes. For a comprehensive list of OSINT tools, sources and frameworks please see the OSINT framework [51]

Full size table

Table 2 Popular OSINT techniques. These techniques can be implemented through a variety of manual methods and tools

Full size table

2.4 Artificial intelligence and machine learning

AI is the emulation of human intelligence by machines. This is usually implemented in the form of code being processed and executed by computers [98]. AI in computing usually consists of various algorithms which attempt to mimic aspects of human thinking and cognition. One particular domain of AI is machine learning. Machine learning impersonates the way humans learn in terms of inferring, categorising and analysing data, the benefit is that a machine learning algorithm can do this with greater speed, efficiency and accuracy than a human. One example of this is the use of convolutional neural networks (CNN) in computer vision tasks. CNN algorithms can analyse images piece by piece to identify features common to a particular set of images and be used to categorise images or identify objects in images [171]. To an extent, CNN algorithms can be used to mimic the human ability of sight and provide an example of how machine learning algorithms can be used to complete tasks usually performed by a human operator. A further example of this comes from the application of machine learning algorithms for natural language processing (NLP). NLP techniques can be employed for a large range of tasks that in the past would have been reserved for a human analyst. NLP can be used to derive meaning from human speech and text and can be used for sentiment analysis, to understand people’s feelings or opinions on a subject. NLP can also be used to efficiently extract knowledge and information from bodies of text [119]. NLP and CNN algorithms are only two examples of how AI and machine learning can perform tasks usually conducted by humans. The full range of available AI and machine learning techniques is extensive but these examples show the potential for machine learning algorithms to reduce the workload of human analysts with specific tasks.

2.5 Combining AI, machine learning and OSINT

The ability of AI to perform human tasks provides a significant opportunity to improve the speed and efficiency of systems utilised in OSINT operations. Recently academic researchers have been utilising AI and machine learning in combination with OSINT sources and practices. Some use cases include the detection of malicious domain names [102], misinformation detection [94] and threat intelligence through monitoring online hacker forums [57].

These examples show that this combination of AI and OSINT opens a powerful new tool for OSINT operations. Incorporating AI and machine learning with OSINT is enabling the dissemination of an otherwise insurmountable amount of data and provides a means of accelerating investigations and bolstering intelligence gathering and analysis. In addition to this, open data available online can also be utilised as training sets for machine learning algorithms to assist in various security and intelligence applications and generate actionable OSINT. This is frequently the case in models used for the detection of maliciously generated domain names which are trained on data gathered from DNS [276].

Combining AI and machine learning with OSINT presents many challenges. The RAND Corporation has recently identified that judging the performance of AI use in the intelligence process does not rest solely on metrics such as how accurate the algorithm’s results are. The AI needs to be judged on the outcomes it provides to the intelligence process [90]. Put another way, even if an AI implemented in the intelligence process is correct 99 per cent of the time, it will still fail if the outcomes it provides to the intelligence process are insufficient, or worse still, result in catastrophic errors. The RAND Corporation believes this concept should be applied when evaluating any AI or machine learning algorithms that are being applied in combination with OSINT or other intelligence operations.

3 Methodology

This study makes use of the PRISMA guidelines for conducting systematic literature reviews. Two researchers conducted the review and evaluated each other’s results. The methodology utilised to identify, include, exclude, evaluate and review literature for this paper consists of the following steps:

1.
Search academic databases using selected keyword set.
2.
Initial evaluation of search results to exclude non-relevant articles by appraisal of the article keywords, abstract and title.
3.
Remove all duplicate articles.
4.
Include or exclude identified articles with the defined criteria through appraisal of the full text.
5.
Use quality questions to identify high-quality articles and exclude low-quality articles.
6.
Use research questions to collect the appropriate data from the included research
7.
Record collected data.
8.
Synthesise collected data through graphical display and description of findings.

The Number of articles was recorded at each stage of the process to show aggregated results throughout the review process. Articles in foreign languages were translated so that data from these articles could be included in the results.

3.1 Academic database search

To initially identify potential articles for review, several online academic databases were queried. The following databases and academic search engines were used:

1.
IEEE [88]
2.
Scopus [222]
3.
ACM [26]
4.
Science Direct [35]
5.
Springer Link [238]
6.
Google Scholar [66]

The keywords identified for use in this study include OSINT, machine learning, artificial intelligence and cyber reconnaissance. An initial search was run using the “OR” operator to identify the total number of articles relating to these keywords available on the selected databases and academic search engines. This was refined by searching for exact matches of the specific keywords, to identify the total number of articles available for each term. Once this was completed the “AND” operator was used to identify articles that relate to either OSINT and machine learning/ artificial intelligence, or cyber reconnaissance and machine learning/ artificial intelligence. An example search would be “OSINT” AND “Machine Learning”. Total numbers and search dates were recorded for each search to construct flow charts that break down the total number of articles related to these keywords. All keyword searches are included below:

1.
OSINT OR cyber reconnaissance OR artificial intelligence OR machine learning
2.
OSINT OR artificial Intelligence
3.
OSINT OR machine Learning
4.
cyber reconnaissance OR artificial intelligence
5.
Cyber reconnaissance OR machine learning
6.
OSINT
7.
Cyber reconnaissance
8.
Artificial intelligence
9.
Machine learning
10.
OSINT AND artificial intelligence
11.
OSINT AND machine learning
12.
Cyber reconnaissance AND artificial intelligence
13.
Cyber reconnaissance AND machine Learning

Only articles using the AND operator were considered for the initial evaluation. Results for the other searches will be included in the appendix of this paper for further review.

3.2 Initial evaluation

When each search was entered the number of articles was compared between the reviewers to assess if the search had been run correctly. Once this was done an initial evaluation was conducted to remove unrelated articles. This evaluation was completed by reviewing the keywords provided by the articles, the abstract and the title to assess if the article matches the required subject matter. If an article matched the required keywords of the search it proceeded to the next stage of the review. The review of the search results was terminated once the search results started returning significant amounts of non-relevant material.

Once articles had been identified through the initial evaluation, they were added to a spreadsheet, and any duplicate papers were identified and removed from the list. These articles then proceeded to be selected based on our quality questions and article criteria.

3.3 Article criteria

Articles that proceed from the initial evaluation will be assessed for inclusion using the following criteria:

1.
Article must post-date the year 2011.
2.
Article must be a journal article, conference proceeding, technical report or academic archive.
3.
The article must include the subject matter of using machine learning or AI in combination with OSINT sources or processes in a security context. A security context could relate to either national security, cyber security, law enforcement or intelligence operations. Systems that are general but could be used in a security context will also be included in this study.
4.
The article must cover the use of a machine Learning or AI algorithm, proposed model or framework through:
1. a.
  Real-world use.
2. b.
  Experimental use.
3. c.
  Proposed or theoretical use.

3.4 Quality questions

Once articles were identified to adhere to the selection criteria they were assessed on their level of academic quality. The following questions were considered when determining the quality of a paper:

1.
Is the article peer-reviewed?
2.
Does the article include a repeatable methodology?
3.
Does the paper provide a statistical analysis of results?
4.
Does the paper contain appropriate references?
5.
Are details of any proposed models or frameworks provided?

3.5 Research questions

Once the articles were identified for inclusion in this study the following research questions were asked to extract data and information from the selected articles. If multiple categorisation occurs, both categories will be included in the total figures. For example, if researchers collaborate from multiple countries, both countries are recorded in the final results. The same process is followed for multiple data sources, professionals and organisations, algorithms, intelligence cycle phases, metrics, OSINT sources and tools:

1.
What is the trend in AI and machine-learning-based OSINT?
2.
What geographical regions are contributing the most to this area of study?
3.
What professions and organisations could benefit from AI and machine-learning-based OSINT applications?
4.
What machine learning algorithms, techniques or tools are being used in OSINT?
5.
What phases of the intelligence cycle does the included research apply to?
6.
What metrics and analysis are provided to evaluate system performance in the included research?
7.
What are the sources of OSINT used in the included research?
8.
What OSINT tools are applied in the included research?

3.5.1 What is the trend in AI and machine-learning-based OSINT?

This question is asked to identify growth patterns in AI-based OSINT research. Given the current need to analyse vast amounts of open data, it is important to identify if the research community is contributing sufficiently to this field of research and if any events have impacted research output. The research community must be sufficiently covering this topic to assist both research and industry in their OSINT efforts. To answer this question the publication date field from the article metadata will be utilised.

3.5.2 What geographical regions are contributing the most to this area of study?

This question will be answered based on each author’s affiliated institution. Multiple institutions could be recorded for each paper. The question seeks to address whether there is sufficient linguistic and cultural contextual spread in AI-based OSINT research. This is important to note, for example, that elements such as regional slang and dialect could influence results when AI models are employed in text-based tasks. For example the previously mentioned monitoring of hacker forums. This could have implications for criminal, cyber threat intelligence and general intelligence investigations.

3.5.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

This question is qualitative as it diverges from the analysis of the article’s metadata. Answering this question will help identify the variance in industry end use for the proposed models. Ultimately it will identify if there are any limitations or gaps in the assessed research articles concerning the domain of final use. This can be used to identify future directions for which there is limited research material.

3.5.4 What machine learning algorithms, techniques or tools are being used in OSINT?

This question seeks to understand what algorithms are being applied to OSINT use cases. It also seeks to identify other tools and techniques that are being used by researchers. This could provide other researchers with potential starting points in terms of tools, techniques and algorithms for their research. The question will be answered by recording identified tools, algorithms and techniques used in models or experimentation within the included articles.

3.5.5 What phases of the intelligence cycle does the included research apply to?

Taking into consideration the advice provided by the Rand Corporation, it is important to understand which phase of the intelligence cycle AI models are being applied to. This provides context to establish their performance in terms of how they support the next phase of the intelligence cycle. Understanding this would be beneficial in appraising models for use in both industry and government settings. This question also supports academic research in identifying if there are research gaps in the context of the intelligence cycle. To answer this question each paper was analysed to identify which intelligence stage is most appropriate. If the research model included multiple stages, each stage was included in the results.

3.5.6 What metrics and analysis are provided to evaluate system performance in the included research?

This question seeks to understand how researchers are currently evaluating their proposed experimental models. Presentation of results that demonstrate how models fit into the intelligence cycle would be beneficial. Identifying any research gaps in this area can potentially provide potential future directions in terms of model evaluation. The question is answered through analysis of metrics provided in the results section for papers and identifying any further evaluation analysis.

3.5.7 What OSINT tools are applied in the included research?

This question is of particular interest to industry and practitioners as it can be used to evaluate the extent to which AI models can be integrated with pre-existing non-AI OSINT tools. The question is answered through identifying in OSINT tools described in the included papers.

4 Results

The following section details the results of the methodology described in Sect. 3. It is broken into two sections, the first of which details the results of the systematic review using the outlined keywords and shows the results obtained at each stage of the review process. The second section addresses the research questions that were defined at the start of the review process and provides graphical representations of the data collected from the included papers.

4.1 Systematic review results and paper selection

The initial database search returned a total of 54,138 articles. Screening of the title, keywords and abstract reduced this number to 415, or 213 with duplicates removed. 84 additional articles were identified through references, providing a total of 297 articles. 27 articles were removed after assessing article quality and 123 were removed after comparing criteria to the full text. A further 11 were removed after comparison to the quality questions. A total of 163 articles have been included in this review. Figure 4 shows the PRISMA flow diagram for the number of articles identified at each stage of the review and is divided into search, screening, exclusion and inclusion sections in line with the PRISMA framework.

4.2 Research questions

This section reports the analysis of the systematic literature review results framed along the eight research questions presented in Sect. 3.5.

4.2.1 What is the trend in AI and machine-learning-based OSINT?

There has been significant growth in the number of research articles that fit the criteria for this study, this positive trend can be seen in Fig. 5. The volume of research in this area peaked in 2019 at 33 articles but this was followed by a decline of about 21.2 percent in 2020.

It is possible that the COVID-19 pandemic slowed the growth of research in this area post-2019. The initial stages of the pandemic correlate with a decline in time spent on scientific research and reduce the number of new scientific research projects [53]. Concerning the assessed articles of this study, there appears to be a more significant decrease in conference papers between 2019 and 2020, with conference publications decreasing by about 22 per cent, while journal articles remain relatively stable. This could be due to disruption in academic conferences during the initial pandemic stages.

Computer science conferences appear to have adapted well in this period but a study into this question showed that 23.5 per cent of computer science conferences were postponed or cancelled during this period [169]. This appears to be a similar level of decline when compared to the decline observed in this study between the years 2019 and 2020.

This may provide some insight into the reduced growth in research for this study in 2020 but the impacts of COVID-19 on academic research is ultimately outside the scope of the systematic review and a larger data set would be required to offer definite conclusions. The polynomial trend line in Fig. 5 shows that this period represented a slowing of growth, as the trend is still positive over this period based on past data.

4.2.2 What geographical regions are contributing the most to this area of study?

A substantial portion of the research that has been included in this paper has occurred in either the United States or India. The United States and India combined account for about 44 per cent of the research included in this study. The United Kingdom and China are also driving research in this area. The top four countries that are contributing to this area of research account for about 57 per cent of the included research papers. The heat map provided in Fig. 6 demonstrates the global distribution of the research included in this study.

Figure 6 additionally shows the total number of articles from the top fifteen countries producing research that fit the criteria of this study. These countries account for 150 of the total 163 included articles, or about 92 per cent of all the included research articles. Countries from the continents of North America, Asia and Europe are included on this list. The United States is the largest producer of included research in North America, with Canada being second in that region. India is the highest-ranked country in Asia for research combining AI with OSINT, with China being the second. In Europe, the United Kingdom is the highest-producing country for research in this field, with Italy being the second highest. There is limited research that meets the criteria of this systematic literature review is being undertaken in the continents of Africa and South America. Only Argentina, Columbia, Egypt and South Africa contributed some papers from these regions.

Overall there seems to be a fair distribution of papers in terms of geographic location. However, there are concentrations of papers in the United States, India and the United Kingdom, two of which are predominantly English-speaking with cultural similarities. The reduced number of papers in areas such as Africa, Central Asia, Eastern Europe and South America could indicate limitations based on cultural context. This would be particularly important for OSINT applications leveraging natural language processing that are concerned with law enforcement or intelligence operations. Contributions from academics in these areas could help produce truly universal AI-based OSINT systems dedicated to these forms of investigation.

4.2.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

There is less variation in the domains that could apply the included research when compared to the specific applications. The majority of the included research can be applied to either security operations, law enforcement, intelligence services or cyber threat intelligence operations. Figure 7 breaks down the included research by the potential domain of use. There are few papers leveraging AI for OSINT purposes that apply to the domain of penetration testing. This presents a research opportunity as an important aspect of penetration testing is collecting publicly available data about the target so that the penetration tester can gain knowledge of the potential attack surface [210].

One specific piece of research that could assist penetration testers in their projects is presented in the paper [187], this conference paper utilises OSINT collected from the National Vulnerability Database to train several classifiers to identify Structured Query Language (SQL) injection vulnerabilities from online texts. The model can be used to scan OSINT sources such as tweets and websites to identify SQL injection-related information. There are also a few papers that are specific to the defence domain although this could be due to the classified and sensitive nature of such research and some defence research may not be available through normal academic sources.

Much research that would be well suited for the defence sector is also well suited for other domains such as intelligence services and government departments. The journal article [58] provides a demonstration of this by proposing a tweet classification system based on various machine learning algorithms that identify the exposure of sensitive or classified information on social media. This system has obvious applications to government, defence and the intelligence sector but such a system could also be utilised by security operations conducted in the private sector. Ultimately the results of this systematic review show that many systems employing AI methods with OSINT can be utilised across multiple domains and are not confined solely to government, defence or intelligence service projects.

Figure 8 shows the main applications and functions of research that combines machine learning with OSINT. The uses for the systems being used in the research are varied but a substantial portion covers either sentiment analysis, cyber threat intelligence, domain name generation algorithm (DGA) detection or OSINT investigations. There are also several other uses such as cyber attack prediction, natural disaster management and human trafficking investigations.

The paper [13] provides an interesting example of how a particular application leveraging OSINT with AI can be applied to specific domains. This research utilises Twitter as an OSINT source and leverages machine learning algorithms to identify hate speech using sentiment analysis. The experiments in this paper show that deep learning methods outperform n-gram techniques in the identification of hate speech. Such systems could be incorporated into law enforcement investigations to identify individuals or groups engaging in such behaviour online, this intelligence could be utilised to monitor these people or groups and make an assessment on if a potentially violent act is about to occur, or if the law has been broken regarding the harassment of particular groups in society.

Another major application of AI systems utilising OSINT is for DGA detection. In [263] the researchers demonstrate how automated detection of malicious domains can save significant time for security operations by removing the need to manually blacklist malicious domains. The proposed machine learning model has been trained on OSINT collected from DNS and ultimately provides analysis of domain names to confirm if they are malicious or benign. This can be used to prevent malware from attempting to communicate with command and control servers. Such systems are an interesting application of OSINT as the open data is used for initial training but the final intelligence analysis is provided by the trained machine learning algorithm. The conference paper [8] applies OSINT with machine learning algorithms to improve organisational threat monitoring capabilities.

An example of an OSINT application leveraging AI that can provide proactive threat monitoring can be found in SYNAPSE. SYNAPSE can monitor Twitter for information on emerging threats concerning particular information technology hardware or software. The benefit of this is that information on new threats can be efficiently and automatically obtained before the information is made available on threat and vulnerability databases.

4.2.4 What machine learning algorithms, techniques or tools are being used in OSINT?

Figure 9 shows the types of AI algorithms and techniques being used for machine-learning-based OSINT research. This is similar to the results concerning particular domains and applications the research could be employed as there is a large variety of algorithms researchers have used in the included research. The five most commonly employed algorithms include Support Vector Machines, Naive Bayes, CNN, Random Forest and LTSM. The covered research includes models that are a combination of different algorithms, for example, the research paper by Choudhary, Sivaguru, Pereira, Yu, Nasciomento and Cock into Domain Generation Algorithm and Malware detection uses a CNN and LTSM combination [24]. Another example of this is the paper by Ravi, Kp and Poornachandran, which also uses a CNN and LTSM combination in their research into the categorisation of malicious domain names. There are also less known algorithms being used in the included research, such as LASSO, which was used in the research undertaken by Kumar and Babu into prepossessing for sentiment analysis [190] and Annotated Probabilistic Temporal Logic (APT), which was used by Marin, Almukaynizi and Shakarian in their work into the prediction of cyber-attacks [151].

Support Vector Machines are commonly employed for tasks that involve the classification of text. Researchers based in Canada have demonstrated this by using Support Vector Machines and Linear Support Vector Machines for the classification of fake news [4]. The proposed model works by analysing information such as article text, type, title and date and uses Term Frequency-Inverted Document Frequency for information extraction, which provides a measure of how critical particular terms are to a particular text. The Support Vector Machine algorithms are then employed to perform the final classification. This form of system leverages OSINT found in news articles and aids in upholding national security. This is particularly true in democracies where fake news is being used to undermine faith in democratic institutions and manipulate elections.

CNN can be employed for OSINT tasks that involve the classification and identification of images. PicHunt is a system that can be used to analyse images to identify evidence and location from pictures obtained from social media [62]. The results showed that CNN outperformed other state-of-the-art processes for this image classification task. The model is particularly well suited to law enforcement to identify evidence and the location of where specific events have occurred. The results of this systematic review show that a wide variety of algorithms are used to analyse OSINT sources or perform OSINT tasks. This is mainly due to the large variety of tasks that can be performed concerning OSINT and particular algorithms are better suited for different tasks.

A small number of machine learning tools, frameworks and libraries were also identified in the included research. Some of thesse include Apache OpenNLP [7], TensorFlow [6], SentiSAIL [12], CoreNLP [117] and NLTK [16, 94, 139, 176, 189]. The Natural Language Toolkit (NLTK) [18], was fairly prevalent with about 5 included research articles using this tool. More information concerning this would be beneficial to further research and industry as it would improve the ability of others to recreate processes, procedures and models that utilise OSINT in combination with AI.

4.2.5 What phases of the intelligence cycle does the included research apply to?

The vast majority of Machine Learning research included in this study applies to either the processing or analysis phases of the intelligence cycle. From the reviewed material no research covers the planning or evaluation phases of the intelligence cycle. There is a sizeable sample of research covered in this study that covers the collection phase and only one paper identified that includes dissemination as part of a proposed model. The distribution of papers in regards to the stage of the intelligence they can be applied to is represented in Fig. 10.

The machine learning-based system in [44] is the sole piece of research that could be applied to the intelligence stage of dissemination. The proposed model covers the stages of collection, processing, analysis and dissemination with a focus on social media data to identify and prevent national security incidents. The dissemination part of the intelligence cycle occurs when the system generates alerts and distributes them to the relevant audience based on jurisdiction and classification.

An example of a paper that provides support for the processing stage of the intelligence cycle can be found in [277]. The system makes use of data originating from automated threat intelligence applications, as well as, OSINT sources such as cyber security blogs, websites and malware databases. The automated system collected over 25,092 cyber threat intelligence reports that were available from open sources, these reports are then labelled by the system so that the security operations team can efficiently analyse the data and decide if further mitigation is necessary. This paper appears to provide a useful system for intelligence operations as it clearly defines which phase of the intelligence cycle it applies to and how it assists operations conducted at the subsequent phase of the process.

One reason for this predominance of the processing and analysis phases can be identified through the algorithms utilised in the research. Most are well suited to classification-based tasks, using either binary or categorical cross-entropy. These are well suited to either the processing or analysis of data and also could provide dissemination support by classifying on a need-to-know basis. They are however not well suited to tasks such as planning or evaluation.

Developments since 2021, including the increased use of large language models such as GPT-4 [1] and AI-based scheduling tools such as motion [167]. There are also references within academic literature to utilising AI for efficient scheduling-based tasks. In [110] heuristic-based algorithms and large neighbourhood search are utilised for maintenance scheduling for offshore oil and gas platforms. Similar methods could be used for the planning and scheduling of OSINT investigations or operations. An interesting avenue of research could be provided by identifying how these tools and models could be incorporated into the planning phase of the intelligence cycle and integrated with classification models for processing, analysis and dissemination.

4.2.6 What metrics and analysis are provided to evaluate system performance in the included research?

Figure 11 shows the metrics used for evaluation in the included research. Nearly half the metrics identified for the evaluation of machine learning systems included in this study are accuracy, precision, f1-Score, recall and area under curve (AUC). These five metrics are some of the most commonly used for the evaluation of machine learning systems [280]. There are also a smaller amount of papers providing totals for metrics such as true positive rate, false positives and receiver operating characteristics. It should also be noted that some papers fail to provide a full accounting of the performance of the algorithm in terms of these metrics, which makes it harder to conclude how suitable the system is to complete its intended task. For example, in the paper [9] an analysis is given comparing the true positive rate to the true negative rate but no precise figures are given for precision, recall or f1-score. Including these results would make it easier to assess the suitability of their system for processing cyber threat intelligence. A further example of this is in the conference paper [57] which only provides figures for accuracy and precision but fails to account for recall and f1-score.

There are some examples in the included research that provide results related to the intended use of the system or system objectives. In [33] cyber threat intelligence is performed using data from Twitter and the researchers can provide information relating to vulnerabilities that were discovered before their inclusion in vulnerability databases. A further example of this is in which provides measurements on whether an investigated individual is engaging in cyberstalking. Providing measurements such as these can help potential users evaluate the usefulness of a system in terms of real-world outcomes and is especially useful information to non-technical stakeholders or team members. These are however limited cases, researchers should endeavour to evaluate their system based on the the objectives of a particular phase of the intelligence cycle. This would better place the systems for use in real-world OSINT operations.

4.2.7 What are the sources of OSINT used in the included research?

Figure 12 shows the types of OSINT used in machine learning and AI-based OSINT applications, a significant proportion of the papers included in this study obtain OSINT sources through either online data sets or social media. Online news, websites, blogs and the dark web are also being utilised to some degree in the included research. A model that makes use of a social media data source is the TwitterOSINT application proposed in [83]. TwitterOSINT uses Twitter data to create real-time visual representations, this feature supports intelligence analysts by easily presenting the data while it is current and relevant. The proposed system leverages NLP to annotate the collected data so it can be further processed for search and final visualisation. This tool is particularly well suited to provide real-time threat intelligence to cyber security operations. An example of research using online datasets as the OSINT data source is the bidirectional LSTM models for DGA classification proposed in [11]. The datasets used consisted of OSINT data collected from DNS that has been flagged as malicious, this data was collected from a data set provided by netlab-360. This obtained data was used as the malicious domain component of the model’s training set.

Figure 13 breaks down the direct sources of the OSINT data from the included articles. For social media, Twitter is the most prominent source of OSINT, followed to a lesser extent by Facebook and Reddit. Video-based social media is also present with YouTube being utilised in some research included in this study. Online data sets collected from services such as Alexa are also prominent, especially in DGA detection. Twitter is used quite predominantly for cyber threat intelligence and sentiment analysis. An example of Twitter being used for cyber threat intelligence is found in [220], which covers automating the collection and analysis of Twitter data for threat intelligence purposes. The system they propose is still in the concept phase. The researcher’s proposal includes the collection, processing and analysis of Twitter data to identify cyber threats before they appear in common vulnerability and exposure databases. The conference paper [5] provides an example of Twitter data being used for sentiment analysis, this is also an example of a system that is general in nature. The model is directed to the analysis of people’s opinions but could also be repurposed to identify posts that concern security incidents or users with malicious intent.

The results of this systematic review show that Twitter is over-represented in the collected data and this is most likely due to the ease of access to posts on the platform. This information can easily be obtained by researchers using the Twitter API [255] or a service such as Tweepy that leverages the API provided by Twitter [10]. The downside of this ease of access is the potential for researchers to overlook other social media sources for OSINT data, this may cause the underrepresentation of services such as YouTube and Facebook in the included research. Online datasets feature quite predominantly in systems being used for the detection of DGA. These systems use OSINT to train their machine learning systems to correctly identify malicious domain names. The systems will generally include separate data sets for benign domains and malicious domains, with benign domains being sourced from DNS and online data sets such as Alexa. Malicious domain data sets are then constructed from security databases such as 360netlab [208].

4.2.8 What OSINT tools are applied in the included research?

There were relatively few examples of pre-existing non-AI-based OSINT tools being incorporated into AI models identified in this study. Many of the researchers have elected to collect data manually, or through application programming interfaces such as the Twitter API [138, 190, 196], or by downloading preexisting data sets, such as UNSW-NB15 [103] or AmritaDGA [264]. There are some examples of OSINT tools being used in the included research in combination with AI, such as Tor Crawler [108], Shodan [267], ReKognition boto3 [251], SAIL LABS Media Mining System [12] and the Tweepy API [94]. In [136] Maltego is used in the prototype Twitter cyberbullying detection system to graph relationships between the various accounts in the collected data. This provides useful information on coordination between stalkers.

There are also a few OSINT tools being developed by researchers that combine AI directly into a proposed OSINT tool, for example, the previously mentioned TwitterOSINT [83, 220] provides a complete system that collects data from the Twitter API and then uses NLP to annotate the unstructured text. This then allows intelligence analysts to easily search and create a visualisation of the collected data. Additional research that either creates new OSINT tools utilising AI or that leverages the abilities of current OSINT tools would be beneficial to intelligence or security analysts. This is because these systems would link directly with their existing tool kits or add new applications that they can use in their existing processes.

5 Limitations, future directions and summary

The following sections outline identified limitations with current research utilising OSINT in combination with AI. First current research limitations are presented followed by future directions that specific researchers are undertaking to reduce the limitations of their research. A summary of findings will also be presented and finally the identified limitations of this study.

5.1 Research limitations

The following section outlines limitations that have been identified from the papers included in the systematic review.

5.1.1 OSINT tools

Very few studies use pre-existing OSINT tools in their machine learning research, while there are a few examples of OSINT tools being used in the included research the significant majority did not include the use of these tools in their AI-based OSINT research. A possible reason for this is the large number of research papers that are dedicated to the processing and analysis phases of the intelligence cycle, rather than collection. As many OSINT tools are used mainly for the collection phase for example theHarvester. This could provide a reason for their exclusion from the research [155]. There are OSINT tools that include functionality for processing and analysis, such as Maltego [147] but due to the nature of the research in this paper, the processing and analysis phases are being undertaken by machine learning algorithms, which are essentially replacing the functions undertaken by non-machine learning based OSINT tools. However, it appears researchers are ignoring the opportunity to preprocess their data using these tools which could refine data sets and enhance the capabilities of their final AI models. An opportunity to demonstrate how their systems can integrate with current tools is also being lost, which would be a great benefit to current users of OSINT tools.

There are sparse examples where researchers have created new AI-based OSINT applications that could be employed in research and industry. There are however a few examples of this. NoRegINT is a machine learning-based OSINT tool [105], its authors place it alongside current OSINT tools that are used for cyber reconnaissance such as Spiderfoot [159] and Maltego [147]. Currently, the proposed framework for NoRegINT uses Twitter, Reddit and Tumblr as OSINT sources and can perform sentiment analysis on the collected data. The tool is however far from complete and does not compare to the scope of data sources and analysis that is currently available in Spiderfoot or Maltego. Another example is TwitterOSINT outlined in [83, 220], although the AI-based functionality is limited to annotations and does not provide further analysis. Future research could focus on the development of an AI-based OSINT suite that can perform multiple functions across a variety of OSINT sources. Another potential research direction is to focus on demonstrating how current OSINT tools can be used in tandem with proposed AI systems. This would be especially beneficial to individuals utilising these OSINT tools in industry or intelligence circles.

5.1.2 Penetration testing

Only a small amount of the included research could be applied to the domain of penetration testing. Some examples of papers included in this study that could be applied to this domain include work by Layton, Perez, Birregah, Watters and Lemercier that links profile ownership between different social media accounts [121] and could provide useful information to test organisational resilience in the face of social engineering attacks. The mentioned research is of particular interest as it could circumvent an organisation’s stakeholders’ efforts to keep particular social media accounts anonymous. A further example is FastEmbed which determines the possibility of the exploitation of vulnerabilities using the LightGMB algorithm [48]. The proposed model uses OSINT collected from an exploit database for its training set and can complete its predictions when considering real-world exploits. This was done with an accuracy of 83 per cent. These two papers provide examples of how AI-based OSINT applications could be beneficial to penetration testing.

Despite these examples, the volume of research is limited in comparison to other domains such as cyber threat intelligence. Another paper of interest for this domain is presented in [205]. The researchers propose a model that generates fake cyber threat intelligence reports to distribute to threat intelligence vendors. This is of interest as such an attack could cause data poisoning of defensive cyber AI systems, or cause confusion within a security operations team leading to sets of actions that benefit the attacker. This paper was ultimately not included in the aggregated results for penetration testing as such a system would impact multiple threat intelligence vendors and organisations. This indiscriminate nature would lead to issues concerning the scope of the penetration testing exercise. However, penetration testers may want to consider how to test organisational resilience against such an attack.

5.1.3 Underutilised data sources

Research that uses data from social media appears to be heavily reliant on Twitter. This seems to be due to the ease of access to the Twitter API [255]. This appears to be true for various forms of applications, including cyber threat intelligence. To demonstrate there are multiple papers covering the collection of threat intelligence from Twitter including [16, 209, 220, 233, 254], with a total number of 80 papers obtaining data from Twitter. The number of papers using data obtained from the dark web is severely limited in comparison. It appears that the ease of access to data from Twitter may be driving an overreliance on this data source in the included research to the detriment of other potential sources.

In [57] a similar threat intelligence system is described that uses the dark web with similar functionality to the stated Twitter-based threat intelligence systems. The system looks specifically at hacker forums and potentially could provide more valuable and current data than Twitter. The same is true when comparing the number of papers using data from other social media sources such as Facebook.

There also appears to be no research into using data from newer social platforms such as Discord [37]. Discord could provide an interesting source of OSINT data for AI-based applications due to its community-based nature. Cyber security-related servers on Discord can be found using services such as Disboard [36]. Researchers could also make attempts to join private Discord groups to gather data, although this might raise ethical concerns.

Essentially the over-utilisation of a single data source due to its ease of access may skew research results and forfeit the ability to gain access to a wider variety and more complete set of data. In the case of OSINT for cyber threat intelligence this would include building AI-based systems that work with the dark web, community forums and newer social platforms. The same reasoning could also be applied to other applications such as behaviour profiling and hate speech detection. Future AI-based OSINT research should endeavour to include these data sources.

5.1.4 Dissemination

From the included papers there are limited examples of systems that include the dissemination phase of the intelligence cycle. An example of research that includes the dissemination phase of the intelligence cycle in their machine learning-based system is a paper by Nnaemeka Ekwunife of Marymount University [44]. The paper is a proposed model that gathers intelligence from social media and includes alert generation when intelligence relevant to national security is uncovered. This is the sole example of the included research that covers this aspect of the intelligence cycle. Further research could include this component as part of future AI-based models. Adding dissemination functionality to AI-based OSINT models would enable researchers to create fully automated systems that can work across multiple jurisdictions.

5.2 Future directions

The following sections detail future directions being undertaken by researchers in this area of study, some avenues present potential opportunities to overcome some of the previously identified limitations.

5.2.1 Multi-lingual capabilities

Some studies have identified the inclusion of multi-lingual functionality as an avenue for future research. AI-enabled OSINT applications with this ability could potentially operate at a global scale across multiple geographic regions. This potential future research is not isolated to a particular domain of research or application. For example, the proposed model in [14] conducts sentiment analysis on social media and the researchers plan to continue the research by training their model on larger datasets containing data from multiple languages. A further example of this can be found in the human trafficking detection model proposed in [22]. The authors of this paper also seek to add multi-lingual functionality to their system in the future. The avenue they propose for this future research is to add automatic translation to the model. Support for multiple languages is also a planned feature for future research for the natural disaster monitoring system proposed in [226] and in the area of cyber threat intelligence, the authors of [277] have included multilingual capabilities into their future research plans. Incorporation of multi-language support will be especially beneficial in domains that tackle problems at the global level, such as monitoring the cyber threat landscape and human trafficking.

5.2.2 Incorporation of additional data sources

Some papers have identified the limitations of relying on a single source of data and plan to incorporate multiple data sources in future research. In [233] the researchers plan to include data originating from Facebook to train their cyber threat intelligence model. The authors of [94], also plan to modify their misinformation detection model to include data sources in addition to Twitter and a further example can be found in [237], where the authors plan to increase the diversity of their dataset to improve their fake news and misinformation detection model. In [209, 220], the researchers plan to include additional social media data sources in their cyber intelligence system and the authors of [85] note that their use of data contained on MITRE ACC &CK [27] framework required expansion due to the nature of the MITRE ACC &CK framework. The framework relies on expert contribution so there is a time delay from when the threat materialises in the real world and from when the threat is reported. The researchers plan to include additional sources of OSINT to circumvent this data source issue. This provides an example that each OSINT data source presents unique strengths and weaknesses and incorporating multiple sources in future research will provide researchers with a means to mitigate any potential open data source shortcomings. Being able to include additional data sources will increase the monitoring surface of AI-based OSINT applications and assist in the creation of general OSINT suites that incorporate AI algorithms.

5.2.3 Robustness against data poisoning and misinformation

An interesting future direction of research is presented in [209] for their cyber event detection system. The avenue for future research is to check the ability of their system to perform robustly in the event of data poisoning attacks, or when dealing with fake accounts. This is vital for all areas of OSINT as the collected data is publicly available. Any AI system that relies on OSINT could be subject to incorrect or misleading information corrupting the normal functionality of the system. This could be unintentional exposure to misleading data, or in the case of a data poisoning attack, part of a purposeful and targeted cyber event that aims to corrupt the output of the AI system. As mentioned in the limitations section the paper [205] provides an example of an AI-based system generating fake cyber threat intelligence reports. Such a system in the hands of a cyber adversary could manipulate multiple organisations into performing a set of desired actions. An interesting avenue for future AI-based OSINT research could focus on how penetration testers test the robustness of such an attack. These tests could be based on current systems or systems that utilise AI for cyber threat intelligence or security operations.

5.2.4 Real world use

Several research teams intend to take steps to apply their OSINT-based AI research for real-world applications or are taking steps to integrate with currently used cyber security applications. In the cyber threat intelligence domain, the authors of [116] plan to improve the integration of their AI model into currently used intrusion detection and prevention systems. A large-scale operations evaluation of the platform will also be performed to assess any additional requirements that may be required before real-world use. This is a common future research aspiration for many of the studies included in this paper. A further example in the threat intelligence domain can be found in [220], who plan to trial their TwitterOSINT system in a security operations environment. Some researchers do not have plans for an operational implementation but are making improvements to their system so it is suitable in a real-world context [97].

5.2.5 Alert generation and dissemination

As previously stated, a limited number of papers include the dissemination of intelligence generated by AI systems utilising OSINT. However, some researchers have included developing this capability in future research. This includes future directions outlined for the model proposed in [141], which uses Twitter data to identify the online sale of narcotics. The researchers wish to pursue including script automation to generate reports that report the findings of their models to the Food and Drug Administration (FDA) and the Drug Enforcement Administration (DEA) using the required reporting templates of those organisations. Another example of this direction is the research team that created CyberDect. These researchers are seeking to improve their cyberbullying detection AI model by incorporating an alert system to notify the appropriate parties when bullying is occurring online [136].

5.2.6 The planning phase of the intelligence cycle

The planning phase of the intelligence cycles may also provide a future avenue for research. The majority of the models found in this study perform classification and are well suited towards the processing and analysis phases of the intelligence cycle. In regards to planning, there are some examples of algorithms dedicated to this purpose, such as the aforementioned research conducted in [110]. A potential future research direction would be to use a similar system for OSINT planning and integrate it with subsequent classifiers for collection, processing, analysis, and perhaps even dissemination.

5.3 Summary of findings

The results of the research questions presented in this systematic review show that there has been significant growth in research using AI for OSINT applications from 2011 to 2021. Despite a slowing of growth between the years 2020 and 2021 during the COVID-19 pandemic, the positive trend is maintained. Continued growth in this area of study is beneficial to business and industry as it becomes more vital to convert open data into actionable intelligence.

There is a fairly even geographic spread for the research covered by this systematic review, although, from the three highest-producing geographic areas, two have similar lingual and cultural backgrounds and some areas are producing limited research in this area which include South America, Central Asia, Africa and Eastern Europe. This is particularly important for systems employing natural language processing as cultural and lingual context will be vital to ensure system effectiveness.

This is important as this systematic review identified a large number of papers being utilised for law enforcement and intelligence purposes. Universal OSINT applications in these areas will need to be employed in areas with varying cultural contexts. It should also be noted the lack of papers focusing on the domains of penetration testing or offensive security, both of which may provide an interesting research opportunity for researchers investigating the use of AI with OSINT.

The study identified a wide range of different AI models and algorithms, which can be related to the various tasks needed for OSINT operations and investigations. However, most of these algorithms are used for classification tasks using binary or categorical cross-entropy. This can explain why the study also identified that the majority of the included research was focused on the processing or analysis phases of the intelligence cycle. These models are not well suited, or at the very least need to be adapted if they can be successfully implemented for planning and evaluation tasks. Researchers could also consider other models if they wish to investigate how these intelligence cycle stages can be served by AI models. Potentially using algorithms suited for planning tasks could be integrated into subsequent models employed in classification tasks.

This systematic review also identified there was a limited degree of integration of pre-existing OSINT tools with AI models. This could be beneficial to practitioners and industry by enabling them to integrate directly with their pre-existing tool set. There was also a small amount of research detailing work towards a fully AI-enabled OSINT tool. To create such a tool that is universal, or general. there will need to be contributions from various geographic regions, continued growth in this area of research, and direction taken to include the planning and evaluation stages of the intelligence cycle. This is most likely possible given the already large variety of OSINT tasks being performed by AI.

5.4 Limitations of this study

Several limitations can be outlined specifically concerning this systematic review. These can be highlighted by the domains of use and the scope of the study.

The domain research questions can be highlighted as containing some limitations. One such limitation is that the assessment of papers for their domain suitability can be subjective. This is particularly true when there is a degree of cross-over between domains. Some systems used for domain name generation algorithm detection have been designed for use within security operations centres but the systems could also be well placed for cyber threat intelligence operations. There is no way to directly quantify this cross-over between domains of application, so the assessment is qualitative and could be open to debate discussion in terms of the usefulness of the application to a particular domain. The same could be said for systems that are to be used in law enforcement investigations, intelligence services and government departments may also find the systems useful but once again there is no direct way to quantify this and it is open to some discussion.

The scope of the study is also wide and while this has uncovered a high number of interesting papers in this area, specific domains might be better served by a more focused review that focuses on a particular area. For example, a similar review could be conducted that focuses purely on law enforcement or the operations of intelligence agencies. The same could be done for focusing exclusively on AI-based OSINT applications used solely in a cyber threat intelligence domain of use.

6 Conclusion

This study completed a systematic review of current research combining AI algorithms for OSINT applications using the Prisma framework. 163 articles were identified for inclusion in the study. It was found that research that combines the fields of OSINT and AI is a growing area of research and currently, India and the United States are the main geographic centres engaging in this area of study. The identified systematic reviews in this area of study do not incorporate much of this growth as they do not include papers that post-date the year 2019.

OSINT combined with AI provides a method to reduce the workload of intelligence analysts across multiple domains working with large amounts of data. Various security-related domains could utilise this form of research including security operations, law enforcement and intelligence services. This variety of potential use is due to the wide range of potential applications that can leverage benefits from AI-based OSINT systems. This is made possible by the number and variety of available AI algorithms that are each suited to specific and different tasks.

There is limited research that covers the use of how AI can be used in conjunction with pre-existing OSINT tools and it appears there has been little significant effort to produce a multi-faceted AI-based OSINT tool that could combine the various models available. The majority of the included research fits into either the collection, processing or analysis phases of the intelligence cycle. There has been limited research concerning the dissemination phase, so it would be beneficial to include this in the form of alerts to relevant parties. Future research could help with dissemination by categorising pre-analysed OSINT based on which persons need to be notified. This could be based on jurisdiction or security clearance.

Social media and online datasets form a significant portion of the types of OSINT that are used in the included research. Research that uses social media appears to be heavily focused on Twitter and this is especially true of threat intelligence systems. This could be due to the ease of access to Twitter as a data source. Future research should seek to engage with alternate data sources such as the dark web or hacker forums. Newer platforms such as Discord appear to be completely absent and could provide researchers with novel sources of data in the future.

It was also found that there is a limited volume of research relating to combined OSINT AI applications that could be utilised for penetration testing purposes. OSINT is a valuable source of information for penetration testers and the domain could benefit from research that combines AI with OSINT in this context. Future research in this area is progressing in several directions that could either address some of these limitations or provide novel directions. These future research pathways include adding multi-lingual support to OSINT AI models, incorporating additional data sources, improving model robustness against misinformation and data poisoning, testing platforms in real-world situations and finally adding alert generation and dissemination functions.

Data availability

Data in relation to this study is available on figshare

Abbreviations

PRISMA:: Preferred reporting items for systematic reviews and meta-analyses
OSINT:: Open-source intelligence
AI:: Artificial intelligence
RNN:: Recurrent neural networks
LTSM:: Long short-term memory
DNS:: Domain name system
CNN:: Convolutional neural network
NLP:: Natural language processing
SQL:: Structured query language
DGA:: Domain name generation algorithm
AUC:: Area under curve
APT:: Annotated probabilistic temporal logic

References

AI, O.: GPT-4 (2024). https://openai.com/gpt-4
Aboul-Ela, A.: Sublist3r. Github (2020). https://github.com/aboul3la/Sublist3r
Aggarwal, K.: DataSploit. https://github.com/DataSploit/datasploit/commits?author=KunalAggarwal (2023). Accessed 17 July 2023
Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and machine learning techniques. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10618 LNCS (December), pp. 127–138 (2017). https://doi.org/10.1007/978-3-319-69155-8_9
Alamsyah, A., Rizkika, W., Nugroho, D.D.A., Renaldi, F., Saadah, S.: Dynamic large scale data on twitter using sentiment analysis and topic modeling. In: 2018 6th International Conference on Information and Communication Technology (ICoICT), pp. 254–258. IEEE (2018). https://doi.org/10.1109/ICoICT.2018.8528776. https://ieeexplore.ieee.org/document/8528776/
Alguliyev, R.M., Aliguliyev, R.M., Abdullayeva, F.J.: The improved lstm and cnn models for ddos attacks prediction in social media. Int. J. Cyber Warfare Terror. 9(1), 1–18 (2019). https://doi.org/10.4018/IJCWT.2019010101
Article Google Scholar
Aliprandi, C., De Luca, A.E., Di Pietro, G., Raffaelli, M., Gazzè, D., La Polla, M.N., Marchetti, A., Tesconi, M.: Caper: Crawling and analysing facebook for intelligence purposes. In: ASONAM 2014 - Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 665–669. IEEE (2014). https://doi.org/10.1109/ASONAM.2014.6921656
Alves, F., Ferreira, P.M., Bessani, A.: Design of a classification model for a twitter-based streaming threat monitor. In: Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, DSN-W 2019, pp. 9–14 (2019). https://doi.org/10.1109/DSN-W.2019.00010
Alves, F., Bettini, A., Ferreira, P.M., Bessani, A.: Processing tweets for cybersecurity threat awareness. Inf. Syst. 95, 101586 (2021). https://doi.org/10.1016/j.is.2020.101586
Article Google Scholar
API, T.: Tweepy API. https://github.com/tweepy/tweepy/blob/cc8dd493b1ed04b48a6dd4e47eeb9a9064f83024/docs/api.rst (2022). Accessed 05 December 2022
Attardi, G., Sartiano, D.: Bidirectional lstm models for dga classification. In: International Symposium on Security in Computing and Communication - SSCC 2018: Security in Computing and Communications, pp. 687–694. Springer (2019). https://doi.org/10.1007/978-981-13-5826-5_54
Backfried, G., Shalunts, G.: Sentiment analysis of media in German on the refugee crisis in Europe. In: Díaz, P., Bellamine Ben Saoud, N., Dugdale, J., Hanachi, C. (eds.) Third International Conference, ISCRAM-med 2016, Madrid, Spain, October 26-28, 2016, Proceedings. Lecture Notes in Business Information Processing, vol. 265, pp. 234–241. Springer, Cham (2016)
Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, pp. 759–760. ACM Press, New York, New York (2017). https://doi.org/10.1145/3041021.3054223 . http://dl.acm.org/citation.cfm?doid=3041021.3054223
Badr, E.M., Salam, M.A., Ali, M., Ahmed, H.: Social media sentiment analysis using machine learning and optimization techniques. Int. J. Comput. Appl. 178(41), 31–36 (2019). https://doi.org/10.5120/ijca2019919306
Article Google Scholar
Batarseh, F.A.: In: Schintler, L.A., McNeely, C.L. (eds.) Business Intelligence Analytics, pp. 141–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-319-32010-6_253
Behzadan, V., Aguirre, C., Bose, A., Hsu, W.: Corpus and deep learning classifier for collection of cyber threat indicators in twitter stream. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 5002–5007. IEEE (2018). https://doi.org/10.1109/BigData.2018.8622506. https://ieeexplore.ieee.org/document/8622506/
Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: Knn based machine learning approach for text and document mining. Int. J. Datab. Theory Appl. 7(1), 61–70 (2014)
Google Scholar
Bird, S.: Natural language toolkit. NLTK Team (2022). https://www.nltk.org/
Blindfuzzy: Low Hanging Fruit. https://github.com/blindfuzzy/LHF (2023). Accessed 17 July 2023
Blocklist: Block List. http://www.blocklist.de/en/index.html (2023). Accessed 17 July 2023
BuiltWith: BuiltWith. https://builtwith.com/ (2022). Accessed 05 December 2022
Catania, C., García, S., Torres, P.: Deep convolutional neural networks for dga detection. In: Argentine Congress of Computer Science - CACIC 2018: Computer Science – CACIC 2018, pp. 327–340. Springer (2019). https://doi.org/10.1007/978-3-030-20787-8_23
Chauhan, S., Panda, N.K.: Osint tools and techniques. In: Hacking Web Intelligence, pp. 101–131. Elsevier, Amsterdam (2015). https://doi.org/10.1016/B978-0-12-801867-5.00006-9
Chapter Google Scholar
Choudhary, C., Sivaguru, R., Pereira, M., Yu, B., Nascimento, A.C., De Cock, M.: Algorithmically generated domain detection and malware family classification. In: International Symposium on Security in Computing and Communication SSCC 2018: Security in Computing and Communications, pp. 640–655. Springer (2019). https://doi.org/10.1007/978-981-13-5826-5_50
CMS, W.: What CMS. https://whatcms.org/ (2022). Accessed 05 December 2022
Computing Machinary (ACM), A.: ACM Digital Library. https://dl.acm.org/ (2021). Accessed 08 November 2021
Corporation, M.: ATT &CK. https://attack.mitre.org/ (2023). Accessed 17 July 2023
Danda, M.: Open Source Intelligence and Cybersecurity. PhD thesis, Webster University (2019). https://mattdanda.com/wp-content/uploads/2019/05/Paper-OSINT.pdf
Das Bhattacharjee, S., Talukder, A., Balantrapu, B.V.: Active learning based news veracity detection with feature weighting and deep-shallow fusion. In: 2017 IEEE International Conference on Big Data (Big Data), vol. 2018-January, pp. 556–565. IEEE (2017). https://doi.org/10.1109/BigData.2017.8257971. http://ieeexplore.ieee.org/document/8257971/
De Smedt, T., De Pauw, G., Van Ostaeyen, P.: Automatic Detection of Online Jihadist Hate Speech. Technical Report February, University of Antwerp, Antwerp (Mar 2018). https://doi.org/10.13140/rg.2.2.28155.41767. arXiv:1803.04596
Di Pietro, G., Aliprandi, C., De Luca, A.E., Raffaelli, M., Soru, T.: Semantic crawling: an approach based on named entity recognition. In: ASONAM 2014 - Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2014). https://doi.org/10.1109/ASONAM.2014.6921661. https://ieeexplore.ieee.org/document/6921661
DiBona, P., Ho, S.-S.: Automated information foraging for sensemaking. In: Pham, T. (ed.) Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, p. 12. SPIE (2019). https://doi.org/10.1117/12.2518893. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11006/110060D/Automated-information-foraging-for-sensemaking/10.1117/12.2518893.short https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11006/2518893/Automated-information-f
Dionisio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from twitter using deep neural networks. In: 2019 International Joint Conference on Neural Networks (IJCNN), vol. 2019-July, pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8852475. https://ieeexplore.ieee.org/document/8852475/
Dionisio, N., Alves, F., Ferreira, P.M., Bessani, A.: Towards end-to-end cyberthreat detection from twitter using multi-task learning. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020). https://doi.org/10.1109/IJCNN48605.2020.9207159. https://ieeexplore.ieee.org/document/9207159/
Direct, S.: Science Direct. https://www.sciencedirect.com/ (2021). Accessed 15 November 2021
Disboard: Disboard - Cyber Security. https://disboard.org/search?keyword=cyber+security (2023). Accessed 27 January 2023
Discord: Discord. https://discord.com/ (2023). Accessed 27 January 2023
Drichel, A., Drury, V., Brandt, J., Meyer, U.: Finding phish in a haystack: a pipeline for phishing classification on certificate transparency logs. In: The 16th International Conference on Availability, Reliability and Security, pp. 1–12. ACM, New York, NY (2021). https://doi.org/10.1145/3465481.3470111. https://dl.acm.org/doi/10.1145/3465481.3470111
Drus, Z., Khalid, H.: Sentiment analysis in social media and its application: systematic literature review. Procedia Comput. Sci. 161, 707–714 (2019). https://doi.org/10.1016/j.procs.2019.11.174
Article Google Scholar
Dughyala, N., Potluri, S., Sumesh, K.J., Pavithran, V.: Automating the detection of cyberstalking. In: Proceedings of the 2nd International Conference on Electronics and Sustainable Communication Systems, ICESC 2021, pp. 887–892 (2021). https://doi.org/10.1109/ICESC51422.2021.9532858
Ebrahimi, M., Suen, C.Y., Ormandjieva, O.: Detecting predatory conversations in social media by deep convolutional neural networks. Digit. Investig. 18, 33–49 (2016). https://doi.org/10.1016/j.diin.2016.07.001
Article Google Scholar
Edwards, M., Rashid, A., Rayson, P.: A systematic survey of online data mining technology intended for law enforcement. ACM Comput. Surv. (2015). https://doi.org/10.1145/2811403
Article Google Scholar
Eiji Aramaki, Sachiko Maskawa, M.M.: Twitter catches the flu: detecting influenza epidemics using twitter. In: EMNLP 11: proceedings of the conference on empirical methods in natural language processing, pp. 1568–1576 (2011). https://doi.org/10.5555/2145432.2145600. https://dl.acm.org/doi/10.5555/2145432.2145600
Ekwunife, N., Ekwunife, N.: National security intelligence through social network data mining. In: Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, pp. 2270–2273 (2020). https://doi.org/10.1109/BigData50022.2020.9377940
Eldridge, C., Hobbs, C., Moran, M.: Fusing algorithms and analysts: open-source intelligence in the age of ‘big data’. Intell. Natl. Secur. 33(3), 391–406 (2018). https://doi.org/10.1080/02684527.2017.1406677
Article Google Scholar
Evangelista, J.R.G., Sassi, R.J., Romero, M., Napolitano, D.: Systematic literature review to investigate the application of open source intelligence (osint) with artificial intelligence. J. Appl. Secur. Res. 16(3), 345–369 (2021). https://doi.org/10.1080/19361610.2020.1761737
Article Google Scholar
Exploit-db: Google Hacking Database. https://www.exploit-db.com/google-hacking-database (2022). Accessed 13 December 2022
Fang, Y., Liu, Y., Huang, C., Liu, L.: Fastembed: predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. PLoS ONE 15(2), 0228439 (2020). https://doi.org/10.1371/journal.pone.0228439
Article Google Scholar
Farina, A., Ortenzi, L., Ristic, B., Skvortsov, A.: Integrated sensor systems and data fusion for homeland protection. In: Academic Press Library in Signal Processing: Volume 2 Communications and Radar Signal Processing vol. 2, pp. 1245–1320. Elsevier Masson SAS (2014). https://doi.org/10.1016/B978-0-12-396500-4.00022-3. https://linkinghub.elsevier.com/retrieve/pii/B9780123965004000223
Fenza, G., Gallo, M., Loia, V., Volpe, A.: Cognitive name-face association through context-aware graph neural network. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06617-z
Article Google Scholar
Framework, O.: OSINT Framework (2023)
Galán-GarcÍa, P., Puerta, J.G.D.L., Gómez, C.L., Santos, I., Bringas, P.G.: Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Log. J. IGPL 24(1), 048 (2015). https://doi.org/10.1093/jigpal/jzv048
Article MathSciNet Google Scholar
Gao, J., Yin, Y., Myers, K.R., Lakhani, K.R., Wang, D.: Potentially long-lasting effects of the pandemic on scientists. Nat. Commun. 12, 6188 (2021). https://doi.org/10.1038/s41467-021-26428-z
Article Google Scholar
García Lozano, M., Schreiber, J., Brynielsson, J.: Tracking geographical locations using a geo-aware topic model for analyzing social media data. Decis. Support Syst. 99, 18–29 (2017). https://doi.org/10.1016/j.dss.2017.05.006
Article Google Scholar
García Lozano, M., Brynielsson, J., Franke, U., Rosell, M., Tjörnhammar, E., Varga, S., Vlassov, V.: Veracity assessment of online data. Decis. Support Syst. 129(July 2019), 113132 (2020). https://doi.org/10.1016/j.dss.2019.113132
Article Google Scholar
Garzia, F., Cusani, R., Borghini, F., Saltini, B., Lombardi, M., Ramalingam, S.: Perceived risk assessment through open-source intelligent techniques for opinion mining and sentiment analysis: the case study of the papal basilica and sacred convent of saint francis in assisi, italy. In: 2018 International Carnahan Conference on Security Technology (ICCST), vol. 2018-Octob, pp. 1–5. IEEE (2018). https://doi.org/10.1109/CCST.2018.8585519. https://ieeexplore.ieee.org/document/8585519/
Gautam, A.S., Gahlot, Y., Kamat, P.: Hacker forum exploit and classification for proactive cyber threat intelligence. In: Lecture Notes in Networks and Systems, vol. 98, pp. 279–285. Springer (2020). https://doi.org/10.1007/978-3-030-33846-6_32
Geetha, R., Karthika, S., Kumaraguru, P.: Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media. Knowl. Inf. Syst. 63(9), 2365–2404 (2021). https://doi.org/10.1007/s10115-021-01592-2
Article Google Scholar
Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A supervised machine learning based approach for automatically extracting high-level threat intelligence from unstructured sources. In: 2018 International Conference on Frontiers of Information Technology, FIT 2018, pp. 129–134. IEEE (2018). https://doi.org/10.1109/FIT.2018.00030 . https://ieeexplore.ieee.org/document/8616979
Giacalone, M., Buondonno, A., Romano, A., Santarcangelo, V.: Innovative methods for the development of a notoriety system. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10147 LNAI, 218–225 (2017). https://doi.org/10.1007/978-3-319-52962-2_19
Giachanou, A., Crestani, F.: Like it or not - a survey of twitter sentiment analysis methods. ACM Comput. Surv. 49(2), 1–41 (2016). https://doi.org/10.1145/2938640
Article Google Scholar
Goel, S., Sachdeva, N., Kumaraguru, P., Subramanyam, A.V., Gupta, D.: Pichunt: Social Media Image Retrieval for Improved Law Enforcement. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10046 LNCS, pp. 206–223 (2016). https://doi.org/10.1007/978-3-319-47880-7_13. arXiv:1608.00905
Goldszmidt, M., Najork, M., Paparizos, S.: Boot-strapping language identifiers for short colloquial postings. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 8189, LNAI, pp. 95–111 (2013). https://doi.org/10.1007/978-3-642-40991-2_7
Gong, S., Lee, C.: Efficient Data Noise-Reduction for Cyber Threat Intelligence System, vol. 715, pp. 591–597. Springer, New York (2021)
Google Scholar
Google: Google Data Studio. Google. https://datastudio.google.com/ (2022)
Google: Google Scholar (2021)
Google: Google Sheets. Google (2022)
Grandi, R., Neri, F.: Sentiment analysis and city branding. In: Catania, B., Cerquitelli, T., Chiusano, S., Guerrini, G., Kämpf, M., Kemper, A., Novikov, B., Palpanas, T., Pokorný, J., Vakali, A. (eds.) Advances in Intelligent Systems and Computing. Advances in Intelligent Systems and Computing, vol. 241, pp. 339–349. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-01863-8_36. http://link.springer.com/10.1007/978-3-319-01863-8
Grant, T.: Building an ontology for planning attacks that minimize collateral damage: literature survey. In: 14th International Conference on Cyber Warfare and Security, ICCWS 2019, pp. 78–86 (2019)
Grepp, A.: Grepp.app. https://grep.app/ (2022). Accessed 0 December 2022
Gupta, A., Pruthi, J., Sahu, N.: Sentiment analysis of tweets using machine learning approach. Int. J. Comput. Sci. Mob. Comput. 6(4), 444–458 (2017)
Google Scholar
Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013). https://doi.org/10.1016/j.procs.2013.05.005
Article Google Scholar
Haoxiang, D.W.: Emotional analysis of bogus statistics in social media. J. Ubiquitous Comput. Commun. Technol. 2(3), 178–186 (2020). https://doi.org/10.36548/jucct.2020.3.006
Article Google Scholar
Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine learning-based sentiment analysis for twitter accounts. Math. Comput. Appl. 23(1), 11 (2018). https://doi.org/10.3390/mca23010011
Article Google Scholar
Hassan, N.A.: Open Source Intelligence Methods and Tools: A Practical Guide to Online Intelligence. Apress, New York (2018)
Book Google Scholar
Hassan, N.A., Hijazi, R.: Search engine techniques. In: Open Source Intelligence Methods and Tools, pp. 127–201. Apress, Berkeley, CA (2018)
Google Scholar
Hernandez Mediná, M.J., Pinzón Hernández, C.C., Díaz López, D.O., Garcia Ruiz, J.C., Pinto Rico, R.A.: Open source intelligence (osint) in a colombian context and sentiment analysis. Revista vínculos 15(2), 195–214 (2018). https://doi.org/10.14483/2322939X.13504
Article Google Scholar
Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, K., Martinez-Hernandez, V., Perez-Meana, H., Olivares-Mercado, J., Sanchez, V.: Social sentiment sensor in twitter for predicting cyber-attacks using \(\ell \) regularization. Sensors 18(5), 1380 (2018). https://doi.org/10.3390/s18051380
Article Google Scholar
Herrera-Cubides, J.F., Gaona-García, P.A., Sánchez-Alonso, S.: Open-source intelligence educational resources: a visual perspective analysis. Appl. Sci. (Switzerland) 10(21), 1–25 (2020). https://doi.org/10.3390/app10217617
Article Google Scholar
Hiransha, M.E.A.G., Gopalakrishnan, E.A., Menon, V.K., Soman, K.P.: Nse stock market prediction using deep-learning models. Procedia Comput. Sci. 132(Iccids), 1351–1362 (2018). https://doi.org/10.1016/j.procs.2018.05.050
Article Google Scholar
Holder, E., Wang, N.: Correction to: Explainable artificial intelligence (xai) interactively working with humans as a junior cyber analyst. Hum. Intell. Syst. Integr. (2021). https://doi.org/10.1007/s42454-021-00039-x
Article Google Scholar
Holt, T.J., Bossler, A.M.: The palgrave handbook of international cybercrime and cyberdeviance. In: The Palgrave Handbook of International Cybercrime and Cyberdeviance, pp. 1–1489 (2020). Chap. 7. https://doi.org/10.1007/978-3-319-78440-3
Hoppa, M.A., Debb, S.M., Hsieh, G., KC, B.: Twitterosint: automated open source intelligence collection, analysis & visualization tool. In: Annual Review of CyberTherapy and Telemedicine, pp. 121–128 (2019). https://www.proquest.com/docview/2153621548?pq-origsite=gscholar &fromopenview=true
Howells, K., Ertugan, A.: Applying fuzzy logic for sentiment analysis of social media network data in marketing. Procedia Comput. Sci. 120(January), 664–670 (2017). https://doi.org/10.1016/j.procs.2017.11.293
Article Google Scholar
Huang, Y.T., Lin, C.Y., Guo, Y.R., Lo, K.C., Sun, Y.S., Chen, M.C.: Open source intelligence for malicious behavior discovery and interpretation. IEEE Trans. Dependable Secure Comput. 5971(c), 1–14 (2021). https://doi.org/10.1109/TDSC.2021.3119008
Article Google Scholar
Hutto, C.J. and Gilbert, E.: Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media, p. 18 (2014)
IDC: Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere Forecast (2021). https://www.idc.com/getdoc.jsp?containerId=prUS47560321 Accessed 02 November 2021
IEEE: IEE Explore. https://ieeexplore.ieee.org/ (2021). Accessed 07 November 2021
Iorga, D., Corlatescu, D., Grigorescu, O., Sandescu, C., Dascalu, M., Rughinis, R.: Early detection of vulnerabilities from news websites using machine learning models. In: Proceedings - RoEduNet IEEE International Conference, vol. 2020-December (2020). https://doi.org/10.1109/RoEduNet51892.2020.9324852
Ish, D., Ettinger, J., Ferris, C.: Evaluating the effectiveness of artificial intelligence systems in intelligence. Analysis (2021). https://doi.org/10.7249/rr-a464-1
Article Google Scholar
ITU: Measuring Digital Development Facts And Figures 2020, pp. 1–15. ITU Publications (2020)
Iwona Chomiak-Orsa, Artur Rot, B.B.: Artificial intelligence in cybersecurity: The use of ai along the cyber kill chain. In: International Conference on Computational Collective Intelligence, pp. 406–416. Springer (2019). https://doi.org/10.1007/978-3-030-28374-2. https://doi.org/10.1007/978-3-030-28374-2_35
Jain, P., Bendapudi, H., Rao, S.: Eequest: an event extraction and query system. In: Proceedings of the 9th Annual ACM India Conference, pp. 59–66. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2998476.2998482. https://dl.acm.org/doi/10.1145/2998476.2998482
Jain, S., Sharma, V., Kaushal, R.: Towards automated real-time detection of misinformation on twitter. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2015–2020. IEEE (2016). https://doi.org/10.1109/ICACCI.2016.7732347. http://ieeexplore.ieee.org/document/7732347/
Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inf. Sci. 535, 1–15 (2020). https://doi.org/10.1016/j.ins.2020.05.026
Article MathSciNet Google Scholar
Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816. ACM, New York, NY, USA (2017). https://doi.org/10.1145/3123266.3123454 . https://dl.acm.org/doi/10.1145/3123266.3123454
Johnsen, J.W., Franke, K.: The impact of preprocessing in natural language for open source intelligence and criminal investigation. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4248–4254. IEEE (2019). https://doi.org/10.1109/BigData47090.2019.9006006. https://ieeexplore.ieee.org/document/9006006/
Johnson, D.O.: Overview of artificial intelligence. In: Medical Applications of Artificial Intelligence, pp. 27–46. CRC Press, Boca Raton (2013). https://doi.org/10.1201/b15618-6
Chapter Google Scholar
Josan, G.S., Kaur, J.: Lstm network based malicious domain name detection. Int. J. Eng. Adv. Technol. 8(6), 3187–3191 (2019). https://doi.org/10.35940/ijeat.F8809.088619
Article Google Scholar
Ju, Y., Li, Q., Liu, H.Y., Cui, X.M., Wang, Z.H.: Study on application of open source intelligence from social media in the military. J. Phys. Conf. Ser. (2020). https://doi.org/10.1088/1742-6596/1507/5/052017
Article Google Scholar
Jung, D., Tuan, V.T., Tran, D.Q., Park, M., Park, S.: Conceptual framework of an intelligent decision support system for smart city disaster management. Appl. Sci. (Switzerland) (2020). https://doi.org/10.3390/app10020666
Article Google Scholar
Jyothsna, P.V., Prabha, G., Shahina, K.K., Vazhayil, A.: Detecting dga using deep neural networks (dnns). In: International Symposium on Security in Computing and Communication - SSCC 2018: Security in Computing and Communications, pp. 695–706. Springer (2019). https://doi.org/10.1007/978-981-13-5826-5_55
Kaiser, S., Ferens, K.: Variance fractal dimension feature selection for detection of cyber security attacks. In: Transactions on Computational Science and Computational Intelligence, pp. 1029–1045 (2021). https://doi.org/10.1007/978-3-030-70296-0_82
Kallus, N.: On the predictive power of web intelligence and social media the best way to predict the future is to tweet it. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9546, pp. 26–45 (2016). https://doi.org/10.1007/978-3-319-29009-6_2
Karthika, S., Bhalaji, N., Chithra, S., Sri Harikarthick, N., Bhattacharya, D.: Noregint-a tool for performing osint and analysis from social media. In: Lecture Notes in Networks and Systems, vol. 173 LNNS, pp. 971–980. Springer (2021). https://doi.org/10.1007/978-981-33-4305-4_71
Kashyap, G.S., Malik, K., Wazir, S., Khan, R.: Using machine learning to quantify the multimedia risk due to fuzzing. Multimedia Tools and Applications (0123456789) (2021) https://doi.org/10.1007/s11042-021-11558-9
Katz, B.: The analytic edge: Leveraging Emerging Technologies to Transform Intelligence Analysis. Technical Report, Center for Strategic and International Studies (CSIS) (2020). https://www.jstor.org/stable/resrep26414?seq=1#metadata_info_tab_contents
Kawaguchi, Y., Yamada, A., Ozawa, S.: Ai web-contents analyzer for monitoring underground marketplace. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10638 LNCS, pp. 888–896 (2017). https://doi.org/10.1007/978-3-319-70139-4_90
Kelly, J., Delaus, M., Hemberg, E., Orreilly, U.M.: Adversarially adapting deceptive views and reconnaissance scans on a software defined network. In: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 49–54 (2019)
Khalid, W., Soleymani, I., Mortensen, N.H., Sigsgaard, K.V.: Ai-Based Maintenance Scheduling for Offshore Oil and Gas Platforms, vol. 2021-May, pp. 1–6. IEEE (2021). https://doi.org/10.1109/RAMS48097.2021.9605794 . https://ieeexplore.ieee.org/document/9605794/
Khan, M., Rehman, O., Rahman, I.M.H., Ali, S.: Lightweight testbed for cybersecurity experiments in scada-based systems. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1–5. IEEE (2020). https://doi.org/10.1109/ICCIT-144147971.2020.9213791. https://ieeexplore.ieee.org/document/9213791/
Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 4(May), 100032 (2021). https://doi.org/10.1016/j.mlwa.2021.100032
Article Google Scholar
Khurana, N., Mittal, S., Piplai, A., Joshi, A.: Preventing poisoning attacks on ai based threat intelligence systems. IEEE Int. Workshop Mach. Learn. Signal Process. MLSP (2019). https://doi.org/10.1109/MLSP.2019.8918803
Article Google Scholar
Kleissner, P.: Intelligence X (2022). https://intelx.io/ Accessed 15 May 2022
Køien, G.M.: Initial reflections on the use of augmented cognition in derailing the kill chain. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12776 LNAI, pp. 433–451. Springer (2021). https://doi.org/10.1007/978-3-030-78114-9_30
Koloveas, P., Chantzios, T., Alevizopoulou, S., Skiadopoulos, S., Tryfonopoulos, C.: Intime: a machine learning-based framework for gathering and leveraging web data to cyber-threat intelligence. Electronics (Switzerland) (2021). https://doi.org/10.3390/electronics10070818
Article Google Scholar
Kotinas, I., Fakotakis, N.: Text analysis for decision making under adversarial environments. In: ACM International Conference Proceeding Series (2018). https://doi.org/10.1145/3200947.3201018
Kotzé, E., Senekal, B.A., Daelemans, W.: Automatic classification of social media reports on violent incidents in South Africa using machine learning. S. Afr. J. Sci. 116(3–4), 4–11 (2020). https://doi.org/10.17159/sajs.2020/6557
Article Google Scholar
Kuiler, E.W.: Natural language processing (nlp). In: Encyclopedia of Big Data, pp. 679–682. Springer, Cham (2022). https://doi.org/10.1007/978-3-319-32010-6_250
Chapter Google Scholar
Kumar, M.S., Ben-Othman, J., Srinivasagan, K.G., Krishnan, G.U.: Artificial intelligence managed network defense system against port scanning outbreaks. In: Proceedings - International Conference on Vision Towards Emerging Trends in Communication and Networking, ViTECoN 2019, pp. 1–5. IEEE (2019). https://doi.org/10.1109/ViTECoN.2019.8899380. https://ieeexplore.ieee.org/document/8899380
Layton, R., Perez, C., Birregah, B., Watters, P., Lemercier, M.: Indirect Information Linkage for Osint Through Authorship Analysis of Aliases. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7867 LNAI, pp. 36–46 (2013). https://doi.org/10.1007/978-3-642-40319-4_4
Le, B.-D., Wang, G., Nasim, M., Babar, M.A.: Gathering cyber threat intelligence from twitter using novelty classification. In: 2019 International Conference on Cyberworlds (CW), pp. 316–323. IEEE (2019). https://doi.org/10.1109/CW.2019.00058. https://ieeexplore.ieee.org/document/8919107/
Lee, J., Moon, M., Shin, K., Kang, S.: Cyber threats prediction model based on artificial neural networks using quantification of open source intelligence (osint). J. Inf. Secur. 20(3), 115–123 (2020). https://doi.org/10.33778/kcsa.2020.20.3.115
Article Google Scholar
Leibowicz, C.R., McGregor, S., Ovadya, A.: The deepfake detection dilemma: A multistakeholder exploration of adversarial dynamics in synthetic media. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, vol. 1, pp. 736–744. ACM, New York, NY, USA (2021). https://doi.org/10.1145/3461702.3462584 . https://dl.acm.org/doi/10.1145/3461702.3462584
Levchuk, G.M., Fouse, A., Pattipati, K., Serfaty, D., McCormack, R.: Active learning and structure adaptation in teams of heterogeneous agents: designing organizations of the future. In: Llinas, J., Hanratty, T.P. (eds.) Next-Generation Analyst VI, vol. 1065305, p. 4. SPIE, Orlando FL (2018). https://doi.org/10.1117/12.2305875 . https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10653/1065305/Active-learning-and-structure-adaptation-in-teams-of-heterogeneous-agents/10.1117/12.2305875.short. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10653/2305
Levchuk, G., Pattipati, K., Fouse, A., Serfaty, D.: Application of free energy minimization to the design of adaptive multi-agent teams. In: Hall, R.D., Blowers, M., Williams, J. (eds.) Disruptive Technologies in Sensors and Sensor Systems, vol. 10206, p. 102060 (2017). https://doi.org/10.1117/12.2263542. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10206/102060E/Application-of-free-energy-minimization-to-the-design-of-adaptive/10.1117/12.2263542.short http://proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12.2263542
Levchuk, G., Shabarekh, C.: Using soft-hard fusion for misinformation detection and pattern of life analysis in osint. In: Hanratty, T.P., Llinas, J. (eds.) SPIE Defense + Security, vol. 1020704, p. 1020704 (2017). https://doi.org/10.1117/12.2263546. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10207/1020704/Using-soft-hard-fusion-for-misinformation-detection-and-pattern-of/10.1117/12.2263546.short http://proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12.2263546
Lewis, S.J.: OnionScan. https://github.com/s-rah/onionscan (2017). Accessed 17 July 2023
Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the ioc game. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 755–766. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2976749.2978315. https://dl.acm.org/doi/10.1145/2976749.2978315
Lison, P., Mavroeidis, V.: Automatic Detection of Malware-Generated Domains with Recurrent Neural Models (2017). arXiv:1709.07102
Liu, X., Keliris, A., Konstantinou, C., Sazos, M., Maniatakos, M.: Assessment of low-budget targeted cyberattacks against power systems. In: IFIP/IEEE International Conference on Very Large Scale Integration - System on a Chip, pp. 232–256. Springer (2019). https://doi.org/10.1007/978-3-030-23425-6_12
Liu, X., Nourbakhsh, A., Li, Q., Shah, S., Martin, R., Duprey, J.: Reuters tracer: Toward automated news production using large scale social media data. In: 2017 IEEE International Conference on Big Data (Big Data), vol. 2018-Janua, pp. 1483–1493. IEEE (2017). https://doi.org/10.1109/BigData.2017.8258082. http://ieeexplore.ieee.org/document/8258082/
Liu, S., Wang, Y., Zhang, J., Chen, C., Xiang, Y.: Addressing the class imbalance problem in twitter spam detection using ensemble learning. Comput. Secur. 69(September 2014), 35–49 (2017). https://doi.org/10.1016/j.cose.2016.12.004
Article Google Scholar
Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, T.Z.: Target-dependent twitter sentiment classification. In: HLT ’11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 151–160. ACM (2011). https://doi.org/10.5555/2002472.2002492
Longo, L., Goebel, R., Lecue, F., Kieseberg, P., Holzinger, A.: Explainable artificial intelligence: Concepts, applications, research challenges and visions. In: Machine Learning and Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Computer Science, vol. 12279, pp. 1–16 (2020). https://doi.org/10.1007/978-3-030-57321-8_1
López-Martínez, A., García-Díaz, J.A., Valencia-García, R., Ruiz-Martínez, A.: Cyberdect. A novel approach for cyberbullying detection on twitter. In: International Conference on Technologies and Innovation, pp. 109–121 (2019). https://doi.org/10.1007/978-3-030-34989-9_9
Lozano, M.G., Franke, U., Rosell, M., Vlassov, V.: Towards automatic veracity assessment of open source information. In: 2015 IEEE International Congress on Big Data, pp. 199–206. IEEE (2015). https://doi.org/10.1109/BigDataCongress.2015.36. http://ieeexplore.ieee.org/document/7207220/
Luber, M., Weisser, C., Säfken, B., Silbersdorff, A., Kneib, T., Kis-Katos, K.: Identifying topical shifts in twitter streams: an integration of non-negative matrix factorisation, sentiment analysis and structural break models for large scale data. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12887 LNCS, pp. 33–49 (2021). https://doi.org/10.1007/978-3-030-87031-7_3
Luo, Y., Ao, S., Luo, N., Su, C., Yang, P., Jiang, Z.: Extracting threat intelligence relations using distant supervision and neural networks. In: Peterson, G., Shenoi, S. (eds.) IFIP Advances in Information and Communication Technology. IFIP Advances in Information and Communication Technology, vol. 306, pp. 193–211. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88381-2_10. https://link.springer.com/10.1007/978-3-030-88381-2
Maciolek, P., Dobrowolski, G.: Cluo: web-scale text mining system for open source intelligence purposes. Comput. Sci. 14(1), 45 (2013). https://doi.org/10.7494/csci.2013.14.1.45
Article MathSciNet Google Scholar
Mackey, T., Kalyanam, J., Klugman, J., Kuzmenko, E., Gupta, R.: Solution to detect, classify, and report illicit online marketing and sales of controlled substances via twitter: using machine learning and web forensics to combat digital opioid access. J. Med. Internet Res. 20(4), 10029 (2018). https://doi.org/10.2196/10029
Article Google Scholar
Madakam, S., Holmukhe, R.M., Kumar Jaiswal, D.: The future digital work force: robotic process automation (rpa). J. Inf. Syst. Technol. Manag. 16, 1–17 (2019). https://doi.org/10.4301/S1807-1775201916001
Article Google Scholar
Mahaini, M.I., Li, S.: Detecting cyber security related twitter accounts and different sub-groups: a multi-classifier approach. In: ASONAM 2012 The 2021 IEEE/ ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1–11. ACM, Netherlands (2021). https://doi.org/10.1145/3487351.3492716. https://kar.kent.ac.uk/90995/
Major, M., Fugate, S., Mauger, J., Ferguson-Walter, K.: Creating cyber deception games. In: Proceedings - 2019 IEEE 1st International Conference on Cognitive Machine Intelligence, CogMI, pp. 102–111 (2019). https://doi.org/10.1109/CogMI48466.2019.00023
Maksimova, E.A., Sadovnikova, N.P., Baranov, V.V., Gromov, Y.Y., Lauta, O.S., Tret’yakova, L.V.: Robot technological system of analysis of cybersecurity information systems and communication networks. In: Journal of Physics: Conference Series, vol. 1661 (2020). https://doi.org/10.1088/1742-6596/1661/1/012119
Malik, J., Akhunzada, A., Bibi, I., Imran, M., Musaddiq, A., Kim, S.W.: Hybrid deep learning: an efficient reconnaissance and surveillance detection mechanism in sdn. IEEE Access 8, 134695–134706 (2020). https://doi.org/10.1109/ACCESS.2020.3009849
Article Google Scholar
Maltego: Maltego. Maltego (2022). https://www.maltego.com/
Mani, G.S.: Data Processing and Analytics for National Security Intelligence: An Overview, vol. 71, pp. 293–315. Springer, New York (2022). https://doi.org/10.1007/978-981-16-2937-2_20
Book Google Scholar
Mantere, M., Sailio, M., Noponen, S.: Detecting anomalies in printed intelligence factory network. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8924, pp. 1–16 (2015). https://doi.org/10.1007/978-3-319-17127-2_1
Marco Pennacchiotti, A.-M.P.: A machine learning approach to twitter user classification. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 281–288 (2021). https://ojs.aaai.org/index.php/ICWSM/article/view/14139
Marin, E., Almukaynizi, M., Shakarian, P.: Inductive and deductive reasoning to assist in cyber-attack prediction. In: 2020 10th Annual Computing and Communication Workshop and Conference, CCWC 2020, pp. 262–268 (2020) https://doi.org/10.1109/CCWC47524.2020.9031154
Marlin, T.J.: Detecting Fake News by Combining Cybersecurity, Open-source Intelligence, and Data Science. PhD Thesis, Utica College (2019). https://search.proquest.com/docview/2346618330?accountid=14478
Marques, C., Malta, S., Magalhães, J.P.: Dns dataset for malicious domains detection. Data Brief 38, 107342 (2021). https://doi.org/10.1016/j.dib.2021.107342
Article Google Scholar
Martinez Monterrubio, S.M., Noain-Sánchez, A., Verdú Pérez, E., González Crespo, R.: Coronavirus fake news detection via medosint check in health care official bulletins with cbr explanation: The way to find the real information source through osint, the verifier tool for official journals. Inf. Sci. 574, 210–237 (2021). https://doi.org/10.1016/j.ins.2021.05.074
Article Google Scholar
Martorella, C.: theHarvester. Edge Security Research (2019). https://github.com/laramies/theharvester
Masombuka, M., Grobler, M., Watson, B.: Towards an Artificial Intelligence Framework to Actively Defend Cyberspace. PhD thesis, University of Stellenbosch (2018)
Medenou, R.D., Mayo, V.M.C., Balufo, M.G., Castrillo, M.P., Garrido, F.J.G., Martinez, A.L., Catalán, D.N., Hu, A., Rodríguez-Bermejo, D.S., Vidal, J.M., De Riquelme, G.R.P., Berardi, A., De Santis, P., Torelli, F., Sanchez, S.L.: Cysas-s3: A novel dataset for validating cyber situational awareness related tools for supporting military operations. In: Proceedings of the 15th International Conference on Availability, Reliability and Security (2020). https://doi.org/10.1145/3407023.3409222. https://dl.acm.org/doi/10.1145/3407023.3409222
Mensikova, A., Mattmann, C.A., Gov, C.A.N.: Ensemble sentiment analysis to identify human trafficking in web data. ACM 1(February), 5 (2018)
Google Scholar
Micallef, S.: Spiderfoot. Spiderfoot (2021). https://github.com/smicallef/spiderfoot/releases
Microsoft: Miscrosoft Excel - Mac Edition. Microsoft (2022)
Miehling, E., Dong, R., Langbort, C., Basar, T.: Strategic inference with a single private sample. In: 2019 IEEE 58th Conference on Decision and Control (CDC), vol. December, pp. 2188–2193. IEEE (2019). https://doi.org/10.1109/CDC40024.2019.9029544 . https://ieeexplore.ieee.org/document/9029544/
Mittal, S., Joshi, A., Finin, T.: Cyber-all-intel: an ai for security related threat intelligence. arXiv preprint, 1–13 (2019) arXiv:1905.02895
Mohan, V.S., R, V., KP, S., Poornachandran, P.: S.p.o.o.f net: syntactic patterns for identification of ominous online factors. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 258–263. IEEE (2018). https://doi.org/10.1109/SPW.2018.00041. https://ieeexplore.ieee.org/document/8424657/
Momtazi, S.: Fine-grained german sentiment analysis on social media. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 1215–1220 (2012)
Morel, B.: Artificial intelligence a key to the future of cybersecurity. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 93–97. ACM (2011). https://doi.org/10.1145/2046684.2046699. https://dl.acm.org/doi/10.1145/2046684.2046699
Morgenstern, M.: Search is back. https://searchisback.com/ (2023). Accessed 17 July 2023
Motion: Motion (2024). https://www.usemotion.com/
Mubarak, S., et al.: Industrial datasets with ICS testbed and attack detection using machine learning techniques. Intell. Autom. Soft Comput. 31(3), 1345–1360 (2022)
Article MathSciNet Google Scholar
Mubin, O., Alnajjar, F., Shamail, A., Shahid, S., Simoff, S.: The new norm: computer science conferences respond to covid-19. Scientometrics 126, 1813–1827 (2021). https://doi.org/10.1007/s11192-020-03788-9
Article Google Scholar
Nadine Wirkuttis, H.K.: Artificial intelligence in cybersecurity. Cyber Intell. Secur. 1(1), 103–118 (2017)
Google Scholar
Nagapawan, Y.V.R., Prakash, K.B., Kanagachidambaresan, G.R.: Convolutional neural network. In: EAI/Springer Innovations in Communication and Computing, pp. 45–51 (2021). https://doi.org/10.1007/978-3-030-57077-4_6
Naiknaware, B., Kushwaha, B., Kawathekar, S.: Social media sentiment analysis using machine learning classifiers. Int. J. Comput. Sci. Mob. Comput. 6(6), 465–472 (2017)
Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: Sentiment analysis in twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1–18. Association for Computational Linguistics, Stroudsburg, PA, USA (2016). https://doi.org/10.18653/v1/S16-1001. http://aclweb.org/anthology/S16-1001
Namihira, Y., Segawa, N., Ikegami, Y., Kawai, K., Kawabe, T., Tsuruta, S.: High precision credibility analysis of information on twitter. In: 2013 International Conference on Signal-Image Technology & Internet-Based Systems, pp. 909–915. IEEE (2013). https://doi.org/10.1109/SITIS.2013.148. http://ieeexplore.ieee.org/document/6727298/
Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., By, T.: Sentiment analysis on social media. In: Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2012, pp. 919–926. IEEE (2012). https://doi.org/10.1109/ASONAM.2012.164
Neuman, Y., Lev-Ran, Y., Erez, E.S.: Screening for potential school shooters through the weight of evidence. Heliyon 6(10), 05066 (2020). https://doi.org/10.1016/j.heliyon.2020.e05066
Article Google Scholar
Nicart, E., Zanuttini, B., Gilbert, H., Grilheres, B., Praca, F.: Building document treatment chains using reinforcement learning and intuitive feedback. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 635–639. IEEE (2016). https://doi.org/10.1109/ICTAI.2016.0102. http://ieeexplore.ieee.org/document/7814662/
Nicart, E., Zanuttini, B., Grilhères, B., Giroux, P., Saval, A.: Amélioration continue d’une chaîne de traitement de documents avec l’apprentissage par renforcement. Revue d’intelligence Artificielle 31(6), 619–648 (2017). https://doi.org/10.3166/ria.31.619-648
Article Google Scholar
Nila, C., Preda, M., Apostol, I., Patriciu, V.V.: Reactive wifi honeypot. In: Proceedings of the 13th International Conference on Electronics, Computers and Artificial Intelligence, ECAI 2021 (2021). https://doi.org/10.1109/ECAI52376.2021.9515048
NIST: National Vulnerability Database. https://nvd.nist.gov/ (2022). Accessed 22 March 2022
Noel, L.: Redai : A Machine Learning Approach to Cyber Threat Intelligence Redai: A Machine Learning Approach to Cyber Threat Intelligence. PhD thesis, James Madison University (2021)
Noubours, S., Pritzkau, A., Schade, U.: Nlp as an essential ingredient of effective osint frameworks. In: 2013 Military Communications and Information Systems Conference, MCC 2013. Military University of Technology (2013)
Pagolu, V.S., Reddy, K.N., Panda, G., Majhi, B.: Sentiment analysis of twitter data for predicting stock market movements. In: 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), pp. 1345–1350. IEEE (2016). https://doi.org/10.1109/SCOPES.2016.7955659. http://ieeexplore.ieee.org/document/7955659/
Pai, U., et al.: Open source intelligence and its applications in next generation cyber security - a literature review. Int. J. Appl. Eng. Manage. Lett. 5(2), 1–25 (2021). https://doi.org/10.47992/IJAEML.2581.7000.0100
Article Google Scholar
Palmieri, R., Orabona, V., Cinque, N., Tangorra, S., Cappetta, D.: Reputation analysis towards discovery. In: Proceedings of the 6th International Conference on Data Science, Technology and Applications, pp. 321–330. SCITEPRESS - Science and Technology Publications (2017). https://doi.org/10.5220/0006487303210330. https://www.scitepress.org/papers/2017/64873/64873.pdf. http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006487303210330
Panagiotou, A., Ghita, B., Shiaeles, S., Bendiab, K.: FaceWallGraph: Using Machine Learning for Profiling User Behaviour from Facebook Wall, vol. 11660 LNCS, pp. 125–134. Springer (2019). https://doi.org/10.1007/978-3-030-30859-9_11
Parashar, D., Sanagavarapu, L.M., Reddy, Y.R.: Sql injection vulnerability identification from text. In: ACM International Conference Proceeding Series (2021). https://doi.org/10.1145/3452383.3452405
Pastor-Galindo, J., Nespoli, P., Gomez Marmol, F., Martinez Perez, G.: The not yet exploited goldmine of osint: opportunities, open challenges and future trends. IEEE Access 8, 10282–10304 (2020). https://doi.org/10.1109/ACCESS.2020.2965257
Article Google Scholar
Patil, P.P., Phansalkar, S., Kryssanov, V.V.: Topic modelling for aspect-level sentiment analysis. In: Advances in Intelligent Systems and Computing, vol. 828, pp. 221–229. Springer (2019). https://doi.org/10.1007/978-981-13-1610-4_23
Pavan Kumar, C.S., Dhinesh Babu, L.D.: Novel text preprocessing framework for sentiment analysis. In: Smart Innovation, Systems and Technologies, vol. 105, pp. 309–317. Springer (2019). https://doi.org/10.1007/978-981-13-1927-3_33
Pellet, H., Shiaeles, S., Stavrou, S.: Localising social network users and profiling their movement. Comput. Secur. 81, 49–57 (2019). https://doi.org/10.1016/j.cose.2018.10.009
Article Google Scholar
Pelzer, R.: Policing of terrorism using data from social media. Eur J Secur Res 3(2), 163–179 (2018). https://doi.org/10.1007/s41125-018-0029-9
Article Google Scholar
Perera, I., Hwang, J., Bayas, K., Dorr, B., Wilks, Y.: Cyberattack prediction through public text analysis and mini-theories. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 3001–3010. IEEE (2018). https://doi.org/10.1109/BigData.2018.8622106. https://ieeexplore.ieee.org/document/8622106/
Pingle, A., Piplai, A., Mittal, S., Joshi, A., Holt, J., Zak, R.: Relext: relation extraction using deep learning approaches for cybersecurity knowledge graph improvement. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 879–886. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3341161.3343519. https://dl.acm.org/doi/10.1145/3341161.3343519
Pratama, M.O., Satyawan, W., Jannati, R., Pamungkas, B., Raspiani, Syahputra, M.E., Neforawati, I.: The sentiment analysis of indonesia commuter line using machine learning based on twitter data. In: Journal of Physics: Conference Series, vol. 1193, p. 012029 (2019). https://doi.org/10.1088/1742-6596/1193/1/012029. https://iopscience.iop.org/article/10.1088/1742-6596/1193/1/012029
Queiroz, A., Keegan, B., Mtenzi, F.: Predicting software vulnerability using security discussion in social media. In: European Conference on Information Warfare and Security, ECCWS, pp. 628–634 (2017). https://www.semanticscholar.org/paper/Predicting-Software-Vulnerability-Using-Security-in-Queiroz-Keegan/3bcb4df05336060443638e71e9ee99c190c9109f
Rachman, F.F., Nooraeni, R., Yuliana, L.: Public opinion of transportation integrated (jak lingko), in dki jakarta, indonesia. Procedia Comput. Sci. 179(2020), 696–703 (2021). https://doi.org/10.1016/j.procs.2021.01.057
Article Google Scholar
Radanliev, P., De Roure, D., Maple, C., Ani, U.: Methodology for integrating artificial intelligence in healthcare systems: learning from covid-19 to prepare for disease x. AI and Ethics (0123456789) (2021). https://doi.org/10.1007/s43681-021-00111-x
Radanliev, P., Roure, D.C.D., Walton, R., Van Kleek, M., Montalvo, R.M., Santos, O., Maddox, L., Cannady, S.: Covid-19 what have we learned? The rise of social machines and connected devices in pandemic management following the concepts of predictive, preventive and personalised medicine. EPMA J 2020(11), 311–332 (2020). https://doi.org/10.2139/ssrn.3692585
Article Google Scholar
Rahul, K., Jindal, B.R., Singh, K., Meel, P.: Analysing public sentiments regarding covid-19 vaccine on twitter. In: 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 488–493. IEEE (2021). https://doi.org/10.1109/ICACCS51430.2021.9441693. https://ieeexplore.ieee.org/document/9441693 https://ieeexplore.ieee.org/document/9441693/
Rahul, L., Meetei, L.S., Jayanna, H.S.: Statistical and Neural Machine Translation for Manipuri-English on Intelligence Domain, vol. 736 LNEE, pp. 249–257. Springer (2021). https://doi.org/10.1007/978-981-33-6987-0_21
Rajalakshmi, E., Asik Ibrahim, N., Subramaniyaswamy, V.: A survey of machine learning techniques used to combat against the advanced persistent threat. In: Communications in Computer and Information Science, pp. 159–172. Springer (2019). https://doi.org/10.1007/978-981-15-0871-4_12
Rajalakshmi, R., Ramraj, S., Ramesh Kannan, R.: Transfer learning approach for identification of malicious domain names. In: SSCC 2018: Security in Computing and Communications, pp. 656–666 (2019). https://doi.org/10.1007/978-981-13-5826-5_51
Ramraj, S., Sivakumar, V., Ramnath G., K.: Real-time resume classification system using linkedin profile descriptions. In: 2020 International Conference on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE), pp. 1–4. IEEE (2020). https://doi.org/10.1109/CISPSSE49931.2020.9212209. https://ieeexplore.ieee.org/document/9212209/
Ranade, P., Piplai, A., Mittal, S., Joshi, A., Finin, T.: Generating fake cyber threat intelligence using transformer-based models. In: Proceedings of the International Joint Conference on Neural Networks 2021-July, 1–9 (2021) https://doi.org/10.1109/IJCNN52387.2021.9534192. arXiv:2102.04351
Recon-NG: Recon-NG. https://github.com/lanmaster53/recon-ng (2022). Accessed 05 December 2022
Reddy, D.M., Reddy, D.N.V.S., Reddy, D.N.V.S.: Twitter sentiment analysis using distributed word and sentence representation (2019). arXiv:1904.12580
Ren, F., Jiang, Z., Wang, X., Liu, J.: A dga domain names detection modeling method based on integrating an attention mechanism and deep neural network. Cybersecurity (2020). https://doi.org/10.1186/s42400-020-00046-6
Article Google Scholar
Riebe, T., Wirth, T., Bayer, M., Kühn, P., Kaufhold, M.A., Knauthe, V., Guthe, S., Reuter, C.: Cysecalert: An alert generation system for cyber security events using open source intelligence data. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12918 LNCS, pp. 429–446 (2021). https://doi.org/10.1007/978-3-030-86890-1_24
Roberts, A.: The importance of osint. In: Cyber Threat Intelligence, pp. 131–152. Apress, Berkeley, CA (2021)
Chapter Google Scholar
Rodriguez, A., Okamura, K.: Cybersecurity text data classification and optimization for CTI systems. Adv. Intell. Syst. Comput. (2020). https://doi.org/10.1007/978-3-030-44038-1_37
Article Google Scholar
Rodriguez, A., Okamura, K.: Enhancing data quality in real-time threat intelligence systems using machine learning. Soc. Netw. Anal. Min. 10(1), 1–22 (2020). https://doi.org/10.1007/s13278-020-00707-x
Article Google Scholar
Rushlene Kaur Bakshi, Navneet Kaur, Ravneet Kaur, G.K.: Opinion mining and sentiment analysis. In: 2016 3rd International Conference on Computing for Sustainable Global Development, pp. 452–455. IEEE (2016). https://ieeexplore.ieee.org/document/7724305/authors#authors
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey (2017). arXiv:1701.07179
Sakiyama, K., de Souza Rodrigues, L., Matsubara, E.T.: Can Twitter Data Estimate Reality Show Outcomes? vol. 12319 LNAI, pp. 466–482. Springer (2020). https://doi.org/10.1007/978-3-030-61377-8_32
Sakiyama, K.M., Silva, A.Q.B., Matsubara, E.T.: Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks. Proc. Int. Jt. Conf. Neural Netw. 2019(July), 1–8 (2019). https://doi.org/10.1109/IJCNN.2019.8852394
Article Google Scholar
Samaan, J.-L.: The RAND Corporation (1989–2009). Palgrave Macmillan US, New York (2012). https://doi.org/10.1057/9781137057358
Book Google Scholar
Sarker, I.H., Kayes, A.S.M., Badsha, S., Alqahtani, H., Watters, P., Ng, A.: Cybersecurity data science: an overview from machine learning perspective. J. Big Data (2020). https://doi.org/10.1186/s40537-020-00318-5
Article Google Scholar
Sarma, N., Singh, S.R., Goswami, D.: Influence of social conversational features on language identification in highly multilingual online conversations. Inf. Processi. Manag. 56(1), 151–166 (2019). https://doi.org/10.1016/j.ipm.2018.09.009
Article Google Scholar
Satyanarayan Raju Vadapalli, George Hsieh, K.S.N.: Twitterosint: Automated cybersecurity threat intelligence collection and analysis using twiter data. In: Proceedings of the International Conference on Security and Management (SAM), p. 60132. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), Athens (2018). https://www.proquest.com/docview/2153621548?pq-origsite=gscholar &fromopenview=true
Schaurer, F., Storger, J.: The evolution of open source intelligence (osint). J. U.S Intell. Stud. 19(3), 53–56 (2013)
Google Scholar
Scopus: Scopus (2021). https://www.scopus.com/ Accessed 08 November 2021
Searchcode: Seachcode. https://searchcode.com/ (2022). Accessed 05 December 2022
Senekal, B., Kotzé, E.: Open source intelligence (osint) for conflict monitoring in contemporary South Africa: challenges and opportunities in a big data context. Afr. Secur. Rev. 28(1), 19–37 (2019). https://doi.org/10.1080/10246029.2019.1644357
Article Google Scholar
Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks, pp. 959–962. ACM (2015). https://doi.org/10.1145/2766462.2767830. https://dl.acm.org/doi/10.1145/2766462.2767830
Shalunts, G., Backfried, G., Prinz, K.: Sentiment analysis of german social media data for natural disasters. In: ISCRAM 2014 Conference Proceedings - 11th International Conference on Information Systems for Crisis Response and Management, pp. 752–756 (2014). http://idl.iscram.org/files/shalunts/2014/940_Shalunts_etal2014.pdf
Shen, A., Chow, K.P.: Time and location topic model for analyzing lihkg forum data. In: 2020 13th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE), pp. 32–37. IEEE (2020). https://doi.org/10.1109/SADFE51007.2020.00009. https://ieeexplore.ieee.org/document/9133703/
Shin, G., Yooun, H., Shin, D., Shin, D.: Incremental learning method for cyber intelligence, surveillance, and reconnaissance in closed military network using converged it techniques. Soft. Comput. 22(20), 6835–6844 (2018). https://doi.org/10.1007/s00500-018-3433-1
Article Google Scholar
Shin, H.S., Kwon, H.Y., Ryu, S.J.: A new text classification model based on contrastive word embedding for detecting cybersecurity intelligence in twitter. Electronics (Switzerland) 9(9), 1–21 (2020). https://doi.org/10.3390/electronics9091527
Article Google Scholar
Shire, R., Shiaeles, S., Bendiab, K., Ghita, B., Kolokotronis, N.: Machine learning in detecting user’s suspicious behaviour through facebook wall. In: arXiv Preprint, pp. 65–76 (2019). arXiv:1910.14417
Shodan: Shodan. https://www.shodan.io/ (2022). Accessed 05 December 2022
Simonov, M., Bertone, F., Goga, K., Terzo, O.: Cyber Kill Chain Defender for Smart Meters, vol. 772, pp. 386–397. Springer (2019). https://doi.org/10.1007/978-3-319-93659-8_34
Simran, K., Balakrishna, P., Vinayakumar, R., Soman, K.P.: Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream, vol. 1208 CCIS, pp. 135–145. Springer (2020). https://doi.org/10.1007/978-981-15-4825-3_11
Singh, S., Fernandes, S.V., Padmanabha, V., Rubini, P.E.: Mcids-multi classifier intrusion detection system for iot cyber attack using deep learning algorithm. In: Proceedings of the 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, ICICV 2021 (March), pp. 354–360 (2021). https://doi.org/10.1109/ICICV50876.2021.9388579
Smadi, M., Qawasmeh, O.: A supervised machine learning approach for events extraction out of arabic tweets. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 114–119. IEEE (2018). https://doi.org/10.1109/SNAMS.2018.8554560. https://ieeexplore.ieee.org/document/8554560/
Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Predictive sentiment analysis of tweets: A stock market application. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 77–88 (2013). https://doi.org/10.1007/978-3-642-39146-0_8
Sotirakou, C., Karampela, A., Mourlas, C.: Evaluating the role of news content and social media interactions for fake news detection. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12887 LNCS, pp. 128–141. Springer (2021). https://doi.org/10.1007/978-3-030-87031-7_9
Springer: Springer Link (2021)
Spyse: Spyse. https://spyse-dev.readme.io/reference/quick-start (2022). Accessed 05 December 2022
Strohmeier, M., Smith, M., Lenders, V., Martinovic, I.: Classi-fly: inferring aircraft categories from open data using machine learning. arXiv preprint (2019) arXiv:1908.01061
Stumptner, M., Mayer, W., Grossmann, G., Liu, J., Li, W., Casanovas, P., De Koker, L., Mendelson, D., Watts, D., Bainbridge, B.: An architecture for establishing legal semantic workflows in the context of integrated law enforcement. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10791, pp. 124–139. Springer (2018). https://doi.org/10.1007/978-3-030-00178-0_8
Susnea, E.: A real-time social media monitoring system as an open source intelligence (osint) platform for early warning in crisis situations. In: International Conference KNOWLEDGE-BASED ORGANIZATION, vol. 24, pp. 427–431 (2018). https://doi.org/10.1515/kbo-2018-0127. https://sciendo.com/pdf/10.1515/kbo-2018-0127 https://www.sciendo.com/article/10.1515/kbo-2018-0127
Szakonyi, A., Chellasamy, H., Vassilakos, A., Dawson, M.: Using technologies to uncover patterns in human trafficking. Adv. Intell. Syst. Comput. (2021). https://doi.org/10.1007/978-3-030-70416-2_64
Article Google Scholar
Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., Li, P.: User-level sentiment analysis incorporating social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’11, p. 1397. ACM Press, New York, New York, USA (2011). https://doi.org/10.1145/2020408.2020614 . http://dl.acm.org/citation.cfm?doid=2020408.2020614
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 44, pp. 1555–1565. Association for Computational Linguistics, Stroudsburg, PA, USA (2014). https://doi.org/10.3115/v1/P14-1146
Tariq, I., Sindhu, M.A., Abbasi, R.A., Khattak, A.S., Maqbool, O., Siddiqui, G.F.: Resolving cross-site scripting attacks through genetic algorithm and reinforcement learning. Expert Syst. Appl. 168(August 2020), 114386 (2021). https://doi.org/10.1016/j.eswa.2020.114386
Article Google Scholar
Tavarez, D.: PwnDB. https://github.com/davidtavarez/pwndb (2022). Accessed 05 December 2022
Terán, L., Mancera, J.: Dynamic profiles using sentiment analysis and twitter data for voting advice applications. Gov. Inf. Q. 36(3), 520–535 (2019). https://doi.org/10.1016/j.giq.2019.03.003
Article Google Scholar
Tewari, A.: Decoding the black box: interpretable methods for post-incident counter-terrorism investigations. In: ACM web science conference. Websci, Southampton (2020). https://www.southampton.ac.uk/~sem03/STAIDCC20_tewari_paper_07_07_2020.pdf
Theron, P., Kott, A.: When autonomous intelligent goodware will fight autonomous intelligent malware: A possible future of cyber defense. Proceedings - IEEE Military Communications Conference MILCOM 2019-Novem, 1–7 (2019) https://doi.org/10.1109/MILCOM47813.2019.9021038 arXiv:1912.01959
Tiwari, S., Verma, R., Jaiswal, J., Rai, B.K.: Open Source Intelligence Initiating Efficient Investigation and Reliable Web Searching vol. 1244 CCIS, pp. 151–163. Springer (2020). https://doi.org/10.1007/978-981-15-6634-9_15
Translator, O.D.: DocTranslator (2021). https://www.onlinedoctranslator.com/en/ Accessed 6 December 2021
Truvé, S.: Threats of tomorrow: using artificial intelligence to predict malicious infrastructure activity. Record. Future 2016, 204–212 (2016)
Google Scholar
Tundis, A., Ruppert, S., Mühlhäuser, M.: On the Automated Assessment of open-source cyber threat intelligence sources. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12138 LNCS, pp. 453–467 (2020) https://doi.org/10.1007/978-3-030-50417-5_34
Twitter: Twitter API. https://developer.twitter.com/en/docs/twitter-api (2022). Accessed 15 December 2022
Tyagi, H., Kumar, R.: Attack and anomaly detection in IoT networks using supervised machine learning approaches. Revue d’Intelligence Artificielle 35(1), 11–21 (2021). https://doi.org/10.18280/ria.350102
Article Google Scholar
Uehara, K., Nishikawa, H., Yamamoto, T., Kawauchi, K., Nishigaki, M.: Analysis of the relationship between psychological manipulation techniques and personality factors in targeted emails. In: Advances in Intelligent Systems and Computing, vol. 1151 AISC, pp. 338–351. Springer (2020). https://doi.org/10.1007/978-3-030-33506-9_30
Upadhayay, B., Lodhia, Z.A.M., Behzadan, V., Haven, W.: Combating human trafficking via automatic osint collection , validation and fusion. In: 15th International AAAI Conference on Web and Social Media. Association for the Advancement of Artificial Intelligence, Connecticut (2020). http://workshop-proceedings.icwsm.org/pdf/2021_17.pdf
Verdejo, D.P., Mercier-Laurent, E.: Video intelligence as a component of a global security system. In: IFIP International Workshop on Artificial Intelligence for Knowledge Management, pp. 132–145. Springer (2019). https://doi.org/10.1007/978-3-030-29904-0_10
Vinayakumar, R., Soman, K.P., Poornachandran, P.: Detecting malicious domain names using deep learning approaches at scale. J. Intell. Fuzzy Syst. 34(3), 1355–1367 (2018). https://doi.org/10.3233/JIFS-169431
Article Google Scholar
Vinayakumar, R., Soman, K.P., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious url’s. J. Intell. Fuzzy Syst. 34(3), 1333–1343 (2018). https://doi.org/10.3233/JIFS-169429
Article Google Scholar
Vinayakumar, R., Soman, K.P., Poornachandran, P., Sachin Kumar, S.: Evaluating deep learning approaches to characterize and classify the dgas at scale. J. Intell. Fuzzy Syst. 34(3), 1265–1276 (2018). https://doi.org/10.3233/JIFS-169423
Article Google Scholar
Vinayakumar, R., Soman, K.P., Prabaharan Poornachandran, A.S., Elhoseny, M.: Improved dga domain names detection and categorization using deep learning architectures with classical machine learning algorithms. Adv. Sci. Technol. Secur. Appl. (2019). https://doi.org/10.1007/978-3-030-16837-7_8
Article Google Scholar
Vinayakumar, R., Alazab, M., Srinivasan, S., Pham, Q.V., Padannayil, S.K., Simran, K.: A visualized botnet detection system based deep learning for the internet of things networks of smart cities. IEEE Trans. Ind. Appl. 56(4), 4436–4456 (2020). https://doi.org/10.1109/TIA.2020.2971952
Article Google Scholar
WalletExplorer: WalletExplorer (2023)
Wan, Y., Gao, Q.: An ensemble sentiment classification system of twitter data for airline services analysis. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1318–1325. IEEE (2015). https://doi.org/10.1109/ICDMW.2015.7. http://ieeexplore.ieee.org/document/7395820/
Wang, M.-H., Tsai, M.-H., Yang, W.-C., Lei, C.-L.: Infection categorization using deep autoencoder. In: IEEE INFOCOM 2018 - IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp. 1–2. IEEE (2018). https://doi.org/10.1109/INFCOMW.2018.8406878. https://ieeexplore.ieee.org/document/8406878/
Wang, T., Chen, L.-C., Genc, Y.: A dictionary-based method for detecting machine-generated domains. Inform. Secur. J. Glob. Perspect. 30(4), 205–218 (2021). https://doi.org/10.1080/19393555.2020.1834650
Article Google Scholar
Warefare, G.: Greyhat Warefare. https://grayhatwarfare.com/ (2022). Accessed 27 May 2022
Wei, Y., Zou, F.: Automatic generation of malware threat intelligence from unstructured malware traces. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds.) Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 398 LNICST, pp. 44–61. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90019-9_3
Whois: Whois. https://www.whois.com/whois/ (2022). Accessed 05 December 2022
Wilkinson, G., Legg, P.: “what did you say?”: Extracting unintentional secrets from predictive text learning systems. In: 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), pp. 1–8. IEEE (2020). https://doi.org/10.1109/CyberSecurity49315.2020.9138882. https://ieeexplore.ieee.org/document/9138882/
Williams, H., Blum, I.: Defining Second Generation Open Source Intelligence (OSINT) for the Defense Enterprise. Rand Corporation, Santa Monica (2018). https://doi.org/10.7249/rr1964
Book Google Scholar
Xu, H., Dong, M., Zhu, D., Kotov, A., Carcone, A.I., Naar-King, S.: Text classification with topic-based word embedding and convolutional neural networks. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 88–97. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2975167.2975176. https://dl.acm.org/doi/10.1145/2975167.2975176
Yaacoub, J.P.A., Noura, H.N., Salman, O., Chehab, A.: Robotics cyber security: vulnerabilities, attacks, countermeasures, and recommendations. Int. J. Inf. Secur. (2021). https://doi.org/10.1007/s10207-021-00545-8
Yadav, S., Reddy, A.K.K., Reddy, A.L.N., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th Annual Conference on Internet Measurement - IMC ’10, p. 48. ACM Press, New York, New York, USA (2010). https://doi.org/10.1145/1879141.1879148. http://portal.acm.org/citation.cfm?doid=1879141.1879148
Yang, W., Lam, K.Y.: Automated Cyber Threat Intelligence Reports Classification for Early Warning of Cyber Attacks in Next Generation SOC, vol. 11999 LNCS, pp. 145–164. Springer (2020). https://doi.org/10.1007/978-3-030-41579-2_9
Yu, B., Pan, J., Gray, D., Hu, J., Choudhary, C., Nascimento, A.C.A., De Cock, M.: Weakly supervised deep learning for the detection of domain generation algorithms. IEEE Access 7, 51542–51556 (2019). https://doi.org/10.1109/ACCESS.2019.2911522
Article Google Scholar
Yue, L., Chen, W., Li, X., Zuo, W., Yin, M.: A survey of sentiment analysis in social media. Knowl. Inf. Syst. 60(2), 617–663 (2019). https://doi.org/10.1007/s10115-018-1236-4
Article Google Scholar
Zhou, Z.-H.: Machine Learning, pp. 181–182. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-1967-3
Zhuk, D., Tretiakov, A., Gordeichuk, A., Puchkovskaia, A.: Methods to identify fake news in social media using artificial intelligence technologies. In: International Conference on Digital Transformation and Global Society DTGS 2018: Digital Transformation and Global Society, pp. 446–454. Springer (2018). https://doi.org/10.1007/978-3-030-02843-5_36
Zhuk, D., Tretiakov, A., Gordeichuk, A.: Methods to identify fake news in social media using machine learning. In: Proceedings of the 22st Conference of Open Innovations Association FRUCT, pp. 59–40159404 (2018). http://dl.acm.org/citation.cfm?id=3266365.3266424
Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art in twitter sentiment analysis. ACM Trans. Manag. Inf. Syst. 9(2), 1–29 (2018). https://doi.org/10.1145/3185045
Article Google Scholar
Zizzo, G., Hankin, C., Maffeis, S., Jones, K.: Adversarial machine learning beyond the image domain. In: Proceedings of the 56th Annual Design Automation Conference 2019, pp. 1–4. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3316781.3323470. https://ieeexplore.ieee.org/document/8806924 https://dl.acm.org/doi/10.1145/3316781.3323470
Zunino, R., Bisio, F., Peretti, C., Surlinelli, R., Scillia, E., Ottaviano, A., Sangiacomo, F.: An analyst-adaptive approach to focused crawlers. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013, pp. 1073–1077. ACM and IEEE (2013). https://doi.org/10.1145/2492517.2500328 . https://ieeexplore.ieee.org/document/6785835

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Computer Science and IT, La Trobe University, Plenty Road and, Kingsbury Dr, Bundoora, VIC, 3086, Australia
Thomas Oakley Browne, Mohammad Abedin & Mohammad Jabed Morshed Chowdhury

Authors

Thomas Oakley Browne
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Abedin
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Jabed Morshed Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Oakley Browne.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare concerning this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Browne, T.O., Abedin, M. & Chowdhury, M.J.M. A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications. Int. J. Inf. Secur. 23, 2911–2938 (2024). https://doi.org/10.1007/s10207-024-00868-2

Download citation

Published: 05 June 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s10207-024-00868-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A systematic review on research utilising artificial intelligence for open source intelligence (OSINT) applications

Abstract

Similar content being viewed by others

Artificial Intelligence Use in e-Government Services: A Systematic Interdisciplinary Literature Review

A systematic review of artificial intelligence impact assessments

Auditing large language models: a three-layered approach

1 Introduction

2 Background

2.1 Open source intelligence (OSINT)

2.2 The OSINT process and intelligence cycle

2.3 OSINT techniques and tools

2.4 Artificial intelligence and machine learning

2.5 Combining AI, machine learning and OSINT

3 Methodology

3.1 Academic database search

3.2 Initial evaluation

3.3 Article criteria

3.4 Quality questions

3.5 Research questions

3.5.1 What is the trend in AI and machine-learning-based OSINT?

3.5.2 What geographical regions are contributing the most to this area of study?

3.5.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

3.5.4 What machine learning algorithms, techniques or tools are being used in OSINT?

3.5.5 What phases of the intelligence cycle does the included research apply to?

3.5.6 What metrics and analysis are provided to evaluate system performance in the included research?

3.5.7 What OSINT tools are applied in the included research?

4 Results

4.1 Systematic review results and paper selection

4.2 Research questions

4.2.1 What is the trend in AI and machine-learning-based OSINT?

4.2.2 What geographical regions are contributing the most to this area of study?

4.2.3 What professions and organisations could benefit from AI and machine-learning-based OSINT applications?

4.2.4 What machine learning algorithms, techniques or tools are being used in OSINT?

4.2.5 What phases of the intelligence cycle does the included research apply to?

4.2.6 What metrics and analysis are provided to evaluate system performance in the included research?

4.2.7 What are the sources of OSINT used in the included research?

4.2.8 What OSINT tools are applied in the included research?

5 Limitations, future directions and summary

5.1 Research limitations

5.1.1 OSINT tools

5.1.2 Penetration testing

5.1.3 Underutilised data sources

5.1.4 Dissemination

5.2 Future directions

5.2.1 Multi-lingual capabilities

5.2.2 Incorporation of additional data sources

5.2.3 Robustness against data poisoning and misinformation

5.2.4 Real world use

5.2.5 Alert generation and dissemination

5.2.6 The planning phase of the intelligence cycle

5.3 Summary of findings

5.4 Limitations of this study

6 Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation