Abstract
Nowadays, literature review is a necessary task when trying to solve a given problem. However, an exhaustive literature review is very time-consuming in today’s vast literature landscape. It can take weeks, even if looking only for abstracts or surveys. Moreover, choosing a method among others, and targeting searches within relevant problem and solution domains, are not easy tasks. These are especially true for young researchers or engineers starting to work in their field. Even if surveys that provide methods used to solve a specific problem already exist, an automatic way to do it for any use case is missing, especially for those who don’t know the existing literature. Our proposed tool, SARBOLD-LLM, allows discovering and choosing among methods related to a given problem, providing additional information about their uses in the literature to derive decision-making insights, in only a few hours. The SARBOLD-LLM comprises three modules: (1: Scopus search) paper selection using a keyword selection scheme to query Scopus API; (2: Scoring and method extraction) relevancy and popularity scores calculation and solution method extraction in papers utilizing OpenAI API (GPT 3.5); (3: Analyzes) sensitivity analysis and post-analyzes which reveals trends, relevant papers and methods. Comparing the SARBOLD-LLM to manual ground truth using precision, recall, and F1-score metrics, the performance results of AI in the oncology case study are 0.68, 0.9, and 0.77, respectively. SARBOLD-LLM demonstrates successful outcomes across various domains, showcasing its robustness and effectiveness. The SARBOLD-LLM addresses engineers more than researchers, as it proposes methods and trends without adding pros and cons. It is a useful tool to select which methods to investigate first and comes as a complement to surveys. This can limit the global search and accumulation of knowledge for the end user. However, it can be used as a director or recommender for future implementation to solve a problem.
Highlights
-
Automated support for literature choice and solution selection for any use case.
-
A generalized keyword selection scheme for literature database queries.
-
Trends in literature: detecting AI methods for a case study using Scopus and OpenAI.
-
A better understanding of the tool by sensitivity analyzes for Scopus and OpenAI.
-
Robust tool for different domains with promising OpenAI performance results.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Over the past decade, artificial intelligence (AI) and machine learning (ML) have gained significant attention in the fields of information technology and computer science, accompanying significant advancements and benefits across diverse industries and sectors [1, 2]. There are numerous AI/ML taxonomies presented in the literature [3, 4] that can be used to select a collection of AI strategies to address a specific challenge.Footnote 1Figure 1 illustrates an example taxonomy of the extensive AI/ML domain, encompassing multiple problem types and branches. However, to search for AI methods specific to a given use case, it is not only necessary to select a fitting branch in the taxonomy, but one also has to refine the search by comparing it to the standing knowledge base of the literature on the use case.
The increasing amount of literature presents a challenge for decision-makers seeking to employ AI/ML methodology in their specific problem domains. Manual review is time-consuming [5], often resulting in incomplete information without targeted searches. The current way to reduce the time spent in choosing a method consists of reading reviews or surveys that consider and explain the pros and cons of several methods belonging to a given field. This has however some limitations, a review considers in general a family of comparable and well-known methods but cannot provide an exhaustive list. Moreover, they do not consider the uses or interests of these methods over time. A tool that rapidly generates trend findings and examines solution methods for any use case would be extremely beneficial in various situations.
This research proposes a semi-automated tool developed to generate results on solution approaches for any use case. The tool is named SARBOLD-LLM where SARBOLD stands for Solution Approach Recommender Based On Literature Database and LLM stands for Large Language Model, respectively. Considering all the constraints and preferences needed in the use case, the objective is to help the end user provide a list of methods able to solve a given problem (the use case), sort them in such a way that it is easy for the end user to choose a method to investigate first and supply existing literature regarding this chosen method.
The study presents results on multiple problem domains on AI with a focus on the case study for AI/ML in oncology. The SARBOLD-LLM has three modules called “Module 1: Scopus search”, “Module 2: Scoring and method extraction”, and “Module 3: Analyzes”, respectively. It broadly contains the following steps:
-
Determining keywords systematically from the use case by a two-domain, three-level setup. (Module 1)
-
Automated literature extraction using selected keywords via Scopus Search application programming interface (API) [6]. (Module 1)
-
Extracting AI methods automatically from Scopus search results by using OpenAI API (text-davinci-003, GPT 3.5). (Module 2)
-
Sensitivity analyzes for both Scopus and OpenAI. (Module 3)
-
Post-analyzes based on results. (Module 3)
Some of the existing studies, which work on solution approach suggestions, have worked on the reduction of time and effort spent and automation. In addition to these, SARBOLD-LLM uses a keyword selection strategy to make sure the search is inside the relevant problem and solution domains. Additionally, it uses trend, popularity, and relevancy analyzes to draw decision-making conclusions about the solution techniques used for any given use case. Furthermore, the sensitivity analyzes performed for Scopus and OpenAI and the performance results obtained for OpenAI are among the positive values of this research.
The SARBOLD-LLM can be used iteratively for the decision makers to augment their understanding of the problem and similarly align the keywords better with the desired use case and specificity level, consequently obtaining better results.
The remainder of this article is structured as follows: Section 2 reviews the use of AI methods and the literature on model selection approaches. Section 3 presents the SARBOLD-LLM, and Section 4 showcases the performance, sensitivity, and post-analysis of the method. In Section 5, a discussion is given. Finally, a conclusion and suggestions for future works are provided in Section 6.
2 Literature Review
In the literature, there are reviews and surveys on which AI approaches or applications are used for different problem domains such as building and construction 4.0 [7], architecture, engineering and construction (AEC) [8], agriculture [9], watermarking [10], healthcare [11, 12], oil and gas sector [13], supply chain management [14], pathology [15], banking [16], finance [17], food adulteration detection [18], engineering and manufacturing [19], renewable energy-driven desalination system [20], path planning in the unmanned aerial vehicle (UAV) swarms [21], military [22], cybersecurity management [23], engineering design [24], vehicular ad-hoc networks [25], dentistry [26], green building [27], e-commerce [28], drug discovery [29], marketing [30], electricity supply chain automation [31], monitoring fetus via ultrasound images [32], internet of things (IoT) security [33]. In Table 1, AI approaches utilized in different problem domains are illustrated.
As can be seen from the aforementioned references, some of the problem domains in the example review and surveys are low-level, while some are high-level. The abstraction level is difficult to integrate for the solution domain while considering the reviews and surveys. Even if the same problem domain is considered, it will be an issue to depend on reviews or surveys in the literature as there may be an unlimited number of use case scenarios and levels of specificity. In addition, AI approaches specified in reviews or surveys can sometimes be very general. In this case, it may be necessary to make article reviews manually, but it causes labor and time loss [52]. Based on this idea, one can search for an automated way to minimize the time spent on manual review to get an AI method applied to a given use case.
The last decade saw significant steps toward a fully automatic model selection scheme with tools that select models for specialized use cases, generally referred to as model determination, parameter estimation, or hyperparameter selection tools. For forecasting time series in R, the popular forecast package by R. Hyndman et al. was presented, showcasing great initial results [53]. For regression models, the investigated selection procedures are generally based on the evaluation of smaller pre-defined sets of alternative methods, e.g., by information criteria (AIC, BIC), shrinkage methods (Lasso), stepwise regression, and or cross-validation schemes [54]. For ML-based model schemes, the methods proposed by B. Komer et al. [55] introduce the hyperopt package for hyper-parameter selection accompanying the Scikitlearn ML library, J. Snoek et al. [56] presents a Bayesian optimization scheme to identify the hyper-parameter configuration efficiently, and J. Bergstra et al. [57] identifies hyper-parameter configurations for training neural networks and DBNs by using a random search algorithm and two greedy sequential methods based on the expected improvement criterion. There also exist smaller frameworks, e.g., that of hyper-parameter tuning based on problem features with MATE [58], to model and fit autoregressive-to-anything processes in Java [59], or extensions to general-purpose optimization frameworks [60].
On the other hand, Dinter et al. [5] present a systematic literature review on the automation of systematic literature reviews with a concentration on all systematic literature review procedures as well as NLP and ML approaches. They stated that the main objective of automating a systematic literature review is to reduce time because human execution is costly, time-consuming, and prone to mistakes. Furthermore, the title and abstract are mostly used as features for several steps in the systematic review process proposed by Kitchenham et al. [61]. Even though our research does not stick to these procedures since our study was not a pure systematic literature review, the title and abstract are included for the OpenAI part. Additionally, they found the majority of systematic literature reviews to be automated using SVM and Bayesian networks, such as Naive Bayes classifiers, and there appears to be a distinct lack of evidence regarding the effectiveness of deep learning approaches in this regard.
The work of H. Chen et al. [62] produce a written section of relevant background material to a solution approach written in the form of a research paper through a BERT-based semantic classification model. Similarly, K. Heffernan et al. [63] utilize a series of machine learning algorithms as automatic classifiers to identify solutions and problems from non-solutions and non-solutions in scientific sentences with good results. These findings suggest that ML-based language models can be utilized in the automation of literature review with success.
Consequently, literature that explains the procedure of manually and automatically reviewing the literature is determined. Also, automated tuning frameworks for different modeling schemes are identified.
In Table 2, the benefit and function features of the methods used in the solution approach selection are compared. Features include saving time and effort (a), ensuring that the required problem and solution domains are searched for using a systematic keyword selection scheme (b), automating tasks (c), drawing conclusions about the relevance, popularity, and trend of the solution approaches used for the use case (d), and the pros and cons information of selected methods (e). The effectiveness of the methods used in a research paper for a specific use case is determined by relevancy metrics, which indicate how well the methods align with the specificity of the use case. The popularity metric is used to assess the research interest of a paper and the methods used in the paper. It is calculated by considering the number of citations and the age of the publication. On the other hand, trend analysis based on a total number of papers that use a specific solution approach provides insights, making knowledgeable decisions, and planning by examining trends and behaviors throughout time.
It is seen that there is a gap in the methods of solution approach selection in terms of satisfying the specified features. This article aims to investigate and address this gap to obtain a tool with all the features listed in Table 2.
3 Methodology
SARBOLD-LLM has three main modules illustrated by the flowchart in Figure 2. They are called “Module 1: Scopus search”, “Module 2: Scoring and method extraction”, and “Module 3: Analyzes”, respectively. Red ellipses are the start and end points, green parallelograms are inputs and outputs, blue rectangles are processes, orange diamonds are decisions, and the purple cylinder is the database. The red dashed line demonstrates the automated flow. The first module named “Scopus search” covers selecting keywords and getting results via Scopus Search.Footnote 2 Then the advanced search query returns the results where the fields are explained by Scopus Search Views.Footnote 3 In the second module named “scoring and method extraction”, solution methods that are used in each article are searched using the OpenAI API. In the third module named “analyzes”, sensitivity and post-analyzes are performed. The flow indicated by the red dashed line is performed automatically.
The current version of SARBOLD-LLM is given in Algorithms 1 and 2. It takes as input the table of keywords used to build the Scopus query (It could also take a query as input to allow more liberty after some iteration) and the prompt used for the OpenAI API. It provides at the end a list of methods, with associated scores (number of papers, citations, relevancy, which can be viewed year by year to detect trends) and papers. To help the end user to select and filter methods, an automated clustering has been made. It allows filtering methods used only one time and regroups similar names such as YOLO-v3 and YOLO-v4.
Algorithm 1: SARBOLD-LLM
Algorithm 2: clustering
SARBOLD-LLM scheme is appropriate for any problem and solution domain. It can be used for use cases in many different fields. Although the second module of this study focuses on AI methods, this module can also evolve into other topics, such as which hardware to be used and which scientific applications to be employed. However, as the SARBOLD-LLM relies on the OpenAI framework, ground truth data is created manually (by writing AI methods from the title and abstract of papers) to check the performance.
3.1 Module 1: Scopus Search
The goal of the first module is to search for a relevant pool of papers concerning the given problem a user is dealing with. To do so, a keyword selection scheme has been made to facilitate the user’s work. This scheme is then used to make a Scopus query, but also to score each paper.
To determine keywords, three specification levels (a general, an expanded, and a detailed one) are applied to the given problem and the searched solutions. This work is done manually as it involves eliciting user information on the use case. That means both classification and order are specified by the user. However, this stage is critical in recommending more appropriate solution approaches because these keywords are the first inputs to the SARBOLD-LLM and determine the pool of papers used in module 2 anointed “scoring and method extraction”. The different levels showcase;
-
Level 1 The general and necessary keywords. The keyword must be a part of the research paper for the paper to be in the selected pool of papers.
-
Level 2 The expanding keywords. Here only one of the keywords in the field is necessary for the paper to be selected.
-
Level 3 A further specification, use case-specific keywords. It is only used in the later stage to rank the identified solution methods with the relevancy metric.
Figure 3 gives an example of the proposed keyword selection scheme. The problem domain block covers the specific area or subject matter that a problem or project deals with. The scope and context of the issues that need to be addressed are defined by the problem domain. On the other hand, the solution approach, also known as the solution space, is the strategy or method used to address and solve the problems identified within the problem domain. It outlines how you plan to design, implement, and deliver a solution to the issues at hand.
These two blocks contain keywords according to the levels explained above. In the database search, adding some version of a keyword will only search for that specific keyword. Consequently, other versions of that word will also have to be added, but with the logic operator OR to indicate that either version can be used in the paper, for instance, “AI OR artificial intelligence”.
Notice that it is possible, but not necessary, to add keywords in each field, where a field refers to the specific level in the block. Leaving some fields empty will lead to a less specified pool of solution approaches, which consequently risks not fitting the use case. At the same time, adding too many keywords can lead either to a too restricted pool of papers (e.g., if one uses too many general keywords, and fulfills each field) or, if too many expanding keywords are given, to a less specific pool of paper as if the field was left empty.
After keyword selection, a query is created for Scopus Search API. Information is searched in titles, abstracts, and keywords of recent articles or conference papers, for the words defined in levels 1 and 2. The query can be, for example:
‘TITLE-ABS-KEY(("oncology") AND ("artificial intelligence" OR "AI") AND ("image processing")) AND DOCTYPE(ar OR cp) AND PUBYEAR > 2013’
Note that an expert can directly enter a query instead of using the keyword selection scheme. It is useful in some cases, for example: when it is difficult to find a good pool of papers using the query built by the keyword selection scheme, or when one wants to search in a specific field or a specific range of years, or for a first try if one wants to search only for reviews to get more appropriate academic keywords. However, it is still advantageous to follow this scheme as it helps to find, classify, and order the use case keywords, but also to specify what is important for scoring the paper.
Another way to help the end user get the query could be to extract keywords directly from a summary of the given use case. However, it is difficult to automate a keyword extraction scheme for several reasons. First, one needs to distinguish keywords used for establishing the search space and keywords used for scoring papers (or methods). Secondly, the query is sensitive to each keyword and it is often necessary to change the list of keywords used to get a more appropriate pool of papers, using synonyms or re-ordering some keywords. The third reason concerns the end user itself, as this scheme helps the end user to understand what are the most important features/constraints of his use case, and what is recommended but not necessary.
The publication year, the number of citations, the title, and the abstract information of all articles returned by the Scopus query are saved. After all the results are obtained, the title and abstract information of all the articles are examined manually, and articles that are irrelevant and have not applied/mentioned any AI method are eliminated.
3.2 Module 2: Scoring and Method Extraction
In this module, the relevancy and popularity metrics for the Scopus search results are computed, and solution methods are extracted from the title and abstract of each paper.
The relevancy metrics count the number of unique level 2 and 3 keywords appearing at least once in the title, abstract, or keywords. Ultimately, the metric represents how well the methods fit the specificity of the use case. For example, a paper named “Hybrid learning method for melanoma detection” yields in the abstract “image recognition (5 times), deep learning (2 times), real-time”; it will therefore have a relevancy metric of 3, taking into account Fig. 3.
The popularity metric is used to know the research interest of a paper and its methods. It is computed by
where 1 is added in the denominator to avoid zero divisions, and the citation number is obtained from the Scopus database.
After calculating the relevancy and popularity metrics, the SARBOLD-LLM inputs the title and abstract information to OpenAI and outputs the AI approaches used in each article.
When someone provides a text prompt in OpenAI API, the model will produce a text completion that tries to match the context or pattern you provided. Essential generative pre-trained transformers (GPT)-3 models, which generate natural language, are Davinci, Curie, Babbage, and Ada. In this article, “text-davinci-003” is used which is the most potent GPT-3 model and one of the models that are referred to as “GPT 3.5”.Footnote 4 Some issues to consider when preparing prompts are as followsFootnote 5:
-
It is advised to place instructions at the start of the prompt and to use ### or””” to demarcate the context from the instruction.
-
Speaking of what to do is preferable to speaking about what not to do.
The prompt can then be the following:
"Extract the names of the artificial intelligence approaches used from the following text. ###{" + str(document_text) + "}### \nA:"
where ‘document_text’ includes the title and abstract information of a paper.
To evaluate OpenAI’s performance, the ground truth AI methods are manually produced for non-filtered papers, regarding each paper's title and abstract information. Some high-level tags, such as “artificial intelligence” and “machine learning” are not included. In other words, the keywords used in Scopus search as a method are not involved. Precision, recall, and F1-measure are calculated for performance analysis.
3.3 Module 3: Analyzes
Firstly, sensitivity analyzes are done regarding Scopus and OpenAI in this module. Different combinations of level 1 and 2 keywords in the Scopus query are tried and the initial prompt is compared with other prompts for OpenAI.
For the selected use case, post-analyzes are performed by investigating which AI methods are used more often and which have higher relevancy or popularity metrics and comparing the results over different periods. This can be done manually, or, if there are too many methods listed, first a clustering algorithm can be used to help this investigation. Currently, DBSCAN [64] is used with (1 − the normalized Indel similarity) as distance performs well enough to support post-analysis. In the controlled comparison, it is seen that the clusters made manually and with DBSCAN are very close to each other for the same observed data.
4 Experiments
4.1 Use Case Definition
The use case example given in Figure 4 is tackled for the initial experiment. Here, AI is employed on the dataset of images to detect cancer.
4.2 Keywords From the Use Case Scenario
Using Fig. 3, the following keywords are defined: “oncology” as problem level 1, “artificial intelligence” and “AI” as solution level 1. Only “image processing” is used as solution level 2. By using only one level 2 keyword, the experiment stays rather general in the expected results.
For simplicity, level 3 keywords are not used in this example. Level 3 keywords do not affect the pool of papers but enable the user to elicit relevancy to papers that match their use case better. Because the computation of the relevancy metric is trivial, it is omitted in this example.
4.3 Scopus API Search and Manual Article Cleaning
According to the selected keywords, our initial query of Scopus APIFootnote 6 is given below.
‘TITLE-ABS-KEY(("oncology") AND ("artificial intelligence" OR "AI") AND ("image processing")) AND DOCTYPE(ar OR cp) AND PUBYEAR > 2013’
That means the keywords are searched in the title, abstract, and keyword parts. In addition, to limit the size of the results, the publications published after 2013 are selected, and to be more specific, the document type is restricted to “Article” or “Conference Paper”.
Digital object identifier (DOI), electronic identifier (EID), year, and citation number results that Scopus API returns are given in Table 6. The relevancy and popularity values are calculated as stated in Section 3.2. Currently, some papers can have a relevancy of 0, but by manually checking them, they stay relevant. It happens when keywords only appear in “INDEXTERMS” provided by Scopus but are absent from the title, abstract, and author keywords. Moreover, this is also due to a total absence of keyword level 3. It can be fixed by taking these automatic keywords for the OpenAI analysis.
The query returns 92 results. Among them, 25 publications (irrelevant, not technical, just survey, etc.) indicated in red in Table 6 are manually filtered. The remaining 67 articles are the results related to the domains and keywords of the use case. However, there are among them 12 papers, highlighted in orange, that apply an AI method successfully, but they do not mention particular methods (they do only highly general, level 1 and 2 ones) in the title and abstract; they will therefore be missed by the OpenAI extraction part that is stated in Section 4.4. However, it is not critical as trends are explored. Still, 55 papers remain to be analyzed. Note that of the 37 articles eliminated, these could have been marked as such if we had implemented the level 3 keywords.
4.4 OpenAI
The initial prompt for the OpenAI API is stated below.
"Extract the names of the artificial intelligence approaches used from the following text. ###{" + str(document_text) + "}### \nA:"
where ‘document_text’ includes the title and abstract information of a paper.
After finding methods using OpenAI and manual work, the performance values are calculated. It is assumed that manual findings are the actual methods. On the other hand, the results coming from OpenAI are the predicted ones.
4.4.1 OpenAI Performance
To analyze the results, the methods found by OpenAI are compared to the ones found by manual investigation (considered ground truths) for each paper. There are different performance determinants:
-
“true found” is the number of methods found by OpenAI that belong to the ground truths,
-
“false found” is the number of methods found by OpenAI that do not belong to the ground truths,
-
“true general found” is the number of methods found by OpenAI and the manual search but belonging to level 1 or 2 keywords or high-level keywords like “machine learning”,
-
“total manual” is the number of ground truths,
-
“missing” = “total manual”—“true found”.
With these data, precision, recall (or sensitivity or true positive rate), and F1-score can be calculated for performance analysis. To do that, the following metrics are employed:
-
True Positive (TP) = “true found”,
-
False Positive (FP) = “false found” + “true general found”,
-
False Negative (FN) = “missing”.
The “true general found” results are counted as FP since they are terms that are entered into the Scopus search or they are high-level keywords for our solution domain interest like “machine learning, artificial intelligence-based approach” as mentioned above.
For each paper that is not filtered, the performance metrics are calculated as follows.
-
\(Precision = TP/(TP + FP)\)
-
\(Recall = TP/(TP + FN)\)
-
\(F1-score=\frac{Precision x Recall}{(Precision+Recall)}\)
The F1-score assesses the trade-off between precision and recall [65]. When the F1-score is high, it indicates that both precision and recall are high. A lower F1-score indicates a larger imbalance in precision and recall.
Let’s check the following example, coming from [66]: “Transfer learning with different modified convolutional neural network models for classifying digital mammograms utilizing Local Dataset”.
“ [...] accuracy of different machine learning algorithms in diagnostic mammograms [...] Image processing included filtering, contrast limited adaptive histogram equalization (CLAHE), then [...] Data augmentation was also applied [...] Transfer learning of many models trained on the Imagenet dataset was used with fine-tuning. [...] NASNetLarge model achieved the highest accuracy [...] The least performance was achieved using DenseNet169 and InceptionResNetV2. [...]”
Manually, “transfer learning”, “convolutional neural network”, “NASNetLarge”, “DenseNet169”,“InceptionResNetV2”, “data augmentation”, and “fine-tuning” are found as AI methods. What OpenAI has found is indicated as well. Firstly, “transfer learning”, “convolutional neural network”, “data augmentation”, ‘NASNetLarge”, “DenseNet169” and “InceptionResNetV2″ are “true found”; so TP = 6. Secondly, “machine learning algorithms” is a “true general found”, and “contrast limited adaptive histogram equalization (CLAHE)” is “false found”, then FP = 2. Finally, “fine-tuning” is a “missing” and so FN = 1. With these, one can compute Precision = 6/(6+2) = 0.75, Recall = 6/(6 + 1) = 0.86 and F1-score = (2 × 0.75 × 0.86)/(0.75 +0.86) = 0.8.
In our studied case (see Appendix 2), the average scores are good, with an average precision of 0.7111, recall of 0.9226, and F1-score of 0.7775. There are 108 TPs, 51 FPs, and 12 FNs if all 55 results are grouped into a single result pool. Then the values of the precision, recall, and F1-score are 0.6793, 0.9, and 0.7742, respectively. All ground truths and OpenAI findings are presented in Table 7.
A manual literature review takes a week to complete, but the SARBOLD-LLM completes the whole task in a few hours (selecting AI approaches from the title and abstract of unfiltered 92 publications).
4.5 Sensitivity Analyzes
4.5.1 Scopus API Sensitivity
For the Scopus sensitivity analysis, different combinations of level 1 keywords are tried in the query. The initial query can be seen in Section 4.3.
Table 3 shows the impact of changing keywords in level 1. The first query in Table 3 is the initial one, given for comparison. Changing a problem domain keyword with another that could be seen as a synonym can greatly impact the papers found. Using the more specific keyword “machine learning” in the solution domain instead of “artificial intelligence” has an impact on the publications found. Similarly, in the problem domain using “cancer” instead of “oncology” has a great impact on the number of papers found. On the other hand, changing double quotes to braces does not have that much effect. Moreover, it seems that using only an abbreviation instead of an open form can change the number of results found. Using only the abbreviation has resulted in a poor paper pool.
However, despite the different pool of papers, the methods found by OpenAI are pretty much the same, both for the second and the third query. This means that using synonyms changes the pool of papers but not the methods used to solve the same kind of problem, which means that the method is robust to the keyword selection scheme.
4.5.2 OpenAI Sensitivity
To analyze the sensitivity of OpenAI, different prompts are tested, and the differences between proposed AI methods are checked. Results are summarized in Table 4, and details are provided in Table 8, Table 9, and Table 10 in Appendix 3.
The below prompts are used for analysis.
"Extract the names of the artificial intelligence approaches used from the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 1
"Just write the names of used artificial intelligence or machine learning methods in the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 2
"Just write the names of used artificial intelligence methods in the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 3
"Just write the names of artificial intelligence approaches used in the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 4
"Extract names of the used artificial intelligence approaches from the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 5
"Write the names of successfully applied artificial intelligence approaches in the following text. ###{" + str(document text) + "}### \nA:"
-
Prompt 6
"Extract the names of artificial intelligence approaches employed in the following text. ###{" + str(document text) + "}### \nA:"
In Table 4, the number in the last column is an enriched ratio, meaning that if two prompts are equal, it will obtain an infinite value. However, having a difference between two prompts will lead to a decreasing ratio, considering that two papers do not provide the same set of words but also how many words in the prompt are different.
The original prompt has a higher F1-score value than the other six prompts. With these few prompts, it can already be said that OpenAI is sensitive to the sentence used. However, it generally adds words with respect to the manual search, and extracting the most common words belonging to these results should be enough to find what the user is searching for. Moreover, it is observed that changing a word’s position has less impact than changing a word; the more words the user changes, the more differences appear. It also seems that using more common/usual words will give more generic results, closer to the ones that are being searched for; when using very specific instructions, notably in the action verbs, the results will generally be more irrelevant.
4.6 Post-Analyzes
The extracted AI methods for the use case described in Section 4.1 are presented in Appendix 4. The total number of appearances of the methods, their relevancy, and popularity metrics are showcased in Table 11 by years. Methods selected from articles that are not highlighted in Table 6 and appeared at least in two papers are discussed.
Figure 5 illustrates the summary chart of Table 11. It is seen from the figure that many different methods have been investigated to solve our example use case, but some are much more used or popular than others. These methods (e.g., class 2 (deep learning methods) and class 1 (artificial neural networks)) are the ones that the user should investigate in the first place to solve the given use case. To be more specific, until 2018 different types of neural networks, logistic regression, SVM, and random forest are popular methods. After 2018, SVM and neural networks are still utilized, and the extra trees classifier seems popular in 2022. However, the trend is being dominated by deep learning methods. Among the deep learning algorithms, CNN, U-Net, and AlexNet can be counted as the three most used and popular methods.
AI methods can be examined without making any classification, but in this case, there will be too many methods. To simplify this situation, the methods are divided into classes. In Appendix 4, specifics on method classification and detailed information for AI methods in these classes are provided. Moreover, a more detailed decision-making process can be made by using relevancy and popularity metrics. For example, these metrics support decision-making when being uncertain between two AI methods.
4.7 Experiments for Different Problem Domains
To check the robustness of the SARBOLD-LLM, different problem domains, and solution approaches are also considered for the Scopus search. The same initial prompt given in Section 4.4 is used for all use cases to extract AI methods by utilizing OpenAI API.
First, the same problem domain is kept, and the level 2 solution approach is changed as given in the below query.
‘TITLE-ABS-KEY(("oncology") AND ("artificial intelligence" OR "AI") AND ("natural language processing" OR "NLP")) AND DOCTYPE(ar OR cp) AND PUBYEAR > 2013’
The aforementioned search yields 35 documents. Although 5 of them effectively use an AI approach, they do not mention any particular methods in the title or abstract, and 15 of them are irrelevant or merely surveys. Consequently, 15 of them are selected in the manner described in Section 4.3. Figure 6 shows AI methods employed in selected papers. Until 2019, SVM seemed to be a popular method, and from 2019 the trend has shifted to deep learning algorithms. RNN, CNN, and BERT are among the deep learning methods that are more used after 2019. In addition, some of the most popular methods are BERT, LSTM, and GPT.
Secondly, the solution approach components are retained the same while changing the problem domain. The query for the “traffic control” issue domain is presented below.
‘TITLE-ABS-KEY(("traffic control") AND ("artificial intelligence" OR "AI") AND ("image processing")) AND DOCTYPE(ar OR cp) AND PUBYEAR > 2013’
The query returns 52 results, where nine are irrelevant or just surveys, and 20 use an AI method successfully, but they do not mention specific methods in the title and abstract. Therefore, 23 of them are selected. In Fig. 7, it is seen that until 2020, classical methods like SIFT, SURF, KNN, and decision trees are popular methods. After 2020, the deep learning methods class (that contains R-CNN, fast R-CNN, faster R-CNN, YOLO, deep simple online real-time tracking (DeepSORT), CNN, U-Net, etc.) is on the rise in terms of the number of uses and popularity.
Another query is the “satellite imagery” for the problem domain, given below. It returns 66 results and 37 of them are selected to be used in analyzes.
‘TITLE-ABS-KEY(("satellite imagery") AND ("artificial intelligence" OR "AI") AND ("image processing")) AND DOCTYPE(ar OR cp) AND PUBYEAR > 2013’
Figure 8 illustrates the summary of extracted AI methods for the “satellite imagery” problem domain. Class 1 includes CNN, DNN, DeepLabv3 + , FCN, U-Net, U-Net + + , encoder-decoder, attention mechanism, Res2Net, ResNet, LSTM, SegNet, V-Net, U2Net, AttuNet, LinkNet, mask RCNN, and cloud attention intelligent network (CAI-Net). On the other hand, class 2 covers ant colony optimization (ACO), genetic algorithm, particle swarm optimization (PSO), bat algorithm, and artificial bee colony (ABC). Until 2020, SVM, ANN, and ACO were frequently used and popular methods. After 2020, the use and popularity of class 1 and PSO appear to be increasing. In class 1, the top three most used and most popular methods are CNN, U-Net, and DNN. As can be seen from the trend, the first methods to be considered in this problem domain may be the deep learning methods given above.
In Table 5, OpenAI performance results for all experiments are given, where TP, FP, and FN values are considered as a single pool, i.e., performance metrics are not average values for each article result. It should also be taken into account that if the “true general found” words (i.e., machine learning, artificial intelligence, image processing) are not included in the FP, higher precision and F1-score values would have been obtained. Although the problem domain and solution approach change, similar performance results are attained, which is promising for the robustness of the SARBOLD-LLM.
5 Discussion
A big issue when utilizing automatic solution method selection schemes is the trust in the fit, relevancy, and popularity of the suggested methods. The fit to the actual use case depends on the ability of the human operator to interact with the tool and whether or not they understand the intricacies of the approach. With the SARBOLD-LLM tool, the human operator has the ability to validate the suggested methods from the accompanying pool of research papers, and due to the simplicity, responsiveness, and intuitiveness, it is relatively straightforward for the human operator to modify and align the usage of the tool with the overall goal of solving a problem. Additionally, to increase the tool’s performance in terms of operation requirements (e.g., explainability, trustworthiness) and resources (e.g., hardware), the necessary features or extra resources for AI methods can be added and expanded later if the detailed requirements and current resources are stated clearly.
For example, if explainability is required, many different methods exist for obtaining explainable AI (XAI) methods [67,68,69,70,71,72]. On the other hand, if trustworthiness is required, then according to the system, environment, goals, and other parameters where AI will be used, several alternative criteria for trustworthiness may be specified [73, 74].
Details or requirements such as explainability and trustworthiness can be retrieved in the keyword selection scheme in Fig. 3. Or, after AI methods are found by the SARBOLD-LLM, post hoc analyzes can be made with the requirements not used in the SARBOLD-LLM. In some use cases, such requirements or details may not be specified at the beginning of the AI system life cycle and, therefore, may not be included in the keyword selection phase.
Due to the specificity of certain use cases, there is a considerable risk that no research has been conducted on the specifics of the use case. Consequently, the proposed solution approach methods will likely not showcase a high score in the relevancy metric. Therefore, the literature pool must be investigated after the results are identified.
Ultimately, the SARBOLD-LLM’s applicability comes down to the objective of the application. It will comfortably propose methods already explained in the literature as to why it is very useful when identifying trends in the research communities. However, as the method identification is based on historical data that train the tool to determine what words within a research paper can be classified as a method, the tool will not fare well when dealing with entirely new solution approach schemes.
It is noteworthy that the relevancy explained in Section 3.2 is computed and saved at the same time as the other data. It could be useful in the future if one wants an automatic filter. On the other hand, if the pool of papers is too big to be manually filtered, it is possible to filter at the end of the process, when one is checking for the methods to be used. The main disadvantage of filtering after the whole process is that it can allow a lot of irrelevant papers to be analyzed by OpenAI, and this will modify the perception of the trends of research for the studied use case. However, note that SARBOLD-LLM is used to get trends in research about a given use case to support the selection of solution methods, and does not directly select a method for the user. It means that having some irrelevant papers analyzed in the whole process will not lead to a completely different result. Moreover, no information is lost, so the trends can be recomputed after filtering if necessary.
6 Conclusion
When the experiments are examined, the SARBOLD-LLM produces robust results concerning OpenAI performance for different problem and solution domains in its current state. In terms of the trend, up-to-date usage, and popularity of solution methods, SARBOLD-LLM quickly produces rich and advantageous information for the user. In addition, the recommended keyword selection scheme offers a very flexible structure in choosing the problem domain and solution approach for any use case.
The SARBOLD-LLM completes work in a few hours, which takes a week or more with a manual literature review (selecting AI methods from the title and abstract of 92 papers). It is more suitable for engineers as it proposes methods and trends without adding pros and cons. This limits knowledge accumulation but can be used as a guide for future implementation.
Several prior studies focusing on proposing solution approaches aim to decrease the time and effort spent, emphasizing automation. In alignment with these objectives, SARBOLD-LLM employs a keyword selection strategy to ensure targeted searches within relevant problem and solution domains. Moreover, it incorporates trend, popularity, and relevancy analyzes to derive decision-making insights regarding the optimal solution techniques for various use cases. The research also highlights other outcomes, including sensitivity analyzes conducted for Scopus and OpenAI, and performance results obtained for OpenAI.
6.1 Future Work
Due to the nature of the underlying problem, certain processes are technically more difficult to automate than others [5]. In its current form, the SARBOLD-LLM still needs a human to perform the keyword selection, check the results given by the query, classify the found methods, and validate the robustness of the solution. For future work, it would be of high value to remove the need for human intervention while presenting results that signify the trade-off for the different automated decisions. Our study towards automating these tasks is currently underway.
Simultaneously, employing versions from the updated suite of large language models, such as OpenAI’s GPT-4,Footnote 7 and exploring other databases (like Web of Science, PubMed, IEEE Xplore, etc.) are also future works. Besides, open-source alternatives to GPT-3 or GPT-4, such as GPT-NeoX-20B [75] and GPT-J [76], will be implemented to help in cutting costs.
The sensitivity analysis is split into two parts: queries and prompts. Queries highly depend on the keyword selection scheme and should be studied together. However, reasonably an automatic sensitivity analysis can be made using some variants of the initial query, like using quotation marks instead of brackets or using several forms of the same words. Later, it could be interesting to study the sensitivity concerning synonyms. On the other part, prompts can be analyzed more easily. Indeed, several sentences could be automatically generated concerning the initial one and then tested. The common pool of solutions, or using a scoring-like number of occurrences, could be a robust amicable solution.
Classifying methods is not easy as we want to keep a stratification level from general methods to specific ones. However, as deep learning is already used to classify images, e.g., gaining attention in cancer research [77], a deep learning method could pool different methods together and reduce the number of methods used like YOLO-v2, YOLOv4-tiny, etc. Without any logical pooling, a simple clustering approach based on the text, such as DBSCAN, can be used to make an automatic pooling for a sufficiently big set of methods extracted. However, if we want to automatically match a specific taxonomy, another method will be needed.
Currently, the SARBOLD-LLM only checks the title, abstract, and keywords for the solution approach determination. For certain papers, the specifics of the method are only introduced later in the paper, e.g., for hybrid methods. Consequently, an important extension will be to determine the applied method of a paper from the entirety of a paper.
As we are only providing trends about uses in the literature, the end user still needs to look at surveys to understand the pros and cons of the proposed methods. A future enhancement would be to add a module explaining the pros and cons of each recommended method.
Furthermore, it is planned to convert Python codes to a graphical user interface (GUI) to present automated applications to end users (especially, engineers). Automatic solution approach suggestion simplifies decision-making processes, enhances efficiency, and ensures that decision-makers (end users) have access to relevant, timely, and diverse information to address complex challenges.
Finally, the SARBOLD-LLM can essentially investigate any arbitrary characteristic of the literature rather than only the solution approaches — E.g., identifying problem formulations and varieties therein. Therefore, exploring how to do this manually will greatly benefit the research community.
Data Availability
The data presented in this study are available upon request from the corresponding author.
Notes
Abbreviations
- ABC :
-
Artificial bee colony
- ACO :
-
Ant colony optimization
- AEC :
-
Architecture, engineering and construction
- AI :
-
Artificial intelligence
- ANN :
-
Artificial neural network
- API :
-
Application programming interface
- BERT :
-
Bidirectional encoder representation from transformers
- BiLSTM :
-
Bidirectional long short-term memory
- BoW :
-
Bag of words
- BPNN :
-
Back propagation neural network
- BTM :
-
Bagging tree model
- CAI-Net :
-
Cloud attention intelligent network
- cGAN :
-
Conditional generative adversarial network
- CLAHE :
-
Contrast limited adaptive histogram equalization
- CNN :
-
Convolutional neural network
- DBSCAN :
-
Density-based spatial clustering of applications with noise
- DBN :
-
Deep belief network
- DCNN :
-
Deep convolutional neural network
- DeepSORT :
-
Deep simple online real-time tracking
- DNN :
-
Deep neural network
- DOI :
-
Digital object identifier
- DRL :
-
Deep reinforcement learning
- DSN :
-
Deeply supervised nets
- EID :
-
Electronic identifier
- ELM :
-
Extreme machine learning
- FCN :
-
Fully convolution networks
- FN :
-
False negative
- FP :
-
False positive
- GAN :
-
Generative adversarial network
- GCN :
-
Graph convolutional network
- GLCM :
-
Gray level co-occurrence matrix
- GNN :
-
Graph neural network
- GPT :
-
Generative pre-trained transformers
- GRU :
-
Gated recurrent unit
- GUI :
-
Graphical user interface
- IoT :
-
Internet of things
- KCNet :
-
Kernel-based canonicalization network
- KNN :
-
K-nearest neighbors
- LDA :
-
Latent Dirichlet allocation
- LSTM :
-
Long short-term memory
- MFFNN :
-
Multi-layer feed-forward neural network
- ML :
-
Machine learning
- MLP :
-
Multiple-layer perception
- NLP :
-
Natural language processing
- ORB :
-
Oriented fast and rotated brief
- PANN :
-
Paraconsistent artificial neural network
- PCA :
-
Principal component analysis
- PNN :
-
Probabilistic neural network
- PSO :
-
Particle swarm optimization
- RBM :
-
Restricted Boltzmann machine
- R-CNN :
-
Region-based convolutional neural network
- RNN :
-
Recurrent neural network
- SARBOLD-LLM :
-
Solution approach recommender based on literature database-large language model
- SIFT :
-
Scale-invariant feature transform
- SMO :
-
Sequential minimal optimization
- SSD :
-
Single shot detector
- SURF :
-
Speeded up robust features
- SVC :
-
Support vector classifier
- SVM :
-
Support vector machine
- SVR :
-
Support vector regression
- TF-IDF :
-
Term frequency-inverse document frequency
- TP :
-
True positive
- UAV :
-
Unmanned aerial vehicle
- XAI :
-
Explainable artificial intelligence
- XGBoost :
-
EXtreme Gradient Boosting
- YOLO :
-
You only look once
References
Devagiri JS, Paheding S, Niyaz Q, Yang X, Smith S. Augmented reality and artificial intelligence in industry: Trends, tools, and future challenges. Expert Syst Appl. 2022;207:118002. https://doi.org/10.1016/j.eswa.2022.118002.
Jan Z, Ahamed F, Mayer W, Patel N, Grossmann G, Stumptner M, Kuusk A. Artificial intelligence for industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Syst Appl. 2023;216:119456. https://doi.org/10.1016/j.eswa.2022.119456.
von Rueden L, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, Kirsch B, Pfrommer J, Pick A, Ramamurthy R, Walczak M, Garcke J, Bauckhage C, Schuecker J. Informed machine learning – a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Trans Knowl Data Eng. 2023;35(1):614–33. https://doi.org/10.1109/TKDE.2021.3079836.
European Commission, Joint Research Centre, Samoili S, López Cobo M, Delipetrev B et al (2021) AI watch, defining artificial intelligence 2.0 – Towards an operational definition and taxonomy for the AI landscape. Publications Office of the European Union. https://doi.org/10.2760/019901.
van Dinter R, Tekinerdogan B, Catal C. Automation of systematic literature reviews: A systematic literature review. Inf Softw Technol. 2021;136:106589. https://doi.org/10.1016/j.infsof.2021.106589.
Rose ME, Kitchin JR. pybliometrics: Scriptable bibliometrics using a python interface to scopus. SoftwareX. 2019;10:100263. https://doi.org/10.1016/j.softx.2019.100263.
Baduge SK, Thilakarathna S, Perera JS, Arashpour M, Sharafi P, Teodosio B, Shringi A, Mendis P. Artificial intelligence and smart vision for building and construction 4.0: Machine and deep learning methods and applications. Autom Constr. 2022;141:104440. https://doi.org/10.1016/j.autcon.2022.104440.
Darko A, Chan AP, Adabre MA, Edwards DJ, Hosseini MR, Ameyaw EE. Artificial intelligence in the aec industry: Scientometricanalysis and visualization of research activities. Autom Constr. 2020;112:103081. https://doi.org/10.1016/j.autcon.2020.103081.
Elbasi E, Mostafa N, AlArnaout Z, Zreikat AI, Cina E, Varghese G, Shdefat A, Topcu AE, Abdelbaki W, Mathew S, Zaki C. Artificial intelligence technology in the agricultural sector: A systematic literature review. IEEE Access. 2023;11:171–202. https://doi.org/10.1109/ACCESS.2022.3232485.
Amrit P, Singh AK. Survey on watermarking methods in the artificial intelligence domain and beyond. Comput Commun. 2022;188:52–65. https://doi.org/10.1016/j.comcom.2022.02.023.
Ali O, Abdelbaki W, Shrestha A, Elbasi E, Alryalat MAA, Dwivedi YK. A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities. J Innov Knowl. 2023;8(1): 100333. https://doi.org/10.1016/j.jik.2023.100333.
Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer-and other disease-related point mutations from the biomedical literature. Bioinformatics. 2011;27(3):408–15. https://doi.org/10.1093/bioinformatics/btq667.
Gupta D, Shah M. A comprehensive study on artificial intelligence in oil and gas sector. Environ Sci Pollut Res. 2022;29:50984–97. https://doi.org/10.1007/s11356-021-15379-z.
Pournader M, Ghaderi H, Hassanzadegan A, Fahimnia B. Artificial intelligence applications in supply chain management. Int J Prod Econ. 2021;241: 108250. https://doi.org/10.1016/j.ijpe.2021.108250.
Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad Pathol. 2019;6:2374289519873088. https://doi.org/10.1177/2374289519873088.
Doumpos M, Zopounidis C, Gounopoulos D, Platanakis E, Zhang W. Operational research and artificial intelligence methods in banking. Eur J Oper Res. 2023;306(1):1–16. https://doi.org/10.1016/j.ejor.2022.04.027.
Ahmed S, Alshater MM, Ammari AE, Hammami H. Artificial intelligence and machine learning in finance: A bibliometric review. Res Int Bus Financ. 2022;61:101646. https://doi.org/10.1016/j.ribaf.2022.101646.
Goyal K, Kumar P, Verma K. Food adulteration detection using artificial intelligence: A systematic review. Arch Computat Methods Eng. 2022;29:397–426. https://doi.org/10.1007/s11831-021-09600-y.
Nti IK, Adekoya AF, Weyori BA, Nyarko-Boateng O. Applications of artificial intelligence in engineering and manufacturing: a systematic review. J Intell Manuf. 2022;33:1581–601. https://doi.org/10.1007/s10845-02101771-6.
He Q, Zheng H, Ma X, Wang L, Kong H, Zhu Z. Artificial intelligence application in a renewable energy-driven desalination system: A critical review. Energy AI. 2022;7:100123. https://doi.org/10.1016/j.egyai.2021.100123.
Puente-Castro A, Rivero D, Pazos A, Fernandez-Blanco E. A review of artificial intelligence applied to path planning in uav swarms. Neural Comput Applic. 2022;34:153–70. https://doi.org/10.1007/s00521-02106569-4.
Galán JJ, Carrasco RA, LaTorre A. Military applications of machine learning: A bibliometric perspective. Mathematics. 2022;10(9):1397. https://doi.org/10.3390/math10091397.
Dapel ME, Asante M, Uba CD, Agyeman MO. Artificial intelligence techniques in cybersecurity management. In: Jahankhani H, editor. Cybersecurity in the Age of Smart Societies. Cham: Springer International Publishing; 2023. p. 241–55. https://doi.org/10.1007/978-3-03120160-8_14.
Yüksel N, Börklü HR, Sezer HK, Canyurt OE. Review of artificial intelligence applications in engineering design perspective. Eng Appl Artif Intell. 2023;118:105697. https://doi.org/10.1016/j.engappai.2022.105697.
Mchergui A, Moulahi T, Zeadally S. Survey on artificial intelligence (ai) techniques for vehicular ad-hoc networks (vanets). Veh Commun. 2022;34:100403. https://doi.org/10.1016/j.vehcom.2021.100403.
Carrillo-Perez F, Pecho OE, Morales JC, Paravina RD, Della Bona A, Ghinea R, Pulgar R, Pérez MDM, Herrera LJ. Applications of artificial intelligence in dentistry: A comprehensive review. J Esthet Restor Dent. 2022;34(1):259–80. https://doi.org/10.1111/jerd.12844.
Debrah C, Chan AP, Darko A. Artificial intelligence in green building. Autom Constr. 2022;137:104192. https://doi.org/10.1016/j.autcon.2022.104192.
Bawack RE, Wamba SF, Carillo KDA, Akter S. Artificial intelligence in e-commerce: a bibliometric study and literature review. Electron Markets. 2022;32:297–338. https://doi.org/10.1007/s12525-022-00537-z.
Deng J, Yang Z, Ojima I, Samaras D, Wang F. Artificial intelligence in drug discovery: applications and techniques. Brief Bioinform. 2021;23(1):bbab430. https://doi.org/10.1093/bib/bbab430.
Chintalapati S, Pandey SK. Artificial intelligence in marketing: A systematic literature review. Int J Mark Res. 2022;64(1):38–68. https://doi.org/10.1177/14707853211018428.
Richter L, Lehna M, Marchand S, Scholz C, Dreher A, Klaiber S, Lenk S. Artificial intelligence for electricity supply chain automation. Renew Sustain Energy Rev. 2022;163:112459. https://doi.org/10.1016/j.rser.2022.112459.
Alzubaidi M, Agus M, Alyafei K, Althelaya KA, Shah U, AbdAlrazaq A, Anbar M, Makhlouf M, Househ M. Toward deep observation: A systematic survey on artificial intelligence techniques to monitor fetus via ultrasound images. iScience. 2022;25(8):104713. https://doi.org/10.1016/j.isci.2022.104713.
Ahanger TA, Aljumah A, Atiquzzaman M. State-of-the-art survey of artificial intelligent techniques for iot security. Comput Netw. 2022;206:108771. https://doi.org/10.1016/j.comnet.2022.108771.
Kareem A, Liu Liu, Sant P. Review on Pneumonia Image Detection: A Machine Learning Approach. Hum-Cent Intell Syst. 2022;2:31–43. https://doi.org/10.1007/s44230-022-00002-2.
He L, Wang X, Chen H, et al. Online Spam Review Detection: A Survey of Literature. Hum-Cent Intell Syst. 2022;2:14–30. https://doi.org/10.1007/s44230-022-00001-3.
Bin Sulaiman R, Schetinin V, Sant P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum-Cent Intell Syst. 2022;2:55–68. https://doi.org/10.1007/s44230-022-00004-0.
Islam L, Islam MR, Akter S, et al. Identifying Heterogeneity of Diabetics Mellitus Based on the Demographical and Clinical Characteristics. Hum-Cent Intell Syst. 2022;2:44–54. https://doi.org/10.1007/s44230-022-00003-1.
Munawar HS, Hammad AWA, Waller ST, et al. Modern Crack Detection for Bridge Infrastructure Maintenance Using Machine Learning. Hum-Cent Intell Syst. 2022;2:95–112. https://doi.org/10.1007/s44230-022-00009-9.
Islam MT, Hasib KM, Rahman MM, et al. Convolutional Auto-Encoder and Independent Component Analysis Based Automatic Place Recognition for Moving Robot in Invariant Season Condition. Hum-Cent Intell Syst. 2023;3:13–24. https://doi.org/10.1007/s44230-022-00013-z.
Munawar HS, Hammad AWA, Waller ST, et al. Road Network Detection from Aerial Imagery of Urban Areas Using Deep ResUNet in Combination with the B-snake Algorithm. Hum-Cent Intell Syst. 2023;3:37–46. https://doi.org/10.1007/s44230-023-00015-5.
Zhang S, Zheng Y, Li T. Social Relationship Link Inference Based on Graph Convolutional Networks. Hum-Cent Intell Syst. 2023;3:47–55. https://doi.org/10.1007/s44230-023-00016-4.
Hassan MM, Hassan MM, Mollick S, et al. A Comparative Study, Prediction and Development of Chronic Kidney Disease Using Machine Learning on Patients Clinical Records. Hum-Cent Intell Syst. 2023;3:92–104. https://doi.org/10.1007/s44230-023-00017-3.
Inamdar S, Chapekar R, Gite S, et al. Machine Learning Driven Mental Stress Detection on Reddit Posts Using Natural Language Processing. Hum-Cent Intell Syst. 2023;3:80–91. https://doi.org/10.1007/s44230-023-00020-8.
Wahid A, Breslin JG, Intizar MA. TCRSCANet: Harnessing Temporal Convolutions and Recurrent Skip Component for Enhanced RUL Estimation in Mechanical Systems. Hum-Cent Intell Syst. 2024. https://doi.org/10.1007/s44230-023-00060-0.
Abiyev R, Adepoju J. Automatic Food Recognition Using Deep Convolutional Neural Networks with Self-attention Mechanism. Hum-Cent Intell Syst. 2024. https://doi.org/10.1007/s44230-023-00057-9.
Khalil A, Jarrah M, Aldwairi M. Hybrid Neural Network Models for Detecting Fake News Articles. Hum-Cent Intell Syst. 2023. https://doi.org/10.1007/s44230-023-00055-x.
Apostolidis K, Kokkotis C, Moustakidis S, et al. Machine Learning Algorithms for the Prediction of Language and Cognition Rehabilitation Outcomes of Post-stroke Patients: A Scoping Review. Hum-Cent Intell Syst. 2023. https://doi.org/10.1007/s44230-023-00051-1.
Abuhoureyah F, Wong YC, Isira ASBM, et al. CSI-Based Location Independent Human Activity Recognition Using Deep Learning. Hum-Cent Intell Syst. 2023;3:537–57. https://doi.org/10.1007/s44230-023-00047-x.
Baswaraju S, Maheswari VU, Chennam KK, et al. Future Food Production Prediction Using AROA Based Hybrid Deep Learning Model in Agri-Sector. Hum-Cent Intell Syst. 2023;3:521–36. https://doi.org/10.1007/s44230-023-00046-y.
Goswami P, Hossain ABMA. Street Object Detection from Synthesized and Processed Semantic Image: A Deep Learning Based Study. Hum-Cent Intell Syst. 2023;3:487–507. https://doi.org/10.1007/s44230-023-00043-1.
Yaqoob A, Musheer-Aziz R, verma NK. Applications and Techniques of Machine Learning in Cancer Classification: A Systematic Review. Hum-Cent Intell Syst. 2023;3:588–615. https://doi.org/10.1007/s44230-023-00041-3.
Asmussen CB, Møller C. Smart literature review: a practical topic modelling approach to exploratory literature review. J Big Data. 2019;6(1):1–18. https://doi.org/10.1186/s40537-019-0255-7.
Hyndman RJ, Khandakar Y. Automatic time series forecasting: the forecast package for r. J Stat Softw. 2008;27:1–22.
Castle JL, Doornik JA, Hendry DF (2011) Evaluating automatic model selection. J Time Ser Econom 3(1). https://doi.org/10.2202/1941-11928.1097.
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: Scipy, Proceedings of the 13th Python in science conference. ICML workshop on AutoML, vol 9, pp 32–37. https://doi.org/10.25080/Majora14bd3278-006.
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, p 25.
Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems, p 24.
El Yafrani M, Scoczynski M, Sung I, Wagner M, Doerr C, Nielsen P (2021) MATE: a model-based algorithm tuning engine. In: Zarges C, Verel S (eds) Evolutionary computation in combinatorial optimization. EvoCOP 2021. Lecture notes in computer science(), vol 12692. Springer, Cham, pp 51–67. https://doi.org/10.1007/978-3-030-72904-2_4.
Uhlig T, Rose O, Rank S. JARTA—a java library to model and fit autoregressive-to-anything processes. In: 2013 Winter Simulations Conference (WSC). IEEE; 2013. p. 1203–11. https://doi.org/10.1109/WSC.2013.6721508.
Mayer T, Uhlig T, Rose O. An open-source discrete event simulator for rich vehicle routing problems. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE; 2016. p. 1305–10. https://doi.org/10.1109/ITSC.2016.7795725.
Kitchenham B, Charters S et al. Guidelines for performing systematic literature reviews in software engineering. Technical report EBSE 2007-001, Keele University and Durham University Joint Report, 2007.
Chen J, Zhuge H. Automatic generation of related work through summarizing citations. Concurr Comput: Pract Experience. 2019;31(3):e4261. https://doi.org/10.1002/cpe.4261.
Heffernan K, Teufel S. Identifying problems and solutions in scientific text. Scientometrics. 2018;116:1367–82. https://doi.org/10.1007/s11192-0182718-6.
Kathiria P, Pandya V, Arolkar H, Patel U (2023) Performance analysis of document similarity-based DBSCAN and k-means clustering on text datasets. In: Singh Y, Singh PK, Kolekar MH, Kar AK, Gonçalves PJS (eds) Proceedings of international conference on recent innovations in computing. Lecture notes in electrical engineering, vol 1001. Springer, Singapore, pp 57–69. https://doi.org/10.1007/978-981-19-9876-8_5.
Fränti P, Mariescu-Istodor R. Soft precision and recall. Pattern Recogn Lett. 2023;167:115–21. https://doi.org/10.1016/j.patrec.2023.02.005.
Mutar MT, Majid M, Ibrahim MJ, Obaid A, Alsammarraie AZ, Altameemi E, Kareem TF. Transfer learning with different modified convolutional neural network models for classifying digital mammograms utilizing local dataset. Gulf J Oncol. 2023;1(41):66–71.
BarredoArrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R, Herrera F. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Inf Fusion. 2020;58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012.
Alicioglu G, Sun B. A survey of visual analytics for explainable artificial intelligence methods. Comput Graph. 2022;102:502–20. https://doi.org/10.1016/j.cag.2021.09.002.
Dazeley R, Vamplew P, Foale C, Young C, Aryal S, Cruz F. Levels of explainable artificial intelligence for human-aligned conversational explanations. Artif Intell. 2021;299: 103525. https://doi.org/10.1016/j.artint.2021.103525.
AI, HLEG. High-Level Expert Group on Artificial Intelligence. Ethics guidelines for trustworthy ai. European Commision, 2019: 6. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai. Accessed 15.04.2023.
Adadi A, Berrada M. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access. 2018;6:52138–60. https://doi.org/10.1109/ACCESS.2018.2870052.
Islam MR, Ahmed MU, Barua S, Begum S. A systematic review of explainable artificial intelligence in terms of different application domains and tasks. Appl Sci. 2022;12(3):1353. https://doi.org/10.3390/app12031353.
Li B, Qi P, Liu B, Di S, Liu J, Pei J, Yi J, Zhou B. Trustworthy ai: From principles to practices. ACM Comput Surv. 2023;55(9):1–46. https://doi.org/10.1145/3555803.
Rojat T, Puget R, Filliat D, Del Ser J, Gelin R, Díaz-Rodríguez N (2021) Explainable artificial intelligence (XAI) on time series data: A survey. CoRR. arXiv preprint. https://doi.org/10.48550/ARXIV.2104.00950.
Black S, Biderman S, Hallahan E, Anthony Q, Gao L, Golding L, He H, Leahy C, McDonell K, Phang J, Pieler M, Prashanth US, Purohit S, Reynolds L, Tow J, Wang B, Weinbach S (2022) Gpt-neox20b: An open-source autoregressive language model. arXiv preprint. https://doi.org/10.48550/arXiv.2204.06745.
Wang B, Komatsuzaki A. GPT-J-6B: A 6 billion parameter autoregressive language model. 2021.https://github.com/kingoflolz/meshtransformer-jax. Accessed 10.11.2023.
Cai L, Gao J, Zhao D (2020) A review of the application of deep learning in medical image classification and segmentation. Ann Transl Med 8(11):713. https://doi.org/10.21037/atm.2020.02.44.
Acknowledgements
All authors read and agreed to the published version of the manuscript.
Funding
No funding was received for this work.
Author information
Authors and Affiliations
Contributions
Deniz Kenan Kılıç: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Alex Elkjær Vasegaard: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Aurélien Desoeuvres: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Peter Nielsen: Conceptualization, Methodology, Validation, Investigation, Data curation, Writing – review & editing, Supervision.
Corresponding author
Ethics declarations
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1 Scopus and OpenAI Results
In Table 6, Scopus results are shown for the initial query stated in Section 4.3. The first column shows the title and DOI information. The second and third columns stand for the electronic identifier and publication year, respectively. C., R., and P. in the last three columns stand for citation number, relevancy value, and popularity value respectively. As mentioned in Section 4.3, articles highlighted in red are manually deleted, and the orange ones that use the AI method are related to the use case but do not specify it in the title and abstract.
In Table 7, OpenAI results for the initial prompt and ground truth methods extracted manually are shown with performance determinants. False-found methods are highlighted in red, and true-found methods are highlighted in green. These performance determinants are utilized to calculate performance metrics stated in Section 4.4.1.
Appendix 2 OpenAI Performance Results
Below, OpenAI performance results for 55 articles are listed in the same order as Table 7.
-
TP = [1, 3, 2, 3, 1, 2, 1, 2, 2, 2, 3, 1, 1, 3, 1, 1, 2, 1, 4, 2, 0, 2, 1, 3, 5, 1, 1, 3, 1, 1, 1, 2, 0, 6, 2, 1, 2, 3, 1, 1, 1,2, 1, 2, 2, 3, 2, 6, 2, 2, 2, 4, 2, 1, 1]
-
FP = [0, 1, 0, 0, 0, 2, 0, 2, 1, 0, 0, 1, 2, 2, 0, 1, 3, 1, 0, 0, 3, 0, 1, 2, 3, 1, 0, 0, 0, 0, 1, 2, 0, 0, 1, 2, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 2, 0, 1, 1, 1, 1, 5, 2]
-
FN = [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0]
-
Precisions = [1, 0.75, 1, 1, 1, 0.5,1, 0.5, 0.6667, 1, 1, 0.5, 0.3334, 0.6, 1, 0.5, 0.4, 0.5, 1, 1, 0, 1, 0.5, 0.6, 0.625, 0.5, 1, 1, 1, 1, 0.5, 0.5, 0, 1, 0.6667, 0.3334, 1, 1, 0.5, 0.5, 1, 1, 0.5, 1, 0.6667, 0.75, 0.6667, 0.75, 1, 0.6667, 0.6667, 0.8, 0.6667, 0.1667, 0.3334] and Average(Precisions) = 0.7111
-
Recalls = [1, 1, 1, 1, 1, 1, 1, 0.6667, 1, 1, 1, 1, 0.5, 1, 1, 1, 1, 1, 0.8, 1, 0, 1, 1, 1, 0.8334, 1, 1, 1, 1, 1, 1, 1, 0, 0.75, 1, 1, 1, 1, 1, 1, 1, 0.6667, 1, 1, 1, 1, 1, 0.8571, 1, 1, 1, 0.6667, 1, 1, 1] and Average(Recalls) = 0.9226
-
F1-score = [1, 0.8571, 1, 1, 1, 0.6667, 1, 0.5714, 0.8, 1, 1, 0.6667, 0.4, 0.75, 1, 0.6667, 0.5714, 0.6667, 0.8889, 1, 0, 1, 0.6667, 0.75, 0.7143, 0.6667, 1, 1, 1, 1, 0.6667, 0.6667, 0, 0.8571, 0.8, 0.5, 1, 1, 0.6667, 0.6667, 1, 0.8, 0.6667, 1, 0.8, 0.8571, 0.8, 0.8, 1, 0.8, 0.8, 0.7273, 0.8, 0.2857, 0.5] and Average(F1-score) = 0.7775
If all 55 results are considered as a single result pool, then there are 108 TPs, 51 FPs, and 12 FNs. Then precision, recall, and F1-score values are 0.6793, 0.9, and 0.7742, respectively.
When the performance metrics are examined, the OpenAI presents good performance for the manually generated ground truths.
Appendix 3 OpenAI Sensitivity Results
In Tables 8, 9 and 10, missing and extra/different methods are given regarding the initial prompt. If there is no missing or extra/different method name, it is expressed by “X”.
Appendix 4 Extracted AI Methods and Post-Analyzes
In Table 11, how many times a method is mentioned in the articles is found according to years where the “Papers” column stands for this. The relevancy and popularity sums are written next to the “Papers” column where “Rel.” and “Pop.” stand for relevancy and popularity, respectively. The total number of articles used is 55 that are not filtered and not general in Table 6. Methods are classified by their occurrence number and their similar ones as described below. Of course, the classification of methods can be done in different ways and at different levels. They are classified to get a more compact overview of the results. The “true general found” results are not included. The methods that are “true found” and mentioned in at least 2 articles are shown.
In the classes listed below, after each method, it is written that it is employed in how many papers total, how many times it is used in which years, and the total relevancy and popularity metrics according to these years.
-
Class 1 (Artificial neural networks): Paraconsistent Artificial Neural Network (PANN) (× 1; 2014, 0, 0.6), Artificial Neural Network (ANN) (× 6; 2014, 1, 2.7; 2015, 1, 1; 2016, 0, 6; 2017, 1, 0.7143; 2021, 0, 4.6667; 2023, 0, 2), Probabilistic Neural Network (PNN) (× 2; 2015, 0, 0.4444; 2017, 0, 3.2857), Multi-Layer Feed-forward Neural Network (MFFNN) (× 1; 2016, 0, 1.125), Neural Networks (× 6; 2017 × 2, 1, 3.8572; 2018, 0, 4; 2019, 0, 0.2; 2020, 1, 0.25; 2023, 0, 0), Perceptron (× 1; 2020, 1, 3.75), Back-Propagation Perceptron (× 1; 2020, 1, 3.75), Fully Connected Network (FCN) (× 1; 2022, 0, 1.5).
-
Class 2 (Deep learning methods): Deep learning (× 15; 2019, 0, 0.2; 2020 × 3, 1, 27.75; 2021 × 3, 1, 4.3334; 2022 × 3, 1, 3.5; 2023 × 5, 1, 2), Generative Adversarial Network (GAN) (× 2; 2019, 0, 0.2; 2020, 1, 3.75), ResNet (× 1; 2020, 1, 3.75), ResNet50 (× 1; 2021, 0, 4.6667), AlexNet (× 2; 2020, 1, 3.75; 2021, 0, 4.6667), U-Net (× 2; 2021, 1, 0; 2022, 0, 1.5), Convolutional Neural Network (CNN) (× 4; 2021, 0, 4.6667; 2022, 0, 2; 2023 × 2, 1, 0), 2D U-Net (× 1; 2021, 0, 2.3333), 3D U-Net (× 1; 2021, 0, 2.3333), Deep Reinforcement Learning (DRL) (× 1; 2022, 0, 1), Convolutional Encoder-Decoder Architecture (× 1; 2022, 0, 1), Convolution algorithm (× 1; 2022, 0, 0), Deep Convolutional Neural Network (DCNN) (× 1; 2023, 0, 0), NASNetLarge (× 1; 2023, 1, 0), DenseNet169 (× 1; 2023, 1, 0), InceptionResNetV2 (× 1; 2023, 1, 0), EfficientNets (× 1; 2023, 0, 2), Conditional Generative Adversarial Network (cGAN) (× 1; 2023, 0, 0).
-
Class 3 (Tree-based methods): Random Forest (× 2; 2016, 1, 2.125; 2018, 0, 4), Decision Trees (× 1; 2016, 0, 4.125), Extra Trees Classifier (× 1; 2022, 0, 8.5).
-
Class 4 (Optimization methods): Genetic Algorithm (× 1; 2014, 1, 2.7), Sequential Minimal Optimization (SMO) (× 1; 2016, 0, 1.75), Ant Colony Optimization (ACO) (× 1; 2023, 1, 1).
The cases are counted where the same method is used between 2014–2023, and all time. Relevancy and popularity sums are calculated for a specific method regarding the related articles. In other words, the first column (“Papers”) states how many articles use the method in total. The second and third columns show the sum of relevancy and popularity values for these articles, respectively.
If all the time is considered, class 1, class 2, class 3, class 4, “K-nearest neighbors (KNN)”, “support vector machine (SVM)”, “K-means”, “grey level co-occurrence matrix (GLCM)” and “logistic regression” are the ones that are mentioned in at least 2 articles. Sorting the total number of papers using these methods from largest to smallest is as follows:
-
Papers: class 2 > class 1 > “SVM” > class 3 > class 4 = “KNN” > “K-means” = “logistic regression” = “GLCM”
-
The relevancy values for all times are sorted as
-
Relevancy: class 2 > class 1 > class 4 = “SVM” = “KNN” > class 3 = “GLCM” > “K-means” = “logistic regression”.
-
On the other hand, the sorting of popularity values for all time is given below and it indicates the highest value belongs to class 2.
-
Popularity: class 2 > class 1 > class 3 > “logistic regression” > “SVM” > class 4 > “GLCM” > “KNN” > “K-means”.
From the above methods, it is seen that the number of implementing, and popularity trends of class 1 and class 2 have been increasing over the years. For this reason, tests can be started with AI methods in these classes in a similar problem domain.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kılıç, D.K., Vasegaard, A.E., Desoeuvres, A. et al. A Semi-Automated Solution Approach Recommender for a Given Use Case: a Case Study for AI/ML in Oncology via Scopus and OpenAI. Hum-Cent Intell Syst (2024). https://doi.org/10.1007/s44230-024-00070-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44230-024-00070-6