Entity linking systems for literature reviews

Marrone, Mauricio; Lemke, Sascha; Kolbe, Lutz M.

doi:10.1007/s11192-022-04423-5

Entity linking systems for literature reviews

Open access
Published: 28 June 2022

Volume 127, pages 3857–3878, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

Entity linking systems for literature reviews

Download PDF

2233 Accesses
5 Citations
Explore all metrics

Abstract

Computer-assisted methods and tools can help researchers automate the coding process of literature reviews and accelerate the literature review process. However, existing approaches for coding textual data do not account for lexical ambiguity; that is, instances in which individual words have multiple meanings. To counter this, we developed a method to conduct rapid and comprehensive analyses of diverse literature types. Our method uses entity linking and keyword analysis and is embedded into a literature review framework. Next, we apply the framework to review the literature on digital disruption and digital transformation. We outline the method’s advantages and its applicability to any research topic.

Analysing Structured Scholarly Data Embedded in Web Pages

Searching Systematically and Comprehensively

ASH: A New Tool for Automated and Full-Text Search in Systematic Literature Reviews

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In the last decade, there has been a rise in the availability of digital data from academic publications, due to a continuous growth in the number of publications (Fortunato et al., 2018). Many academics find it increasingly challenging to stay up to date with the latest research, especially in disciplines where hundreds or thousands of new publications are released yearly (Nakagawa et al., 2019). Hence, there has been an increased focus on novel methods and tools that help researchers automate the coding process within literature reviews and accelerate the literature review process (Westgate, 2019).

Several computer-assisted methods and tools have emerged to help researchers automate and accelerate the content analysis process, such as Leximancer (Smith, 2003), topic modelling (statistical modelling for discovering abstract “topics” which appear in the documents) or Bibliometrix (Aria & Cuccurullo, 2017), an R package for bibliometric and keyword analysis. These methods and tools aim to increase the researcher’s ability to study relationships in the literature and help with an improved conceptual and theoretical understanding (Barry, 1998).

Automated tools commonly used in literature reviews analyse textual data at a lexical level (see Smith & Humphreys, 2006; Blei et al., 2003; Marrone & Hammerle, 2016; Bonaccorsi et al., 2021), meaning that they disregard the semantic relations between words. The concern with carrying out any analysis at the lexical level is that homographs (i.e., words with the same spelling but different meanings) are not disambiguated. For example, the word ‘mercury’ could refer to the planet Mercury, the Roman god Mercury, the chemical element Mercury, the singer Freddie Mercury, or other instances of ‘mercury’. Additional concerns with carrying out the analysis at a lexical level are that equivalent terms are not merged (e.g., IoT and “Internet of Things”); language or naming variations over time are not recognized (e.g., Leningrad and Saint Petersburg); and spelling variations are not accounted for (e.g., visualisation and visualization). Therefore, the results of using existing computer-assisted methods and tools are often ambiguous and must be manually modified, which can lead to subjective interpretation (Indulska et al., 2012; Sotiriadou et al., 2014). To overcome these limitations, researchers have started to call for solutions that help overcome these issues and to address what is referred to as “lexical ambiguity” (e.g., Zupic & Čater, 2015).

In this article, we respond to the call by proposing the entity linking approach to literature reviews, which offers a way to overcome issues associated with lexical ambiguity. Entity linking systems disambiguate words and phrases, allowing for the analysis of literature that uses varying terminologies. Unlike common content analysis tools, entity linking systems use a knowledge base (which can be thought of as a standardized taxonomy) to assign human-readable labels to concepts (Shen et al., 2014), which may lead to more accurate, reproducible and reliable results.

Literature reviews often seek to identify topic clusters and their evolution, analyse research trends and hotspots of research, and map interaction between different disciplines or stakeholders (e.g., Kolev et al., 2019; Meuser et al., 2016; Patriotta, 2020). Such endeavours would benefit from the use of entity linking systems, as they provide a more precise identification of themes mentioned in the literature. Additionally, the use of entity linking may enable researchers to identify unobserved connections (Preiss & Stevenson, 2016) as well as provide a breakthrough in the ability to explore grey or interdisciplinary literature, or literature with large numbers of documents (Navigli, 2009).

The article is structured as follows. In the following section, we suggest entity linking systems to overcome the limitations of automated computer-assisted methods and tools. Next, we embed our approach, based on entity linking and keyword analysis, into a literature review framework. Additionally, we develop guidelines for automating themes, which scholars can apply to their topic of interest. Results section focuses on highlighting the advantages of our approach, while the Discussion section elaborates on additional applications of the approach.

Theoretical background

Computer-assisted automated content analysis tools and techniques help researchers with storing (bringing data together), indexing (searching for themes or categories in the data), retrieving (identifying links or concordances between disparate sections of data) (Maclaran & Catterall, 2002) and visualizing data (Paulus et al., 2013). In literature reviews, tools such as Leximancer (Smith, 2003), MaxQDA (Kuckartz & Rädiker, 2019), QDAMiner (Lewis & Maas, 2007), as well as bibliometric tools such as CitNetExplorer (Van Eck & Waltman, 2017) and Bibliometrix (Aria & Cuccurullo, 2017), are commonly used in content extraction and analysis (Bandara et al., 2015). Additionally, Natural Language Processing (NLP) techniques, such as topic modelling using Latent Dirichlet Allocation (LDA, e.g., Antons et al., 2019; Hannigan et al., 2019) and Latent Semantic Analysis (LSA, e.g., George et al., 2016; Larsen & Bong, 2016), have been used to review literature. Such approaches provide automated analyses based on the statistical properties of texts (Garrett-Jones et al., 2010) and allow the exploration of vast quantities of textual data (Sridhar, 2015). However, these approaches require researchers to arbitrarily specify the number of categories or topics the algorithm should identify (Alghamdi & Alfalqi, 2015) and require the texts to be on only one topic, as well as written in the same style (Papadimitriou et al., 2000).

In general, the approaches listed above rely on the frequency with which words and phrases appear in texts, not the context in which they appear, leading to unsatisfactory results. As stated by Zupic and Čater (2015, p. 432), “words can appear in different forms and can have different meanings”. According to Hobolt and Klemmemsen (2005), the issues of lexical ambiguity, whereby individual words may have multiple literal meanings, and context, for which the meaning of a word depends intrinsically on the presence or absence of other words that surround it, are significant problems associated with existing computer-aided methods. Examples include homographs, which are those mentions that have the same name but are linkable to different entities (i.e., they have different meanings; for example, ‘bank’, which could relate to money, river, or memory); synonyms, which are mentions that have different names but are linked to the same entity (i.e., they have the same meaning; for example, ‘car’, ‘auto’, ‘automobile’); acronyms (e.g., CRM and Customer Relationship Management), aliases (e.g., resource-based theory and resource-based view), and name variations over time (e.g., Astana and Nur-Sultan). Literature reviews and content analysis would strongly benefit from methods and tools that can reduce different representations of concepts into one expression, hence decreasing lexical ambiguity.

Given the limitations of existing computer-assisted methods to address lexical ambiguity, disciplines such as Biology and Medicine have begun to apply other NLP techniques, namely entity linking (e.g., Crichton et al., 2017; Giorgi & Bader, 2018), to overcome this issue. Entity linking has been used to extract entities in academic papers (e.g. Cifariello et al., 2019; Cai et al., 2019), yet, no guidelines have been established on applying an entity linking approach to assist in reviewing the literature.

In this paper, we propose using entity linking systems to help with content analysis of literature reviews. Our approach uses entity linking and keyword analysis. Entity linking enables researchers to address challenges involving homographs and synonyms, enabling a higher level of textual analysis (Wu et al., 2018). Entity linking has arisen to link single terms, phrases, acronyms or known aliases in a text (mentions) to relevant entries (entities) in a knowledge base (Cornolti et al., 2013), helping to reduce different representations of concepts into one expression.

While entity linking reveals entities in a text, we subsequently implement keyword analysis to determine which entities are statistically salient in one dataset compared to another (Scott & Tribble, 2006). Essentially, we test which entities occur unusually often in one dataset of interest compared to the dataset’s reference dataset. Keyword analysis examines the significance of the entities and adjusts for different sizes of datasets (Crawford et al., 2006). Through this process, the researcher can identify similarities and differences in the entities between the datasets.

Workflow and guidelines

This article aims to propose entity linking as an approach to automate coding for literature reviews. To do this, we embed our contribution within a well-structured literature review framework, as presented by Templier and Paré (2015).

We use these guidelines to better showcase and highlight our contribution. While their proposed guideline is well-structured, the phases they outline are similar to those that have been recommended to conduct a literature review. For example, Snyder (2019) suggests literature reviews are carried out in the following phases: (1) design, (2) conduct, (3) analysis, (4) structuring and (5) writing the review. Linnenluecke et al. (2020) suggest the following steps: (1) identification of literature for inclusion, (2) data cleaning, (3) analysis and (4) synthesis and (5) presentation of results. Finally, Brendel et al. (2020) compare ten literature review methodology articles and summarise the literature review process into six steps: (1) preparation, (2) defining scope, (3) literature search, (4) analysis, (5) synthesis, and (6) discussion. Similarly, Templier and Paré (2015) outline six steps needed to conduct a literature review; namely, (1) formulating the problem, (2) searching the literature, (3) screening for inclusion, (4) assessing quality, (5) extracting data, and (6) analysing and synthesizing data. These phases are presented in Fig. 1. With this as our basis, Table 1 details the steps to carry out the entity linking approach, to guide researchers who intend to apply the method to their area of interest.

Table 1 Step-by-step instructions for literature reviews

Full size table

To illustrate this approach, in the following section, we apply the approach to the analysis of academic and practitioner literature from two emerging fields: digital disruption and digital transformation.

Illustrative example

We examine the literature containing the terms “digital disruption” and “digital transformation” to illustrate how the entity linking approach is carried out. We recognize that these terms overlap extensively. The terms are often used as synonyms; yet they are distinct topics (see Appendix B for examples of how the terms are used). First, definitions of digital disruption and digital transformation include common elements, such as digital technologies, change, and broad consequences for, among other things, individuals and companies. Hence, the drivers, impacts, and influenced groups are highly congruent for both fields. Second, literature reviews of each field incorporate wording from the other field. Considering digital disruption reviews, Vesti et al. (2017) listed digital transformation as a keyword. Correspondingly, Molla et al. (2016) used “transform” as a verb to describe the action of digital disruption. Research gaps may be characterized by overlooked elements or ambiguity in a field of study (Boell & Cecez-Kecmanovic, 2014). Below, we present the steps that we took when carrying out the entity linking approach.

Step 1: formulating the problem

This step requires the author to define the review’s objective(s) and justify the review's need. There are two sub-steps; specifying the primary goal of the review and defining its key concepts and boundaries (Templier & Paré, 2015).

As explained above, the primary goal of our representative study is to clarify the ambiguity of how digital disruption and digital transformation differ from one another. The boundaries of the review enlist both academic and practitioner literature as materials to be reviewed.

Step 2: searching the literature

Step 2 involves developing guidelines to identify relevant studies. The sub-steps include specifying the search procedure, using a combination of data sources and search approaches, avoiding restrictions not based on the research question(s), and adopting strategies to minimize publication bias.

According to Levac et al. (2010), researchers should identify the sources and queries used in their search. We, therefore, provide our search procedure in Appendix Table C1, summarizing the academic and practitioner data sources used. The following paragraphs present our justification for our data sources and search approach.

After consulting Schryen (2015), we decided to source literature from Scopus, Factiva and the ABI/Inform collection from ProQuest. These outlets all incorporate a wide range of literature, which allows the collection of different types of literature to extract a large dataset. The data corpus of the study consisted of academic and practitioner articles. Academic literature was downloaded from Scopus and deliberately included conference proceedings; given the recency of the key topics, conference proceedings might reveal yet unpublished ideas (González-Albo & Bordons, 2011). Conversely, the practitioner datasets included trade publications and news articles. Trade publications were retrieved from ProQuest’s ABI/inform collection and Factiva. Factiva sources were limited to CIO Australia, CIO New Zealand, Forbes, Wired, and Computerworld USA, Australia and New Zealand. We also sourced news articles from the Financial Times, Wall Street Journal, New York Times and the Washington Post. We decided to include these because practitioners inform themselves by reading news media (Tienari et al., 2003).

Additionally, they might offer other insights than those found in trade journals (Vaara & Tienari, 2002). By including a variety of literature, we aim to enhance practical relevance and reflect the whole picture of the topics in question (Banks et al., 2016). Moreover, a limitation often expressed regarding bibliometric methods is their inability to include grey literature (Nakagawa et al., 2019); hence, grey literature allows us to test the proposed approach. Finally, practitioners are often ahead of scholars in discussing emerging topics (Benbasat et al., 1987) and are currently highly active in conversations around digital disruption and transformation (Vesti et al., 2017).

Researchers may experience the problem that their topics of interest are often intertwined with other, established concepts. In our case, both topics are commonly mixed with the theory of disruptive innovation or the concept of organizational transformation. The search strings, therefore, targeted mutual exclusiveness in two ways. First, the query of both academic datasets excluded trade publications, while the practitioner datasets excluded academic publications. Second, when searching for digital disruption, the query excluded publications that included the term ‘digital transformation’ and vice versa. Hence, overlapping the two topics might be avoided, and themes could be developed individually for one term. Themes are recurring patterns within datasets (Braun & Clarke, 2006) and play a crucial role in our method, as we will explain in the following steps.

Step 3: screening for inclusion

The main objective is to evaluate the applicability of the studies identified in step 2, to include or exclude them from the analysis. The sub-steps include specifying the screening and selection procedures in detail; conducting parallel independent assessments of the studies for inclusion; using an inclusion criterion that reflects the research questions; being explicit about duplicate studies; and including studies from reputable resources (Templier & Paré, 2015).

Researchers should ideally evaluate every academic article to identify if it contains content that answers the research question (Templier & Paré, 2018). To avoid subjectivity, we suggest a parallel independent assessment by multiple researchers at this stage. Subsequently, the researchers should compare their results and remove articles that do not match the inclusion criteria. Kitchenham and Charters (2007) recommend that two researchers independently select academic articles that fulfil the quality criteria. Additionally, duplicate studies should be appropriately managed. Overall, to increase transparency, the researcher should clarify which sources are included and why these are included (Bandara et al., 2015).

In our application of the approach, the screening and selection criterion was that academic abstracts had to refer to a digital, technological context describing change. Two researchers independently carried out this analysis. We recommend spending additional time evaluating the literature if the dataset is small, because of the possibility of results being skewed by a small number of off-topic articles. In turn, we assume that relevant themes will emerge from a larger dataset, even when not closely evaluated, because repeated patterns in the data should outscore articles that are off-topic.

Before downloading the selected articles, we checked for duplicates. The results may be incorrect if this step is skipped, as duplicate articles lead to an inflated number of code frequencies. For practitioner and academic datasets, identical and fuzzy duplicates were removed using an Excel Add-in by Ablebits and subsequent manual cross-checking. In total, the number of publications was reduced from 4475 to 3613. The remaining articles were then retrieved as follows. The academic content from Scopus was downloaded with the title and abstract.

In contrast, the practitioner content from ProQuest and Factiva included the complete text, as practitioner sources often do not provide an abstract. The number of articles downloaded from each source is shown in Appendix C, Table C1. The four dataset downloads were then comprehensively cleaned to avoid the automatic coding of article-specific information, such as author or publication details and the name of the practitioner journal, which is often included in the full text of the article. Given that our analysis wished to incorporate as many publications as possible, we did not exclude articles based on their publication outlet.

By the end of this task, we had created four text documents: academic digital disruption (DDA), practitioner digital disruption (DDP), academic digital transformation (DTA) and practitioner digital transformation (DTP).

Step 4: assessing quality

The step of Assessing Quality involves the assessment of the methodological quality of studies. The sub-steps are the usage of recognized quality assessment tools and considering the quality assessment in the selection of studies or interpretation of findings. Given that our study included both academic and practitioner literature, we did not assess the rigor of the included papers.

Step 5: extracting data

This step involves the gathering of information from each included paper. The three sub-steps specify the type of information extracted, the establishment and use of a procedure for the extraction of data, and parallel independent data extraction (Templier & Paré, 2015).

The step of extraction is the heart of the method outlined here. In our example, we wish to extract the entities mentioned in titles and abstracts of academic articles and titles and full texts of practitioner articles. After cleaning the data, it was automatically coded using the entity linking system TAGME. The program is endorsed, especially for comparing datasets of different lengths (Marrone & Hammerle, 2017; Piccinno & Ferragina, 2014). TAGME allows the detection and disambiguation of the codes in a text and links these codes to Wikipedia entries (Hasibi et al., 2016; Piccinno & Ferragina, 2014).

Researchers may use the configuration option for TAGME, which provides flexibility in how TAGME operates. Cuzzola et al. (2015) provide an easy-to-follow guide in their publication. Hence, the values for the area-under-the-curve F-measure were set as long_text 10, ε 0.427, and ρ 0.1613. These values help to define the annotation process. The value of long_text specifies the shifting window (the number of surrounding codes that are used to annotate a particular mention in the text); the value of epsilon defines whether the annotation process will favour the context (a lower value) or the most common surrounding codes (a higher value) more; and the value of ρ is used to indicate those annotations above and below a given confidence score threshold. These confidence scores are assigned by TAGME and represent the likelihood that the annotations are appropriate, given their context in the input text (Cuzzola et al., 2015). At the end of this phase, the researcher receives one list of annotations per dataset. In our case, we processed each of the four datasets separately, resulting in four lists of entities.

With the aim of evaluating the validity of our results, 50 articles from each of the four datasets and the extracted entities were sent to two independent reviewers for further examination. In 93% of cases, the reviewers agreed with the extracted entities.

Step 6: analysing and synthesizing data

The final step is analysing and synthesizing data. The three final sub-steps are reporting the appropriate standards for synthesising the results; describing the logical reasoning and justification behind the findings; and summarizing the studies in detail.

Our application of this approach was intended to identify the differences between digital disruption and digital transformation. To do this, we applied a keyword analysis. Keyword analysis has previously been used in bibliometric studies (e.g. Li et al., 2017; Xu et al., 2021). Dunning (1993) argues that keyword analysis using the log-likelihood ratio is more robust than chi-square when accounting for size discrepancies. Hence, AntConc, a program that conducts keyword analysis using log-likelihood measurement (Anthony, 2005), was applied as the digital transformation datasets were larger than those for digital disruption.

We first conducted two keyword analyses: one for each of the academic datasets. ‘Digital disruption by academics’ (DDA) was the reference dataset for ‘digital transformation by academics’ (DTA) and vice versa; in other words, each dataset is treated once as the normative and once as the reference dataset. The same logic was then applied to practitioners. As a result, four tables of salient themes emerged, sorted by log-likelihood ratio. The higher the log-value, the larger is the difference in frequency between mentions in the normative dataset and mentions in the reference dataset. Considering the theme, “Silicon Valley”, which was tagged in the digital disruption academic dataset with the second-highest log-value of 39.10, we derive that it is key in the context of disruption but not in transformation. The results for academic and practitioner literature are attached as Appendix D, Tables D1 and D2, respectively.

Next, the sorted theme lists are examined more closely in terms of significance and meaning. First, the critical value of 3.84 was chosen as the lower limit for the log-values. This corresponds to the commonly used p-value of 5% (Rayson et al., 2004). Moreover, each theme had to appear a minimum of five times in the dataset to be included in further analysis. This additional rule excluded themes in the dataset of academic digital disruption which were only mentioned repeatedly in one article. 23 and 29 significant themes persisted in the academic digital disruption and digital transformation datasets, respectively. However, for both practitioner datasets, the log-values were too high to retrieve a reasonable theme number by enforcing the 5% limit. We, therefore, decided to consider only the 50 highest log-value scoring themes for practitioners. In this manner, all 50 were significant to p < 0.0001 in both datasets. We cut the themes at 50 because too many themes would overwhelm the reader, while not providing substantial information in our example. Researchers should consider themes if they are significant and provide novel insights regarding the research question.

Afterwards, we considered which of the remaining themes were most relevant for distinguishing between digital disruption and transformation. For our example, the names of countries or individuals were not perceived to be valuable themes. On the other hand, company names were assumed to yield insights regarding the relationship between digital disruption and transformation. Second, a few false-positive themes emerged from insufficient dataset cleaning before applying TAGME. In total, 34 themes were eliminated. Appendix D, Table D3 displays the eliminated topics for transparency.

Our analysis aims to have a precise understanding of each theme at the end of this phase, while arranging them into an overarching structure, such as a thematic map. This way, the reader may quickly distinguish themes salient in digital disruption compared to digital transformation literature. We constructed four main themes; “Company context,” “Technology,” “Industry,” and “Company names”. “Company context” can be explained by the foci that SWOT analysis takes when looking at internal strengths and weaknesses and opportunities and threats outside the company (Dyson, 2004). We separated traditional from non-traditional technology, because we found themes that could also be tagged in 90s information technology literature compared with common digital technology themes, such as the “Internet of Things”. Next, we used the main theme, “Industry” to explain how services were dominant in disruption literature, while manufacturing prevailed for the transformation dataset. The last main theme, “Company Names”, differentiates between companies established before and after the year 2000.

Findings

The use of this approach provides many benefits sought out by researchers, including enhancing coding consistency (e.g., Weber, 1990), achieving exhaustiveness in coding (e.g., Maclaran & Catterall, 2002); improving the transparency of the logic behind the research method (e.g., Wickham & Woods, 2005), and increasing the rigor of the analytical process (e.g., Smith & Short, 2001). The entity linking approach enables larger data sets to be analysed than may be feasible with manual procedures. Additionally, using this approach also allows diverse datasets to be used, such as grey literature, which, for example, cannot be done using existing bibliometric tools.

Similarly, tools such as LDA and LSA would have their own limitations, such as their inability to deal with lexical ambiguity. This issue is amplified when looking at a variety of literatures (e.g., Marrone, 2020). Moreover, the proposed approach enables themes, such as “Industry 4.0”, to be automatically identified as key. In contrast, tools such as LDA and LSA would include the word, “industry” but eliminate the “4.0”, leading to difficulties when interpreting results. In our example, the code “knowledge management” was identified as an entity, yet, using current approaches, this would be split into two words (“knowledge” and “management”) and these two words would be treated as independent, which may lead to confusion when interpreting the results. The researcher may merge such terms, leading to possible increased difficulty in achieving reproducibility and introducing researcher bias and subjectivity. Our method automatically disambiguates the mentions in the text and includes only one (that is, “Industry 4.0” or “knowledge management”) in the resulting keyword list. Other examples of terms that were disambiguated included company names (e.g., Walt Disney Company, which was often referred to as Disney, or EMC, now known as Dell EMC), acronyms (e.g., ICT for Information and Communication Technology, ERP for Enterprise Resource Planning, IoT for Internet of Things) and technology terms (e.g., big data, cloud computing, machine learning).

Finally, the results of our method are also based on a measure of statistical significance. Tools such as LDA and LSA would identify the existence of the entities but could not identify saliency. Using a tool such as Leximancer, the researchers would need to interpret the results, introducing the risk of researcher bias (Campbell et al., 2011; Indulska et al., 2012). Additionally, statistically significant results are not shown in Leximancer (Bal et al., 2010), which suggests that there is a potential for different researchers to arrive at differing conclusions and understandings (Bal et al., 2010). For example, it might prove difficult for researchers to write a list of the most important themes for each literature, yet our method automates this process.

The application of this method in the context of digital disruption and digital transformation illustrates its usefulness. The approach helped researchers form themes and facilitated the creation of a thematic map. Summarising our results, Fig. 2 presents salient themes often used to refer to one term (e.g., digital transformation) and not the other (e.g., digital disruption). Text inside the bubbles represents main themes, while the salient themes extracted from the four literatures are in boxes. The four main findings of the paper are: (1) Digital Disruption focuses on Company External Factors, while Digital Transformation focuses on Company Internal Factors; (2) Digital Disruption starts digital, whereas Digital Transformation goes digital; (3) Digital Disruption is dominant in service industries, while Digital Transformation dominates manufacturing industries; and (4) start-ups are associated with Digital Disruption, but incumbents with Digital Transformation. A brief explanation of these results can be found in Appendix F.

Discussion

The growing availability of text sources in a wide range of formats magnifies the benefits of using computer-assisted tools to analyse literature. However, current tools and methods do not account for lexical ambiguity. By applying entity linking, which allows disambiguation, we provide a countermeasure.

In this article, we take advantage of recent developments in text mining and introduce a new method that allows for contextualization and reduces the need for manual intervention in the coding and interpretation process (Zhai et al., 2015). We outline the steps required to apply this approach to understand different literature types. Rather than simply identifying words that appear frequently, the approach allows for words or phrases with the same meaning to be grouped and attributed to the same annotation. The combination with keyword analysis allows the researcher to quickly assess which entities are statistically salient. While there is commonly a trade-off between time and comprehensiveness in reviews (Paré et al., 2015), our method saves time through automatization, which may effectively be spent in optimizing appraisal quality or reviewing entities.

There are numerous ways in which the findings of this research can be extended. One area of interest is the visualization of the results and how graphs and other visual displays may assist. Researchers should explore how graphs, such as t-distributed stochastic neighbour embedding (t-SNE; Maaten & Hinton, 2008), Uniform Manifold Approximation and Projection (UMAP; McInnes et al., 2018), or Principal Component Analysis (PCA) and Hierarchical Component Analysis (HCA, e.g., Granato et al., 2018), may help visualize the results and cluster similar terms close to each other. Additionally, such a mechanism for dimension reduction may help researchers identify the different streams of literature and terms that help connect them.

Baskerville and Myers (2009) modelled the relative strength and duration of four particular fashion waves in practitioner and academic discourses. However, they only explored a small number of topics and did not attempt to predict which topics may increase in popularity in the future. The method proposed in the present article would offer academics a unique opportunity to adopt these salient themes early in their research, thus potentially helping to make their research more relevant for practitioners.

Another suggested application might be to compare how different academic fields view and use specific terms. In his “Two Cultures” lecture, Snow (1961) spoke of a “gulf of mutual incomprehension” between scholars of the humanities and scientists, resulting from their disparate perspectives on the world. This approach might help identify such incomprehension by looking at how different fields approach the same topic (Kemp, 2009).

This method can be used to examine similar terms, for example, one with a positive connotation and one with a negative connotation, to understand how they differ. One example is gender equality (also known as sexual equality or equality of the sexes) and gender inequality (also referred to as gender imbalance). A researcher might wish to examine the use of these two terms (and their aliases) within academic research to understand in which scenarios one term is selected over the other.

Limitations

This work has several limitations. First, the pursuit of a digitized approach to thematic analysis is at odds with the original intention of the method, as defined by Braun and Clarke (2006). The authors of the original work introduce their method to support manual coding and theme development by researchers. Nevertheless, automated efforts have already worked well within the thematic analysis of social media (Abaho et al., 2018; George et al., 2016). Although we automatically derived the topics, we actively reviewed the themes and built the thematic map. Therefore, the crucial interpretive part of the thematic analysis remained the researcher’s responsibility.

Second, it was not possible to jointly download full-text versions of multiple articles in Scopus for data collection. However, Crawford et al. (2006) concluded that, compared to using abstracts, there was no advantage to relying on the full text of academic articles. Future research, nevertheless, could compare the relative effectiveness of our method in using the abstract or the full text of academic articles for tagging.

Third, our method is subject to the quality of the tool used for entity linking. In turn, the tool, in our case TAGME, depends on the catalogue used for the annotation. Still, TAGME and its catalogue, Wikipedia, have been repeatedly endorsed by various researchers (Cornolti et al., 2013; Hasibi et al., 2016). With further improvements to entity linking systems (Piccinno & Ferragina, 2014), the quality of automated coding will continue to improve. To reduce difficulties when interpreting the results, we recommend conducting an iterative manual examination of themes followed by the deletion of the false positives identified. When working in a team, the derived themes may be analysed independently and discussed after every iteration. Okoli and Schabram (2010) described two researchers independently assessing, then comparing and—in the case of contradictory views—consulting a third researcher to make the final decision. By applying this approach, researchers reduce personal bias and gain a more holistic view of the results. The iterative process of making sense of the data and eliminating inconclusive themes is to be deemed complete when the remaining themes address the research question sufficiently (Braun & Clarke, 2006).

Fourth, our quality assessment of which articles to include in the datasets was merely based on the reading of abstracts and their connection to technological change, which may lead to bias. Instead, researchers may be more selective and define multiple inclusion as well as exclusion criteria, possibly with the help of a panel of scholars (Hoon, 2013). Still, we believe that the themes that emerged represent appropriate patterns from the datasets to tackle our research question.

Finally, our validation section was limited by the fact that digital disruption and transformation are emerging terms and still lack a sound definition. Validity traditionally questions whether a method delivers on what the researcher aims to measure. We base our assessment of validity on the assumption that the literature provides us with accurate depictions of both topics. However, considering that the topics are still in the process of conceptualization, we cannot be sure that the knowledge captured is what digital disruption and transformation constitute in reality.

Conclusion

This article presents an approach, the entity linking approach, to overcome lexical ambiguity. Lexical ambiguity is an issue generally present in existing automated approaches to examining textual data. The main advantage of entity linking systems is their ability to reduce different representations of concepts into a single expression. By incorporating keyword analysis, the method ensures that any entities identified as key for a given dataset are mentioned a statistically significant number of times compared to a reference dataset.

Using entity linking and keyword analysis together yields the opportunity to compare vast datasets in a short period of time. Although the coding is automated, the final interpretive phase on the list of themes is actively carried out by the researcher. Moreover, the researcher has diverse options for using this method of data analysis. Differences and similarities across datasets, which could characterize multiple stakeholder perspectives, offer considerable flexibility. To illustrate our method, we compared academic and practitioner literatures on digital disruption and transformation, generating fruitful insights. After revealing the overlap in literature, our analysis sharpens our knowledge around the topics, guiding researchers towards the refinement of the concepts.

A step-by-step guideline is created for those who wish to carry out this type of analysis. We hope it will help researchers new to a field quickly grasp the field’s structure while achieving quantitative rigor. We encourage fellow researchers to replicate the method in a variety of settings.

References

Alghamdi, R., & Alfalqi, K. (2015). A survey of topic modeling in text mining. International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/IJACSA.2015.060121
Article Google Scholar
Anthony, L. (2005). AntConc: Design and development of a freeware corpus analysis toolkit for the technical writing classroom. In International professional communication conference (IPCC). IEEE.
Antons, D., Joshi, A. M., & Salge, T. O. (2019). Content, contribution, and knowledge consumption: Uncovering hidden topic structure and rhetorical signals in scientific texts. Journal of Management, 45(7), 3035–3076.
Article Google Scholar
Aria, M., & Cuccurullo, C. (2017). Bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975.
Article Google Scholar
Bal, A. S., Campbell, C. L., Payne, N. J., & Pitt, L. (2010). Political ad portraits: A visual analysis of viewer reaction to online political spoof advertisements. Journal of Public Affairs, 10(4), 313–328.
Article Google Scholar
Bandara, W., Furtmueller, E., Gorbacheva, E., Miskon, S., & Beekhuyzen, J. (2015). Achieving rigor in literature reviews: Insights from qualitative data analysis and tool-support. CAIS, 37, 154–204.
Article Google Scholar
Banks, G. C., Pollack, J. M., Bochantin, J. E., Kirkman, B. L., Whelpley, C. E., & O’Boyle, E. H. (2016). Management’s science–practice gap: A grand challenge for all stakeholders. Academy of Management Journal, 59(6), 2205–2231.
Article Google Scholar
Barry, C. A. (1998). Choosing qualitative data analysis software: Atlas.ti and nudist compared. Sociological Research Online, 3(3), 16–28.
Article Google Scholar
Baskerville, R. L., & Myers, M. D. (2009). Fashion waves in information systems research and practice. MIS Quarterly, 33(4), 647–662.
Article Google Scholar
Benbasat, I., Goldstein, D. K., & Mead, M. (1987). The case research strategy in studies of information systems. MIS Quarterly, 11(3), 369–386.
Article Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
MATH Google Scholar
Boell, S. K., & Cecez-Kecmanovic, D. (2014). A hermeneutic approach for conducting literature reviews and literature searches. Communications of the Association for Information Systems, 34(1), 12.
Google Scholar
Bonaccorsi, A., Chiarello, F., & Fantoni, G. (2021). Impact for whom? Mapping the users of public research with lexicon-based text mining. Scientometrics, 126(2), 1745–1774.
Article Google Scholar
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.
Article Google Scholar
Brendel, A. B., Trang, S., Marrone, M., Lichtenberg, S., & Kolbe, L. M. (2020). What to do for a Literature Review?–A Synthesis of Literature Review Practices.
Cai, C. W., Linnenluecke, M. K., Marrone, M., & Singh, A. K. (2019). Machine learning and expert judgement: Analyzing emerging topics in accounting and finance research in the Asia-Pacific. Abacus, 55(4), 709–733.
Article Google Scholar
Callon, M. (1986). Pinpointing industrial invention: An exploration of quantitative methods for the analysis of patents. In M. Callon, J. Law, & A. Rip (Eds.), Mapping the dynamics of science and technology (pp. 163–188). The Macmillan Press Ltd.
Chapter Google Scholar
Campbell, C., Pitt, L. F., Parent, M., & Berthon, P. R. (2011). Understanding consumer conversations around ads in a Web 2.0 World. Journal of Advertising, 40(1), 87–102. https://doi.org/10.2753/JOA0091-3367400106
Article Google Scholar
Cifariello, P., Ferragina, P., & Ponza, M. (2019). WISER: A semantic approach for expert finding in academia based on entity linking. Information Systems, 82, 1–16.
Article Google Scholar
Cornolti, M., Ferragina, P., & Ciaramita, M. (Eds.) (2013). A framework for benchmarking entity–annotation systems. ACM.
Crawford, L., Pollack, J., & England, D. (2006). Uncovering the trends in project management: Journal emphases over the last 10 years. International Journal of Project Management, 24(2), 175–184.
Article Google Scholar
Crichton, G., Pyysalo, S., Chiu, B., & Korhonen, A. (2017). A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinformatics, 18(1), 368.
Article Google Scholar
Cuzzola, J., Jovanović, J., Bagheri, E., & Gašević, D. (2015). Evolutionary fine-tuning of automated semantic annotation systems. Expert Systems with Applications, 42(20), 6864–6877.
Article Google Scholar
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Google Scholar
Dyson, R. G. (2004). Strategic development and SWOT analysis at the University of Warwick. European Journal of Operational Research, 152(3), 631–640.
Article MathSciNet MATH Google Scholar
Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., ... & Vespignani, A. (2018). Science of science. Science, 359(6379). https://doi.org/10.1126/science.aao0185
Gallon, J. R. (1991). Nitrogen fixing organisms: Pure and applied aspects. Chapman and Hall.
Garrett-Jones, S., Turpin, T., & Diment, K. (2010). Managing competition between individual and organizational goals in cross-sector research and development centres. The Journal of Technology Transfer, 35(5), 527–546.
Article Google Scholar
George, G., Osinga, E. C., Lavie, D., & Scott, B. A. (2016). Big data and data science methods for management research. Academy of Management Journal, 59(5), 1493–1507.
Article Google Scholar
Giorgi, J. M., & Bader, G. D. (2018). Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics, 34(23), 4087–4094.
Article Google Scholar
González-Albo, B., & Bordons, M. (2011). Articles vs. proceedings papers: Do they differ in research relevance and impact? A case study in the Library and Information Science field. Journal of Informetrics, 5(3), 369–381.
Article Google Scholar
Granato, D., Santos, J. S., Escher, G. B., Ferreira, B. L., & Maggio, R. M. (2018). Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective. Trends in Food Science and Technology, 72, 83–90.
Article Google Scholar
Hannigan, T. R., Haans, R. F., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., ... & Jennings, P. D. (2019). Topic modelling in management research: Rendering new theory from textual data. Academy of Management Annals, 13(2), 586–632.
Hasibi, F., Balog, K., & Bratsberg, S. E. (2016). On the reproducibility of the TAGME entity linking system. In 38th European conference on information retrieval (ECIR).
Hobolt, S. B., & Klemmensen, R. (2005). Why labour didn’t listen: Party competition and issue responsiveness in the recent British and US elections.
Hoon, C. (2013). Meta-synthesis of qualitative case studies: An approach to theory building. Organizational Research Methods, 16(4), 522–556.
Article Google Scholar
Indulska, M., Hovorka, D. S., & Recker, J. (2012). Quantitative approaches to content analysis: Identifying conceptual drift across publication outlets. European Journal of Information Systems, 21(1), 49–69.
Article Google Scholar
Kemp, M. (2009). Dissecting the two cultures. Nature, 459(7243), 32–33.
Article Google Scholar
Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering. Technical Report.
Kolev, K. D., Wangrow, D. B., Barker, V. L., III., & Schepker, D. J. (2019). Board committees in corporate governance: A cross-disciplinary review and agenda for the future. Journal of Management Studies, 56(6), 1138–1193.
Google Scholar
Kuckartz, U., & Rädiker, S. (2019). Analyzing qualitative data with MAXQDA. Springer.
Book Google Scholar
Larsen, K. R., & Bong, C. H. (2016). A tool for addressing construct identity in literature reviews and meta-analyses. MIS Quarterly, 40(3), 529–551.
Article Google Scholar
Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: Advancing the methodology. Implementation Science, 5(1), 69.
Article Google Scholar
Lewis, R. B., & Maas, S. M. (2007). QDA Miner 2.0: Mixed-model qualitative data analysis software. Field Methods, 19(1), 87–108.
Article Google Scholar
Li, J., Reniers, G., Cozzani, V., & Khan, F. (2017). A bibliometric analysis of peer-reviewed publications on domino effects in the process industry. Journal of Loss Prevention in the Process Industries, 49, 103–110.
Article Google Scholar
Linnenluecke, M. K., Marrone, M., & Singh, A. K. (2020). Conducting systematic literature reviews and bibliometric analyses. Australian Journal of Management, 45(2), 175–194.
Article Google Scholar
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.
MATH Google Scholar
Maclaran, P., & Catterall, M. (2002). Researching the social web: Marketing information from virtual communities. Marketing Intelligence and Planning, 20(6), 319–326.
Article Google Scholar
Marrone, M. (2020). Application of entity linking to identify research fronts and trends. Scientometrics, 122, 1–23.
Article Google Scholar
Marrone, M., & Hammerle, M. (2016). An integrated literature review: Establishing relevance for practitioners. In 2016 International conference on information systems, ICIS 2016 (pp. 1–21). Association for Information Systems.
Marrone, M., & Hammerle, M. (2017). Relevant research areas in IT service management: An examination of academic and practitioner literatures. CAIS, 41, 517–543.
Article Google Scholar
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint. arXiv:1802.03426.
Meuser, J. D., Gardner, W. L., Dinh, J. E., Hu, J., Liden, R. C., & Lord, R. G. (2016). A network analysis of leadership theory: The infancy of integration. Journal of Management, 42(5), 1374–1403.
Article Google Scholar
Molla, A., Cooper, V., & Karpathiou, V. (2016). IT managers' perception and response to digital disruption: An exploratory study. arXiv preprint. arXiv:1606.03534.
Nakagawa, S., Samarasinghe, G., Haddaway, N. R., Westgate, M. J., O’Dea, R. E., Noble, D. W., & Lagisz, M. (2019). Research weaving: Visualizing the future of research synthesis. Trends in Ecology and Evolution, 34(3), 224–238.
Article Google Scholar
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.
Article Google Scholar
Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61(2), 217–235.
Article MathSciNet MATH Google Scholar
Paré, G., Trudel, M.-C., Jaana, M., & Kitsiou, S. (2015). Synthesizing information systems knowledge: A typology of literature reviews. Information and Management, 52(2), 183–199.
Article Google Scholar
Patriotta, G. (2020). Writing impactful review articles. Journal of Management Studies, 57(6), 1272–1276.
Article Google Scholar
Paulus, T., Lester, J., & Dempster, P. (2013). Digital tools for qualitative research. SAGE.
Piccinno, F., & Ferragina, P. (2014). From TAGME to WAT: A new entity annotator. In 37th Annual international ACM SIGIR conference.
Preiss, J., & Stevenson, M. (2016). The effect of word sense disambiguation accuracy on literature based discovery. BMC Medical Informatics and Decision Making, 16(1), 57.
Article Google Scholar
Rayson, P., Berridge, D., & Francis, B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora. In 7th International conference on statistical analysis of textual data (JADT).
Schryen, G. (2015). Writing qualitative IS literature reviews—Guidelines for synthesis, interpretation and guidance of research. CAIS, 37(12), 286–325.
MathSciNet Google Scholar
Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis in language education (Vol. 22): John Benjamins Publishing.
Shen, W., Wang, J., & Han, J. (2014). Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2), 443–460.
Article Google Scholar
Smith, A. (2003). Automatic extraction of semantic networks from text using Leximancer. In Companion volume of the proceedings of HLT-NAACL 2003—Demonstrations (pp. 23–24).
Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods, 38(2), 262–279.
Article Google Scholar
Smith, C., & Short, P. M. (2001). Integrating technology to improve the efficiency of qualitative data analysis—A note on methods. Qualitative Sociology, 24(3), 401–407.
Article Google Scholar
Snow, C. P. (1961). The two cultures and the scientific revolution. Cambridge University Press.
Book Google Scholar
Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. Journal of business research, 104, 333–339.
Sotiriadou, P., Brouwers, J., & Le, T. A. (2014). Choosing a qualitative data analysis tool: A comparison of NVivo and Leximancer. Annals of Leisure Research, 17(2), 218–234.
Article Google Scholar
Sridhar, V. K. R. (2015, June). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp. 192–200).
Templier, M., & Paré, G. (2015). A framework for guiding and evaluating literature reviews. Communications of the Association for Information Systems, 37(1), 6.
Google Scholar
Templier, M., & Paré, G. (2018). Transparency in literature reviews: An assessment of reporting practices across review types and genres in top IS journals. European Journal of Information Systems, 27(5), 503–550.
Article Google Scholar
Tienari, J., Vaara, E., & Björkman, I. (2003). Global capitalism meets national spirit: Discourses in media texts on a cross-border acquisition. Journal of Management Inquiry, 12(4), 377–393.
Article Google Scholar
Vaara, E., & Tienari, J. (2002). Justification, legitimization and naturalization of mergers and acquisitions: A critical discourse analysis of media texts. Organization, 9(2), 275–304.
Article Google Scholar
Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 111(2), 1053–1070.
Article Google Scholar
Vesti, H., Nielsen, C., Rosenstand, C. A. F., Massaro, M., & Lund, M. (2017). Structured Literature Review of disruptive innovation theory within the digital domain. In The ISPIM innovation summit.
Weber, R. P. (1990). Basic content analysis (No. 49). SAGE.
Westgate, M. J. (2019). revtools: An R package to support article screening for evidence synthesis. Research Synthesis Methods. https://doi.org/10.1002/jrsm.1374
Article Google Scholar
Whittaker, J. (1989). Creativity and conformity in science: Titles, keywords and co-word analysis. Social Studies of Science, 19(3), 473–496. https://doi.org/10.1177/030631289019003004
Article MathSciNet Google Scholar
Wickham, M., & Woods, M. (2005). Reflecting on the strategic use of CAQDAS to manage and report on the qualitative research process. The Qualitative Report, 10(4), 687–702.
Google Scholar
Wu, G., He, Y., & Hu, X. (2018). Entity linking: An issue to extract corresponding entity with knowledge base. IEEE Access, 6, 6220–6231.
Article Google Scholar
Xu, Z., Ge, Z., Wang, X., & Skare, M. (2021). Bibliometric analysis of technology adoption literature published from 1997 to 2020. Technological Forecasting and Social Change, 170, 120896.
Article Google Scholar
Zhai, X., Li, Z., Gao, K., Huang, Y., Lin, L., & Wang, L. (2015). Research status and trend analysis of global biomedical text mining studies in recent 10 years. Scientometrics, 105(1), 509–523.
Article Google Scholar
Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429–472.
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Department of Accounting and Corporate Governance, Macquarie University, Sydney, 2109, Australia
Mauricio Marrone
Chair of Information Systems, University of Goettingen, Göttingen, Germany
Sascha Lemke & Lutz M. Kolbe

Authors

Mauricio Marrone
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Lemke
View author publications
You can also search for this author in PubMed Google Scholar
Lutz M. Kolbe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mauricio Marrone.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Supplementary Information

Below is the link to the electronic supplementary material.

(DOCX 117 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Marrone, M., Lemke, S. & Kolbe, L.M. Entity linking systems for literature reviews. Scientometrics 127, 3857–3878 (2022). https://doi.org/10.1007/s11192-022-04423-5

Download citation

Received: 20 June 2021
Accepted: 24 May 2022
Published: 28 June 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11192-022-04423-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Entity linking systems for literature reviews

Abstract

Similar content being viewed by others

Analysing Structured Scholarly Data Embedded in Web Pages

Searching Systematically and Comprehensively

ASH: A New Tool for Automated and Full-Text Search in Systematic Literature Reviews

Introduction

Theoretical background

Workflow and guidelines

Illustrative example

Step 1: formulating the problem

Step 2: searching the literature

Step 3: screening for inclusion

Step 4: assessing quality

Step 5: extracting data

Step 6: analysing and synthesizing data

Findings

Discussion

Limitations

Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

(DOCX 117 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Entity linking systems for literature reviews

Abstract

Similar content being viewed by others

Analysing Structured Scholarly Data Embedded in Web Pages

Searching Systematically and Comprehensively

ASH: A New Tool for Automated and Full-Text Search in Systematic Literature Reviews

Explore related subjects

Introduction

Theoretical background

Workflow and guidelines

Illustrative example

Step 1: formulating the problem

Step 2: searching the literature

Step 3: screening for inclusion

Step 4: assessing quality

Step 5: extracting data

Step 6: analysing and synthesizing data

Findings

Discussion

Limitations

Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

(DOCX 117 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation