Keyword occurrences and journal specialization

Since the borders of disciplines change over time and vary across communities and geographies, they can be expressed at different levels of granularity, making it challenging to find a broad consensus about the measurement of interdisciplinarity. This study contributes to this debate by proposing a journal specialization index based on the level of repetitiveness of keywords appearing in their articles. Keywords represent one of the most essential items for filtering the vast amount of research available. If chosen correctly, they can help to identify the central concept of the paper and, consequently, to couple it with manuscripts related to the same field or subfield of research. Based on these universally recognized features of article keywords, the study proposes measuring the specialization of a journal by counting the number of times that a keyword is Queryrepeated in a journal on average (Sj). The basic assumption underlying the proposal of a journal specialization index is that the keywords may approximate the article’s topic and that the higher the number of papers in a journal based on a topic, the higher the level of specialization of that journal. The proposed specialization metric is not invulnerable to a set of limitations, among which the most relevant seems to be the lack of a standard practice regarding the number and consistency of keywords appearing in each article.


Introduction
This paper aims to contribute to the debate on the specialization and compartmentalization of academic journals and, indirectly, on their degree of interdisciplinarity.An interdisciplinary journal is an academic publication that publishes articles and reviews covering themes and approaches from different disciplines (Augsburg, 2016).These journals seek to overcome the boundaries of individual disciplines and promote collaboration and integration of research approaches from different fields.Articles published in interdisciplinary journals often address complex issues that require understanding and solutions that cannot be provided by a single discipline.Interdisciplinary journals focus on promoting research and academic debate that goes beyond the limits of individual disciplines and promotes a more holistic approach to analysing and understanding problems.These journals are often sought out by those looking for a broader perspective on a topic or issue and are used as a source of information for interdisciplinary research and academic projects.In contrast to interdisciplinarity, we have journals with a strong degree of disciplinarity, which we identify here with the concept of specialization.A specialized journal is an academic source whose contributions are strongly concentrated in a field bodies of knowledge or research practice.The specialization level and interdisciplinarity of journals may be considered complementary terms: the higher the interdisciplinarity of a journal is, the lower the level of specialization and vice versa.
This paper intends to contribute to developing a debate about the disciplinarity level of journals by introducing an index of journal specialization (Sj index).
The literature on the degree of disciplinarity of journals is scant in formal terms, but it is extensive in substantive terms.In fact, if we consider specialization as a complementary term to interdisciplinarity, it is straightforward to count the bulk of studies on interdisciplinarity as indirect references to the topic of journal specialization.
From the analysis of the literature on interdisciplinarity, it emerges that while the attention on interdisciplinary research has grown in recent decades (Leydesdorff, 2007;Glänzel & Debackere, 2022), the academic literature on this topic has shown a lack of consensus on the most appropriate measurement approach for interdisciplinary research (Leydesdorff & Rafols, 2011;Wang & Schneider, 2020).Moreover, the categorization of scientific articles and journals by topic is one of the most difficult and essential problems of information science.
Specifically, and in a quantitative context, there are at least two different methods to classify the interdisciplinarity of research: journal-level and paper-level categorization.The debate on which of these two methods of categorization provide the most appropriate solution is still open (Abramo et al., 2018;Milojević, 2020).The most common research categorization approach is the journal-level approach, based on the assumption that journal subject categories (SCs) identify disciplines (Carpenter & Narin, 1973, Zwanenburg et al., 2022).
However, using an a priori discipline structure is not invulnerable to criticism.The unambiguous categorization of journals through subject categories appears to be rare due to the fuzziness of journal sets (Bensman, 2001).Further, Journal-level categorization cannot assign a single or few categories to interdisciplinary journals (such as Nature or Science), ad other concerns about the risk of disambiguation have been raised. 1o overcome these limitations, it seems reasonable to introduce a paper-level categorization approach based on one of the most granular items in an article, such as the title words, as suggested in their final remarks by Eck and Waltman (2012) or keywords, as proposed by this study.
This paper tries to produce a measure of specialization (and therefore indirectly of interdisciplinarity) of journals using paper-level categorization that uses research keywords as an element of analysis.Article keywords are generally considered a crucial element in understanding the contents of an academic paper (Callon et al., 1986;Choi et al., 2011;Hartley & Kostoff, 2003;He, 1999;Whittaker, 1989).
As these keywords summarize and represent the paper's topic, they represent carefully selected terms that are deemed essential and significant.This is because scholars can quickly identify papers by scanning through article keywords, which are known to consist of the most important terms that represent papers.Additionally, research keywords can reveal related terms that may have previously been unknown and expand search queries.
The use of keywords seems to be one of the most effective ways authors and editors enhance the chances of potential readers finding their articles.Since keywords are not limited to predetermined categories, as they can be freely selected by the authors, they permit readers to make certain that the research article is relevant and support editors and research databases in enhancing the relatedness of articles on a specific topic (Hartley & Kostoff, 2003).Furthermore, keywords selected by authors have been shown to be useful for understanding a discipline (Li, 2018;Onyancha, 2018;Xu et al., 2018), even though Tsai et al. (2011) have shown that expert authors tend to select keywords that better represent the content of their article compared to those chosen by novices.
Moreover, it has also been found that some statistical characteristics of keywords (e.g., number of keywords and percentage of new keywords) have significant relations with citation counts (Uddin et al., 2016).
Based on the assumption that keywords are able to identify the content of the article, the approach proposed in this study identifies the frequency of keywords in a journal as a possible indicator of its specialization: the higher (the lower) the occurrences of the keyword among articles published by a single journal, the higher its discipline specialization (its interdisciplinarity).
By using a dataset consisting of 88,583 articles published in 50 journals, we propose the use of a specialization index, Sj, interpreted as the average frequency at which a keyword appears in the j th journal.
The main peculiarity of this index is its extreme simplicity and replicability due to the availability of freely accessible bibliometric software (i.e., VOSviewer, Eck & Waltman, 2010) that quite easily collects keywords from a dataset of papers.
However, analysing the keyword dynamics requires caution in several aspects.The number of keywords collected in a dataset of articles can be correlated with many factors, such as the number of articles collected, the presence or absence of a publisher's policy that requires the inclusion of a minimum or a maximum number of keywords, the degree of specialization of the article, and the degree of attention given by a researcher to the selection of their keywords.The analysis of the dynamics of keywords still seems to be an unexplored theme, while their use as a possible key factor to screen the specialization of journal research is a novelty.
The essential contribution of this study is to enhance the literature related to publication-level classification systems by introducing a new method of investigation about the specialization of academic journals based on the resource of keywords in a journal that attempts to approximate the level of concentration of articles on a specific topic.By assuming that keywords approximate the article's topic, this study suggests that the higher the occurrences of keywords, the higher the level of specialization of that journal.
The remaining paper is structured as follows: Sect."Related literature" provides context for the proposed measure by comparing it to other journal disciplinarity/interdisciplinarity measures.Sect."Methodology" reports the methodology of the specialization index.Sect."The dataset" describes the way in which the sample of journals was composed.Sect."Keyword dynamics across journals."describes some unobserved dynamics of keywords of journals, while Sect."Results" reports the calculation of the metrics for the identified sample.Finally, Sect."Conclusions and limitation" concludes and outlines the implications of this study.

Related literature
To our knowledge, this is the first study that attempts to propose a metric of specialization by using author keywords.At the same, time it is not the first study that propose an index of specialization.Boyack and Klavans (2011) propose to measure the journal specialization (indicated with the term "journal specificity") by introducing more than one quantity including a textual coherence indicator that is the most close to the index presented in this manuscript.By leveraging the title and abstract information of papers, the authors employed word probability vector techniques to generate clusters and structural insights.Specifically, the index of journal specificity proposed is based on the Jensen-Shannon divergence since it measures the similarity (and then the divergence) between two probability distributions: the likelihood of occurrence of a word in a document (an article) and the likelihood of occurrence of the same word in a Journal (where the article has been published).This approach enables to capture meaningful data about the scientific domain of academic article and it follows other studies in which quite similar text analysis have been used (Boyack & Klavans, 2010;Braam et al., 1991;Glänzel & Czerwon, 1996;Jarneving, 2001).
By comparing the Boyack and Klavans (2011) metric with the Sj index, it becomes apparent that while Sj offers insights into the frequency of specific keywords used within a specific journal, the other index determines the textual coherence of individual journals by comparing word probability distributions.In summary, these two metrics differ both in terms of the data used (keywords versus title and abstract) and the methodological approach (probability of occurrence versus actual occurrence).
Other studies discuss, in some case partially, of journal specialization even though without a proposal of metric.The concept of journal specialization has been discussed by Glänzel et al., (1999) who recognizes that the demarcation of subject areas through journal assignment is inherently less precise compared to using subject headings from individual publications.In these circumstances, keywords can be the considered as a flexible means of tracing the dissemination and trajectories of knowledge, as they are highly indicative of the concepts and subjects covered in articles (Xu et al., 2018).By using keywords extracted through an automatic method (Frantzi et al., 2000 2 ), rather than author's keywords as reported in our study, Xu et al. (2018) examine the formation of interdisciplinary knowledge through the lens of keyword evolution.By doing so, it is possible to gain insights into the specific developmental characteristics of interdisciplinarity, which provide a potential timeline for the interdisciplinary formation process.Specifically, the formation of an interdisciplinarity approach around a specific topic of research progresses through some significant phases (a latent phase, an embryo phase, and a mature phase) and this evolution may be captured through the evolution of keywords related to a topic in different domains of science.In summary, while keywords may be considered as a valuable method to analyze the concentration of subjects and concepts inside a single journal, if collected in different journals and referred to a specific subject, they can also be considered tools for analysis of the interdisciplinarity attitude of that specific subject and also to predict knowledge evolution (Choi et al., 2011).Griffiths and Steyvers (2004) introduce a statistical inference algorithm for Latent Dirichlet Allocation (LDA), a generative model that considers documents as a combination of topics.The implementation of this algorithm allows them to explore the topic dynamics and the identification of words' significance in the semantic content of documents.This results is in part related to our study to the extent that it represents a valuable example of how to calculate the frequency of topic in a cluster of documents (including academic journals).In any case, the most distinguished characteristic of the Sj Index when compared with other explicit or implicit indicators of journal specialization rely on the circumstances that it is not based on probability calculations neither in a more complex algorithm while it represents a metric expressed by a deterministic process.

Methodology
The content of a paper may be described by many features, such as title, abstract, keywords, and discipline classification coding.Keywords represent a "subject heading" that should help readers understand the central concept of the paper and its fields of concern (Hartley & Kostoff, 2003).
Based on this attitude, the underlying rationale of the recourse to keywords to analyse the topic specialization of a journal is that the more times a keyword is replicated in a journal, the higher the number of papers related to the same or similar concepts.
To formalize this idea, we indicate with K j (with K j > 0) the total number of unique keywords contained in a set of m articles of journal j, and with OCC ij (with OCC ij > 0 and where i = 1,..,K j ) the occurrences of each i th unique keyword appearing in journal j.The term "unique" means that, for example, if K j = 10, there are m articles of journal j containing 10 keywords, each of which is replicated OCC ij times.
Since OCC ij represents the number of times a unique keyword is found in a single journal j and since keywords are items unique to each paper, OCC ij should be positively related to the number of articles focused on a specific field of research.In other words, it is possible to consider OCC ij as the number of papers that use the i th keyword in the j th journal.
Based on this specification, we propose measuring the specialization of the j th journal through the level of density of keywords Sj, which can be written in the following form: where OCC ij and K j have already been defined, while OCCj represents, by construction, the total number of keywords (including duplications) selected by all authors of journal j. 3   (1) 3 Suppose that a journal has only the following three author keywords: red, white, green.Then, suppose that the keyword red has an occurrence of 4 (i.e., there are four papers that use this keyword), white has an occurrence of 5, and green has an occurrence of 6.In these conditions OCC j is equal to 15 (4 + 5 + 6), and S j = 5 (15/3), meaning that, on average, a keyword is repeated 5 times in the journal.The higher this number, the higher the number of papers in the journal on the same topic.To clarify further the concept of OCC j and S j , suppose that the journal has now also a fourth keyword, blue, and that this keyword has an occurrence of 1 (i.e., the keyword blue is present in only one paper).In these new conditions, OCC j is equal to 16 (4 + 5 + 6 + 1), while S j is equal to 4 (16/4, where the denominator is the number of unique keywords that passes from 3 to 4).
By expressing the index in this way, Sj appears to be immediately interpreted as the number of times that a generic keyword on average appears in the j th journal.
The simplicity of the index and its immediate comprehensibility represent its primary properties.
However, as discussed in the remainder of the paper, the level of Sj may be affected by a special link between OCC j and K j .Basically, when someone compares the number of unique keywords appearing in journal j (K j ) with the total number (including duplicates) of keywords selected by authors of journal j (OCC j ), it is reasonable to expect that OCC j increases in Kj.This relationship could be explained by probability theories.In particular, assume that selecting a keyword is a mechanism analogous to drawing a card from a deck of W cards with replacement, where W is the (unknown4 ) size of the vocabulary from which researchers choose their keywords and where any drawing is independent of the others.Under these circumstances, the number of times a researcher selects a keyword (i.e., takes a card from the deck) already chosen by another researcher should theoretically be approximated by the binomial distribution X ~ B(n, p) with parameters n ∈ W and p ∈ [0,1], where the probability of success is positively related to the number of trials.In other words, the positive link between the number of keywords in a journal (K j ) and the number of times a keyword may be selected more than one time in that journal (OCC ij ) is because with a larger number of keywords selected, there is a higher chance that two or more researchers will select the same keyword by chance.5Consequently, since journals with a higher number of keywords (K j ) tend to show a higher number of occurrences (OCC j ), these journals may show a higher value of the specialization index Sj.In other words, if we calculate the Sj of two journals with a very different number of keywords, the journal with the higher number of keywords tends to be overestimated.
The positive linkage between K j and OCC j then makes a correction of Sj necessary for those cases where K j is higher to avoid overestimating Sj.
Thus, we propose an adjusted version of Sj that can be written as follows: Using the logarithm of variables represents an attempt to reduce the impact of outliers and extreme values that may skew the analysis results.While the correction proposed by [2] should mitigate the bias of the K-OCC linkage, it has the drawback of losing the immediate meaning of Sj (i.e., the number of times a keyword is on average repeated in a journal) and is only one of the possible corrections.In reality, one could also skip the use of such a measure of correction if the Sj calculation is limited to a set of journals with approximately the same number of keywords. (2)

The dataset
The data used to calculate the journal specialization measure proposed in this study (Sj) were based on keyword frequencies obtained from 50 journals selected from the Scopus source title list (updated in March 2023).More specifically, we selected 50 active journals from the Scopus journal list in alphabetical order containing a minimum number of unique keywords 6 .For each journal, we retrieved all their articles from the Scopus database.The maximum threshold of 20,000 journal articles imposed in the Scopus algorithm has not represented a problem since no one of the 50 journals has more than 20,000 documents to export.Each journal is classified according to the Scopus All Science Journal Classification, ASJC (Franceschini et al., 2016).The keywords were collected through the function available in VOSviewer (version 1.6.18), a freely available bibliometric software explicitly developed to create and display bibliometric maps (Eck & Waltman, 2010, 2014).This function permits the collection of unique keywords and their occurrences at the article level.For keywords, this study intends only author keywords, the terms selected by authors to accurately represent the content of their document.The indexed keywords-that represent keywords selected and standardized by Scopus for indexing purposes using vocabularies derived from Elsevier-owned or licenced thesauri-are excluded from the analysis even though they represent a separate vector of data that deserves further investigation in future research. 7 The number of keywords and occurrences of each journal and some additional details are reported in Table 1.We collected 119,775 unique keywords 8 in total.At a single journal level, the number of keywords varied from a minimum of 43 (for the journal AAO Journal) to a maximum of 13,112 (for the journal Accident Analysis and Prevention).
Using a data cleaning function able to merge different keyword variants in unique forms (a VOSviewer thesaurus file), the keywords collected in this study were submitted to a filter to delete too generic or ambiguous keywords that do not consent to specify in detail the content of an article.The reasons for the ambiguousness vary from the extreme generalization of keywords (genetics, health, chemistry) to an incorrect specification of the article content (e.g., case, report, note, etc.) up to confounding terms (e.g., united states, adult, procedures). 9In making this correction, it is necessary to clarify that it is not meant to label these keywords as inappropriate.Rather, the purpose of the correction is to highlight 6 The minimum number has been arbitrarily set at 40.It should be noted that not all journals have retrievable keywords.In some cases, there may be a lack of author keywords, index keywords, or both.Since the topic of missing keywords remains unexplored, it could be a worthwhile area for future research. 7More specifically, index keywords are keywords not entered by authors but from the reference database (e. g.Scopus or WoS).Scopus, for example, has index keywords that use a controlled vocabulary to describe the contents of a study, such as MeSH (medical subject headings), Emtree (life sciences & health science), or Compendex (engineering). 8In this study the term "unique" is referred to the single journal j and not to the set of 50 journals since it is not possible to exclude the possibility that a unique keyword in one journal also appears in another journal.Table 1 Keywords and occurrences of Journals analysed. 9The list of the keywords qualified as ambiguous and then deleted from our data is the following: covid-19, Covid19, COVID/19, covid_19, adult, aged, animal, animals, article, case report, chemistry, clinical article, clinical trial, conference paper, controlled study, diagnosis, diseases, editorial, female, genetics, health, human, humans,letter, major clinical study, male, methodology, nonhuman, note, priority journal, procedures, review, short survey, therapy, united kingdom, united states, europe.We also removed the keyword "covid-19" or similar terms (i.e.Covid19, COVID/19, covid_19) due to the massive impact of the COVID-19 pandemic on scientific production (Riccaboni and Verginer, 2022) and the related possible bias in keyword distribution.the importance of correcting keywords during estimation exercises to prevent overestimation of the specialization index.Through the function provided by Bibliometrix (Aria & Cuccurullo, 2017), we also counted the average number of coauthors per document for each journal (see NC in Table 1).The number of papers retrieved on Scopus (NP j ) has been calculated with Bibliometrix and was not the same for all journals and varied from 111 (the number of papers for A e C-Revista de Direito Administrativo e Constitucional and for AAO Journal) to 14,803 (for Academic Medicine).The period of collection also differed, as the briefest time span was 4 years, from 2019 to 2022 (for A e C-Revista de Direito Administrativo e Constitucional), and the longest was 1922-2023 (for Abhandlungen aus dem Mathematischen Seminar der Universitat Hamburg).

Keyword dynamics across journals
This section analyses the dynamics of the number of occurrences of keywords in a journal.At first glance, the occurrence of keywords (OCC ij , the number of times the i th keyword appears in the journal j th ) is a function of the number of keywords.Specifically, by using the analogy of a deck of cards (i.e., an actual finite population from which samples can be randomly drawn) and considering that the topics of articles published in a volume of a journal should be random, we can approximate the choice of a keyword as a card taken from a deck of W j cards with replacement (where W j is the unknown and many possible keywords), while the occurrence should be approximated as success in selecting a keyword already extracted by another author.Under these circumstances, the number of occurrences is increasing in the number of keywords, as the drawing with replacement analogy is positively associated with the number of trials.
The positive linkage between the number of occurrences and the number of keywords is empirically tested in Fig. 1.The special connection between keywords and occurrences is observed in all journals, independent of the ASJC macrodiscipline.Since the number of keywords seems, at this point, to reasonably act as a factor that affects the number of occurrences, one wonders what are the main factors that determine the number of keywords appearing in a set of papers.The first possible answer is to connect the number of keywords to the size of the set of papers analysed since one should expect that a high number of papers analysed corresponds to a high number of topics investigated and hence a high number of keywords.As shown in Fig. 2a, this relationship is observed in our sample.
However, analysing the dynamics of keywords is more complex, and it seems to represent a still underexplored subject.For example, the keyword dynamics should be a function not only  of the number of collected papers but also of the journal submission settings to the extent that some journals may require a minimum number and/or a maximum number of keywords.
Additional factors that could explain a high number of keywords for a single paper could be the number of article pages, as the number of pages could reasonably be correlated with the article's depth (and then to the number of keywords selected).Not being able to directly correlate the number of pages with the number of keywords for each article, this study considered the average number of authors for each journal since a relationship between article length and the number of authors was found, albeit weak and limited to a single discipline (Papatheodorou et al., 2008).The image shown in Fig. 2b supports this idea and suggests considering this relationship in future research related to inspecting keyword dynamics across academic journals.

Results
This section shows the results of calculating the level of concentration of keywords in a set of articles of journal j, with the Sj index exhibited in [1] that we use as a proxy of journal specialization.
Recalling that the Sj index indicates the number of times a keyword appears on average in the j th journal, the value of the Sj index for each of the 50 journals included in the sample is shown in Table 2.The highest level of Sj is 4,19, which is attributed to Academic Medicine, while the lowest level (Sj = 1) is related to AAO Journal.
Since a pattern of dependence of keywords on the number of papers was observed and discussed in the previous section, the journals with the higher Sj are not surprisingly almost coincident with the journals with the higher number of papers collected.
In particular, if we qualify as high-frequency journals (hfj) the journals with numerous collected papers higher than the 75 th percentile of NP j (equal to 1850), we observe that 7 out of 10 journals with the higher Sj are in fact hfj.Moreover, the correlation between Sj and NP j is 0.74 (highly statistically significant at the 1% level), indicating a positive relationship that can be graphically observed in Fig. 3a.
The positive relationship between the number of papers and the specialization index also emerges if we observe the dynamic of Sj inside the same journal.Specifically, if we analyse the Sj of a single journal, we observe a tendency of the Sj index of that   journal to be positively correlated with the number of papers of the sample from which to collect the number of keywords.Figure 3b reports the case of the two journals 2D Materials and 3 Biotech (the first and the second journals of the sample in alphabetical order), and it is possible to observe that Sj is also positively correlated with the number of papers in this case.The positive relationship between the index of specialization of a journal and the number of articles retrieved to collect the keywords raises concerns about the appropriateness of calculating the Sj index among journals with different sizes of article collection.
One possible solution could be to limit the comparison of any keyword-based metric, such as Sj, only to journals for which the same amount of research output (i.e., NP j ) was collected.
An alternative approach could be to normalize the Sj index for the natural logarithm of K j, as shown by equation [2].
The adjusted version of the keyword density level, Sj-adj and the relative rank were then calculated for each journal, and the results are presented in columns 5 and 6 of Table 2, respectively.
The adoption of the correction changes the rank of 40 out of 50 journals, while the other 10 (20,0% of the sample) remain unaltered, including the journal positioned in the top rank (i.e., Academic Medicine).However, the amplitude of change in ranks after the adoption of the Sj-adj measure is minimal.Specifically, 64% of journals show a change in the ranks of only ± 3 ranks, while approximately 16% of journals show a change higher than 3 notches with respect to the Sj-based ranking.A large change is observed for 3 Biotech, which scaled down the ranking by 7 positions.Taken together, these findings suggest that using Sj-adj does not radically alter the previous Sj-based results, as is also revealed by Fig. 4, which compares the rank of each journal according to the two measures discussed above.

Conclusions and limitation
The number of academic journal titles has grown progressively during the last century, and the scholarly publication system is experiencing new growth because of Internet technology and open access options (Gu & Blackmore, 2016).This study introduces a new methodology for constructing an index of journal specialization.Based on the collection of keywords appearing in a predefined collection of articles published by a journal, the study proposes to measure journal specialization through the number of occurrences of keywords appearing in a journal.In this section, we discuss the strengths and limitations of the proposed methodology.Important strengths are the extreme simplicity of the methodology and its immediate comprehensibility, as the Sj index could be interpreted as the number of times that a keyword appears on average among the articles published by a specific journal.The higher this number is, the higher the fraction of papers about a specific research subject.The underlying assumption is that the keywords are chosen carefully by researchers to adequately represent the content of manuscripts and to be specific to a field or subfield of research.
The methodology is fully documented in this paper and accompanied by an analysis of keyword dynamics that, to our knowledge, has not been explored before.To determine journal specialization, the methodology relies exclusively on unique keywords and occurrences that can be easily retrieved through VOSviewer (Eck & Waltman, 2010).The data observed for our sample of 50 journals suggest considering the number of papers (NP j ) collected for the j th journal as a crucial variable since the number of keywords K j is dependent on the amount of research analysed.Furthermore, K j positively affects the occurrence of OCC j , which affects the same Sj index (since it is calculated as OCC j /K j ).
To avoid any overestimation of the Sj index, the study proposes an adjustment of Sj for the number of keywords; the corrected version of the index seems not to be changed overall to the extent that more than 80% of the papers collected do not show a significant variation in specialization ranks.Nevertheless, the findings of this study must be seen in light of some limitations.First, the Sj index cannot definitively conclude without caution the specialization of a journal in a specific discipline.The index only indicates the level of overlapping articles' contents, assuming that they are reasonably approximated by the keywords.From an abstract perspective, a journal could show a high level of Sj , while its articles may cover different disciplines.In a basic example regarding the recent or structural events, a journal could have published many papers with the keyword "COVID-19" or "climate" even though the contents cover various research fields (medicine, economics, engineering, sport science, etc.).
A second limitation is that the Sj index could be extremely affected by an underestimation of the number of occurrences OCC j , as the matching of keywords could fail due to the different ways in which two authors can write the same keyword.The number of examples is indefinite, as the discoupling of two similar keywords may occur due to recourse to the use of singular or plural, or different uses of special characters (e.g., fibrillin 1 rather than fibrillin-1), or synonyms.10Third, to conclude about the reliability of the Sj index, one should consider how it is affected by many aspects related to the size and the quality of the keywords retrieved that may depend on many factors, such as (i) the attitude of the researchers towards inserting detailed article-specific keywords, (ii) the choice of the journal to impose a minimum (and/or a maximum) number of keywords per article (Golub et al., 2020) and (iii) the lack of a standard for the length of keywords. 11he ranking's high sensitivity to multiple factors highlights that the purpose of this work is not to determine the ranking of specialized journals but rather to contribute to the ongoing discussion on specialization and interdisciplinarity metrics.Importantly, the sample size used in this study is arbitrary and limited to only 50 journals out of a total of approximately 44,000 journals listed in Scopus as of March 2023 (active and inactive journals).This study could also help address future research that intends to use keywords (and lemmatization of keywords) to enhance the journal specialization debate.For example, an issue that needs to be addressed in future works is the introduction of measures that also consider the characterization of the distribution of keywords (e.g., skewness and kurtosis).In this regard, there is room for improvement in the approach proposed here by adapting metrics such as the Herfindahl-Hirschman index (HHI) to the use of keywords and occurrences.
Funding Open access funding provided by Università Parthenope di Napoli within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.
This figure displays the connection between the number of unique keywords and their occurrences among the 50 journals constituting the sample: subpicture a) reports the case of all journals; subpicture b) groups journals according to the macrodiscipline categorization reported by the Scopus All Science Journal Classification (ASJC).The grey area indicates the 95% confidence interval for the predicted values.

Fig. 1
Fig. 1 Keyword and occurrences relationship Subpicture a) reports the scatter plot of the connection between the number of unique keywords and the average number of papers collected.The long-dashed line indicates the area's border containing the 95% confidence interval for the predicted values.Subpicture b) reports the scatter plot between the number of unique keywords and the average value of the number of coauthors per document; each circle represents a journal, while the grey area indicates the 95% confidence interval for the predicted values.

Fig. 2
Fig. 2 Keywords, number of papers, and number of coauthors

Fig. 3
Fig. 3 Index of specialization and number of papers retrieved This figure reports in graphic form the values reported in Table2.Note: a) the left dot graph indicates the dot graph of Sj and Sj-adj values, while the right scatter plot shows the correlation between the ranks obtained with Sj and Sj-adj for each journal.

Fig. 4
Fig. 4 This figure reports in graphic form the values reported in Table 2. Note: the left dot graph reported in subpicture a) indicates the dot graph of Sj and Sj-adj values, while the right scatter plot reported in subpicture b), shows the correlation between the ranks obtained with Sj and Sj-adj for each journal

Table 1
Keywords and occurrences of Journals analysed

Table 1 (
This table reports the number of keywords (variable K j , column 3) retrieved from Scopus for the 50 journals listed in column 1.The number of articles collected for each journal is indicated in column 2 by NP.The variable OCC (column 4) is the total number of occurrences of keywords appearing in the journal.Variable NC reported in column 5 indicates the average number of coauthors per document.In column 6 are reported the time-span of the articles collected for each journal.Data have been retrieved in Scopus at the date of May 1st 2023

Table 2
Journal ranking by specialization

Table 2
Sj the number of times that a keyword on average appears in the j th journal.Sj-adj is a version of Sj corrected for the number of keywords.Rank Sj and Rank Sj-adj represent the position of a journal in the ranking order for Sj and Sj-adj, respectively.Δ Rank is the difference between Rank Sj and Rank Sj-adj.Column (2) marks high-frequency journals (hfj) and low-frequency journals (lfj): journals indicated with (hfj) have numerous articles equal or higher than 1850 (the 75th percentile of column 2 of Table1); journals indicated with (lfj) have numerous articles less than 1850