Keywords

1 Introduction

The open access landscape keeps growing, making it harder and harder to choose a suitable journal to publish research findings. There is an increasing number of journals: The Directory of Open Access Journals (DOAJ), an online directory of peer-reviewed open access journals, added over 5,000 journals in the last three years. All of these journals offer a variety of publishing conditions, including different peer-review schemes, publication costs and waivers, copyright and rights retention clauses. But there is also growing support for researchers to facilitate open access publishing: The recent years brought on various agreements formed between academic institutions and publishers, which determine publication costs and conditions; scientific libraries increasingly offer support services. Also, more funding agencies expect scientists to use open access options and specify clear conditions with respect to how funded work is to be published. This adds to the overall workload of researchers: They have to assess the open access landscape while taking into account factors such as newly established journals, predatory publishing schemes, quality measures, and, finally, individual publication costs.

In this paper, we present B!SON, a web-based recommendation system which aims to alleviate these problems. Open access journals are recommended based on content similarity (using title, abstract and keywords) and the cited references. User requirements were systematically collected in a survey [9], focusing primarily on researchers, but also addressing libraries, publishers and editors of scholar-led journals. Findings from these surveys have been discussed in depth with a user community and were incorporated into the system specification accordingly. The quality of B!SON’s recommendation is evaluated on a large test set of 10,000 articles. The rest of the paper describes the B!SON prototype by first reviewing existing work on scientific recommendation (Sect. 2), then describing the B!SON service, its data sources, algorithm and functionality (Sect. 3), and concluding in Sect. 4.

2 Related Work

Scientific recommendation tasks span the search for potential collaborators [17] and reviewers [18], of papers to read [1] and to cite [2, 5]. With more and more ways of publishing scientific articles, the recommendation of scientific publication outlets (journals/conferences) is a task which is on the rise (e.g. [22, 25]).

Prototypical approaches explore diverse data sources to provide recommendations. A major source of information is the article to be published: The text’s title, abstract or keywords are used to compare against papers that previously appeared in an outlet [14, 25] . Other systems exploit the literature cited by the article, and try to determine the best publication venue using bibliometric metrics [7, 20]. An alternative stream of research focuses more on the article authors, exploring their publication history [24] and co-publication networks [16, 23].

While there is a number of active journal recommender sites, they all come with limitations. Several publishers offer services limited to their own journals like Elsevier’s JournalfinderFootnote 1 [10] or Springer’s Journal suggesterFootnote 2. Others, like Journal GuideFootnote 3 are closed-source and do not provide transparent information on their recommendation approach. Several collect user and usage data , e.g. Web of Science’s manuscript matcherFootnote 4. Notably, some open recommenders exist, e.g. Open Journal MatcherFootnote 5, PubmenderFootnote 6 [4] and JaneFootnote 7 [21], the latter two being limited to medical journals. All of these open services provide little information on journals, are limited to abstract input and do not offer advanced filter options.

3 B!SON – The Open Access Journal Recommender

B!SON is the abbreviation for Bibliometric and Semantic Open Access Recommender Network. It combines several available data sources (see Sect. 3.1 for details) to provide authors and publication support services at libraries with recommendations of suitable open access journals, based on the entered title, abstract and reference list of the paper to be published.

3.1 Data Sources

The B!SON service is built on top of several open data sources with strong reputation in the open access community:

DOAJ: The Directory of Open Access Journals (DOAJ)Footnote 8 collects information on open access journals which fulfill a set of quality criteria (available full text, dedicated article URLs, at least one ISSN, etc.). The dataset includes basic information on the journal itself, but also metadata of the published articles (title, abstract, year, DOI, ISSN of journal, etc.). The DOAJ currently contains 17,669 journals and 7,489,975 articles. The data is available for download in JSON format under CC0 for articles and CC BY-SA for journal data [3].

OpenCitations: The OpenCitations initiativeFootnote 9 collects (amongst other) the CC0-licensed COCI data set for citation data. It is based on Crossref data and contains over 72,268,850 publications and 1,294,283,603 citations [15]. The information is available in the form of DOI-to-DOI relations and it covers 44% of citations in Scopus and 51% of the citations in Dimensions [13]. COCI lacks citations in comparison to commercial products, but can be used to check which articles published in DOAJ journals cite the references given by the user (details in Sect. 3.2). As open access journals are incentivized to submit their articles’ metadata, we can assume that the coverage of COCI in this regard is better.

Journal Checker Tool: The cOAlition S initiative (a group of funding agencies that agreed on a set of principles for the transition to open access) provides the Journal Checker ToolFootnote 10. A user can enter journal ISSN, funder and institution to check whether a journal is open access according to Plan S, if the journal is a transformative journal or has a transformative agreement with the user’s institution, or whether there is a self-archiving option [8]. An API allows to fetch this information automatically. Since B!SON does not retrieve data on funder or institution, we use the funder information of the European Commission to check if a journal is Plan-S compliant.

Other data sources: There are many other projects whose data might be used in B!SON’s future to supplement the currently used data sets. Crossref would allow us to extend the article data of the DOAJ which are occasionally incomplete. OpenAlex (by OurResearch) could add e.g. author information.

3.2 Technology

B!SON consists of a Django backendFootnote 11 and a Vue.jsFootnote 12 frontend.

Table 1. Top@N accuracy for the different search methods.

Data Integration: PostgreSQL and Elasticsearch are used as databases. Data from DOAJ and OpenCitations’ COCI index are bulk downloaded and inserted into PostgreSQL and Elasticsearch. The information on Plan-S compliance stems from Journal Checker Tool and is fetched from their API using “European Commission Horizon Europe Framework Programme” as the funder.

All developed software will be published open source in the upcoming weeksFootnote 13.

Recommendation: The recommendation is based on similarity measures with regard to the entered text data (title, abstract and keywords) and reference list.

Text similarity: Elasticsearch has a built-in functionality for text similarity search based on the Okapi BM 25 algorithm [19]. This is used to determine those articles already indexed in the DOAJ which are similar to the entered information. Stop word removal is performed as a pre-processing step. As the DOAJ contains articles in several languages, we combine the stop word lists from Apache Lucene for this purpose.

Bibliographic coupling: Additionally to textual data, the user can enter the list of cited articles, allowing to match journals based on bibliometric coupling [11]. For this, we extract the DOIs from the input list using regular expressions, then rely on the OpenCitations’ COCI index to find existing articles citing the same sources. The current solution is in a prototypical state: The number of matching citations is divided by the highest number of matching citations of the compared articles; if the normalized value is higher than a threshold (which is currently manually defined), the article is considered similar. We are currently working on integrating more sophisticated normalisation methods (e.g. [12]) and exploring options on how to dynamically define the threshold value.

Combination of text-based and bibliographic similarity: Similar articles are matched with their journal and the total score is calculated as a sum. Refined aggregation methods are currently explored and will be available soon.

Recommendation Evaluation: The algorithm is evaluated on a separate test data set of 10,000 DOAJ articles. To ensure realistic input data, all articles in the test set have a minimal abstract length of 100 characters and a minimal title length of 30 characters. As the references are not part of the DOAJ data, the COCI index was used to fetch references via the article DOI. Only articles with at least one reference were included. We assume that the articles were published in a suitable journal to begin with, counting a positive result if the originating journal appears in the top-n results of the recommendation. While this may not be correct for each individual article, we rely on the assumption that the overall journal scope is defined by the articles published in a journal. This current recommendation algorithm reaches the top@N accuracy shown in Table 1 when tested on a test set of 10,000 DOAJ articles.

3.3 User Interface and Functionality

The current state of the B!SON prototype is available for testingFootnote 14. The user interface has been designed deliberately simple, a screenshot is shown in Fig. 1.

Fig. 1.
figure 1

Screenshot of B!SON prototype with an exemplary query.

Data entry: The start page directly allows the user to enter title, abstract and references or fill them out automatically by fetching the information from Crossref with a DOI so that open access publication venues can be found based on previously published research.

Result page: To inspect the search results, the user has the choice of representing them as a simple list or a table which offers a structured account of additional details, enabling easy comparison of the journals. Author publishing costs (APCs) are displayed based on the information available in DOAJ, and automatically converted to Euro if necessary.

Currently, the displayed similarity score is calculated based on simple addition of the Elasticsearch similarity score and the bibliometric similarity score. By clicking on the score field, the user has the option to display a pop-over with explanatory information: B!SON will then display the list of articles which previously appeared in said journal which were determined to be similar by the recommendation engine. Clicking on a journal name leads the users to a separate detail page which offers even more information including keywords, publishing charges, license, Plan-S compliance, and more.

Data export and transparency: Search results can be exported as CSV for further use and analysis. A public API is available for programmatic access. It is also planned to provide the recommendation functionality in a form that is easily integrated and adapted to local library systems. For transparency on data sources, the date of the last update of the data is shown on the “About Page”.

4 Conclusion

In this paper, we have presented a novel prototypical recommendation system for open access journals. The system combines semantic and bibliometric information to calculate a similarity score to the journals’ existing contents, and provides the user with a ranked list of candidate venues.

The B!SON prototype is available online for beta-testing. Based on the community feedback received so far, we are currently working on further optimisations. This concerns, for instance, the computation of the similarity score which is, to date, a simple addition of semantic and bibliometric similarity results. More sophisticated aggregation methods will allow optimised weighting of both components and, perspectively, better interpretability of the resulting score. Furthermore, we are currently exploring embedding-based text (used by e.g. Pubmender [4]) and citation graph representations [6] to further improve the recommendation results. Several remarks also concern the user interface. The community’s wishlist includes more sophisticated methods for the exploration of the result list, e.g. graph-based visualisations, an extension of the filtering options and an improved representation of the similarity score.

Going beyond the scope of the B!SON project, it could be interesting to extend recommendations to other venues which offer open access publication such as conferences. Moreover, the integration of person-centred information, such as prior publication history and frequent co-authors seems promising.