B!SON is the abbreviation for Bibliometric and Semantic Open Access Recommender Network. It combines several available data sources (see Sect. 3.1 for details) to provide authors and publication support services at libraries with recommendations of suitable open access journals, based on the entered title, abstract and reference list of the paper to be published.
3.1 Data Sources
The B!SON service is built on top of several open data sources with strong reputation in the open access community:
DOAJ: The Directory of Open Access Journals (DOAJ)Footnote 8 collects information on open access journals which fulfill a set of quality criteria (available full text, dedicated article URLs, at least one ISSN, etc.). The dataset includes basic information on the journal itself, but also metadata of the published articles (title, abstract, year, DOI, ISSN of journal, etc.). The DOAJ currently contains 17,669 journals and 7,489,975 articles. The data is available for download in JSON format under CC0 for articles and CC BY-SA for journal data .
OpenCitations: The OpenCitations initiativeFootnote 9 collects (amongst other) the CC0-licensed COCI data set for citation data. It is based on Crossref data and contains over 72,268,850 publications and 1,294,283,603 citations . The information is available in the form of DOI-to-DOI relations and it covers 44% of citations in Scopus and 51% of the citations in Dimensions . COCI lacks citations in comparison to commercial products, but can be used to check which articles published in DOAJ journals cite the references given by the user (details in Sect. 3.2). As open access journals are incentivized to submit their articles’ metadata, we can assume that the coverage of COCI in this regard is better.
Journal Checker Tool: The cOAlition S initiative (a group of funding agencies that agreed on a set of principles for the transition to open access) provides the Journal Checker ToolFootnote 10. A user can enter journal ISSN, funder and institution to check whether a journal is open access according to Plan S, if the journal is a transformative journal or has a transformative agreement with the user’s institution, or whether there is a self-archiving option . An API allows to fetch this information automatically. Since B!SON does not retrieve data on funder or institution, we use the funder information of the European Commission to check if a journal is Plan-S compliant.
Other data sources: There are many other projects whose data might be used in B!SON’s future to supplement the currently used data sets. Crossref would allow us to extend the article data of the DOAJ which are occasionally incomplete. OpenAlex (by OurResearch) could add e.g. author information.
B!SON consists of a Django backendFootnote 11 and a Vue.jsFootnote 12 frontend.
Data Integration: PostgreSQL and Elasticsearch are used as databases. Data from DOAJ and OpenCitations’ COCI index are bulk downloaded and inserted into PostgreSQL and Elasticsearch. The information on Plan-S compliance stems from Journal Checker Tool and is fetched from their API using “European Commission Horizon Europe Framework Programme” as the funder.
All developed software will be published open source in the upcoming weeksFootnote 13.
Recommendation: The recommendation is based on similarity measures with regard to the entered text data (title, abstract and keywords) and reference list.
Text similarity: Elasticsearch has a built-in functionality for text similarity search based on the Okapi BM 25 algorithm . This is used to determine those articles already indexed in the DOAJ which are similar to the entered information. Stop word removal is performed as a pre-processing step. As the DOAJ contains articles in several languages, we combine the stop word lists from Apache Lucene for this purpose.
Bibliographic coupling: Additionally to textual data, the user can enter the list of cited articles, allowing to match journals based on bibliometric coupling . For this, we extract the DOIs from the input list using regular expressions, then rely on the OpenCitations’ COCI index to find existing articles citing the same sources. The current solution is in a prototypical state: The number of matching citations is divided by the highest number of matching citations of the compared articles; if the normalized value is higher than a threshold (which is currently manually defined), the article is considered similar. We are currently working on integrating more sophisticated normalisation methods (e.g. ) and exploring options on how to dynamically define the threshold value.
Combination of text-based and bibliographic similarity: Similar articles are matched with their journal and the total score is calculated as a sum. Refined aggregation methods are currently explored and will be available soon.
Recommendation Evaluation: The algorithm is evaluated on a separate test data set of 10,000 DOAJ articles. To ensure realistic input data, all articles in the test set have a minimal abstract length of 100 characters and a minimal title length of 30 characters. As the references are not part of the DOAJ data, the COCI index was used to fetch references via the article DOI. Only articles with at least one reference were included. We assume that the articles were published in a suitable journal to begin with, counting a positive result if the originating journal appears in the top-n results of the recommendation. While this may not be correct for each individual article, we rely on the assumption that the overall journal scope is defined by the articles published in a journal. This current recommendation algorithm reaches the top@N accuracy shown in Table 1 when tested on a test set of 10,000 DOAJ articles.
3.3 User Interface and Functionality
The current state of the B!SON prototype is available for testingFootnote 14. The user interface has been designed deliberately simple, a screenshot is shown in Fig. 1.
Data entry: The start page directly allows the user to enter title, abstract and references or fill them out automatically by fetching the information from Crossref with a DOI so that open access publication venues can be found based on previously published research.
Result page: To inspect the search results, the user has the choice of representing them as a simple list or a table which offers a structured account of additional details, enabling easy comparison of the journals. Author publishing costs (APCs) are displayed based on the information available in DOAJ, and automatically converted to Euro if necessary.
Currently, the displayed similarity score is calculated based on simple addition of the Elasticsearch similarity score and the bibliometric similarity score. By clicking on the score field, the user has the option to display a pop-over with explanatory information: B!SON will then display the list of articles which previously appeared in said journal which were determined to be similar by the recommendation engine. Clicking on a journal name leads the users to a separate detail page which offers even more information including keywords, publishing charges, license, Plan-S compliance, and more.
Data export and transparency: Search results can be exported as CSV for further use and analysis. A public API is available for programmatic access. It is also planned to provide the recommendation functionality in a form that is easily integrated and adapted to local library systems. For transparency on data sources, the date of the last update of the data is shown on the “About Page”.