1 Introduction

The open access landscape keeps getting more diverse and complex. An increasing number of open access journals make it harder to choose a suitable journal for a particular research output: The Directory of Open Access Journals (DOAJ), a curated online directory of peer-reviewed open access journals, added over 5,000 journals in the last three years. All of these journals offer a variety of conditions, including different publication costs and waivers, peer-review models, and copyright and rights retention clauses. At the same time, a growing number of funding agencies require researchers to publish open access, in compliance with specific rules. Academic libraries increasingly offer financial support, also introducing new criteria. All of these developments add to the overall workload of researchers who now have to choose proper publication venues and to assess them in terms of compliance, publication costs, quality, and reputation.

In this paper, we present a detailed look into B!SON, a web-based recommendation system that aims to alleviate these issues. The system uses basic information from the manuscript to be published to recommend suitable open access journals, based on content similarity (using title and abstract) and the cited references. The initial design of the B!SON service is based on the findings of a survey [1] conducted at the start of the project. Therein, we systematically collected user requirements which were directly integrated into the system specification.

This article is an extended version of a short paper [2] presented at the TPDL 2022. Apart from more details on the user survey (Sect. 3.1), information on additional B!SON services (Sect. 3.5) and a discussion (Sect. 5), it includes new results from the ongoing development process: Sect. 3.3.2 describes embedding-based approaches we explored to improve the semantic component of the current recommendation system. They have been implemented and subjected to a comparative evaluation of their recommendation performance (Sect. 4). An already implemented and integrated extension to the graphical user interface, filter suggestions, is explained in Sect. 3.4.2.

The paper is structured as follows: We first provide a review of existing work on scientific recommendation (Sect. 2). Subsequently, we present the B!SON prototype and its development: Sect. 3.1 discusses the initial assessment of user requirements; Sect. 3.2 presents the integrated data sources; Sect. 3.3 explains the current recommendation algorithm as well as currently tested, more advanced versions. In Sects. 3.4 and Sect. 3.5, we provide details on the system’s user interface and the planned TYPO3 extension for local instances of the service, respectively. The results of the experimental evaluation of different recommendation methods are presented in Sect. 4. A discussion (Sect. 5) and conclusion (Sect. 6) follow.

2 Related work

Scientific recommendation tasks span the search for potential collaborators [3] and reviewers [4], of papers to read [5] and to cite [6]. With more and more ways of publishing academic articles, the recommendation of academic publication outlets (journals/conferences) is a task which is on the rise (e.g., [7, 8])

Prototypical approaches explore diverse data sources to provide recommendations. A major source of information is the article to be published: The manuscript’s title, abstract or keywords are used to compare against papers that previously appeared in an outlet [8, 9]. Other systems exploit the literature cited by the article, and try to determine the best publication venue using bibliometric methods [10, 11]. An alternative stream of research focuses more on the article authors, exploring their publication history [12] and co-publication networks [13, 14].

Regarding journal recommendation based on semantic similarity, TF-IDF is a popular building block [15,16,17], especially in combination with chi-square statistics to determine the dependence of terms compared to journals [18,19,20]. Other systems use a word embedding like word2vec or fasttext in combination with a convolutional neural network [21,22,23]. Algorithms of popular search engines like Okapi BM25 or MoreLikeThis are used in [24, 25]. Others use document-level embeddings [26, 27], approaches based on n-grams [28] or manually defined ratios [29]. Systems using algorithms that do not directly return journal recommendations, but a set of similar articles, rely on aggregation methods, e.g., using the k-Nearest-Neighbors algorithm in combination with summation or averaging to calculate a journal score [24, 25].

Semantic recommendation is usually based on title, abstract and keywords [15, 16, 20, 22, 23]. The Aims & Scope section of the journal can be considered as well [17, 30]. The references section might be used [26, 29] as well as the full text including images [31].

While there is a number of active journal recommender sites, they all come with limitations. Several publishers offer services limited to their own journals like Elsevier’s JournalfinderFootnote 1 or Springer’s Journal suggester.Footnote 2 Others, like Journal Guide,Footnote 3 are closed-source and do not provide transparent information on their recommendation approach. Several services collect user and usage data, e.g., Web of Science’s manuscript matcher.Footnote 4 These proprietary services work with semantic methods, recommending journals based on title, abstract and keywords. Notably, some open recommenders exist, e.g., Open Journal Matcher,Footnote 5 Pubmender,Footnote 6 JaneFootnote 7 and Jot.Footnote 8 Since the publication of the first version of this article, the author of the Open Journal Matcher announced that the service will be discontinued [32] and the Pubmender back end does not seem to work anymore. The recommendations of Jane and Jot are limited to medical journals. These open services integrate little information about the recommended journals and do not offer advanced filter options.

3 B!SON—the open-access journal recommender

B!SON is the abbreviation for Bibliometric and Semantic Open Access Recommender Network. It combines several available data sources to provide authors and publication support services at libraries with recommendations of suitable open access journals, based on the title, abstract and reference list of the paper to be published. The system will be maintained for at least five years after which its usage will be evaluated. Further extensions of the core functionality are planned.

3.1 Survey

To assess the needs of the future B!SON users, we conducted an online survey as a requirements analysis [33]. The question items were based upon features of existing journal recommender tools; the survey was aimed at scientists from all research disciplines.

After discarding entries with less than 90% questions answered, a total set of 884 questionnaires remained for analysis. The participating researchers had an experience of having published a median of seven papers.

The survey targeted two main categories of information: (a) finding out which filter functionalities are most important for efficiently selecting a journal; (b)assessing the key characteristics of journals to display in a journal profile. The results differ only slightly depending on the research discipline. Overall, the most important filter criteria were citation metrics, publication costs, language of the publication, appearance in scholarly databases and whether the author retains the copyright. For the journal characteristics to show, the most important information is whether the article receives a DOI, whether the journal is listed in common journal lists to protect against predatory publishers, whether the publication costs are covered, the journal’s scope and general publication costs.

Fig. 1
figure 1

Flow diagram showing how articles similar to the user input are found and then matched to their journal. The score for each journal is calculated as a final step

We considered these results in the design of B!SON where possible while also keeping the focus on few and trusted data sources. Citation metrics were deliberately not included in the design due to their controversial influence [34].

3.2 Data sources & integration

The B!SON service is built on top of several open data sources with strong reputation in the open access community:

DOAJ: The Directory of Open Access Journals (DOAJ)Footnote 9 indexes information on open access journals which fulfil a set of quality criteria (full text available, dedicated article URLs, at least one ISSN etc.) as well as publishing ethics guidelines. The dataset includes basic information on the journal itself, but also metadata of the published articles (title, abstract, year, DOI, ISSN of journal, etc.). The DOAJ currently contains 18,461 journals and 8,154,699 articles. The data are available for download in JSON format under CC0 for articles and CC BY-SA for journal data [35].

OpenCitations: The OpenCitations initiativeFootnote 10 collects (amongst others) the CC0-licensed COCI dataset for citation data. It is based on Crossref data and contains 76,072,926 publications, and 1,392,036,835 citations [36]. The information is available in the form of DOI-to-DOI relations, covering 44% of citations in Scopus and 51% of the citations in Dimensions [37]. COCI lacks citations in comparison with commercial products but can be used to check which articles published in DOAJ journals cite the references given by the user (details in Sect. 3.3.1). The coverage of open access publications, especially DOAJ journals, in COCI is better compared to closed access publications, so we can assume that it is sufficient for our needs [38].

Journal Checker Tool: The cOAlition S initiative (a group of funding agencies that agreed on a set of principles called Plan S for the transition to open access) provides the Journal Checker Tool.Footnote 11 A user can enter journal ISSN, funder and institution to check whether (a) a journal is fully open access according to Plan S requirements, (b) the journal is a transformative journal, (c) it has a transformative agreement with the user’s institution, or (d) the journal offers a self-archiving option [39]. An API allows fetching this information automatically. Since B!SON does not retrieve data on the funder or institution, and the DOAJ dataset only contains open access journals (and no transformative journals), we use the funder information of the European Commission as a placeholder to check if a journal is Plan-S compliant.

Additional data: There are other data sources which might be used in future B!SON versions to extend the current setup. Crossref metadata would allow us to extend the article data of the DOAJ which are occasionally incomplete. OpenAlexFootnote 12 could add, e.g., author information.

Data integration: Data from DOAJ and OpenCitations’ COCI index are bulk downloaded and inserted into two databases: PostgreSQL and Elasticsearch. The information on Plan-S compliance stems from the Journal Checker Tool and is fetched from the API using the “European Commission Horizon Europe Framework Programme” as placeholder for the funder. The DOAJ articles are matched to their journal via ISSN, and matching to the citations happens via DOI. Data on Plan S compliance are connected via ISSN as well.

All software is published and developed as open source under the AGPL licence on GitLab.Footnote 13 The used data sources are automatically updated in regular intervals to keep the service up-to-date without human intervention. For transparency, the time of the last update is shown on B!SON’s “About” page.

3.3 Recommendation system

B!SON consists of a Django back endFootnote 14 and a Vue.jsFootnote 15 front end. The original publication presented an implementation using a basic recommendation algorithm (here described in Sect. 3.3.1). Since then, we experimented with a number of possible enhancements using embedding-based approaches which are presented in Sect. 3.3.2. The comparative evaluation follows in Sect. 4.

3.3.1 Baseline system

The current recommendation system is based on combined similarity measures with regard to the entered text data (title and abstract) and reference list. Figure 1 shows an overview of the recommendation process, the individual steps are described in the following passages.

Text similarity: Elasticsearch has a built-in functionality for text similarity search based on the Okapi BM 25 algorithm [40]. This functionality is used to determine those articles already indexed in the DOAJ which are similar to the entered information. Stop word removal is performed as a pre-processing step. The DOAJ contains articles in several languages, so we combine the available Apache Lucene stop word listsFootnote 16 for this purpose. The similarity search happens separately for title and abstract and only the top 100 hits are considered.

Bibliographic coupling: According to Kessler [41], two articles are bibliographically coupled if at least one reference is cited in both articles. This linking citation can be interpreted as an overlap in content or method. The more references are cited together, the higher the closeness of the two articles [42].

Fig. 2
figure 2

Visualization of the relation between bibliographic coupling, co-citation, and direct citations (adapted from [43])

A co-citation exists if two articles are cited together in a third article. Co-citation, too, can be assumed to indicate article similarity. Figure 2 shows the temporal dependence of both methods. While co-citation calculates the similarity of already cited and, thus, older papers, bibliographic coupling can be used to calculate the proximity of recent articles.

This approach of a similarity calculation of recent articles is inspired by the process of the publication support services of one of the participating libraries. It is currently used in B!SON in the following way:

The user enters an unstructured list of references which are cited in the article to be matched to a journal. From this list, B!SON extracts the DOIs using regular expressions. Then, it relies on OpenCitations’COCI index to find existing articles citing the same sources. The current normalization of the degree of bibliographic coupling is in a prototypical state: The number of matching citations is divided by the highest number of matching citations of the compared articles. If this normalized value is higher than a threshold (which is currently manually defined), the article is considered similar. The system then determines the journals in which the similar articles have been published, taking only those into account which are indexed in the DOAJ. The more articles in a journal were considered relevant based on bibliographic coupling, the higher the respective journal will rank in the generated result list.

Combination of text-based and bibliographic similarity: Similar articles are matched with their journal. The maximum of three different scores per journal are combined using a neural network which was trained to classify a journal as correct or incorrect based on these scores, thereby weighing them in their meaningfulness. The resulting probability is the output score for each journal which will then be displayed as part of the result page. To increase transparency of the scoring, we are investigating alternatives to this combination process.

3.3.2 Embedding-based approaches

To further improve the recommendations, we tested several other approaches. Existing recommendation systems often use language models to build text representations [21]. The journals referenced in the DOAJ include articles in multiple languages, raising the requirement for a multilingual language model. This eliminates options such as commonly used transformer-based models for scientific language (e.g., SciBERT [44]). We tested three different multilingual language models (see Sect. 4.2.2).

One approach is to find similar articles with vector embeddings instead of word comparisons (like Elasticsearch). Each article’s title and abstract are combined and fed into the pre-trained, transformer-based language model. The resulting embeddings can then be compared against the embedding of the user input. Previous work also suggested a more fine-grained approach, weighing journal articles lower for the journal embedding if they appeared earlier in the past [45]. The distribution of our dataset features many journals with few articles (see Sect. 5), and does not allow for this kind of granularity.

The embeddings can be compared on the article level and on the journal level. We experiment with the following configurations.

Journal embeddings: For the journal level, the article embeddings of each journal are combined by calculating the average. The embeddings that are the closest to the embedding of the input text are the best matching journals. The Open Journal Matcher uses a similar approach (using spacyFootnote 17 as a word-level embedding model instead of a document embedding).

Article embeddings (individual): For the article level, the embedding of the user input is directly compared to all article embeddings. The rank of a journal is derived from its article with the closest embedding.

Article embeddings (combined): Similarly, the embedding of the user input is directly compared to all article embeddings. Only the articles within the top-n hits are used for the computation of the combined journal score, which is calculated by summing up the scores.

Article embedding & classifier: Building upon existing pre-trained, transformer-based language models, a classifier can be trained to predict a journal. The weights of the pre-trained model are frozen and a dense layer is added to predict the one-hot encoded journals based on the classification token of the pre-trained model.

Article embedding, BiLSTM & classifier: The previous approach can be further refined by adding a BiLSTM layer in-between that receives the token embeddings of the pre-trained language model as an input.

We created the embeddings using the Sentence-Transformers libraryFootnote 18 which offers the functionality to include and test different language models. The gensim libraryFootnote 19 provides the functionality to determine the closest neighbors using the dot product on normalized vectors. We used the Huggingface libraryFootnote 20 in combination with PyTorchFootnote 21 to try different pre-trained, transformer-based and multilingual models. The weights of the language model were not further finetuned. The Huggingface tokenizer of the corresponding model was used with a token length of 256. For the BiLSTM layer, 256 hidden features were used. The Universal-Sentence-Encoder model was obtained from TensorFlow HubFootnote 22 and the classification layer was trained with TensorFlow. The results are shown in Table 3.

3.4 Interfaces & functionality

The current state of the B!SON system is available online.Footnote 23 The main entry point for an end user is the graphical user interface provided on our website; it is described in Sect. 3.4.1. It was recently extended by support functionalities for data entry (Sect. 3.4.2). Beyond that, we provide additional access points for programmatic access and integration into third-party services (Sect. 3.5).

3.4.1 Graphical user interface

The user interface has been designed deliberately simple; screenshots are shown in Figs. 3 and 4.

Fig. 3
figure 3

Screenshot of B!SON prototype with an example query

Data entry: The start page allows the user either to enter title, abstract and references directly or to let them be filled out automatically by fetching the information from Crossref, DataCite or arXiv with a DOI or arXiv ID. This allows open access publication venues to be found based on previously published research.

Fig. 4
figure 4

Screenshot of B!SON prototype showing the table view with results

Results page: To inspect the search results, the user has the choice of representing them either as a simple list or a table which offers a structured account of additional details, enabling easy comparison of the journals. Article processing charges (APCs) are displayed based on the information available in DOAJ, and automatically converted to Euro if necessary.

After clicking on the score field, a pop-over with explanatory information displays a list of articles which previously appeared in said journal which were determined to be similar by the recommendation engine. Clicking on a journal title leads the users to a separate detail page which offers even more information including keywords, APCs, licence, Plan-S compliance, and more.

3.4.2 Data entry support

The B!SON website offers the possibility to refine the results with a number of filter options. While several of them are purely preferences by the user (such as the average publication time), the language and subject can be deduced from the user input and presented as a suggestion.

Language: We are using the library lingua-pyFootnote 24 to choose the most likely language for the given input text.

Subject: A neural network is used to identify the subject. We adopted the Library of Congress Classification (LCC) which is used by the DOAJ and constructed a training set of 10,000 DOAJ articles for each top level subject (based on the first letter of the subject code). The pretrained multilingual language model XLM-RoBERTa-large [46] with a BiLSTM and classification layer as a head was trained with an accuracy of 73.56% on a test set of 1000 articles per subject.

The model XLM-RoBERTa [46] is a transformer based multilingual language model trained with the Masked-Language-Modeling objective. The training set consists of a subset of Common Crawl with over 100 languages.

Semantically close categories within the LCC such as “World History and History of Europe, Asia, Africa, Australia, New Zealand, etc.” and “Auxiliary Sciences of History” can pose a challenge to our trained model. The subject suggestion is only presented to the user if a probability greater than 50% for the prediction is reached.

3.5 Further interfaces

To enhance integration and computational interoperability, other access options apart from the standard user interface are provided.

Data export: Search results can be exported as CSV for further sharing and analysis.

API: A public API is available for programmatic access. As it is also used for the front-end, all shown information on the website can be accessed via the API.

Local instances: We plan to provide the recommendation functionality in a form that can be easily integrated and adapted to third-party websites, e.g., by libraries. We are currently starting with the development of an extension for the TYPO3 Content Management System (CMS)Footnote 25 which is widely used in the German library landscape. Both the TIB and the SLUB library are using TYPO3 and will act as early adopters of the prototype. The extension will allow libraries to further filter the results or include additional information such as waiver agreements with publishers. The code of the extension will be provided as open source for re-use and support for other content management systems is planned.

4 Experimental evaluation

The recommendation algorithms were tested in two different experimental setups (Sect. 4.1); Sect. 4.2 discusses the achieved results.

4.1 Experimental setups

The algorithms were tested with two different setups. The first one uses a randomly sampled dataset from the DOAJ data and consequently simulates an article distribution corresponding to the article distribution in the full dataset. This entails a higher share of data coming from certain academic domains and/or journals. This data setting thus represents the realistic environment in which our prototype needs to perform.

For the second setup, we sampled a fixed number of articles from all eligible journals in the DOAJ. As it is not skewed to certain domains and journals, it allows for a fairer comparison of the enhanced methods on journal and article level.

4.1.1 Random article sampling

The algorithm is evaluated on a separate test dataset of 10,000 random DOAJ articles. To ensure realistic input data, all articles in the test set have a minimal abstract length of 150 characters and a minimal title length of 30 characters. As the references are not part of DOAJ’s article metadata, the COCI index was used to complete the references via the article DOI. Only articles with at least five references were included. We assume that the articles were published in a suitable journal to begin with, counting a positive result if the originating journal appears in the top-n results of the recommendation. While this may not be correct for each individual article, we rely on the assumption that the overall journal scope is defined by the articles published in that journal.

Random sampling introduces a bias. Subjects such as medicine have a higher share than other subjects. This limitation of the dataset is further discussed in Sect. 5.

4.1.2 Equal article distribution

The distribution of articles in the DOAJ is skewed toward certain domains and journals. To provide a fairer test case, we sample a subset of DOAJ articles: For each journal, 20 random articles are included in the training set; 10 articles per journal go to the test set. We then proceed to a detailed analysis concerning the influence of minimal text length of the input data and the number of references found in COCI on the overall recommendation accuracy. We define three levels of requirements as to these parameters, as shown in Table 1. The “high”requirement values correspond to the median in the dataset.

The articles’ title and abstract were pre-processed to remove HTML entities, URLs and non UTF-8 characters. As the bibliometric recommendation works with a different approach than the semantic ones, the data model has to be slightly changed to create a suitable comparison: after finding citations via COCI only those publications are matched to their DOAJ journal whose DOI is within the training data. This simulates the same case of 20 articles per journal that the semantic comparison assumes.

Table 1 Requirement levels for title length, abstract length and number of references. The last column lists the number of journals which have the required amount of complying articles
Table 2 Top@N accuracy for the current recommendation tested on 10,000 random articles

4.2 Experimental results

This section discusses the results of the experimental evaluation in the two setting discussed in Sect. 4.1.

4.2.1 Random article sampling

Here, we discuss the results of the evaluation setting described in Sect. 4.1.1. Table 2 shows the results for the top@N accuracy for using (a)the bibliometric approach alone (“Bibliometric approach”); (b)only the Elasticsearch provided similarity score for the title (“Elasticsearch on title”) (c)only the Elasticsearch provided similarity score for the abstract (“Elasticsearch on abstract”) (d)using the recommendation combining all of the above (“combined approach”) which is the solution used in the current version of the B!SON system. Unsurprisingly, the abstract-based recommendation delivers better results than the title-based approach, as the former provides more information. The combination of the recommendation methods shows the expected improvement over the individual methods.

4.2.2 Equal article distribution

Here, we discuss the results related to the experimental setup discussed in Sect. 4.1.2. Again included are the algorithms constituting the current solution, named as in Sect. 4.2.1 and Table 2. Additionally, we report the results for the embedding-based approaches discussed in Sect. 3.3.2.

Impact of training data: The effect of the minimal training requirements is shown in Fig. 5. As expected, the accuracy during testing improves if longer texts are used for training. This was the case for all methods except for XLM-RoBERTa with a classifier or BiLSTM and classifier layer (shown for the latter case in the same plot). The reason is unclear and requires further investigation. Possible reasons include that model learns undesired features such as assigning short texts generally to a subset of journals.

Fig. 5
figure 5

The top@10 accuracy for the “USE article embedding with classifier” and “XLM-RoBERTa + BiLSTM + classifier” method with respect to different requirements on the training set

Impact of test data: We evaluated the models’ top@1, top@5, top@10 and top@15 accuracy, varying the length of the input data used for testing. (The model was trained with input data that satisfy the “middle”requirements criterion described above.) the models exhibit similar behavior, we thus show only one example of the resulting graph in Fig. 6, using the USE-based system configuration. In general, there is a bigger leap from the top@1 accuracy to top@5 accuracy, while the gap between top@5 and top@10 as well as top@10 and top@15 is decreasing in size.

Fig. 6
figure 6

The different accuracies for the “USE article embedding with classifier” method for the “middle” requirements for the training set

The accuracy for the different methods is presented in Fig. 7. To reduce the visual complexity, only the top@10 accuracy is shown. Both the Universal-Sentence-Encoder journal vectors and the trained classifier are performing well. In contrast to XLM-RoBERTA, the USE model only supports 16 languages and does not cover all languages from DOAJ corpus. The accuracy of the Elasticsearch recommendation on the abstract increases almost linearly with longer input length. For some methods (e.g., journal embedding with Universal-Sentence-Encoder) the line levels out at some point. This is caused by the models’ limitations w.r.t. maximum input lengths.

Fig. 7
figure 7

The top@10 accuracy for the different methods presented with the “middle” requirements for the training set

The results are shown in detail in Table 3 for the “middle” requirements level applied to both training and test set. As highlighted in the table, the best performing algorithms for this configuration are the bibliometric search and the Universal-Sentence-Encoder journal embeddings. The version with XLM-RoBERTa in combination with a BiLSTM is on fourth place regarding top@10 and top@15 accuracy. It is the method that we identified as well performing in previous tests and that is used for subject prediction (described in Sect. 3.4.2).

We further report training and testing times in Table 4. For the Elasticsearch approach, the indexing time (for both title and abstract at once) was measured as the training time. The experiments were conducted on a server with an “AMD EPYC 7542” processor, NVMe connected hard drives and “AMD A40” graphic cards. Elasticsearch and the bibliometric search take the longest time as they require many database look-ups.

Table 3 Top@N accuracy for different methods with the “middle” requirements according to Table 1 applied to both training and test set
Table 4 Training time and testing time (in seconds) for the different methods for the experiment runs described in Table 3. There is not a specific training time for the bibliometric search included as it cannot be easily defined

5 Discussion

While the current B!SON implementation achieves a decent accuracy and beta users reported predominantly positive feedback, the approach comes with a set of limitations. Using the data from the DOAJ has the advantage of reliable data access and basic quality control of the metadata as well as ensuring a minimal standard of journal’s publishing ethics policies. However, information may be incomplete or not up-to-date. For instance, information such as Article Processing Charges (APCs) are provided by publishers once, and might not be adjusted over time. Furthermore, the calculation of the exact APCs is sometimes rather complex, with charges influenced by the number of pages or figures in a manuscript – a fact which cannot be considered in the B!SON interface. Displayed APC information are thus only indicative.

Moreover, the distribution of articles is highly skewed. There are few journals with an immense number of articles (4 with more than 50,000 articles), sometimes referred to as mega journals, while half of the journals with at least one article have less than 192 articles in total. The semantic similarity metrics for recommendation would thus favour those mega journals, as they are more likely to contain similar articles due to their sheer size. The current algorithm takes this into account by limiting the number of top matches for the semantically close articles. This prevents the accumulation of a high number of articles with a very low score. Apart from the number of articles per journal, the number of articles per subject and the number of articles per language is also unevenly distributed. English dominates as a language with 77% of articles and medicine is the most common subject with 29.63% in contrast to Military Science with only 0.10%. Further research is needed to determine the exact impact of these imbalances on recommendation performance.

The general approach of suggesting journals based on semantic or bibliometric similarity does not cover all journal scopes. A journal focusing on a specific methodology but with a broader scope of topics will rank lower. Similarly, special issues of a journal can shift the topic for the recommendation algorithm as they contain a high concentration of articles on a certain (possibly niche) topic.

6 Conclusion

In this paper, we have presented a comprehensive experimental comparison of various retrieval methods for our open access journal recommendation system B!SON. The system combines semantic and bibliometric information to calculate a similarity score to the journals’ existing contents, and provides the user with a ranked list of candidate venues. The B!SON service is available online for use and testing.

The here-presented version represents a more advanced version of the prototype with respect to the contents of the original demo paper. Based on feedback from our user community, we improved the user interface and scoring functions. Other requests concern further improvements of user interface and usability. Features such as filter suggestions or automatic fetching of paper information from the pre-print server arXiv have been implemented and help with the tool’s usage. The community’s wishlist further includes more sophisticated methods for the exploration of the results, e.g. graph-based visualizations, an extension of the filtering options and an improved representation of the similarity score. These action points have not been tackled yet, but we will explore possibilities in their respect in the future.

This paper reports on our experiments with embedding-based algorithms to improve the journal recommender. The structured comparative evaluation of the current recommendation methods shows promising results for the classifier built upon the pre-trained Universal-Sentence-Encoder model. We are currently working on integrating this technical solution into the productive recommendation system.

After our exploration into improving the semantic components of the recommendation, we now look into enhancing the bibliometric recommendation. Similar to the semantic methods, the bibliometric similarity computation shows a tendency to favour larger journals. The more reference articles a journal contains, the more potential to find matching citations to the query set of references. The current implementation relies on a manually defined relevancy threshold which, in the future, should be computed automatically. Furthermore, we experiment with additional normalization methods, starting from established ones such as the Jaccard index [47] to balance between journals with a high or low publication output. The bibliometric component is computationally expensive. Adopting enhanced normalization in the productive system thus comes with challenges with respect to efficient implementation and computational resources.

Citation graph embeddings are another area which we are currently evaluating as they are commonly used for citation recommendation [48, 49]. The size of the COCI dataset makes this computationally expensive, however.

Beyond the scope of the B!SON project, it could be interesting to extend recommendations to other venues with open access options (e.g., conferences). Moreover, the integration of person-centred information, such as prior publication history and frequent co-authors, seems promising.