Introduction

The Pandora’s Box, opened by big data and artificial intelligence (AI), has released incredible powers to reshape the thinking patterns and behaviors of the modern world. Embracing AI’s extraordinary capabilities in analyzing unstructured and scalable data streams, understanding the uncertainty of scientific semantics, and learning robust models with enhanced reliability and adaptability, the bibliometrics community has already demonstrated impressive passion on introducing this powerful tool to broad information studies and achieved enormous success in turning big data into big value and impact. Recent influential studies on the topic include bibliometrics-enhanced information retrieval (Mayr et al., 2014) and the extraction and evaluation of knowledge entities from scientific documents (Zhang et al., 2020a). Notably, intelligent bibliometrics (Zhang et al., 2020b), emphasizing the development and application of computational models that incorporate AI and data science techniques with bibliometric indicators, raises a particular interest in promoting a research agenda on “AI + Informetrics” (AII). These endeavors with broadened perspectives from machine intelligence have portended far-reaching implications for science (Fortunato et al., 2018). However, how to effectively cohere the power of AI and informetrics to create cross-disciplinary solutions in line with this big data boom is still elusive from neither theoretical nor practical perspectives.

The 1st Workshop on AI + Informetrics was held online co-located with the iConference2021 on March 17, 2021. The goal of this workshop seriesFootnote 1 is to create an interactive cohort for global researchers to exchange ideas, share pilot studies, and scope future directions on this cutting-edge venue. In this workshop, we highlighted efforts on constructing fundamental theories and concepts, developing novel methodologies, bridging conceptual knowledge with practical uses, and creating solutions for real-world needs.

This special issue collected articles presented at the 1st Workshop on AI + Informetrics and relevant external submissions. This collection contains 11 articles, contributed by 43 authors from 8 countries (e.g., China, Greece, Pakistan, South Korea, Australia, the US, Germany, and Finland). Specifically, we summarized the 11 articles and introduced this special issue through the following four categories: Prediction and scholarly recommendation, citation evaluation, tools for science, technology, and innovation (ST&I) studies, and interdisciplinary measures.

AI + informetrics for prediction and scholarly recommendation

The bibliometrics community has long been interested in broad prediction tasks in information studies, such as predicting research outcomes and impact, and scholarly recommendations for publication venues, peer reviewers, and collaborators.

Within an interesting angle in the science of science—gender bias, Kuppler (2022) analyzed the bibliographical data of 111,156 computer science researchers and facilitated random forests and gradient boosting machines to predict their h-index. The results suggested the under-representation of female researchers in the computer science discipline.

Aiming to develop a recommender system for patent applicants and examiners to efficiently locate and cite prior patents, Choi et al. (2022) enlarged the use of patent features, including textual information, metadata, and patent examiner citation information, and developed a deep learning-based system for embedding, searching, and ranking.

Ali et al. (2022) draw attention to the unignorable connections between authors and their research interests. Document embedding techniques and a memory network were integrated to represent paper contents, capture their relationships with author preferences, and conduct personalized recommendations.

Concentrating on collaborator recommendations, Xi et al. (2022) highlighted the similarity between the research interests of scholars and the topology of their collaboration networks, and developed a recommender system using word embedding and node embedding techniques for knowledge representation.

AI + informetrics for citation evaluation

Understanding the content of references and evaluating their value could be one fundamental task of broad citation analyses. Identifying core features from a reference’s limited information and constructing evaluation models to measure references are among the top challenges.

Zhang et al. (2022) emphasized the value of a reference’s native information (e.g., citation context and section name), and integrated these features into different neural text representation models. Based on a dataset with labeled citations, the improved classification performance endorsed the effectiveness of their efforts on feature selection and representation.

Driven by the task of identifying important citations, An et al. (2022) utilized a semi-supervised self-training technique, including a module of full text-based feature engineering with six key features (e.g., authors, semantics, and citations) and a self-training strategy using support vector machine and random forest techniques.

AI + informetrics for ST&I studies

Bibliometric data have been recognized as a core ST&I data source for profiling research R&D, analyzing emerging technologies, and understanding scientific activities and behaviors. The involvement of AI and data science techniques further strengthens such analytical capabilities.

Focusing on collaborations between universities and enterprises, Chen et al. (2022) constructed a dual-layer network with the two parties and their research topics, depicted network dynamics through network topology, individual characteristics, and knowledge proximity, and developed a stochastic actor-oriented model to understand the evolution of such collaborations.

With a distinctive angle of mapping the sustainable development goals (SDGs) in ST&I studies, Hajikhani and Suominen (2022) compiled a classification model using the SDG classification of scientific articles. They applied it to classify the SDG relevancy in patent families registered in the European Patent Office.

Motivated by developing a one-stop analytical tool for discovering and visualizing research fronts in scientific articles, Wang et al. (2021) integrated a set of models within a systematic toolkit called ITGInsight, covering functionalities of extracting domain terms, clustering topics, and visualizing topic evolution.

AI + informetrics for interdisciplinary measures

The bibliometrics community has been dedicated to measuring interdisciplinary interactions through diverse approaches (e.g., science maps) for decades. Still, new angles and solutions have been added with cutting-edge AI techniques.

To measure the strength of the interdisciplinary interactions between two disciplines, Huang et al. (2022) developed an analytical framework incorporating citation analysis and semantic analysis. Specifically, citation-based relationships were calculated through direction citations and bibliographical couplings, while topic models and word embedding techniques were applied to calculate semantic similarities.

To demystify the characteristics of knowledge integration in interdisciplinary research from a perspective of knowledge content, Wang et al. (2022) examined the best performance of the BERT model in an eHealth dataset when representing integrated knowledge phrases shared between citation sentences and cited references and classifying them into given knowledge categories.

Conclusions

In the AI + Informetrics workshop and this special issue, we noted the enthusiasm from the global community, demonstrating their strong willingness and interests in practicing this challenging but creative combination and recombination—developing and applying AI-empowered computational models for information studies. Interestingly, in this special issue, we are glad to showcase some successful pilot studies, e.g., various embedding techniques for knowledge representation, heterogeneous network analytics for discovering in-depth social connections among scientific researchers, and task-driven classifier selection and comparisons. We anticipate AI + Informetrics will be further strengthened with disruptive methodological innovation through AI’s vertical involvement, and create impactful and insightful empirical evidence for broad decision-making scenarios.