Do academic inventors have diverse interests?

Academic inventors bridge science and technology, and have attracted increasing attention. However, little is known about whether they have more diverse research interests than researchers with a single role, and whether their important position for science–technology interactions correlates with their diverse interests. For this purpose, we describe a rule-based approach for matching and identifying academic inventors, and an author interest discovery model with credit allocation schemes is utilized to measure the diversity of each researcher’s interests. Finally, extensive empirical results on the DrugBank dataset provide several valuable insights. Contrary to our intuitive expectation, the research interests of academic inventors are the least diverse, while those of authors are the most. In addition, the important position of the researchers has a certain relation with the diversity of research interests. More specifically, the degree of centrality has a significant positive correlation with the diversity of interests, and the constraint presents a significant negative correlation. A significant weaker negative correlation can also be observed between the diversity of research interests of academic inventors and their closeness centrality. The normalized betweenness centrality seems be independent from interest diversity. These conclusions help understand the mechanisms of the important position of academic inventors for science–technology interactions, from the perspective of research interests.

Historically, Narin and Noma (1985) pioneered the linkages between scientific publications and patents by analyzing nonpatent references (NPRs) on the front pages of patent documents. Meyer (2000) also did a lot of work on the basis of NPRs, most of which focused on the field of nanotechnology. Apart from citations of patents to scholarly articles, Glänzel and Meyer (2003) explored the citations of patents in scientific publications, and Huang et al. (2015) exploited two-way citations between papers and patents. However, only about 30-40% of patent documents contain NPRs , and chemistry-related research dominates the citations from academic articles to patents (Glänzel & Meyer, 2003).
As for lexical-and topic-based linkages, a popular pipeline research framework (Ba & Liang, 2021;Shibata et al., 2010;Xu et al., 2012Xu et al., , 2019Xu et al., , 2020 is to extract respective thematic structures from scholarly articles and patents, to calculate the similarities between them, and then to construct topic linkages. However, the performance of such a framework is inadequate (Shibata et al., 2010;Xu et al., 2012), since noncomparable themes with different distributions are generated from scientific publications and patents (Xu et al., 2019). This makes it difficult to link the uncovered themes only according to calculated similarities. Although a joint research framework has been developed by Xu et al. (2021c) on the basis of topic models for multiple collections of documents, the lexical-and topic-based linkages often require advanced text mining and machine learning techniques.
Academic inventors are known to author scientific publications and patent inventions simultaneously. In other words, these researchers have two roles: authorship and inventorship. The relationship between their publishing and patenting activities has been investigated in the literature, and both activities are found to be rather complementary than substitutional (Azoulay et al., 2009;Stephan et al., 2007;Thursby et al., 2007). Recent studies have even observed a U-inverted shape pattern (Crespi et al., 2011;Kang et al., 2020), so that beyond a certain level of commercial engagement, patenting starts being a substitute for publishing. Compared with their noninventing/nonpublishing peers, academic inventors tend to outperform in terms of publication/patent counts, citation frequency, and h-index (Guan & Wang, 2010;Meyer, 2006;Van Looy et al., 2006).
With the development of social network theories and methods, several studies have mapped researchers to the interconnection of nodes in the network by their coauthoring and coinventing behaviors. The node position importance (Balconi et al., 2004;Zamzami et al., 2015;Zhang et al., 2019) and key role as gatekeepers (Breschi & Catalini, 2010;Li et al., 2020;Lissoni, 2010) of academic inventors in scientific and technological (S and T) networks have also been investigated. However, little is still unknown on the characteristics of academic inventors, especially their research interests, and the relation between their interests and their position in S&T networks. For this purpose, we have identified the following open questions: • Do academic inventors have more diverse research interests than those with a single role? • Does the position of academic inventors in S&T networks correlate with their diverse research interests?
This article is arranged as follows: after the literature review is briefly introduced, our research framework and methodology are put forward. Then, several core modules, such as the identification of academic inventors, interest discovery models, and diversity indicators, are described in more detail. Finally, extensive experiments are conducted on the DrugBank dataset to obtain several valuable insights about diversity of research interests and the relation between interest diversity and position characteristics.

Related work
Before delving into more specifies, the literature pertinent to academic inventors, interest discovery models, and concepts and measurements of diversity is discussed.

Academic inventors
The linkages between science and technology are attracting increasing attention. To exploit these interactions, one research stream of the science-technology linkages mainly focuses on researchers active in both academia and industry, and a number of different terminologies have been used, such as "inventor-author" (Boyack & Klavans, 2008;Noyons et al., 1994;Zhang et al., 2019), "author-inventor" (Wang & Guan, 2011), "patenting-publishing scientist" (Breschi & Catalini, 2010), and "academic inventor" (Balconi et al., 2004;Forti et al., 2013;Lissoni, 2010). Here, academic inventor is used to collectively refer to this type of researchers.
For identifying this kind of researchers, the following strategies have been utilized by previous studies: (1) Czarnitzki et al. (2016) observed that the title "Prof. Dr." was usually taken as a name affix in German, so they searched this title in the inventor field; (2) when a list of staff in universities and research institutes is available, each individual in this list can be linked with the resulting inventors in patent documents (Azoulay et al., 2009;Carayol & Carpentier, 2021;Ejermo & Toivanen, 2018;Hvide & Jones, 2018); (3) the authors of scientific publications can be directly linked with the inventors in patent documents (Boyack & Klavans, 2008;Breschi & Catalini, 2010;Forti et al., 2013;Lissoni, 2010;Maraut & Martínez, 2014;Noyons et al., 1994;Wang & Guan, 2011;Zhang et al., 2019). The first two strategies mainly focus on the employees with patenting activity in universities and institutes. The latter reduces this limitation, which enables it to encompass professors, researchers, and engineers with both publishing and patenting activities. Strictly speaking, this study follows the definition in the latter, but our academic inventors (see Dataset) all happen to affiliate with at least one academic institution.

3
Whichever strategy is adopted for academic inventors, the names of authors and inventors first need to be disambiguated. Many different solutions have been put forth in the literature for author name ambiguity (Caron & van Eck, 2014;Han et al., 2017;Kim, 2018;Torvik & Smalheiser, 2009;Xu et al., 2021b). Most of them follow a two-step process: feature extraction and clustering/classifying. Similarly, inventor name disambiguation is an important issue for patent data. Correspondingly, many approaches have been proposed to disambiguate inventor names (Li et al., 2014;Pezzoni et al., 2014;Raffo & Lhuillery, 2009;Yang et al., 2017). Different from disambiguating author names, most of these are three-step processes: parsing, matching, and filtering, such as the Massacrator© algorithm (Lissoni et al., 2006;Pezzoni et al., 2014).
After disambiguating the authors and inventors, an automatic method can be used to match and identify the academic inventors by text content similarity (Cassiman et al., 2007) or string matching (Boyack & Klavans, 2008). Another common way is to compare the list of disambiguated authors and inventors in a semi-automatic way (Breschi & Catalini, 2010;Li et al., 2020;Lissoni, 2010;Wang & Guan, 2011). More specifically, after given and family names of each author and inventor are normalized, desktop research is conducted, including manually checking and leveraging extra information from other databases, the internet, or questionnaires. Although this kind of conservative approach is time consuming, the reliability of linkage results is convincing.
Once the identification of the academic inventors has been made, many studies have exploited whether there is a balance between patenting and publishing activities. Several empirical investigations found increased patenting activities may undermine the performance of basic research (Agrawal & Henderson, 2002;Blumenthal et al., 1997;Fabrizio & Di Minin, 2008). Another stream of empirical investigations showed that the patenting activities were positively related to the number and quality of publications (Azoulay et al., 2007;Grimm & Jaenicke, 2015;Van Looy et al., 2006). Recent studies suggested that there was a curvilinear (inverted-U) correlation between patenting and publishing activities (Crespi et al., 2011;Lee, 2019;Kang et al., 2020). In more detail, an increase in patenting activity initially promotes the number and quality of publications up to a peak, and after this peak, it lowers the number and quality of publications.
Some studies have also focused on the special role of academic inventors, who bridge science and technology, and the network structure and position characteristics have been widely measured. In the coauthorship and coinventorship network, academic inventors have more central and better connected positions (Balconi et al., 2004;Forti et al., 2013;Zamzami & Schiffauerova, 2015). It is highly likely that these significant network characteristics should be attributed to their role as the gatekeepers between science and technology (Breschi & Catalini, 2010;Lissoni, 2010). Zhang et al. (2019) and Li et al. (2020) found that academic inventors promote the knowledge transfer between science and technology. In addition, academic inventors also play an important role in entrepreneurial firm development (Murray, 2004), breakthrough scientific research (Winnink & Tijssen, 2014), and technological innovation processes (Quatraro & Scandura, 2019).

Interest discovery models
Every researcher has their own research interests, which can be readily obtained from the curriculum vitae (CV) of the focal researcher. However, since these may not be regularly updated and many CVs are not available from the internet, several data-driven topic models for discovering interests from their research outputs are proposed in the literature.
One popular model is the Author-Topic (AT) model (Rosen-Zvi et al., 2010), which integrates author information into the standard Latent Dirichlet Allocation (LDA) model (Blei et al., 2003). Several variants have since been proposed, such as the Author-Persona-Topic (APT) model (Mimno & McCallum, 2007), the Author-Interest-Topic (AIT) model (Kawamae, 2010), and the Author-Topic over Time (AToT) model (Shi et al., 2013;Xu et al., 2014a, b), and so on. In these models, each research output is modeled as if it is generated by a two-stage stochastic process. A researcher's interests are represented by a multinomial distribution over topics, and each topic is represented as a multinomial distribution over words. The probability distribution over topics in a multi-author paper or multi-inventor patent is a mixture of the distributions associated with their authors or inventors.
All these models are actually members of generative probabilistic topic models for uncovering main themes from a collection of documents (Blei, 2012). Hence, each model can be viewed as a generative process. For example, in the AT model (Rosen-Zvi et al., 2010), to generate each word in a document, a researcher index is uniformly drawn from its author/inventor list. Then, a topic index is drawn from their multinomial distribution over topics (viz. research interests). Finally, a word token is drawn from the multinomial distribution of that topic.
From the generative process above, it is not difficult to see that these models share the following same assumption: the author/inventor list of a document is uniformly distributed. Currently, the knowledge for addressing these issues is more diverse and specialized (Leahey, 2016), and increasing cooperation in science and technology is a general trend (Adams et al., 2005;Wuchty et al., 2007). It is obviously inappropriate to implicitly assume that each coauthor/coinventor contributes equally to a target document. Therefore, the AT credit model (Xu et al., 2021a(Xu et al., , 2022, which powers the AT model's abilities with the credit allocation schema, is adopted in this work.

Diversity: concept and measurement
In real-world scenarios, many instances of diversity can be observed, such as diverse ecological species, diverse crystal structures, and diverse disciplines in the science and technology field. Stirling (2007) argued that diversity is a characteristic of any system whose elements could be apportioned into categories. Further, three basic properties, "variety," "balance," and "disparity" were proposed (2007), each of which is a necessary and insufficient property for diversity.
• Variety is the number of categories to which the elements in a focal system are assigned. It can be quantified as an integer (enumerating categories). When all else is equal, the greater the variety, the greater the diversity. • Balance is a function of the pattern of assignment of elements across categories. It can be quantified as a vector of fractions summing to unity (apportioning elements). When all else is equal, the more even the balance, the greater the diversity. • Disparity is the degree to which categories in a focal system are different from each other. It can be quantified as a matrix of distances (differentiating elements). When all else is equal, the more disparate the disparity, the greater the diversity.
To measure the diversity of an interested system, Rao (1982) and Stirling (2007) presented a general quantitative nonparametric heuristic indicator, Rao-Stirling. To the best 1 3 of our knowledge, this was the first systematic and transparent approach in the treatment of scientific and technological diversity in a broad range of fields. Since then, many alternatives have been proposed in the literature. Generally speaking, these indicators can be divided into three groups according to the basic properties above: (a) measures sensitive to balance, (b) measures sensitive to balance and disparity, and (c) measures sensitive to variety, balance, and disparity.
The measures sensitive to balance mainly focus on the distribution of different categories of elements in the system, such as Shannon entropy (Shannon, 1950) and Simpson diversity (Simpson, 1949). This type of indicator makes the following implicit assumption: the categories of system elements are completely different from each other. Obviously, this is not in line with many real-world scenarios. The measures sensitive to balance and disparity consider the balance and disparity of system elements at the same time. Two instances of this type of measures are the Rao-Stirling (Rao, 1982;Stirling, 2007) and 2D S (Zhang et al., 2016), which are closely related. The measures sensitive to variety, balance, and disparity, as their names imply, simultaneously operationalize three basic properties. The DIV (Leydesdorff et al., 2019) is one such indicator, and the superiority of the DIV indicator has been validated by Bu et al. (2020). Hence, the Rao-Stirling and DIV are both utilized here to calculate the diversity of interests of academic inventors and their peers.

Research framework and methodology
To answer the research questions in the Introduction, our research framework consists of three phases, as shown in Fig. 1. After disambiguating the names of authors and inventors, and linking and identifying the academic inventors in the first phase, the second phase measures the node characteristics of authors, inventors, and academic inventors with the help of a social network analysis. In this phase, the research interests of each researcher are also discovered by the AT credit model (Xu et al., 2021a(Xu et al., , 2022, and then the diversity of each researcher's interests is measured by the Rao-Stirling and DIV indicators. Finally, we analyze the correlation between the node characteristics and the interest diversity in the last phase. In the following subsections, several core modules will be described in more detail.

Identifying academic inventors
To identify academic inventors, the names of authors and inventors must first be disambiguated. To the best of our knowledge, the authors in several bibliographic databases (such as Web of Science and Scopus) are neigher fully unambiguously identified, nor are the inventors in the intellectual property databases [such as United States Patent and Trademark Office (USPTO) and European Patent Office (EPO)]. Hence, a revised rule-based scoring and clustering method (Xu et al., 2021b) is utilized here for disambiguating the authors. As for the inventors, we adopted a semi-automatic method. More specifically, after the first and last names of each inventor were split and checked, the inventors were disambiguated by several manually curated rules on the basis of the applicants, co-inventors, address, theme of the resulting patent, and so on.
To identify academic inventors, we first excluded the nonindividual entities in the inventor field, such as research team (laboratory/group/international organizations/institute), company (Co./Corp./LLC/PLC/AG/GmbH), university (Univ.), hospital, etc. Then, we matched the last name and initials of each pair of author and inventor. This step can group the paired researchers as follows: (a) inconsistent pairs are filtered out, such as pairs 1 and 2 in Table 1b, while the consistent pairs are linked directly to an academic inventor, such as pair 3 in Table 1c. Ambiguous pairs, such as pairs 4, 5, and 6 in Table 1, are manually checked for whether the following factors overlap, including the authors' affiliation and assignee, coauthors and coinventors, themes from the resulting publication and patent, and so on. For example, pairs 4 and 5 in Table 1 share the same research institutions and coauthor information, so we identify them as academic inventors. No evidence can be found to support pair 6 in Table 1 as the same individual, so they cannot be linked.

Interest discovery model
To discover the research interests of researchers objectively and accurately, this work adopts the AT credit model (Xu et al., 2021a(Xu et al., , 2022 using the author's credit allocation. This model is a generalization of the AT model by introducing a set of hidden random variables Derek When the indiscriminate counting scheme is adopted, it degenerates into the AT model (Rosen-Zvi et al., 2010). Though many credit allocation schemes have been put forward in the literature, this study prefers to use the sequence-determines-credit (SDC) schema (Tscharntke et al., 2007) because (a) the SDC schema takes "hyper-authorship" (more than ten coauthors or coinventors) into consideration and (b) this scheme effectively combines the advantages of the harmonic counting scheme (Hagen, 2013) and indiscriminate counting scheme. The graphical model representation of the AT credit model is illustrated in Fig. 2. Here, K , M and A represent the number of topics, documents, and unique authors/inventors, respectively. �� ⃗ k and � ⃗ a denote respective multinomial distribution of words specific to the topic k and of topics specific to the author/inventor a . � ⃗ , and � ⃗ are the Dirichlet hyperparameter. The byline information of the document m is encoded in the variable � ⃗ a m , and � ⃗ c m assigns the authorship credit to each coauthor/coinventor in the document m according to a specified schema with the parameter . In addition, z m,n and x m,n are the topic and author/inventor assignment associated with the n-th word token w m,n in the document m.
The model can also be described from the viewpoint of generative process as follows.
) are drawn respectively from the Dirichlet ( � ⃗ ) and Dirichlet ( � ⃗ ), the authorship credits are calculated for each document m ∈ [1, M] by following a designated authorship credit allocation schema with a parameter . Finally, for each document m ∈ [1, M] , and each word token n ∈ 1, N m in the document m , x m,n is drawn from � ⃗ c m , z m,n from � ⃗ x m,n , and then w m,n from �� ⃗ z m,n . As for many Bayesian models, posterior inference cannot be done exactly in this model. The collapsed Gibbs sampling algorithm was originally utilized in Xu et al. (2021a) and Xu et al. (2022) to approximate the posterior of the AT credit model. Please refer to Xu et al. (2021a) and Xu et al. (2022) for more detail. In this work, symmetric Dirichlet priors and are set at 0.5 and 0.01, respectively. The collapsed Gibbs sampling is run for 2000 iterations, including 500 for the burn-in period.

Diversity indicators
Since the Rao-Stirling (Rao, 1982;Stirling, 2007) and DIV (Leydesdorff et al., 2019) can simultaneously consider at least two basic properties in a system, these measures were adopted to measure the diversity of interests of each researcher in this work. A larger value of these two measures indicates more diverse interests. They can be defined formally as follows: Here, i ∈ [1, K] and j ∈ [1, K] represent any two different topics. a,i and a,j are the probability of the topic i and j specific to the author/inventor a , respectively. n a,k denotes the number of themes that the author/inventor a prefers. (1 − Gini) indicates the balance of the diversity, and d ij is the degree of difference (i.e., disparity of the diversity) between the theme i and j . In our work, the disparity d ij between the theme i and j is operationalized with symmetrized Kullback-Leibler (KL) divergence, Jensen-Shannon (JS) divergence, and cosine distance. See Table 2 for more detail on how to operationalize the disparity.

Dataset
It is well known that the research and development (R&D) procedure of a novel drug often involves rich scientific knowledge, intellectual property protection, and reliable clinical trials. Thus, science-technology interactions in the pharmaceutical field are prominent (Glänzel & Meyer, 2003). Hence, this work used the DrugBank 1 database (version: 1 November 2019) as our dataset, which is the world's largest online database of drug and drug-target information. The DrugBank database is a free-to-access resource for academic users. Each drug in this database may be issued multiple patents and has an attached list of scholarly articles, which provides us an opportunity for further science-technology linkage research. The DrugBank dataset was downloaded on 1 November 2019 in XML, and parsed to the MySQL database. There are 13,339 unique drugs, 10,355 unique scientific publications, and 5,932 granted patents in this dataset. Although the drugs can act as a bridge to link scientific publications and patents, not all drugs have explicitly attached scientific or technological knowledge. In this dataset, 2768 drugs have linked with 10,836 scientific publications, 1026 drugs linked with 7880 granted patents, and 804 drugs with 3713 scholarly articles and 6535 granted patents simultaneously. Note that this study does not merge the closely related patents derived from the same core technology but issued by different authorities into a patent family. Figure 3 illustrates the linkage relations between drugs and scientific publications (a), and drugs and patents (b). More specifically, the number of unique articles and patents (y axis) that have linked to k drugs are plotted as a function of k [x-axis]. A power-law like distribution of the number of scholarly articles and patents can be noted from Fig. 4. That is to say, the vast majority of drugs are linked to few articles or patents, but several drugs are associated with a large of articles or patents. For example, Imidacloprid, a neonicotinoid insecticide, links up to 71 scientific publications, and Metformin, an oral blood glucose-lowering drug, first approved in Canada in 1972 followed by 1995 in the USA, associates with 83 patents. The number of unique academic articles and patent documents attached to drugs are 10,257 and 5930, respectively. Fig. 3 The number of scientific publications a and patents b (y axis) linked to drugs (x axis). Both axes are shown on a log scale. The power-law-like distribution is evident from the near linear pattern (in log space) Figure 4 intuitively illustrates the procedure for determining the academic inventors in this study. This just considers scientific publications and patents directly attached to each drug, and the academic inventors are limited to the intersection between the authors in scientific publications and the inventors in patents. It is worth mentioning that the authors in scientific publications could be patenting beyond the DrugBank dataset, and the inventors in patents could be publishing beyond this dataset. This study does not take these situations into consideration. 2 Hence, the number of academic inventors in this study may be underestimated, and the results from this study should be interpreted with some caution.
Given that the DrugBank dataset only records the patent numbers and PubMed Unique Identifier (PMIDs), we further collected the title, abstract, inventor and author list, and other related information for each granted patent and scientific publication from the EPO database with OPS API 3 and PubMed database with E-Fetch API. 4 To identify academic inventors, the name of each author and inventor was disambiguated with a revised rule-based scoring and clustering method (Xu et al., 2021b) and checked manually. After this operation, we obtained 43,087 unique authors and 8738 unique inventors. Then, by following the procedure in Identifying academic inventors, we finally obtained 805 unique academic inventors. That is, academic inventors account for 1.9% and 9.2% of authors and inventors, respectively.
These figures are lower than those observed in previous studies, such as Breschi and Catalini (2010) and Carayol and Carpentier (2021). Since this study does not consider the scholarly articles and patents beyond the DrugBank dataset, it may result in an underestimation of academic inventors. In addition, the number of authors on a scientific publication is frequently higher than that of inventors listed in a patent (Breschi & Catalini, 2010;Lissoni & Montobbio, 2008), so the proportion of academic inventors among authors is lower than among inventors. This point can be validated from the distribution of the number of documents with the number of authors and inventors in Fig. 5. The average number of coauthors per article (4.93) is larger than that of coinventors per patent (3.87).

Fig. 4
Procedure of how to determine the academic inventors 2 These situations have not not considered in this study because (1) as mentioned in Identifying academic inventors, the authors in several bibliographic databases and the inventors in the intellectual property databases are not disambiguated at all. This makes it these situations difficult to overcome when only using the search interface provided by these databases, and (2) to estimate the number of academic inventors beyond the DrugBank dataset, this study randomly draws 1500 solely authors and 1000 solely inventors, and then manually checks the retrieved patents and publications from the EPO and PubMed databases, respectively. Only one researcher (Anhalt, Grant J.) was identified as an academic inventor. That is to say, the rate of academic inventors beyond the DrugBank dataset is about 0.04%. Therefore, we argue that classification of individuals into three groups (academic inventors, solely authors, solely inventors) should not affect the main conclusions regarding the topic interests of each group in this study. 3 http:// ops. epo. org/. 4 https:// www. ncbi. nlm. nih. gov/ books/ NBK25 499/# chapt er4. EFetch.
Intuitively, the incentive and funding systems in different countries may affect the propensity of each individual to either publish or patent, or both. Hence, this study further collected the country information of each author and inventor. As for the authors affiliated with multiple countries, we only kept the country of the first affiliation of each author in the byline information of the resulting scientific publications. In the end, the authors in our DrugBank dataset come from 132 countries, and the inventors from 35 countries. The authors and inventors in the USA (63.31% versus 55.30%) dominate, followed by the UK (6.64% versus 4.99%), Japan (5.55% versus 13.03%), and Germany (4.54% versus 8.15%).
The pre-processing steps in this study are very similar to those in Xu et al. (2021bXu et al. ( , 2021c. The sentences in the titles and abstracts were detected with geniass (Saetre et al., 2007), and then the split sentences were tokenized and lemmatized with geniatagger (Tsuruoka et al., 2005). To filter stopwords, the English stopword list from Natural Language Toolkit (NLTK) was used to filter stopwords and all numbers were replaced with a special word NUMBER. To reduce the interference of unrelated information, copyright information was removed with human-curated rules based on regular expressions.

Descriptive statistics
To make the comparison of the three types of researchers fair, we kept scientific publications and patents with at less one academic inventor for further analysis. That is to say, only academic inventors, and their coauthors and coinventors were included in our final dataset. In the end, this dataset included 603 scientific publications and 1285 patent documents, involving 805 academic inventors, 4088 solely publishing peers and 1803 solely patenting peers. Hereinafter, for convenience, the solely publishing peers and solely patenting ones are specifically referred to as researchers with a single role.
Among the authors, 16.45% of them also applied for patents, and among the inventors, 30.87% of them also published academic articles. This is similar to observations in the fields of nanoscience and fuel cells (Guan & Wang, 2010;Klitkou et al., 2007;Meyer, 2006). Then, we compared the publishing performance of academic inventors with solely publishing peers in terms of the number of articles per author, and their patenting performance with solely patenting peers in terms of the number of patents per inventor. As a whole, most academic inventors are highly productive researchers. In more detail, the academic inventors are superior to their solely publishing peers (1.42 > 1.10) and solely patenting peers (3.11 > 2.44). This is consistent with the findings of Guan and Wang (2010), but is different from those of Meyer (2006).
In our dataset, 521 (64.72%) academic inventors participated in the basic and applied research of drugs at the same time. Surprisingly, 24 academic inventors contributed to successful delivery of Ombitasvir, an antiviral medication used as part of combination therapy to treat chronic hepatitis C. Furthermore, they are more inclined to apply for patents than to publish academic papers. The number of articles per author and that of patents per inventor are 1.43 and 3.47, respectively. For example, Soni, Paresh patented 23 inventions about the methods of treating and/or preventing cardiovascular-related disease, and published three articles on the topics of pharmacokinetics and/or clinical application of icosapent ethyl for the treatment of hypertriglyceridemia, which is an important risk factor for cardiovascularrelated diseases.  Fig. 6 illustrates the country distribution of academic inventors, solely publishing peers, and solely patenting peers. Researchers from the USA dominate, followed by Japan, the UK, and Germany. On closer examination, three interesting phenomena can be observed: (1) researchers from Japan and Germany tend to patent rather than publish; (2) researchers from USA and the UK are more inclined to publish articles; and (3) in Italy and Sweden, no significant differences between publishing and patenting were observed. In our opinion, these phenomena may be related to the incentive and funding systems of each country.

Network size and node characteristics
For an overall intuitive understanding of the three types of researchers, we constructed coauthorship, coinventorship, and hybrid networks. Several statistics are reported in Table 3. The number of components and isolated nodes in the hybrid network are less than the sum of those in other two networks. The number of nodes in the giant component of the hybrid network is much more than that in the giant component of the coauthorship or coinventorship network. This shows that the academic inventors can effectively bridge science and technology and connect more authors and inventors with each other, as shown in Fig. 7.
Here, four indicators ("degree centrality," "normalized betweenness centrality," "closeness centrality," and "constraint") in Table 4 are adopted to measure the importance and influence of the resulting researchers in the hybrid network, as presented in Table 5. A higher degree of centrality means that the resulting researcher cooperated with more researchers. If a researcher can bridge more pairs of researchers through the shortest paths, which do not have direct connectivity between them, their normalized betweenness centrality will assume a higher value. Similarly, in a network, if a researcher occupies the more central position with the shortest average distance to other researchers, they will have a higher closeness centrality value. A lower value of constraint implies that the corresponding researcher occupies a less constrained position, thereby brokering more extensively in the network. Table 5 shows that the academic inventors are more sociable, more "in between," more centrality positioned, and more likely to be structural whole than their peers, which is in line with the observation of Breschi and Catalini (2010).

Research interest discovery
In this subsection, we first identify the number of interest topics, and then answer the question: do academic inventors have more diverse research interests than those with a single role? Degree centrality for a node v is the fraction of nodes it is connected to Normalized betweenness centrality Betweenness centrality of a node v is the sum of the fraction of all-pairs shortest paths that pass

Closeness centrality
Closeness centrality of a node v is the reciprocal of the average shortest path distance to v over all Constraint is a measure of the extent to which a node v is invested in those nodes that are themselves invested in the neighbors of v

Interest diversity
Two diversity indicators, Rao-Stirling and DIV , are utilized here to calculate the diversity of research interests of solely publishing authors, solely patenting inventors, and academic inventors. To determine the number of preferred research interest topics of an author/ inventor ( n a,k ) in the DIV indicator [Equation (2)], we set three cumulative probability thresholds of interest topics (0.80, 0.85, 0.90). Whichever value the cumulative probability threshold takes, the following conclusion can be drawn: solely publishing authors have the most diverse research interests, followed by the solely patenting inventors, and the research interests of academic inventors are least diverse. Please check Table 6 for more detail.
This observation does not seem to be in line with our intuitive understanding of academic inventors. In fact, we reconducted all experiments on the whole DrugBank dataset, and a similar conclusion was drawn (Table 11 in Appendix). In our opinion, the main reasons are twofold: (1) most patents on drugs come from pharmaceutical companies with a clear R&D goal, but the authors can usually carry out free exploratory research, and even  constantly adjust their research directions according to hot themes; and (2) as science-technology gatekeepers, academic inventors span across academia and industry. Correspondingly, their interests are mainly at the interface between science and technology. In this way, they can seek a trade-off between research significance and the risk of patent uncertainty under the circumstance of market economy . Hence, the scope of their interests may not be as diverse as the researchers with a single role. Fig. 9 The distribution of interest topics of solely publishing authors, solely patenting inventors, and academic inventors To check whether the interest diversity varies by country, the researchers from the following five countries were further analyzed: the USA, Japan, the UK, Germany, and Italy, presented in Table 7. Since the diversity of research interests is not sensitive to the threshold (Table 6), a threshold of 0.85 is utilized here. From Table 7, it is evident that the research interests of academic inventors are less diverse than those of the researchers with a single role across countries. This indicates that this conclusion seems be independent from national incentive and funding systems.

Distribution and preferred content of interest topics
To determine which interest topics the solely publishing authors, the solely patenting inventors, and the academic inventors prefer, Fig. 9 illustrates the distribution of interest topics. In Fig. 9, the horizontal axis denotes the topic identification, and the vertical axis represents the average probability distribution of a focal topic. Among these three types of researchers, the solely publishing authors have most even probability distribution. This is consistent with having the most diverse research interests (Table 6). Further, we can  Table 8 illustrates examples of 12 themes. As for academic inventors, Fig. 10 shows the top five interest topics in terms of average probability distribution, in which each topic is attached with one researcher, one scientific publication, and one patent document. Theme 42 is on drugs with blocking function,  1 3 which have two major classes as inhibitors and antagonists. This kind of drugs plays a role in reducing activity and alleviating symptoms. For example, R406 is an orally available spleen tyrosine kinase (Syk) inhibitor, which blocks Fc receptor signaling and reduces immune complex-mediated inflammation. It potentially modulates Syk activity in human disease (Braselmann et al., 2006). Theme 34 discusses antitumor drugs, which inhibit tumor cell growth and induce apoptosis, providing promising therapeutic options for patients. Theme 56 discusses antidiabetic drugs and focuses on insulin, which has always been the primary pharmacological agent for treating diabetes and preventing its complications (Falcetta et al., 2022). In addition, academic inventors are also interested in antiviral drugs (Theme 23): two RNA-viruses: human immunodeficiency virus (HIV) and hepatitis C virus (HCV) received the most attention. Both have similar blood and mother-tochild transmission routes, but act on different types of cells. The former mainly infects human immune cells, and the latter infects liver cells. Theme 78 is clinical application and preparation of compound drugs. Once the clinical drug discovery is demonstrated, the researchers under this topic are more invested in the production of related drug reagents and products.
To verify whether the discovered research interests make sense, two representatives from each type of researchers were taken as examples, as reported in Table 9. Shen, Jianwei published two articles in our dataset about metabolism and disposition of inhibitor of hepatitis C. Bando, Takuji patented six inventions on the preparation of low hygroscopic aripiprazole drug. As an academic inventor, Wang, Bing worked at BioMarin Pharmaceutical Inc. His research interests include clinical application research and preparation of anticancer agents, involving one scientific publication and five patents in our dataset.  Table 12 in the Appendix illustrates the titles of scientific publications and patents of these six researchers.

Interest diversity with position characteristics
In this subsection, we answer the other question: does the position of the researcher correlate with their diverse research interests? From Table 6, one can see that the diversity of research interests is not sensitive to the threshold. Therefore, a threshold 0.85 was fixed for the following analysis. Figure 11 illustrates the distribution of the diversity of research interests of each type of researcher with the percentile of node characteristic indicators. The patterns are not consistent across the three types of researchers. The diversity of research interests of the researchers with one single role mainly concentrates on the top percentile of the degree centrality indicator and the bottom percentile of the constraint indicator. Inventors who are closer to the geometric center of the network and with medium extent of control over nonadjacent nodes have more diverse research interests.
To find out whether the interest diversity correlates with the position characteristics, we implemented Spearman rank correlation coefficient test, as presented in Table 10. The "degree centrality" and "constraint" present significant positive correlation and negative correlation with the diversity of interests of researchers, respectively. That is, whichever role one researcher has, the more widely connected, and the more structural nodes in the network, the more diverse their research interests tend to be.
In term of "closeness centrality," the results are mixed. More specifically, only academic inventors show a significant weaker negative correlation. For the researchers with one single role, we cannot conclude that the important positions at the center of a network correlate with the diversity of interests. As for "normalized betweenness centrality," nearly half of the cells in Table 10 are nonsignificant, and the other half have low values. We argue that this indicator does not correlate with the interest diversity.
In summary, the position of the researchers in a cooperative network does have a certain relation with the diversity of their research interests. For all researcher types, those with more social and more as structural nodes in the network have more diverse research interests.

Conclusions
Academic inventors play an import role in the knowledge diffusion between science and technology. Considerable efforts have been spent analyzing academic inventors in the literature. However, it is still unknown whether they have more diverse interests than researchers with one single role, and whether their position in science-technology interactions correlates with their interest diversity.
To answer these two questions, this study puts forward a rule-based identification approach of academic inventors. After research interests of each researcher were identified 1 3 by an interest discovery AT credit model, two diversity indicators with three disparity measurements were calculated for each type of researcher. Extensive empirical results on the DrugBank dataset indicate that academic inventors have less diverse research interests than the researchers with a single role, followed by solely patenting inventors.
The position of the researchers has a certain relation with the diversity of their research interests. The "degree centrality" has a significant positive correlation with the diversity of research interests, and the "constraint" presents a significant negative correlation. Among the three types of researchers, interest diversity of only academic inventors shows a weakly negative correlation with the "closeness centrality," and does not correlate with the "normalized betweenness centrality." There are several limitations of this study. As mentioned in the Dataset subsection, this study only considers scholarly articles and patents attached to each drug in the DrugBank database, which may result in a lower proportion of academic inventors. In the near future, we will retrieve scholarly articles and patents authored or patented by each researcher in the DrugBank database from comprehensive bibliographic databases. Further, the identification of academic inventors can benefit from a good name disambiguation method, and the rules for identifying academic inventors will be further enriched in our next study. In addition, we will try to identify the factors contributing to the position and interest diversity of academic inventors for science-technology interactions.