Anomalous Citations Detection in Academic Networks

Citation network analysis attracts increasing attention from disciplines of complex network analysis and science of science. One big challenge in this regard is that there are unreasonable citations in citation networks, i.e., cited papers are not relevant to the citing paper. Existing research on citation analysis has primarily concentrated on the contents and ignored the complex relations between academic entities. In this paper, we pro-pose a novel research topic, that is, how to detect anomalous citations. To be speciﬁc, we ﬁrst deﬁne anomalous citations and propose a uniﬁed framework, named ACTION, to detect anomalous citations in a heterogeneous academic network. ACTION is established based on non-negative matrix factorization and network representation learning, which considers not only the relevance of citation contents but also the relationships among academic entities including journals, papers, and authors. To evaluate the performance of ACTION, we construct two anomalous citation datasets. Experimental results demonstrate the eﬀectiveness of the proposed method. Detecting anomalous citations carry profound signiﬁcance for academic fairness.


Introduction
Citations could be regarded as currencies for assessing the scholarly impact for both scholars and institutions in academia.At the same time, citation analysis can help scholars understand the development trend of specific disciplines and the frontier of science (Liu et al, 2021a;Fortunato et al, 2018).The development of information technology and the expansion of large databases provide the opportunity to achieve the multi-angle and systematic analysis of academic networks (Xia et al, 2017).For citation bibliographic information analysis, researchers have proposed a large number of citation-based evaluation indicators such as h-index, g-index.In these indicators, citations are considered equal and valid.However, different types of citations should be treated differently (Zhu et al, 2015).
Although scholars have distinguished different types of citations in citation network analysis, most research has focused on the bibliographic information (Liu and Fang, 2020;Cai et al, 2018;Siudem et al, 2020), while few of them concentrate on the citation contents.At the same time, the number of suspected citations increases1 , which are established to enhance the impact of publications or authors intentionally rather than disseminate priors scientific advances contributing to the publication2 .In this paper, we regard the abovementioned citations as anomalous citations.Anomalous citations are academic misconducts that are currently highly concerned by the academic community (Franck, 1999;Bai et al, 2016;Chorus and Waltman, 2016;Mimouni et al, 2016;Bai et al, 2017Bai et al, , 2020;;Liu et al, 2022).It is essential to detect anomalous citations because they will bring a lot of negative effects.For example, they will affect researchers' judgment on the real quality and value of papers.Moreover, they could result in academic false prosperity and even undermine academic fairness.Anomalous citations also affect citation-based measurement indices, which are associated with scholars' promotion and awards.
In this work, we aim to give an effective solution that addresses the problem of anomalous citations identification in the heterogeneous academic network.To this end, we propose ACTION (Anomalous CItations detecTION) to simultaneously exploit complex relations among multiple academic entities.Fig. 1 gives the illustration of a heterogeneous academic network and the relations among the academic entities including journals, authors, and publications.In Fig. 1, papers p 1 and p 2 are both published on journal j 1 .p 3 and p 4 are published on j 2 and j 3 , respectively.a 1 , a 2 , • • • , a 5 are authors who cite these papers.In addition, there may be collaboration relationships existing between these authors.For example, a 1 and a 2 have published p 5 together.a 2 and a 4 have published p 6 together.In addition, we use different colours and symbols to distinguish whether a paper or author has anomalous citation behaviour in Fig. 1.For example, papers p 1 , p 3 , p 5 , and p 6 have anomalous references.Authors a 1 and a 2 have published p 5 , so they have anomalous citation behaviour.
More specifically, in the proposed framework, the first step is to represent the relationships in the academic heterogeneous network by constructing relational matrices.Then ACTION models paper contents, journal-paper relationship, and author-paper relationship based on the constructed matrices.Finally, the proposed framework integrates these three parts and combines the semi-supervised method to identify anomalous citations.Furthermore, we also identify the papers with anomalous citations and observe the effectiveness of ACTION when anomalous boundary changes.The key contributions of our work include: • We formally define a novel topic, i.e., anomalous citations detection, which widely covers a series of real-world issues.The rest of the paper is organized as follows.Section 2 gives a brief introduction to the related work.In section 3, we explain the definition of the problem and the mathematical preliminaries.In section 4, we describe each part of the framework in detail.Section 5 depicts the process of the experiments and presents the results of the experiments.Finally, we conclude our work and give the future directions in Section 6.

Related Work
Previous work on anomalous citation detection is limited.The most related topics focus mainly on citation network analysis.The proposed framework takes advantage of non-negative matrix factorization.In the contexts of citation network analysis and non-negative matrix factorization, we provide a brief review of the related work as follows.

Citation Network Analysis
Citation network analysis is a well-established research topic which utilizes multiple kinds of techniques: bibliometrics, machine learning algorithms, and complex network analysis (Liu et al, 2021b;Xia et al, 2020;Liu et al, 2020).By analyzing the citation relationships between academic entities including journals, papers, and authors, researchers could reveal the quantitative characteristics and inherent laws within science of science (Tóth et al, 2020;Fortunato et al, 2018).
Citation network analysis consists of three types from the perspective of contents, including number of citations, structure of the citation network, and relevance of topics reflected in citation relationships.Liu et al (2018) traces the scientific publications the scientists produced and quantitatively describes the hot streak phenomenon in their careers.Siudem et al (2020) propose a model to recreate citation record from three perspectives, i.e., the number of publications, citations, and the degree of randomness in the citation patterns.In the field of medicine, Liao et al (2018) explore the current status of medicine through visualizing and analyzing on the citation network constructed by publications related to medical big data.
The recent studies provide new insight into co-citation network analysis.Co-citation networks are constructed based on publication co-citation relationships, which means that two publications are cited in the same article.It has been proved that co-citation analysis is a useful way to help researchers to identify key literature for cross-disciplinary ideas (Trujillo and Long, 2018).Kim et al (2016) consider both citation contents and proximity to represent the authors' subject relatedness.Similarly, taking advantage of author co-citation analysis, Bu et al (2018) propose a new approach incorporating new pieces of information, i.e., the number of mentioned times and the number of context words to depict scientific intellectual structures.By using the data from CiteSpace, Fang et al (2018) tend to identify the intellectual landscape of climate change and tourism based on analyzing and visualizing the collaboration network and the co-citation network of the related areas.
Research on anomalous citation is limited.Anomalous citations can be regarded as a kind of improper behavior (Yu et al, 2020), including miscitations, missing citations, mandatory citations, and inappropriate self-citations.The citation behavior is arbitrary because the authors have the right to decide to how to cite references.The research pointed out that less than 1/3 of the references in each article must be cited (Prabha, 1983), which also confirms the randomness of citing behavior in the process of writing the paper.The arbitrariness also brings opportunities for academic misconduct.Moustafa (2016) explains different kinds of aberration of the citation including selfcitations (to inflate the authors' h-index), discriminatory citations (citations only from specific journals resulting in a substantial increase in their impact factors), reciprocal citations (citations from people who cite their own work), etc. Elsevier has investigated more than 55,000 paper reviewers in more than 1,000 journals, and the evidence showed that 433 of them might have compulsory citations (Chawla, 2019).The journal citation stacking is regarded as the behavior pattern of anomalous citations.Some studies focus on analyzing journal self-citations and assuming that journal self-citation malpractices are aimed at inflating a journal's impact factor (Chorus and Waltman, 2016;Humphrey et al, 2019;Gazni and Didegah, 2021;Kojaku et al, 2021;Krell, 2014).Besides, inappropriate co-author and collaborative self-citations may be misleading and may distort the scientific literature (Ioannidis, 2015).Although scholars have put forward the concept and tried to divide anomalous citations, there is still room for further deepening and expansion.Due to the difficulty of obtaining annotation data for anomalous citations, the existing research is mainly based on a small amount of data, combined with specific cases to classify different anomalous citations behaviours.There is little research on anomalous citations identification methods in academic networks.The citation behaviour is complex and diverse in reality, and the identification of anomalous citations should not only consider the content similarity between the cited paper and the cited paper, but also the implicit conflict of interest relationship between the papers and the authors.Scientometric methods based on statistical analysis are not capable of representing and processing complex networks and text contents, so it is difficult to fully explore the patterns behind the data.

Non-negative Matrix Factorization
As one of the popular machine learning techniques (Xia et al, 2021a;Sun et al, 2020), Non-negative Matrix Factorization (NMF) is proposed by Lee and Seung Lee and Seung (1999).NMF aims to make all components after decomposition non-negative, at the same time realizes the nonlinear dimension reduction.Mathematically, for any non-negative matrix V, it aims to find a non-negative matrix W and a non-negative matrix H, who satisfy the condition V = W * H. Thus it decomposes a non-negative matrix into the product of two non-negative matrices, which can realize the dimension reduction and feature extraction of the original matrix.Taking advantage of its convenient and reasonable explanation of the data, NMF has gradually become one of the most popular multidimensional data processing tools in many fields (Lin and Boutros, 2020), such as signal processing (Puigt et al, 2021), biological processes (Wang et al, 2021), and computer vision (Dai et al, 2020).
One of the most successful applications of NMF is in the field of computer vision.Generally, the computer stores the image information in the form of matrix.Thus researchers can conduct recognition, analysis and processing based on the matrix, which make NMF be well applied in computer vision.Buciu and Pitas (2004) carry out a study on facial expression recognition based on NMF and local non-negative matrix factorization (LNMF).Li et al (2016) propose NLMF method for obtaining an effective low-rank data representation and apply it in image clustering.
In recent years, there have been increasing interests in social network mining and analysis with the application of NMF (Ren et al, 2021;Xia et al, 2021b).Shu et al (2019) utilize relationships among publishers, news content and users who spread the news on social networks to identify fake news.Li et al (2020) propose a multi-view clustering method based on deep graph regularized non-negative matrix factorization.Luo et al (2020) propose a new framework named PGS for community extraction taking advantage of four different non-negative matrix factorization models.Inspired by these studies, we aim to detect anomalous citations based on NMF.

Preliminaries and Definitions
We focus on mining anomalous citations hidden in the academic network which consists of complex relations among journals, authors, and publications.In this section, we first define anomalous citations in our work.Then we describe the mathematical preliminaries used throughout the paper.

Definition of Anomalous Citation
Although studies have proved the existence of anomalous citations by analyzing the citation network and stating on publication ethics, there is not a clear definition of anomalous citations.In this paper, in order to study the anomalous citation identification problem, we assume that anomalous citations have the following characteristics: • The content of the citing paper is irrelevant to references.
• The citations between the cited papers and citing papers are relational citations.
In this paper, we use the abstract of the paper to measure the similarity between the cited paper and the citing paper.Besides, if the citing paper and the cited paper are related (i.e., self-citations, discriminatory citations, reciprocal citations, co-author citations), it belongs to relational citations.In this paper, we model the relationship between the authors, papers and journals to help determine whether the citations are relational citations.

Mathematical Preliminaries
Next, we explain the meaning of the matrices and notations that appear in this paper.In the heterogenous academic network G, P = {p 1 , p 2 , ..., p n } is the set of papers and A = {a 1 , a 2 , ..., a m } is the set of authors.J = {j 1 , j 2 , ..., j l } is the set of l journals.We define X ∈ R n×t as the paper feature matrix.For the collaboration relationships between authors, we use A ∈ R m×m to denote the author collaboration times adjacency matrix.The author-paper citing matrix in the citation network is denoted as C ∈ {0, 1} m×n , where C i,j = 1 indicates that the author a i has cited the paper p j ; otherwise C i,j = 0. We define Cr ∈ R m×1 as the author credibility.B ∈ R l×n is the journal-paper relation matrix, where B kj = 1 means that the paper p j is published in the journal j k ; otherwise B kj = 0. J ∈ R l×1 represents the journal grade.We will introduce how to get the author credibility and the journal grade in the next section.
We treat anomalous citation detection as a binary classification problem.So each citation is either a real citation (non-anomalous citation) or a false one (anomalous citation).We use Y ∈ R n×n to represent labels for the citations.y ij = 1 indicates that the citation from the target paper p i to paper p j is an anomalous citation; y ij = −1 indicates that the citation is a normal citation.The description of notations involved in this section is shown in Table 1.

Problem Definition
Based on the notations explained above, the input of the task is composed of paper feature matrix X, author collaboration times matrix A, author-paper citing matrix C, author credibility vector Cr, journal-paper relation matrix B, journal grade vector J, and partial labeled citations y L .Note that, in this paper, the labeled citations equal to the labeled cited papers.The main goal of the task to predict the label y U of remaining unlabeled citations (cited papers).
Based on the judgement of citation labels, we can judge whether a paper is a paper containing anomalous citations (defined as anomalous papers).

The ACTION Framework
In this section, we will give the details of the proposed framework.Specifically, we use a semi-supervised framework to explore the relationships among journals, papers, and authors.The overall architecture of the framework is shown in Fig. 2. ACTION contains three critical parts, including paper content embedding, author-paper relationship modeling, and journal-paper relationship modeling.First, we will introduce the paper latent feature embedding for paper contents.Then, we illustrate how to model the author-paper relationship and journal-paper relationship, respectively.Last, we will emphasize on how to integrate these three parts.

Paper Content Embedding
It's important to extract feature representations of paper contents.We utilize the abstract of each paper to represent the paper content for two reasons.
(1) Abstract is the most refined part of a paper which can effectively express its central theme.
(2) Abstract is brief and easy to obtain.Inspired by previous work, we use Doc2Vec (Le and Mikolov, 2014) to map the abstract of the paper to a vector.Doc2Vec is an unsupervised algorithm that can obtain vectors of sentences, paragraphs, and documents, which is an extension of the Word2Vec (Mikolov et al, 2013).It can overcome the shortcomings of traditional bag of words models.There are mainly two steps to obtain the representation: 1. Training stage.The first step is to get word vectors, parameters of softmax, and paragraph vectors or sentence vectors from the training data by the learning process.2. Inference stage.This step aims to obtain the vector expression for new paragraphs.Specifically, the paragraph vector is initialized randomly.The model will conduct the process of iterative learning to get the final stable sentence vector according to the random gradient descent.
Thus we can obtain the paper content representation matrix X by transforming paper abstracts into the vector representations by Word2Vec.It's worth mentioning that after transformation there may be negative values.To solve this problem, we use a linear transformation to make the values positive.After the transformation, the new matrix will preserve the content features and not affect the following steps.Thus we can get a non-negative matrix to represent paper contents for the entire network.Then we use NMF to get a low-dimensional matrix.As mentioned before, NMF is a matrix decomposition method that can make all decomposed components non-negative.Based on the paper content matrix X ∈ R n×t , we try to find two non-negative matrices N ∈ R n×d and K ∈ R t×d by solving the following optimization problem: min where d is the dimension of the latent topic space.In addition, N and K are non-negative matrices indicating low-dimensional representations of papers and words.Besides, N = [N L ; N U ], where N L ∈ R r×d is the papers' latent feature matrix for labeled cited papers and N U ∈ R (n−r)×d denotes the papers' latent feature matrix for unlabeled cited papers.

Author-Paper Relationship Modeling
With representations for the paper contents, the next step is to model the relationship between papers and the authors' citing behavior.It is based on the assumption that the relationships between authors can reflect the learning process of citations' latent features.Corresponding to the yellow part of Fig. 2, we try to explore the author-paper relationship from the following two aspects.
• We try to use author collaboration times to learn the basic author's potential characteristics because the collaboration can lead to conflicts of interest between authors.• Based on the labels of the citation and author citing behavior, we try to encode the relationship between author credibility and citation features.

Author Feature Representation
In academic networks, there are multiple relationships existing among authors such as collaboration relationships.Traditionally, scholars have a collaboration tendency with authors who have similar research interests.We also use NMF method to learn the potential representation of authors.Given the author collaboration times matrix A ∈ R m×m , we finally obtain the non-negative matrix D ∈ R m×d + by solving the following optimization problem: min where D is the author's latent matrix.T ∈ R d×d + is the author association matrix and Y ∈ R m×m controls the contribution of matrix A. Since there are only positive samples in A, we first set Y = sign(A) and then perform negative sampling.If A i,j = 0, then Y i,j = sign(A i,j ) = 0.

Capturing Relations of Author Collaboration
Author collaboration information could provide additional information for detecting anomalous citations.The study has proved that once the collaboration has been established between scholars, one author prefers to cite papers published by his/her co-authors in his/her subsequent papers (Zhou et al, 2018).
In order to model the author's citing behavior, we consider the inherent association between authors' credibility and the papers they publish.Intuitively, we assume that authors with low reputations are more likely to make anomalous citations, and vice versa.For example, if a 1 , a 2 , a 4 have low credit, they are more likely to make anomalous citations in comparison with authors with high credit.
We assume that each author has a virtual credibility score.We use Cr = {c 1 , c 2 , ..., c m } to represent the credibility for each author.The value of Cr i is between 0 and 1. Cr i is obtained by calculating the normal references proportion of the author a i : where R normal , R represent the number of normal references and the total number of the references of the author a i , respectively.We believe that the latent features of low-credit authors are closer to the latent features of anomalous citations, while the latent features of high-credit authors are closer to the latent features of real citations.We get the following optimization formula: (4) In Eq. ( 4), y L ∈ R r×1 is the label of the partially labeled citations.Then, we consider the following two situations: • For real normal citations, we set y Lj = −1 and ensure that the latent features of high reputation authors are close to the latent features of real citations.• For anomalous citations,we set y Lj = 1 and make the latent features of authors with low reputation are close to the latent features of the anomalous citations.
Actually, Eq. ( 4) can be convert to: min tr(H t LH). (5) The detailed derivation process is presented as follows.
), Eq. ( 4) can be simplified to: where B ij is computed as: We use L = S − B to represent the Laplacian matrix (where S ii = m+r j=1 B ij ), then Eq. ( 6) can be finally rewritten as: (8)

Journal-Paper Relationship Modeling
We assume that papers published in lower-quality journals have a higher probability of having anomalous citations than papers published in higher-quality journals.Therefore, exploring the quality of the journals can help us detect anomalous citations correctly.Journal impact factor (JIF) (Garfield, 1972) is an internationally accepted journal evaluation index.According to journals' IF, we regard J as the journal grade vector.Because journal impact factors of different journals are quite different, we normalize them to [0, 1].As shown in the green part of Fig. 2, the basic idea is to use the journal's grade matrix J ∈ R l×1 and the journal-paper relation matrix B ∈ R l * n to optimize the feature representation learning of the citations: We suppose that the latent features of a journal can be represented by the papers it publishes such as BN .B is a normalized journal-paper relation matrix.Q ∈ R d×1 is a weight matrix that maps the potential features of the journal to the corresponding journal grade vector J.

Classification Model
We introduce how to capture the latent feature representations of the cited papers by modeling relationships among journals, papers, and authors.We further use a semi-supervised linear classification model to learn the latent features.We try to minimize the following equation: where P ∈ R d×l is a weighting matrix that maps the potential features of the paper to the anomalous citation labels.Thus we can identify the anomalous citations by the classification model.

Optimization
If we merge all the parts mentioned above, we get the objective function: βtr(H T LH) is the simplified form of Eq. ( 4).The last term is introduced to avoid over-fitting.N = [N L ; N U ] consists of labeled and unlabeled parts.In the process of computing derivatives of N , we should first compute the derivatives of these two parts separately.Similarly, H and X also contain two parts, including labeled part and unlabeled part.
L is the Laplacian matrix and we rewrite L = [L 11 , L 12 ; L 21 , L 22 ] in order to facilitate the separate derivation of the labeled and unlabeled parts.
Our goal is to get the optimal solution of the loss function.In this process, we adopt gradient descent to obtain the optimal solution.The calculation process is shown as Fig. 3.
Next, we show the process of deriving partial derivatives for each matrix variable.1 2 2. Calculate the partial derivative of ι on D.
3. Calculate the partial derivative of ι on K.
4. Calculate the partial derivative of ι on T .
1 2 The update rules of N can be obtained by gradient descent: Similarly, For K and T : Finally, we can update P and Q as: where I is a unit matrix.

Model Complexity
In each iteration, the computational complexity for computing

Experiments
In this section, we will verify the effectiveness of ACTION.In particular, we discuss the following questions: • Whether we can improve the performance of identifying anomalous citations by modeling the journal and author simultaneously?• Whether learning of journal grades and author credibility play an important role in identifying anomalous citations?
We first illustrate the datasets used in this paper.Then we analyze the performance of ACTION from the perspectives of anomalous boundary and parameter sensitivity.The whole experimental procedure is shown in Fig. 4.

Datasets
We can obtain citation relationships from different academic datasets, such as MAG and DBLP.However, there are no recognized datasets for anomalous citations.Especially, there is no authoritative expert to point out which papers exist anomalous citations.So, we manually add non-existent references in the original papers as anomalous citations and take the original references in the papers as real citations.The reason for constructing datasets based on the MAG and DBLP datasets is that they all contain abundant information that we need, e.g., paper's abstract, authors, journal, published year and references.We randomly extract some papers with complete information in MAG and DBLP in the field of Computer Science to establish citation networks respectively.Based on these two datasets, we construct the anomalous citation datasets.We randomly select half of the papers to add anomalous references.Then the citation from the original paper to the added papers can be regarded as an anomalous citation.We add the same number of anomalous references as original references for each paper.We add three types of references: (1) Citing collaborators' publications; (2) Citing the same journal's publications; (3) Citing interdisciplinary publications with irrelevant contents.Table 2 lists the basic information of constructed datasets.We first choose 204 papers and 240 papers from two datasets respectively and then select half of the papers to add anomalous references.True papers represent the papers without anomalous citations and false papers represent the papers with anomalous citations.The number of citing links is obtained by adding each element in the authorpaper citing matrix.Similarly, the number of collaboration links is obtained by adding each element in the author collaboration times matrix.From the table, we can see that the two datasets differ in the number of authors and journals.Therefore, there are differences in collaboration density and citing density.

Evaluation Metrics
We use Accuracy, Precision, Recall, and F1 to evaluate the performance of ACTION, which are common metrics in the classification tasks.The metrics are calculated as: • F 1 = 2 * P recision * Recall P recision + Recall where T P , T N , F P , F N represent the number of anomalous citations correctly classified (true positive), the number of normal citations correctly classified (false positive), the number of normal citations misclassified (false positive), and the number of anomalous citations misclassified (false negative), respectively.

Baselines and Variants
Due to the lack of any existing baseline for the given task, we decide to use the existing dimension reduction method as baselines because in our framework, we utilize NMF to get the low-dimensional matrix.There are many ways to reduce the dimension of the matrix in machine learning.We choose some commonly used dimension reduction methods including Robust Principal Component Analysis (RPCA) (Lee and Choe, 2018), Singular Value Decomposition (SVD) (Wang and Zhu, 2017), Multidimensional Scaling (MDS) (Borg and Groenen, 2010), and SSD-Isomap (Rui et al, 2019) as baselines.
• RPCA: Principal Component Analysis (PCA) is one of the most widely used data dimension reduction algorithms.By calculating the covariance matrix of data, the eigenvector of the covariance matrix can be obtained.
The eigenvectors corresponding to k features with the largest eigenvalues are selected to form the matrix.In this way, the data matrix can be transformed into a new space to reduce the dimension of data characteristics.However, the robustness of PCA is not good.RPCA can solve the problem of poor robustness of PCA.
• SVD: SVD has great ability to extract information.It can simplify the data, remove noise and improve the performance.But the process of data conversion may be difficult to understand.• MDS: MDS is a very classical method for Manifold Learning.It can simplify the research object (sample or variable) of multidimensional space to the low-dimensional space for positioning, analysis, and classification while retaining the original relationship between objects.• SSD-Isomap: SSD-Isomap is a semi-supervised version of Isometric Feature Mapping (ISOMAP).ISOMAP's core algorithm is consistent with MDS.The difference lies in the calculation of the distance matrix in the original space.
We use SVM as the classifier to conduct classification for all baselines.Furthermore, to analyze the importance of each module, we compare ACTION with different variants of it.
• ACTION-JP: We eliminate the effect of journal-paper relation module and only utilize paper content module and author-paper relation module to identify anomalous citations.• ACTION-AP: We ignore the effect of author-paper relation module and only use paper content module and journal-paper relation module to identify anomalous citations.• ACTION-JA: We only use paper content module for identification anomalous citations.
We also analyze the role of author credibility and journal grade.
• ACTION-Cr: We remove the effect of the author credibility in author-paper relation module.
• ACTION-J: We eliminate the effect of the journal grade in journal-paper relation module.
Table 3 presents the summary of the variants.

Results Comparison and Analysis
We present our results together with some case studies from two perspectives, (1) comparison results with dimension reduction methods, (2) comparison results with ACTION variants.

Dimension Reduction Methods Comparison
We first compare ACTION with baselines mentioned in Section 5.3.We run the models on the MAG dataset and DBLP dataset respectively.The experimental results are presented in Fig. 5. From the figure, we can see that ACTION outperforms all baselines, especially on the MAG dataset.On the DBLP dataset, RPCA, SVD, MDS, and SSD-Isomap achieve similar accuracy.The F1 of ACTION comes out to be 79% and 71% on the MAG and DBLP dataset respectively, which is higher than that of the baselines.Specifically, on the DBLP dataset, ACTION is 55%, 56%, 26%, and 23% higher than RPCA, SVD, MDS, and SSD-Isomap.For the MAG dataset, ACTION is 38%, 37%, 15%, and 23% higher than RPCA, SVD, MDS, and SSD-Isomap.The superiority of ACTION over baselines further demonstrates the effectiveness of the proposed framework.

ACTION Variants Comparison
Next, we evaluate the effectiveness of each module in ACTION on the task of anomalous citations detection.We evaluate and compare six variants of ACTION mentioned above on two different datasets.The results are shown in Table 4. Based on results from Table 4, the discoveries are listed as follows.
(1) In terms of F1, ACTION is 8.48%, 22.64% higher than ACTION-JP on MAG and DBLP datasets, respectively.It shows that the journal-paper relation module is very important.In terms of F1, ACTION is 14.55%, 18.18% higher than ACTION-AP on MAG and DBLP datasets, respectively.The results suggest that the author-paper relation is indispensable.(2) We can see that ACTION performs better than ACTION-JA which is only based on paper content.In terms of F1, ACTION is 16.57%, 36.64%higher than ACTION-JA on MAG and DBLP datasets, respectively.What's more, the ACTION-JA performs worse than ACTION-JP and ACTION-AP.Compared with ACTION-J, the accuracy reduces 4.76%, 4.16% on MAG and DBLP, respectively.Compared with ACTION-AP, the accuracy reduces 2.38%, 6.25% on MAG and DBLP, respectively.The F1 reduces 2.02%, 18.46% on MAG and DBLP, respectively.The above analysis shows that the joint modeling of journal and author plays a very important role in identifying anomalous citations.
(3) ACTION performs better than ACTION-Cr which eliminates the effect of the author credibility.In terms of F1, ACTION is 10.65%, 22.64% higher than ACTION-Cr on MAG and DBLP datasets, respectively.It indicates that the author credibility provides additional information for identifying anomalous citations.
(4) ACTION performs better than ACTION-J which removes journal grade vector.In terms of F1, ACTION is 18.46%, 14.71% higher than ACTION-J on MAG and DBLP datasets, respectively.We can conclude that the journal grade provides supplementary information for identifying anomalous citations.
From the analysis of the experimental results, we conclude that: (1) joint modeling of the relationships between journals and authors contribute to the performance of ACTION; (2) author credibility and journal grade are necessary for identifying anomalous citations.

Anomalous Paper Identification Results
In order to identify anomalous papers, we introduce the concept of anomalous rate.Anomalous rate refers to the percentage of a paper's anomalous citations  to its total citations.It is calculated as: Anomalous rate = Anomalous citations T otal citations (24) Anomalous boundary helps us to judge whether a paper is an abnormal paper.When a paper's anomalous rate is greater than the anomalous boundary, we regard it as an anomalous paper.Fig. 6 shows the anomalous paper classification results for ACTION under different anomalous boundaries.The blue line represents the highest accuracy, while the orange line represents the average accuracy.The results show a consistent downtrend both on the MAG dataset and the DBLP dataset.By observing and comparing the average accuracy of datasets, we find that the highest point is at the initial point.With the increase of anomalous boundary value, the accuracy tends to be 50%.The reason is that the number of papers with anomalous citations is the same as the number of papers without anomalous citations.The worst case is that all the papers are identified as papers without anomalous citations.
We also evaluate the performance of ACTION variants on the task of anomalous paper identification.The results are summarized in Fig. 7.As can be expected, all variants have decreasing performance with the increase of anomalous rate.Obviously, ACTION always has the best performance on the MAG dataset in comparison with its variants.In particular, when we set anomalous rate higher than 20%, ACTION is much superior to its variants.

Parameter Sensitivity
We also explored how hyperparameters affect the performance of ACTION, including η, γ, α, β, and the embedding dimension. .We can see that when α and β vary from 0 to 10 −2 , the performance in terms of accuracy tends to decrease first and then increase.It achieves relatively best performance when {β = 10 −6 , α = 10 −6 }.Embedding Dimensions.In the process of paper content embedding, we use Doc2Vec to transform paper abstracts into low-dimensional representations.Generally, the dimension of the generated vectors is the same as that of the hidden layer in the neural.In fact, empirical studies have shown that the learning process of word embedding will also lead to model under-fitting and over-fitting.Thus, it is important to select an appropriate embedding dimension.Fig. 9 shows the sensitivity in terms of the dimensionality.From the results, we can easily find that the performance keeps increasing with the increase of embedding dimension in a certain region both on DBLP and MAG datasets.Furthermore, it is relatively stable when the embedding dimension d > 200.

Conclusion
In this work, we propose a novel framework, namely ACTION, to detect anomalous citations across heterogeneous academic networks.In the academic network, there are multiple types of relationships among different academic entities.ACTION can make full use of information provided by journals, authors and contents of papers for identifying anomalous citations.The experiments are carried out on two artificially generated datasets from two aspects, including comparing ACTION with dimension reduction methods and comparing ACTION with ACTION variants.From the results we observe that the performance of ACTION is at least 15% better than dimension reduction baselines on both DBLP and MAG dataset.Furthermore, by comparing ACTION with its variants, we can find that joint modeling of the relationships between journals and authors contribute to the performance of ACTION.At the same time, author credibility and journal grade are necessary for identifying anomalous citations.
There are some limitations to our work which must be borne in mind.For example, our research is focus on computer science area.It can be extended to other fields in future work.Furthermore, construction of anomalous datasets and the labeling of anomalous citations are time-consuming.We can consider how to improve the efficiency of the algorithm so that it can be applied to a large-scale dataset.Text information publications such as citation context can also be used to detect anomalous citations.

Fig. 1
Fig.1The example of a heterogeneous academic network which contains three types of academic entities, i.e., publications, journal, and author.

Fig. 2
Fig.2The overall model framework of ACTION which contains three parts: paper content embedding, author-paper relationship modeling, and journal-paper relationship modeling.
where n, l, m represent the number of papers, authors, and journals, respectively.d is the dimension of the latent space.Similarly, the cost for computing K is O(tnd), where t is the dimension of the paper content matrix.The computation cost for D is O(m 4 d 3 + md), for T is O(m 4 d 3 + m 2 d 2 ).The time complexity for updating P and Q are O(d 3 + d 2 + dr) and O(d 2 ln + d 3 + dl), where r represents the number of labeled papers.

Fig. 4
Fig.4The experimental process of identifying anomalous citations in the academic network.

Fig. 6
Fig. 6 Performance of anomalous paper identification under different anomalous boundaries.

Fig. 7
Fig. 7 Performance of anomalous paper identification results for variants of ACTION under different anomalous boundaries.

Table 1
The description of notations

Table 2
The statistics of datasets

Table 3
Summary of the detection methods for comparison

Table 4
Performance comparison for anomalous citations detection