Categorization and correlational analysis of quality factors influencing citation

Khatoon, Asma; Daud, Ali; Amjad, Tehmina

doi:10.1007/s10462-023-10657-3

Categorization and correlational analysis of quality factors influencing citation

Open access
Published: 22 February 2024

Volume 57, article number 70, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Categorization and correlational analysis of quality factors influencing citation

Download PDF

Asma Khatoon¹,
Ali Daud³ &
Tehmina Amjad^1,2

1132 Accesses
1 Citation
6 Altmetric
Explore all metrics

Abstract

The quality of the scientific publication plays an important role in generating a large number of citations and raising the work's visibility. According to several studies, the number of citations has been actively used to measure the quality of the publications. Existing studies have identified the document-related factors, author-related factors, journal-related factors, and altmetrics as the factors that influence the citations of an article. However, the majority of the stated indicators for determining the quality of a publication involve factors from the publication that are related to the author or venue of an article but these are not related to the content of the article. The factors related to the quality of publication are ignored by existing literature. The purpose of this research is to identify, categorize, and correlate the quality criteria that influence citations. As a result, a systematic literature review (SLR) is undertaken for factor categorization, and Pearson’s correlation coefficient (PCC) is calculated to quantify the impact of factors on citations. The SLR collects relevant articles from several data sources from 2013 to 2022 and categorizes factors impacting citations. A subset of factors is identified from DBLPV13 dataset and correlation of these factors with citations is studied to observe the impact of these factors on citations. The factors include Readability, Recency, Open Access, Hot topics, Abstract Length, Paper Title Length, and Page Count. Pearson’s correlation is performed to test the impact of aforementioned factors on citations. It can be observed from correlational analysis that Recency, Open Access, Hot topics, Abstract Length, page count have a favorable impact on citations, whereas Readability, Paper title length has a negative relationship with citations. The relationship among the factors is nonlinear therefore Spearman’s Correlation is computed for comparison with existing studies and has been undertaken to validate the empirical and correlational analytic results. The study has contributed by identifying, categorizing, and correlating the quality factors that need to be prioritized. Apart from the broad and more obvious features, it is determined that there is a need to investigate quality-related factors of the article that are related to the contents of the article.

A probe into 66 factors which are possibly associated with the number of citations an article received

Article 06 April 2019

Factors affecting citation networks in science and technology: focused on non-quality factors

Article 16 October 2014

Factors affecting number of citations: a comprehensive review of the literature

Article 15 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Knowledge dissemination is the most promising activity nowadays (Amjad and Ali 2018). Research publications are the main source of knowledge dissemination. Also, the knowledge diffusion research publication assists in hiring, firing, tenure, funding, and promotion decision (Chang et al. 2013). In the existing literature, the word citation is referred as a text used by the researchers to criticize or acknowledge previous research work. Citations provide valuable information that can be used as the basis to determine different information such as H-index, Impact of paper. Citation count is the measure for citation. It describes the influence of a publication and provides information on various research units, such as high ranked authors, venues, research groups, and research institutions. Predicting high impact research articles and evaluating the quality of research publications is a challenging task. Most researchers conclude that the citation count is used as a proxy for measuring the quality of research publications. The number of citations increases due to various reasons as many research publications are cited to refer to existing work (Aksnes et al. 2019; Harwood 2008). Furthermore, there are several ways to inherit knowledge in the form of citation. One is to support their technique and findings, ((Amjad et al. 2020), while others are cited to criticize research design (Song et al. 2022).

Recent work has utilized feature based, deep learning and time series (Zhao and Feng 2022) approaches to predict citations in research articles. In feature based approaches document related factors, author related factors, journal related factors, and altmetrics feature information are employed independently or their combination, however criteria for the selection of features varies in multiple studies. Feature based approaches for citation prediction has been adopted as a regression or classification problem. While using citation prediction as a regression problem, different regression models such as multiple sequential regression analysis (Lee and Brusilovsky 2019), Spearman correlation and Logistic regression, (Wang et al. 2019a, b), Paper Potential Index (PPI) Model, Multi-Feature Model (Bai et al. 2019), Regression equation (Ayaz et al. 2018), three step regression (Meyer et al. 2018), Multiple Linear Regression Model (Susarla et al. 2018), semi-continuous regression (Sohrabi and Iraj 2017), quartile regression (Stegehuis et al. 2015), stepwise multiple regression analysis(Yu et al. 2014a) are used. However using citation prediction as a classification problem SVM, KNN, and Bagging (Wang, et al. 2020), are used. Deep learning techniques, on the other hand, predict citation using smaller or larger datasets by leveraging temporal information, author or document meta information, or information from the title, abstract, or full text. Some of the approaches suggested for citation prediction include DeepCCP [(Ma et al. 2021)], multilayer BP [(Ruan et al. 2020)], BiLstm with attention model [(Shen et al. 2019)], (HAN) ST [(Maillette de Buy Wenniger et al. 2020)], and GRU-CPM model [(Wen et al. 2020)]. Previous research typically used the AMiner, Semantic Scholar, Scopus, Web of Science, and Google Scholar databases for citation prediction. These databases contain information about papers, authors, collaboration groups, and publications.

The most important fact is that several proposed classification and regression techniques exploit either extrinsic factors of the research article or meta information of author, venue, or their combination to predict the scientific impact of the research article. Limited studies have taken some of the intrinsic factors to assess the quality of the research article into consideration.

This study performs an extensive SLR to identify, categorize and correlate the factors that affect the citations received by an article. We collected data from multiple sources and have identified the correlation among multiple factors. We identified the major objectives that have been achieved by the existing literature and performed an analytical review of these factors that affect the citations. This study also presents a novel categorization of the existing literature based on the studied factors including document-based factors, author-based factors, journal-based factors, and altmetrics. The categorized factors have been correlated with the citations using PCC on the DBLP dataset to determine the impact of factors on citations. In addition, to validate the empirical and correlational findings, a comparison with earlier studies was carried out. From correlation analysis, it is concluded that main quality parameters have a favorable impact on citations. Furthermore, this study also propose recommendations for improving quality factors to make research more valuable.

The organization of the paper is as follows, Sect. 2 depicts the detailed SLR, critical analysis of the selected papers. Section 3 represents research methodology and Sect. 4 depicts the Results with their detailed analysis. Section 5 represents the Conclusion of the paper.

2 Literature review

Conducting a literature review is crucial for researchers to make valuable decisions and provide empirical evidence. This study focuses on conducting a systematic literature review (SLR) using Kitchmen guidelines to search, select, extract, and consolidate information from published evidence (Mariano et al. 2017; Snell et al. 2017). The following research question was developed.

RQ: What are the reported factors that can affect the early citation of a research article in existing literature?

The search strings designed was developed using the main research question i.e. ((“Factors” OR “Characteristics” OR “Feature” OR “Elements” OR “Attribute” OR “Properties”) AND (“Influence” OR “Determine” OR “Effect” OR “Predict” OR “Impact” OR “Significant” OR “Considerable”) AND (“Citation Rate” OR “Citation Frequency” OR “Number of Citation” OR “Citation Count” OR “High Citation” OR “Early Citation” OR “Citation Prediction”)).

2.1 Evidence of factors affecting citation and their classification

Several factors exist that drive the citation growth, usually, these studies are based on single as well as multiple factors for better prediction results. Author-based factors, journal-based factors, document-based factors, institutional factors, and altmetrics are employed to predict publication citations (Bai et al. 2019) Whereas, different parameters affect the quality of the scholarly publication. Among them, citation acts as a strong metric for assessing the quality of the scholarly article. Therefore, several studies have correlated citation rate with other factors i.e., author related, journal-related, document-related, and altmetics. Hence, it can be observed that internal quality factors and external factors make a great contribution to the paper’s citation impact. The citation rate could be considered a strong indicator to assess the quality of the paper (Yu et al. 2014a). To clarify the quality of the scholarly articles, several findings have been presented.

2.2 Document factors

As mentioned earlier, some existing studies have used a single parameter as well as a combination of parameters for the prediction of scientific impact. Several studies used machine learning techniques for prediction (Abuhay et al. 2018a, 2018b; Amplayo et al. 2018; Ayaz et al. 2018; El Mohadab et al. 2018; Fu and Aliferis 2010; Hwang et al. 2019; Lee 2019a; Lee and Brusilovsky 2019; Lisha et al. 2020; Luo et al. 2018; Wang et al. 2020; Pobiedina and Ichise 2016; Robson and Mousquès, 2016; Sohrabi and Iraj 2017; Wang, Zhang 2020; Yu and Yu 2014; Yuan et al. 2018). Moreover, some studies performed data identification and analysis (Fu and Ho 2016; Hu et al. 2018; Kosteas 2018; Prathap et al. 2016; Susarla et al. 2018)), by conducting a bibliometric investigation or proposed a novel technique (Abrishami and Aliakbary 2019; Bai et al. 2019; Yuan et al. 2018), In Tables 1 and 2 it can be seen that many influential factors (document-related factors) in assessing citation impact are determined and correlated with other similar factors as well. Document-related factors are divided into two major categories, Document General Factors, and Document Quality-Related factors. Document General Factor indicates the general extrinsic factors related to the article whereas, quality-related factors depict the factors that illustrate the quality of the article.

Table 1 Document general or extrinsic factors

Full size table

Table 2 Document quality related or intrinsic factors

Full size table

2.2.1 Document general factors

2.2.1.1 Number of early citation/citation window

Factors such as the Number of Early Citations after 1 year, length of citation window, the short term citation count, early attention, number of citations received in one year after publication, early citations, Citation Index (Citation per year), Total Citation Count in the first three years, illustrates the average citation time after publication and early dissemination of knowledge in the scientific community (Abramo et al. 2019; Chakraborty et al. 2014b; Garner et al. 2014; Guerrero-Bote and Moya-Anegón, 2014). According to research, the more citations a work receives, the higher is its quality (Ramezani-Pakpour-Langeroudi et al. 2018).

The restricting citation window has been used as a significant factor according to several researchers. They have used different terminologies to indicate the context of the citation window. The maximum limit considered for the citation window is from one to five years (Abrishami and Aliakbary 2019; Bai et al. 2019; Kosteas 2018; Lee and Brusilovsky 2019; Wang, et al. 2020; Stegehuis et al. 2015; Susarla et al. 2018; Wang, Zhihui 2020). While some of the studies have expressed that a citation window of one year for any publication is very low. Whereas, receiving a citation in the first year of publication expresses the worth of a particular publication, as some publications do not receive a decent amount of citations in the earliest year (Stegehuis et al. 2015). A novel technique was proposed to predict the future influence of a publication (Abrishami and Aliakbary 2019) depending on the citation window. According to the study, the more the citation count in a small citation window greater will be the publication influence. According to (Wang and Zhang 2020) a short citation window cannot be considered a significant citation impact indicator. In case of a short citation window (e.g., 2 years) results will not be reliable and ultimately leads to bias. Hence reliability can be associated with the length of the citation window. According to different studies, changes in citation window merely also affect the prediction of scholarly publication. Hence, citation impact prediction becomes more challenging when citation patterns evolve. At the same time, the citation window of 5 years is high to measure the impact of any publication. As a result, citation window of three years can be considered significant (Aksnes et al. 2019; Wang et al. 2020).

2.2.1.2 Publication type/study design

Multiple terminologies are analyzed from the prior literature and can be used in the same context such as Type of the Article, the type of papers, the study design, the publication type, and type of study (Hu et al. 2018; Lee and Brusilovsky 2019; Wang et al. 2020; Susarla et al. 2018). Also, these are the most important feature that affects the citation frequency, but some papers receive more citation and some are rarely cited (Aksnes et al. 2019). Publication type or the type of study can be a review, meta-analysis, a publication presenting novel methodology can be positively correlated with the citation frequency (Annalingam et al. 2014; Antoniou et al. 2015; Falagas et al. 2013; Farshad et al. 2013). A well-written and good quality review has a greater chance to enhance the citation impact (Biscaro and Giupponi 2014; Fu and Aliferis 2010; Gargouri et al. 2010; Ruano-Ravina and Alvarez-Dardet 2012; Sin 2011; Vanclay 2013). However, articles presenting a novel technique will add a strong correlation with the citation impact as it introduces a new technique for the dissemination of knowledge.

2.2.1.3 Publication year/time period of publication

The time of publication matters when a paper gets published at the start of the year or the end of the year. It seems that there is a biasness in the reception of citations. When the paper is published earlier in the year it gets more time to gain citations as compared to a paper published at the end of the year. Another terminology that is correlated with the publication time is the academic age of the article. (Araújo et al. 2012; Bornmann and Williams, 2013a; Lachance et al. 2014). Academic age recommends the total citation till the given date. Usually, older papers receive more citations than newer ones (Bornmann and Williams, 2013a; Ruano-Ravina and Alvarez-Dardet 2012; Tahamtan et al. 2016). According to some studies, recent articles receive more citations than older ones. The age of the article also depends upon the papers published. One of the research (Dey et al. 2017; Ke et al. 2015) focused on the fact that some publications get most of the citations in the initial years while, some of them do not get citations but suddenly get huge citations. Therefore, it can be concluded that the publication year or the time of publication does not affect the prediction of citation frequency. Different researchers have used the recent work to express their claims (Bai et al. 2017; Bornmann et al. 2014; Donner 2018; Wang, et al. 2020; Susarla et al. 2018).

2.2.1.4 Number of references/reference density

References in a document play a vital role, which ensures the claims depicted in the document, and references used depend upon the type of study. Usually review article receives more citations than the article presenting the methodology (Biscaro and Giupponi 2014; Fu and Aliferis 2010; Gargouri et al. 2010; Ruano-Ravina and Alvarez-Dardet 2012; Sin 2011; Vanclay 2013). An increase in the number of references receive more citations than fewer ones (Antoniou et al. 2015; Biscaro and Giupponi 2014; Bornmann et al. 2014; Didegah and Thelwall 2013; Farshad et al. 2013; Onodera and Yoshikane 2015; Robson and Mousques 2014; So et al. 2014; Van Wesel et al. 2013; Yu and Yu 2014).

Yuan et al. 2018 found that research novelty fades over time and frequency of citation of old concepts reduces with time. While on the other hand, Mathew effect suggests that rich gets richer. This means that well-known documents and reputable authors receive more citations.

2.2.1.5 Article length/paper length

Article length or the length of the paper indicates a positive correlation with the citation frequency because the increase in paper length indicates the addition of valuable knowledge to the research. A decrease in the length of the paper restricts the author from including information. However, some studies negate that the increase or decrease in the length of the article does not affect the citation frequency.

2.2.1.6 Page count/number of pages

The limitation of the page count limits the information to be presented, usually, the limitation of the publication is imposed in the conference proceedings where the scope of the research is limited, and the author is just presenting the research idea in front of the research community. Whereas, the page count is usually not limited to journal proceedings. The author can present their ideas in detail and therefore the number of pages is not restricted.

2.2.1.7 Article language

The language of the article is sometimes referred to as the journal's language, indicating the language in which the article is written. Language expresses the positive relationship between the language of the paper and the citation frequency. Some studies indicate that English is an international language and is used and understood by international researchers and therefore achieves more citations than other languages. Hence paper published in English journals has greater chances to be cited more frequently than journals in other languages.

2.2.1.8 Paper field (number of research fields of the paper/Interest of subject/research focuses Categories)

Several studies indicate that the number of citations varies across various disciplines and research fields. At the same time, citation behavior varies across major fields and subfields (Amjad and Munir 2021). The citations of paper depend upon the field size. Subfields achieve fewer citations than the major fields. For example, a paper published in the field of chemistry such as organic chemistry achieves more citations than biochemistry. In another study, the major field of biology achieve more citations than subfields. Besides different disciplines have different citation behavior. The citation rate is higher in social sciences as compared to the natural sciences. Therefore, it can be concluded that articles published in major fields and belonging to the social science discipline gain more citations as compared to the subfields and the discipline of natural sciences.

2.2.2 Document quality-related factors

The section illustrates the factors that have been used by several researchers to ensure the increase in citation frequency. Moreover, several techniques have been proposed by the researchers using the mentioned document quality-related factors in Table 2.

2.2.2.1 Quality of paper

One of the most important factors is the quality of the publication that impacts the research (Jabbour et al. 2013). Various studies have focused on the quality of publication but some extrinsic factors indicate the document's general factors as a quality measure. Hence, it can be analyzed that publication quality is illustrated more when used in some intrinsic factors (Bai et al. 2019; Robson and Mousquès, 2016; Yuan et al. 2018). In Table 2 we have concatenated the related factors. Recency (Pobiedina and Ichise 2016), Clarity, Timeliness(Robson and Mousquès, 2016), Abstract ratio, weight ratio (Sohrabi and Iraj 2017), Novelty (Amplayo et al. 2018; Castillo-Vergara et al. 2018), creativity (Tahamtan and Bornmann 2018) can be considered under the umbrella of document quality-related factors.

2.2.2.2 Abstract/ keyword characteristics

Abstract and Keyword factors are extracted from the text of the article (Uddin and Khan 2016). In previous studies, abstract, keywords and title features are explained comprehensively and we have analyzed that the abstract and keyword features contribute to the contents of the article. Diversity and the use of several keywords in an article help to increase the citations ((Chakraborty et al. 2014a; Rostami et al., 2014a; So et al. 2014). Usually, the use of keywords can direct the researchers to the relevant studies. Improper use of the article's keywords might not be accessed. Hence the use of several keywords, the percentage of keywords, and keyword diversity have a significant relation with the citation count (Uddin and Khan 2016). Abstract in any article depicts the crust of the whole concept presented in the article. The researchers usually screen the articles by their title and abstract. Hence abstract has its importance along with the title. At the same time, the title and the keywords used in the article must have relevance with one another and be diverse. Diversity and the number of keywords used in the title or abstract might increase citations (Chakraborty et al. 2014a; Rostami et al. 2014a; So et al. 2014). Another analysis is that the more the number of keywords in the titles matches the number of keywords in the abstract, the greater will be the citations (Annalingam et al. 2014; Falagas et al. 2013; Rostami et al., 2014b). It has also been analyzed that the longer abstract and presence of abstract helps to achieve more citations, while structured abstract has little effect on citation frequency (Tahamtan et al. 2016; Van Wesel et al. 2013).

2.2.2.3 Presentation/Clarity

The presentation can be expressed through three major concepts well-structured article, the use of figures and tables, and the readability of the article (Uddin and Khan 2016). An article is nicely presented, if an article is well structured in which the literature, problem, contribution, methodology, and results are depicted. Secondly, the addition of tables and figures to the articles enhances clarity and ultimately achieves more citations. Last but not least is the readability of the article, we expect that if the language of the article is more readable and helps more to grasp the concepts, it will be cited more frequently (Uddin and Khan 2016). According to the concept used by (Meyer et al. 2018), whenever the article is cited, it is actually due to the presentation and clarity of contents in a particular article. Therefore, the primary drivers of the citations of the articles are their contents, quality, and the way the contents are presented to give the reader deep knowledge about the concept expressed in the article. Hence a good presentation helps to grasp the idea that eventually helps to reduce the efforts of understanding.

2.2.2.4 Novelty/Creativity

(Veugelers and Wang 2019) mentioned that novel publications in respective fields might achieve more citations. However, novelty or creativity factor has some risk associated with it, as the novelty might add knowledge to the research along with a high uncertainty level because it might face failure. Novelty can ensure breakthroughs (Chai and Menon 2019). Publication novelty sometimes expressed as creativity might achieve and have a strong impact on the number of citations (Amplayo et al. 2018; Chai and Menon 2019; Veugelers and Wang 2019).

2.2.2.5 Open access/Visibility/Timeliness

Open access and visibility are related to the number of citations according to several studies. Previous studies assumed that the articles with open access have high visibility and therefore more cited (Koler-Povh et al. 2014; Uddin and Khan 2016) used the concept that there is a positive correlation between open access or online availability to the number of citations. Various studies depicted that open access separately has little or no effect on the citations, therefore, it can add flavor to increase the number of citations when published in high-impact journals (Craig et al. 2007; Koler-Povh et al. 2014).

2.2.2.6 Relevance

The scientific relevance that comes under the umbrella of quality-based factors of the article is related to the citation impact and citation counts. Relevance is related to the contents of the article. Hence scientific relevance and the number of citations is correlated (Leydesdorff, Bornmann, Comins, & Milojevic, 2016) but the number of citations does not help measure scientific relevance. According to our analysis relevance is a factor that can be used along with some other factors as well, for instance, relevance could be the relevance of the topic feature with the article author and relevance could be the relevance of the article with the peer reviewers. Whether it belongs to the specific field and without the conflict of interest of the reviewer or not (Geng et al. 2019; Ishag et al. 2019; Li et al. 2017),.

2.2.2.7 Peer-reviewed papers

Peer review is crucial for high-quality academic publications, assigning experts to relevant publications. Fairness and relevance are essential, with minimal conflict of interest. However, author-reviewer relationships should be examined to avoid biased reviews, which can lead to high citations (Ishag et al. 2019; Li et al. 2017). A negative correlation between peer review and citations can be observed (Geng et al. 2019; Ishag et al. 2019; Li et al. 2017).

2.2.2.8 Topic feature/Hot topics/Diversity of study topic/Versatile

Topic features are more related to the contents of the article reflecting the main semantics of the contents that act as guidance to reference the literature. In this research, we have combined the features used in terms of the topic in different aspects. Topics that might get attention due to their popularity can be referred to as hot topics. According to the scholar’s point of view, hot topics are a high-quality citation influencing factor than others. Therefore, hot topics are gaining more attention and ultimately receive more citations (Fu and Aliferis 2010; Gallivan 2012). The number of citations varies based on study topics' diversity and versatility, with social sciences receiving more citations than natural sciences. Hot topics receive more citations than outdated ones, with study design and article topics being the most predictive factors.

2.2.2.9 Title of the article/characteristics of articles

Title length is a crucial aspect of research articles, as it influences citation impact and attracts researchers and practitioners. Studies show that title length varies across disciplines, with longer titles indicating significance and greater citation impact in medical fields. However, it is not considered an influential factor and has a negative correlation with citations in the field of psychology and management. Hence a shorter title would be more concise and more appealing to the audience. (Stremersch et al. 2015; Rostami et al., 2014b; Tahamtan et al. 2016).

Title characteristics, including punctuation marks, can impact citation impact. Titles with colons, hyphens, and brackets receive more citations, while simple alphabets and questions-oriented titles are better predictors of citation. Research (Giuffrida et al. 2019) determined the factor of the versatile title of a research paper to increase the rate of citation, Therefore this research correlates the paper title to the number of citations. Another feature related to the title of the article is to mention the study design in the title (Subotic and Mukherjee 2014). For instance, systematic literature review, the type of methodology, and the data analysis and investigation indicate a strong correlation with the citation impact (Antoniou et al. 2015).

After summarizing the detailed findings, it can be analysed that Researchers use extrinsic factors to measure publication quality, but intrinsic factors are more crucial and must be associated with publication quality.

2.3 Author related aspects

2.3.1 Author-name/author keywords

The author's name and keywords are mentioned as author-related factors in Table 3 that may play a role in achieving high citation counts (Bai et al. 2017; Fu and Ho 2016). According to (Fong and Wilhite 2017) working harder isn’t the only way to increase an academic’s publication count. The inclusion of researchers as authors on publications or grant submissions even though they did not contribute to the research effort is referred to as honorary authorship (Fong and Wilhite 2017; Singhal and Kalra 2021), (Singhal and Kalra 2021). There can be many reasons to add the author names in research articles that will eventually lead to a gain in the number of citations (Amjad et al. 2015). Positive reviews can be attributed to author's reputation, but resolving the critical conflict of authorship order is crucial. First and last author publications are equally important for recruiting, promotion, and tenure in scientific domains (West et al. 2013). The importance of the authors is analyzed by (Bu et al. 2020) in which weights have been allocated to multiple authors so that their contribution could be added to the context as well. Moreover, it sometimes takes part in funding and other formalities as well (Singhal and Kalra 2021; West et al. 2013). While searching for articles published by the prestigious author, the targeted article is also searched by using the name of the author. However, in case of similar names of multiple authors leads to author disambiguation issues (Shoaib et al. 2020). As a result, a solution to address this issue must be proposed.

2.3.2 Author count/number of authors/co-authors and national/international collaboration

Collaboration between authors and co-authors significantly impacts research, as increased authors contribute valuable knowledge and increase the likelihood of being cited by co-authors (Amjad et al. 2017, 2016). National and international collaborations have a significant relationship with the number of citations, as creative minds from multiple areas enhance the knowledge (Liao et al. 2018). Collaboration among multiple authors residing internationally has a greater impact on citation frequency, and collaboration among authors from western countries increases the number of authors and co-authors in an article (Han et al. 2014).

2.3.3 Centrality

In any research environment, collaboration is very important (Han et al. 2014). Centrality is a crucial factor in collaboration networks, where authors' positions play a significant role in accessing resources and impacting publication citations. Degree centrality indicates the importance of an author, while betweenness centrality determines the importance of a node in information flow. Higher betweenness centrality leads to larger paths among nodes, highlighting the importance of centrality in a collaboration network (Guan et al. 2017). Therefore it has been concluded that in some fields, for instance, climate change field, according to Biscaro and Giupponi, (2014) degree and betweenness centrality has a significant influence on future citations of the authors. Moreover, in fields like steel structure and information science field, both measures have a positive association with the citation count of the authors (Lee 2019b; Li et al. 2013; Uddin et al. 2013).

2.3.4 Authors rank/ author score/scientific impact of author (first/ last)/H-index, author reputation

Author H-index, author reputation, and scientific impact measure productivity, with well-reputed authors having greater impact and prestige. These terms reflect the author's contribution and influence citations.

2.3.5 Institution related factors promotion/records of best paper awards

Several factors are been considered by researchers in previous studies such as size of the department, productivity of the department, ratio of department professors, research projects, and funded projects (Tahamtan et al. 2016). However, the main focus of the research is the institutional factors such as institutional prestige and university rank. Authors affiliated with prestigious and high rank universities receive more citations and some international institutes achieve more citations due to the reputation of the institutions (Amara et al. 2015; Amjad et al. 2018, 2016; Collet et al. 2014),. Authors publishing their research in high impact journals receive more citations and eventually lead the author to promotion and rewards from the institute.

Faculty members publishing multiple publications from single research have a greater impact on the research community, gaining more achievements. Funding and grants from the research community or industry contribute to industry-based research projects, which often acquire more citations. The best paper award is determined by paper characteristics, with longer papers having a higher chance of being award-winning and highly cited (Coupé, 2013).

2.3.6 Average articles published

The category is related to the research ability of the research scientist, whether working with productive and reputable researchers in the early times of the research’s career will affect the future advancement of the researcher (Amjad and Munir 2021; Lee 2019b). According to the study by (Han et al. 2014), in the field of computer science, citations for single-authored articles have decreased in the twenty-first century; consequently, the influence of working with credible collaborators must be considered for future research performance. Now the earlier publication irrespective of their publication type (journal or conference) has a significantly positive impact on attaining the future success of the article as well as the researcher ((Acuna et al. 2012; Penner et al. 2013). Detailed analysis reveals that publications in conferences and journals have different impacts on early researchers, with conference publications serving as a benchmark for future research influence (Amjad et al. 2022; Lee 2019b).

2.4 Journal related aspects

2.4.1 Venue

According to our analysis, the mentioned terms can be used alternatively and have been used by (Ayaz et al. 2018; Robson and Mousquès, 2016; Susarla et al. 2018), (Fu and Ho 2016). The frequency of citations a paper receives is influenced by how it is presented. The modes of presentation can be of two forms either journal or conference (Ibáñez et al. 2013). The papers published in journals gain more citations than conference publications. However, Tahamtan et al. analyzed no relationship between the citations and conference proceedings in a review. The conference proceedings are more valuable for presenting the topic and getting an initial idea. In comparison to conference papers, journal papers receive more citations per document and year. The journals can be international and national, general and specific. International journals receive more citations than national journals (Annalingam et al. 2014; Tahamtan et al. 2016). The papers published in general journals but high impact factor ultimately receive more citations than papers published in specific journals (Tahamtan et al. 2016; Vanclay 2013). Therefore the venues are also associated with the topic specificity (Amjad 2021). According to Daud et al. venues with high citations have low entropy and venues with low citations have high entropy, the term entropy has been used to indicate the closeness to topic specificity (Daud et al. 2019). Hence it can be concluded that papers published in journals could be a significant predictor. Moreover, journals in combination with high impact factors could be more influential to gain citations (Table 3).

Table 3 Author related factors

Full size table

2.4.2 Journal impact factor

According to our analysis mentioned terms in Table 4 can be used to express journal impact factor i.e., journal impact factor, Venue Prestige, Impact of Venue, Venu rank, Average citation received by the paper published in the venues, journal ranking attribute. Usually, journals are evaluated on the basis of journal impact factor that intern can be used to evaluate research and researchers (Brito and Rodríguez-Navarro 2019). To gain citations, researchers try to publish their work in prestigious and high impact journals. As high impact journals are more appealing to research scholars (Uddin and Khan 2016). Several studies depict the positive correlation of high impact journals with the number of citations (Bornmann et al. 2014; Bornmann and Williams 2013b; Daud et al. 2019; Didegah and Thelwall 2013; Falagas et al. 2013; Fu and Aliferis 2010; Garner et al. 2014; Jiang et al. 2013; Van Der Pol et al. 2015; Vanclay 2013). It has been analyzed by (Waltman and Traag 2017) scientific merit of the article is well measured using the journal impact factor. Moreover, the quality of the paper is associated with the journal impact factor, as papers with high quality will eventually be published by high impact factor journals. More importantly journal impact factor along with other independent factors could help to achieve citations. For instance, open access or visibility with journal impact factor, or early citation count and journal impact factor (Hammarfelt and Rushforth 2017). While, according to (Prathap et al. 2016) journal impact factor solely is not a significant predictor of citation rate.

Table 4 Journal related factors

Full size table

2.5 Altmetrics

After considering bibliometric indices, several researchers have introduced the alternative indices with the emergence of social networks also known as altmetrics as shown in Table 5. In their point of view while measuring the scholarly impact the altmetrics or alternative indices such as viewed article, saved article, Rate of the Article (Wang et al. 2019a, b), download counts (El Mohadab et al. 2018) should be considered. In today’s era of social network connectivity, altmetrics could be a strong indicator to enhance citation rate. (Nuzzolese et al. 2019) investigated the relevance and effectiveness of altmetrics in determining future success of the article. Several studies investigated the association among traditional bibliometric factors and altmetrics (Nuzzolese et al. 2019; Thelwall et al. 2013; Xia et al. 2016). Altmetric is now considered a significant factor, as the author does not have the knowledge for the existence of the article that can be cited. Articles disseminated via social communities, self-archived and open access have greater chances of citation as the chances of knowledge dissemination increase. Although altmetric is not the replacement for traditional measures but the speed with which the knowledge is disseminated towards the scientific community is the major property. Because the article is discussed and mentioned by the scientific community can be seen and therefore its worth can be analyzed properly (Table 5).

Table 5 Altmetrics

Full size table

2.6 Fact findings

The following section depicts the detail critical analysis of the paper in considered in the literature.

Table 6 depicts the critical analysis from some literature studies; therefore, it illustrates the basic idea about the type of paper, the problem domain, methodology, and results along with limitations from the year 2012 to 2022, in which factors that affect citations are highlighted. The presented factors considered in the literature are somehow related to the separated factors e.g., Paper related factors, journal related factors, author-related factors, and almetrics. It is observed that articles in the category of paper related factors (El Mohadab et al. 2018; Lee 2019a; Lisha et al. 2020; Wang, et al. 2020; Meyer et al. 2018; Yu et al. 2014b), have not focused on the intrinsic factors, Moreover, it has also been analyzed that the combination of the mentioned factors played a significant role to enhance citations. Different machine learning such as (El Mohadab et al. 2018; Fu and Aliferis 2010; Wang, et al. 2020), has been proposed to certify the claim and the evaluation of results indicate that most of the researchers have presented mostly the author related, journal related factors and extrinsic factors of an article to predict or determine higher citation although intrinsic factors that deal with the contents of the publication is the major document for knowledge dissemination. In some studies, some intrinsic factors have been considered along with extrinsic ones, but the quality-related factors of a publications are ignored.

Table 6 Critical analysis

Full size table

The objective of this research is to find the quality factors to evaluate the citations used in the existing literature. Therefore, SLR has been conducted for an in-depth study of each existing research and analysis of factors based on the groups classified. Initially, a meta-synthesis is performed, that strengthens the analysis and exploration of existing qualitative studies. As a result, Table 1 depicts the factors overlooked in various studies that lead to the development of another group that is based on the factors related to quality which are highly ignored in previous research.

From Table 7 it can be comprehended that impact assessment of either article, author, venue, and a new matrix altmetrics can be considered as an important activity, though, very few surveys exist that match the objective of our research. (Fronzetti Colladon et al. 2020) conducted an empirical analysis to predict the long-term citation count by considering the bibliometric as well as social features and author restricted the bibliometric features related to the abstract only. The objective of (Wang et al. 2019a, b) was to determine the importance of bibliometric and altmetrics. While most of the intrinsic factors are missed in the survey by considering only extrinsic factors. (Cai et al. 2019) surveyed for scholarly impact assessment and also discussed various solutions by assigning weights. Their main focus was to provide the weighing solutions assessment of the author, journal, and article but the research did not consider the indicators for article assessment. Various factors have not been considered for the appropriate assessment. (Tahamtan et al. 2016) comprehensively presented the factors affecting the citations and they categorized the factors according to paper, author, and journal is conducted. The study of quality-related factors needs more emphasis from the researchers. According to (Waltman 2016), citation impact indicators significantly affect research productivity. They utilized the bibliographic databases for the calculation of citation impact with more focus on the journals and the authors and less focus on the articles. An inclusive survey conducted by (Onodera and Yoshikane 2015) helped the researcher to collect several factors in a single document. The entire work was gracefully separated into several factors, but those factors are not categorized, and most importantly, several intrinsic factors are overlooked, which ultimately influenced the article's citation.

Table 7 Existing surveys for citation impact prediction

Full size table

3 Research methodology

In this section the use dataset and methodology is elaborated. The correlation among citations and quality factors is determined as shown in Table 9.

3.1 Dataset

In this study, comprehensive experiments are illustrated using the citation network dataset (DBLPV13) from the link https://www.aminer.cn/citation. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other data sources.The dataset is designed only for the research purpose and multiple updated versions are available at the mentioned link. We have used DBLPV13 that contains 5,354,309 papers and 48,227,950 citation relationships from the year 1990–2019. The dataset has 5.3 billion instances and features for instance Paper Id, Paper Title, Author Names, Affiliations, publication year, Field of Study, keywords, Abstract and References. While, some of the features are derived from the existing data, those features include, Paper Title length, Number of Authors, Article Length, Open Access, Presentation, Hot topics on the basis of the keywords feature, hot topics count and abstract length. The dataset has a lot many ambiguities that require huge data cleaning operations. We have considered a subset of the dataset that analyzes 150000 instances for the purpose of experimentation. The dimension of the dataset selected is (150000*30). The considered subset is further divided into multiple chunks for better experimentation. The feature set is built around four major categories that influence article citations: document general features, document quality factors, author related factors, and venue related factors.

3.2 Methodology of citation impact computation

To find the impact of various factors on citations Pearson’s correlation coefficient is used.

3.2.1 Pearson’s correlation coefficient

This type of correlation usually determines the linear regression. It indicates the nature of relationship between two variables i.e. either strong or weak. One is the dependent variable and the other act as the independent variable. Citations received by the articles are correlated with multiple features to depict the association among the features. The Dataset is divided into multiple chunks. The correlation of individual chunk is calculated and mean is calculated for each factor. Some subsets of these features are collected for experimentation purposes to prove the theoretical concepts collected from SLR findings are shown in Table 8.

Table 8 Feature distribution

Full size table

3.2.2 Data analysis

The quality of the article content relies upon multiple factor that includes presentation and readability of the article, openness, or availability of the article, topic characteristics of the article, abstract, keywords, and relevance. This study evaluated few factors and correlated them with the article's citation rate. Because of the dataset's limitations, only a few factors are chosen. The Correlation of these subsets are visualized in the form of scatter plots. The y-axis represents the dependent variable, citations, and the x-axis represents the factors whose impact on citations must be determined. The visualization is performed on the subsets of the datasets using Jupyter notebook, Python.

3.2.3 Correlational analysis among page count (PC) and citations (C)

The first dimension is the article length or page count of the research article. During the SLR analysis, it was concluded that extending the article length provides favorable conditions in terms of high citation rate; nevertheless, the experimentation demonstrated that increasing the article length has little or no effect on the citation rate of the article. Pearson's correlation is computed between citations and page count. It is extracted and added to the dataset by the difference between the start and finish of the page. The association between page count and citation is 0.035972. We also checked the rest of the data, although the correlation coefficient value fluctuates from 0.01 to 0.03 in different portions of the dataset. Hence it can be concluded that the average value of correlation coefficient is 0.03 that depicts the weak positive correlation. Both values alternatively increase or decrease till a certain pattern as shown in Fig. 1. Furthermore analyzing conference and journal separately might affect the citations. As, conference articles have a limitation of fewer pages than journal articles therefore, conference articles become a bridge for idea initiation and journal article for knowledge dissemination.

3.2.4 Correlational analysis among author count (AC) and citations (C)

Author count is the number of authors that contributed in the research article. According to researchers, number of author or authors count is among those factors that affect citation. The increase in author count might increase the indirect self-citations that negates the quality factor in the research. This study correlated the citations with the author count and correlational coefficient value for is 0.0161 that shows the low positive correlation. The data point distribution in the graph indicates how “Author Count” is related to "Citations" received by each article as shown in Fig. 2. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables. This illustrates that the increase in author count affect citations in a positive way till a certain range of author count. Hence this study concluded that low author count has high impact on citations.

3.2.5 Correlational analysis among paper title length (TL) and citations (C)

Title characteristics have an impact on citations, although they differ among disciplines. The length of the title is another characteristic that is reported in research as a significant factor. It is investigated from SLR that the paper title length has an impact on citations. The experimentation on dataset indicates that the lesser the title length the higher is the citations rate. The correlation among citations and paper title length is calculated and values range from − 0.08568 to − 0.00609. The average of these results in − 0.03417 and hence there is a weak negative correlation that indicates the increase in title length causes citations to decrease till a certain pattern. The graph's data point distribution shows how "Title length" relates to “Citations” obtained by each article. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables. Hence lower title length is highly correlated with the citations as shown in Fig. 3. The versatile nature of the title and title relevance with the study design strongly correlates with the number of citations.

3.2.6 Correlational analysis among abstract length (AL) and citations (C)

The abstract of the article depicts the extract of the actual work therefore, the abstract of an article serves as a summary, and articles are typically extracted using keywords. As a result, a well-written abstract and the right usage of keywords (several, diverse) in the abstract are positively associated to the citation impact. The decision to the inclusion of a specific study in a research is usually done on the basis of the abstract of the article as it depicts the relevance of the study hence, the abstract must be readable so that the contents of the study must be delivered. The length of the abstract is usually restricted by several journals and conferences, as the researcher has to communicate the insight of the work from the abstract. The experimentation indicates the value of correlation equals 0.056817 that depicts the positive correlation till a certain range. The graph's data point distribution shows how "count of words in Abstract" relates to "Citations" obtained by each article. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables as shown in Fig. 4. Moreover, the abstract length of the journals must be less than 300 words. The length along with the readability and relevance of the article will certainly lead to gain more citations.

3.2.7 Correlational analysis among readability (RD) and citations (C)

Readability of an article is an important aspect in terms of quality of the article. Readability is improved by well-presented content, and the addition of tables and figures to an article intensifies clarity. In this study the readability score is calculated using Flesch readability score (Eleyan et al. 2020). Flesch readability score usually range from 0 to 100 however extremely difficult text could result in a negative score. The score is computed on the abstract of an article as the abstract depicts the extract of the research. The detailed review revealed that the more an article will be readable the more citation it will gain. The correlation coefficient values fluctuates between − 0.03 to 0.0003 and therefore the average values for the correlation is − 0.014 that indicates a positive correlation, i.e. both values either increase or decrease at the same time. The graph data points as shown in Fig. 5, indicates that the value range from 10 to 80 receive more citations and the negative score indicates extreme difficult text. Hence the experimentation showed that understandability of the article will be positive when the article will be easier to read. Consequently, the number of citations has a significant correlation with readabiity and presentation. Another aspect is that readability affect citations in terms of relevance, therefore it can be concluded that if a researcher finds an article more readable but not relevant then it affect citations in a negative way.

3.2.8 Correlational analysis among hot topics (HT) and citations (C)

Another important factor is the hot topic or the topic in trend that is calculated using keywords of the article. The frequency of terms from keywords is maintained and compared the term in the title to get the trendy topics. The methodology demonstrates how to obtain the most common term in all title, keywords (organize a collection of frequent words, and if the words appear in the title or keyword, mark that from 1 to onwards according on the frequency of the words). Therefore, it can be concluded from the empirical study that hot topics gain more citations. The experimentation indicates that the correlation coefficients for the hot topics count is 0.0172. However the values varies from -0.0006 to 0.0391. Since our methodology is based on the title and keywords, it may produce different results when the abstract or field of study is taken into account. This study concludes that hot topics are influencing characteristics to gain citations as shown in Fig. 6 when used along with other factors for instance the publication date.

3.2.9 Correlational analysis among open access (OA) and citations (C)

Another significant factor is the openness or availability of the article, the more the article is visible the more it will be read and used in the research. The features are extracted on the basis of the URL/PDF and DOI (digital Object Identifier) of the article. The data is scraped using DOIs, and binary values (0,1) are encoded based on open access and non-open access. The value 0 is assigned to unavailability, whereas the value 1 is assigned to open access to the article. The correlation among open access and citations varies from 0.05 to 0.10 as shown in the Table 9. Therefore the average correlation coefficient value is 0.0824 that depicts positive correlation i.e. with open access the chances for article citation increases. The graph's data point distribution shows how “Open Access” is related to “Citations” received by each publication. Each dot on the graph represents a publication, and its position on the graph indicates its values for both variables. Hence it can be seen in Fig. 7 that visibility of the article has a significant impact on citations and visibility has greater impact on citation if published in high impact journals.

Table 9 Correlation among Factors and Citations

Full size table

3.2.10 Correlational analysis among recency (R) and citations (C)

One of the factors is publication year of the article has strong effect on citation. It is analyzed through SLR that article starts gaining citations 3 to 5 years after publication. The experimentation is carried out on the mentioned dataset and the correlation is calculated. The correlation coefficient ranges from 0.004 to 0.009 as shown in Table 9. The mean value for the correlation is 0.009025. The visualization of data also indicates that the recently published articles gain more citations than the older ones as shown in Fig. 8. The graph's data point distribution shows how “Years_Since Publication” is related to “Citations” received by each publication. Each dot on the graph represents a publication, and its position on the graph indicates its values for both variables. The lower the value of recency higher is the value of citations. Hence it can be concluded that more recent articles are cited more as they are used as the baseline for other work. Hence it also covers recency effect mentioned by several researchers. The more recent a work will be more it will gain citation.

3.2.11 Results

This section expresses a comparative analysis of the factors in existing and the current study. The characteristics identified in this study are the result of SLR; an extensive SLR is performed to find the parameters influencing citations. The detailed discussion on the comparison of factors with results of SLR has already been discussed in the previous section. The effect of citation on some of the factors is investigated and compared to earlier research. The following research considers the affecting elements listed in the table… it shows the existing studies and the methodology for determining the correlation it can be seen from Table 10, that majority of the analysis is performed on extrinsic factors. They have used PCC, MR and SC to get the relationship among individual and multiple factors. Existing research depicts that the article length has positive while weak relationship moreover this research also expressed the same positive (0.0395) hence weak relation. Existing research indicates that author count is a major predictor of citations, while our analysis determined only a weak positive association. Length of the title is considered as a significant predictor in previous research while this research did not find the factor valuable predictor for the citation (− 0.03417). The value shows that the factor is not correlated with the citation i.e. increase or decrease in one title length do not impact the citation of the article. The length of the abstract is limited in journal/conferences. The research found it as significant measure (0.05681), the correlation value indicates that the abstract length is positively correlated with the citation, however the value could be more influential if the readability and relevance is added to the abstract length. The detailed analysis has already been explained in the previous section. Recency or paper age is truly determined after citation window of 3 to 5 years. The limitation to the study is that the citation window is not defined while considering the paper age. However, they act as weak predictor (r = 0.0092) from the data analysis, however the results might varies when considering a specific Citation window. From existing studies, it can be analyzed that it varies among several disciplines. Scientific articles exploiting recent references can be a significant factor for citation and need to be investigated. Open Access determines the visibility or availability of the article and therefore it can be categorized as a document quality factor and the altmetrics. Several social platforms support the research work and helps in the dissemination of the research. From the existing and the current study it can be taken into consideration as a positive predictor (0.0824) for citations.

Table 10 Comparative analysis factor with existing studies

Full size table

The metric used to assess for readability is different to ensure symmetry with previous studies as shown in Table 11.The comparison reveals that the readability has a negative impact on citation. The minimum score for FRE is − 305.36 (Very difficult) whereas the maximum score is 84.13(very easy) which express that the readability of the abstracts varies. Hence the mean is 9.15 which indicates that the abstracts are difficult to read. The results are compared to information sciences and Linguistics and the mean value of (18.87) and (28.45) respectively. Hence it can be concluded that the information science and computer science article’s abstract are more difficult to read. This study uses the abstract of a paper to calculate the readability score; the illustration may differ if the full text or another content attribute is selected.

Table 11 Comparative analysis of readability with existing studies

Full size table

Hence it can be concluded that (Recency, Open Access, Hot topics, Abstract Length, page count) have the positive impact on citations whereas (Readability, Paper title length) has negative affect on citation. The former presents that the factors are significantly correlated with the citation count while the latter has no significant impact on citations. Considering other factors in combination with these might affect the citations in a positive way. It is observed that the relationship varies using Pearson’s correlation and is less linear, therefore spearman is also added to the experimentation to get the more accurate and association. Spearman’s correlation give more accurate results. Spearman’s correlation results that there is a need to explore the quality factors as they are significantly affecting the citations. Moreover combining these with other factors might give better results.

4 Recommendations

The quality of the article acts as a major influencing factor, and it has been ignored while extracting the factors that affect citation. Evaluating the quality of the paper is a challenging task (Jabbour et al. 2013; Maillette de Buy Wenniger et al. 2020; Onodera and Yoshikane 2015), Few factors determine the quality of the article, categorize as scientific or non-scientific. According to several studies, it can be determined using journal-related, author-related, and document-related factors. In the case of non-scientific categorization, the prestige of the journal (Impact Factor) in journal-related factors can be used to measure the quality of the article (Kosteas 2018). While, authors reputation, h-index, collaboration among the authors increase the number of citations, and the number of citations usually acts as an indicator to determine the quality of the article resides under non-scientific categorization. Now in the case of paper-related factors, extrinsic factors that are not directly linked to the contents of the paper comes under non-scientific. Moreover, dealing with intrinsic factors that are related to the contents of the paper, for instance, novelty, creativity, and innovation are categorized under scientific. The major problem is that is difficult to quantify the scientific category. Therefore, no consensus has been established to measure the quality of the article.

Every scientific manuscript is evaluated in which the quality is usually assessed by the experts and the process is referred to as Peer Review (Li et al. 2019; Prathap et al. 2016). The collaboration among multiple authors is increasingly prevalent in several fields for instance medicine, economics, and finance which can be due to multiple reasons for instance knowledge diffusion, to involve the experts in the research, to avoid the risk of rejection of manuscripts, and delayed reviews. The Journal reviewer may be related to the manuscript holder as a co-author in other manuscripts which may cause acceptance of a low-quality paper. That might restrict fairness in determining the quality of the article. It is important to understand the position of the author and the reviewer of the particular journal in which the article has been submitted. The author and the reviewer might have some indirect relationship that introduces bias into the evaluation of the article's quality.

Most of the studies worked on extrinsic factors of the article (El Mohadab et al. 2018; Wang, et al. 2020; Wang, Zhang 2020),. While, the quality of the article is determined through the intrinsic factors (Bai et al. 2019; Jabbour et al. 2013). As we have already discussed that the quality of an article cannot be assessed solely on the basis of the manuscript's external factors. Existing research focuses mostly on extrinsic factors and hence assumes that the quality factor is dependent upon only the extrinsic factors. Therefore, it can be analyzed that quality of the article has a strong association with intrinsic factors of the article. A major reason to consider the extrinsic factor is that most of these factors have quantitative basis, therefore, measurable. For instance, using limited number of pages limits the information elaboration increase in number of pages add more knowledge. Moreover, these number of pages are measurable, and can be correlated with the citations. At the same time, intrinsic factors mostly have qualitative basis, hence unmeasurable. Although relevance and clarity are very important quality factors, they are not measurable. According to our analysis, some of the document's extrinsic factors are important and their importance amplifies when they are combined with the document's intrinsic factors.

In existing literature (Abramo et al. n.d.; Brito and Rodríguez-Navarro 2019; Jian et al. 2019) Journals/venues are considered the main source for achieving high citation of articles. Journal impact factor and venue prestige have a positive correlation with the number of citations. Also, some publishers are more promising than others, renowned publishers achieve more citations than others. Hence, it is revealed that papers published by Springer receive more citations than those papers issued by Taylor and Francis. Therefore, journal related factors cannot solely evaluate the quality of the articles. There might be the possibility that the researcher and the peers have a direct or indirect collaborative relationship and therefore the acceptance of articles in reputed journal becomes biased. Presenting an article in a well-known and well recognized journal increases visibility and hence helps to achieve more citations. After analyzing existing literature, it can be extracted that journal related factors cannot be the sole criteria to achieve high citation. Moreover, journal related factors when combined with some other factors particularly, intrinsic factors can be the reason to obtain high citation.

In some studies, factors related to the author are considered more important than others to gain high citation. Author factors such as the number of authors, h-index, or reputation of the author are those that many researchers have used to gain citations. Authors that collaborate with other authors actually generate self-citations. Likewise, the article is also cited because of the author's reputation. Although factors associated with the article's authors are significant, the quality of the article cannot be determined using the information related to the author. Some factors related to authors have a quantitative basis and therefore they are measurable. For example, the number of authors, or h-index of the author have a quantitative basis and can be used to assess the status of an article’s citations indirectly.

Hot topics are those that are likely to attract attention due to their popularity (Daud et al. 2021a). According to our analysis, hot topics receive more citations than outdated topics. Reputable venues are willing to consider a hot topic, focusing more on the article title than the contents covered. Whenever a new topic trends, the topic takes precedence over the quality of the content covered. Therefore, the contents discussed in the article are somehow ignored. The articles with hot topics are assessed based on the article title, nevertheless of the fact that the article must be evaluated on the basis of the contents covered. Consequently, emphasizing more on the article’s title than the contents of the article ultimately affect the quality-based factors to achieve high citation. The hot topic factor should not be a single measure to achieve high citation, in addition to trending and hot topics, covered contents must be of high quality.

Altmetrics is introduced with the rise of social media networks. In today's era of social network connectivity, altmetrics could be the strong indicators to enhance citation rate. As the probabilities of knowledge diffusion improve, articles published via social communities, self-archiving, and open access have a higher possibility of being cited. According to our analysis multiple document quality related factors when combined with altmetrics can be used to gain high citation.

In a nutshell, the quality of the research is better represented by document quality-related parameters. Relevance of the materials employed, novelty/creativity, recentness of the article, topic characteristics, visibility, readability, and other factors should be investigated for improved content development for the research. Furthermore, other factors are important and their importance amplifies when they are combined with the document's intrinsic factors.

5 Conclusion

Although it is well acknowledged that research papers are the primary means of disseminating knowledge, the quality of the publication is even more critical to get adequate citations. Several studies concluded that the number of citations is the main criteria for measuring the quality of the publication, though, citations are influenced by several other factors.

Multiple studies analysed the correlation between different factors and citation rates of the article. The current study also explores the factors that influence the number of citations a paper receives. The study includes the articles from the year 2013 through 22 the findings are divided into five categories. Document General Factors, Document Quality Related Factors, Author Related Aspects, Journal Related factor, Altmetrics. This research also correlates some of the extracted factors with the citations using Pearson’s correlation coefficient to determine the impact of factors on citations. The main contributions of the study is (i) to segregate document related factors to document general factors and document quality related factor, (ii) some of the categorized factor’s impact on citation is also determined using Pearson’s correlation coefficient. The study also compared the results of correlation with the exiting researches. Hence it can be concluded from empirical and correlational analysis that (Recency, Open Access, Hot topics, Abstract Length, page count) have positive correlation on citation while (Readability, Paper title length) have negative impact on citation. Due to non-linearity more accurate and reliable relationship is determined using spearman’s correlation. Conclusively, the quality of the research is better depicted using document quality related factors. Relevance of the contents used, novelty/creativity, recency of the article, topic characteristics, visibility, readability and other needs to be explored for better content creation for the research. Furthermore, other aforementioned factors are significant, and their significance intensifies when integrated with the intrinsic features of the document.

References

Abramo G, D’Angelo C, Di Costa F (2010) Citations versus journal impact factor as proxy of quality: could the latter ever be preferable? Scientometrics 84(3):821–833
Article CAS Google Scholar
Abramo G, D’Angelo CA, Felici G (2019) Predicting publication long-term impact through a combination of early citations and journal impact factor. J Informet 13:32–49
Article Google Scholar
Abrishami A, Aliakbary S (2019) Predicting citation counts based on deep neural network learning techniques. J Informet 13:485–499. https://doi.org/10.1016/j.joi.2019.02.011
Article Google Scholar
Abuhay TM, Kovalchuk SV, Bochenina K, Mbogo G-K, Visheratin AA, Kampis G, Krzhizhanovskaya VV, Lees MH (2018a) Analysis of publication activity of computational science society in 2001–2017 using topic modelling and graph theory. J Comput Sci 26:193–204. https://doi.org/10.1016/j.jocs.2018.04.004
Article Google Scholar
Abuhay TM, Nigatie YG, Kovalchuk SV (2018b) Towards Predicting Trend of Scientific Research Topics using Topic Modeling. Procedia Computer Science, In: 7th International Young Scientists Conference on Computational Science, YSC2018, 02–06 July2018, Heraklion, Greece 136: 304–310. https://doi.org/10.1016/j.procs.2018.08.284
Acuna DE, Allesina S, Kording KP (2012) Predicting scientific success. Nature 489:201–202. https://doi.org/10.1038/489201a
Article ADS CAS PubMed PubMed Central Google Scholar
Aksnes DW, Langfeldt L, Wouters P (2019) Citations, citation indicators, and research quality: an overview of basic concepts and theories. SAGE Open 9:2158244019829575
Article Google Scholar
Amara N, Landry R, Halilem N (2015) What can University administrators do to increase the publication and citation scores of their faculty members? Scientometrics 103:489–530. https://doi.org/10.1007/s11192-015-1537-2
Article Google Scholar
Amjad T, Munir J (2021) Investigating the impact of collaboration with authority authors: a case study of bibliographic data in field of philosophy. Scientometrics 126:4333–4353
Article Google Scholar
Amjad T, Daud A, Che D, Akram A (2015) MuICE: mutual influence and citation exclusivity author rank. Inf Process Manage 52:374–386
Article Google Scholar
Amjad T, Daud A, Che D, Akram A (2016) MuICE: mutual influence and citation exclusivity author rank. Inf Process Manage 52:374–386. https://doi.org/10.1016/j.ipm.2015.12.001
Article Google Scholar
Amjad T, Ding Y, Xu J, Zhang C, Daud A, Tang J, Song M (2017) Standing on the shoulders of giants. J Informet 11:307–323
Article Google Scholar
Amjad T, Daud A, Aljohani NR (2018) Ranking authors in academic social networks: a survey. Library Hi Tech 36:97–128
Article Google Scholar
Amjad T, Rehmat Y, Daud A, Abbasi RA (2020) Scientific impact of an author and role of self-citations. Scientometrics 122:915–932. https://doi.org/10.1007/s11192-019-03334-2
Article Google Scholar
Amjad T, Shahid N, Daud A, Khatoon A (2022) Citation burst prediction in a bibliometric network. Scientometrics 127:2773–2790
Article Google Scholar
Amjad T, (2021) Domain-Specific Scientific Impact and its Prediction, in: 2021 International Conference on Artificial Intelligence (ICAI). IEEE, pp. 16–21.
Amplayo RK, Hong S, Song M (2018) Network-based approach to detect novelty of scholarly literature. Inf Sci 422:542–557. https://doi.org/10.1016/j.ins.2017.09.037
Article Google Scholar
Annalingam A, Damayanthi H, Jayawardena R, Ranasinghe P (2014) Determinants of the citation rate of medical research publications from a developing country. Springerplus 3:140. https://doi.org/10.1186/2193-1801-3-140
Article PubMed PubMed Central Google Scholar
Antoniou G, Antoniou S, Georgakarakos E, Sfyroeras G, Georgiadis G (2015) Bibliometric analysis of factors predicting increased citations in the vascular and endovascular literature. Ann Vasc Surg 29(2):286–292
Article PubMed Google Scholar
Ayaz S, Masood N, Islam MA (2018) Predicting scientific impact based on h-index. Scientometrics 114:993–1010. https://doi.org/10.1007/s11192-017-2618-1
Article Google Scholar
Bai X, Liu H, Zhang F, Ning Z, Kong X, Lee I, Xia F (2017) An overview on evaluating and predicting scholarly article impact. Information 8:73. https://doi.org/10.3390/info8030073
Article Google Scholar
Bai X, Zhang F, Lee I (2019) Predicting the citations of scholarly paper. J Informet 13:407–418. https://doi.org/10.1016/j.joi.2019.01.010
Article Google Scholar
Biscaro C, Giupponi C (2014) Co-authorship and bibliographic coupling network effects on citations. PLoS ONE 9:1–12. https://doi.org/10.1371/journal.pone.0099502
Article CAS Google Scholar
Bornmann L, Williams R (2013) How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects. J Informet 7:562–574. https://doi.org/10.1016/j.joi.2013.02.005
Article Google Scholar
Bornmann L, Leydesdorff L, Wang J (2014) How to improve the prediction based on citation impact percentiles for years shortly after the publication date? J Informet 8:175–180. https://doi.org/10.1016/j.joi.2013.11.005
Article Google Scholar
Bramoullé Y, Ductor L (2018) Title length. J Econ Behav Organ 150:311–324. https://doi.org/10.1016/j.jebo.2018.01.014
Article Google Scholar
Brito R, Rodríguez-Navarro A (2019) Evaluating research and researchers by the journal impact factor: is it better than coin flipping? J Informet 13:314–324. https://doi.org/10.1016/j.joi.2019.01.009
Article Google Scholar
Bu Y, Wang B, Chinchilla-Rodríguez Z, Sugimoto CR, Huang Y, Huang W (2020) Considering author sequence in all-author co-citation analysis. Inf Process Manage 57:102300. https://doi.org/10.1016/j.ipm.2020.102300
Article Google Scholar
Cai L, Tian J, Liu J, Bai X, Lee I, Kong X, Xia F (2019) Scholarly impact assessment: a survey of citation weighting solutions. Scientometrics 118:453–478
Article Google Scholar
Castillo-Vergara M, Alvarez-Marin A, Placencio-Hidalgo D (2018) A bibliometric analysis of creativity in the field of business economics. J Bus Res 85:1–9. https://doi.org/10.1016/j.jbusres.2017.12.011
Article Google Scholar
Chai S, Menon A (2019) Breakthrough recognition: Bias against novelty and competition for attention. Res Policy 48:733–747. https://doi.org/10.1016/j.respol.2018.11.006
Article Google Scholar
Chakraborty T, Sikdar S, Ganguly N, Mukherjee A (2014b) Citation interactions among computer science fields: a quantitative route to the rise and fall of scientific research. Soc Netw Anal Min 4:187. https://doi.org/10.1007/s13278-014-0187-3
Article Google Scholar
Chakraborty T, Kumar S, Goyal P, Ganguly N, Mukherjee A (2014a) Towards a stratified learning approach to predict future citation counts, In: IEEE/ACM Joint Conference on Digital Libraries. pp. 351–360. https://doi.org/10.1109/JCDL.2014.6970190
Chang C-L, McAleer M, Oxley L (2013) Coercive journal self citations, impact factor, Journal Influence and Article Influence. Mathematics and Computers in Simulation, Selected Papers of the MSSANZ 19th Biennial Conference on Modelling and Simulation, Perth, Australia, 93: 190–197. https://doi.org/10.1016/j.matcom.2013.04.006
Collet F, Robertson DA, Lup D (2014) When does brokerage matter? Citation impact of research teams in an emerging academic field. Strateg Organ 12:157–179. https://doi.org/10.1177/1476127014530124
Article Google Scholar
Costello MJ, Beard KH, Primack RB, Devictor V, Bates AE (2019) Are killer bees good for coffee? The contribution of a paper’s title and other factors to its future citations. Biol Cons 229:A1–A5. https://doi.org/10.1016/j.biocon.2018.07.010
Article Google Scholar
Coupé T (2013) Peer review versus citations – An analysis of best paper prizes. Res Policy 42:295–301. https://doi.org/10.1016/j.respol.2012.05.004
Article Google Scholar
Craig ID, Plume AM, McVeigh ME, Pringle J, Amin M (2007) Do open access articles have greater citation impact?: a critical review of the literature. J Informet 1:239–248
Article Google Scholar
Daud A, Amjad T, Siddiqui MA, Aljohani NR, Abbasi RA, Aslam MA (2019) Correlational analysis of topic specificity and citations count of publication venues. Library Hi Tech 37:8–18
Article Google Scholar
Daud A, Abbas F, Amjad T, Alshdadi AA, Alowibdi JS (2021) Finding rising stars through hot topics detection. Futur Gener Comput Syst 115:798–813
Article Google Scholar
de Buy M, Wenniger G, van Dongen T, Aedmaa E, Kruitbosch HT, Valentijn EA, Schomaker L (2020) Structure-tags improve text classification for scholarly document quality prediction. Proceed First Workshop Sch Doc Process. https://doi.org/10.18653/v1/2020.sdp-1.18
Article Google Scholar
Dey R, Roy A, Chakraborty T, Ghosh S (2017) Sleeping beauties in Computer Science: characterization and early identification. Scientometrics 113:1645–1663. https://doi.org/10.1007/s11192-017-2543-3
Article Google Scholar
Didegah F, Thelwall M (2013) Determinants of research citation impact in nanoscience and nanotechnology. J Am Soc Inform Sci Technol 64:1055–1064. https://doi.org/10.1002/asi.22806
Article CAS Google Scholar
Donner P (2018) Effect of publication month on citation impact. J Informet 12:330–343. https://doi.org/10.1016/j.joi.2018.01.012
Article Google Scholar
El Mohadab M, Bouikhalene B, Safi S (2018) Predicting rank for scientific research papers using supervised learning. Appl Comput Inform. https://doi.org/10.1016/j.aci.2018.02.002
Article Google Scholar
Eleyan D, Othman A, Eleyan A (2020) Enhancing software comments readability using flesch reading ease score. Information. https://doi.org/10.3390/info11090430
Article Google Scholar
Fahimifar S, Mousavi K, Mozaffari F, Ausloos M (2023) Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods. Qual Quant 57:3685–3712. https://doi.org/10.1007/s11135-022-01480-z
Article Google Scholar
Falagas ME, Zarkali A, Karageorgopoulos DE, Bardakas V, Mavros MN (2013) The impact of article length on the number of future citations: a bibliometric analysis of general medicine journals. PLoS ONE 8:1–8. https://doi.org/10.1371/journal.pone.0049476
Article CAS Google Scholar
Farshad M, Sidler C, Gerber C (2013) Association of scientific and nonscientific factors to citation rates of articles of renowned orthopedic journals. Eur Orthop Traumatol 4:125–130. https://doi.org/10.1007/s12570-013-0174-6
Article Google Scholar
Fong EA, Wilhite AW (2017) Authorship and citation manipulation in academic research. PLoS ONE 12:1–34. https://doi.org/10.1371/journal.pone.0187394
Article CAS Google Scholar
Fronzetti Colladon A, D’Angelo CA, Gloor PA (2020) Predicting the future success of scientific publications through social network and semantic analysis. Scientometrics 124:357–377. https://doi.org/10.1007/s11192-020-03479-5
Article Google Scholar
Fu LD, Aliferis CF (2010) Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85:257–270. https://doi.org/10.1007/s11192-010-0160-5
Article Google Scholar
Fu H-Z, Ho Y-S (2016) Highly cited antarctic articles using science citation index expanded: a bibliometric analysis. Scientometrics 109:337–357. https://doi.org/10.1007/s11192-016-1992-4
Article Google Scholar
Gallivan, M.J., 2012. Analyzing Citation Impact of IS Research by Women and Men: Do Women Have Higher Levels of Research Impact?, In: Proceedings of the 50th Annual Conference on Computers and People Research, SIGMIS-CPR ’12. Association for Computing Machinery, New York pp. 175–184. https://doi.org/10.1145/2214091.2214137
Gargouri Y, Hajjem C, Larivière V, Gingras Y, Carr L, Brody T, Harnad S (2010) Self-selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 5:1–12. https://doi.org/10.1371/journal.pone.0013636
Article CAS Google Scholar
Garner J, Porter AL, Newman NC (2014) Distance and velocity measures: using citations to determine breadth and speed of research impact. Scientometrics 100:687–703. https://doi.org/10.1007/s11192-014-1316-5
Article Google Scholar
Geng, Q., Jin, J., Yan, S., 2019. Utilizing Academic-Network-Based Conflict of Interests: for Paper Reviewer Assignment.
Giuffrida C, Abramo G, D’Angelo CA (2019) Are all citations worth the same? Valuing citations by the value of the citing items. J Informet 13:500–514. https://doi.org/10.1016/j.joi.2019.02.008
Article Google Scholar
Guan J, Yan Y, Zhang JJ (2017) The impact of collaboration and knowledge networks on citations. J Informet 11:407–422. https://doi.org/10.1016/j.joi.2017.02.007
Article Google Scholar
Guerrero-Bote VP, Moya-Anegón F (2014) Relationship between downloads and citations at journal and paper levels, and the influence of language. Scientometrics 101:1043–1065. https://doi.org/10.1007/s11192-014-1243-5
Article Google Scholar
Hammarfelt B, Rushforth AD (2017) Indicators as judgment devices: an empirical study of citizen bibliometrics in research evaluation. Res Eval 26:169–180
Article Google Scholar
Han P, Shi J, Li X, Wang D, Shen S, Su X (2014) International collaboration in LIS: global trends and networks at the country and institution level. Scientometrics 98:53–72. https://doi.org/10.1007/s11192-013-1146-x
Article Google Scholar
Harwood N (2008) Publication outlets and their effect on academic writers’ citations Jointly published by Akadémiai Kiadó, Budapest. Scientometrics 77:253–265. https://doi.org/10.1007/s11192-007-1955-x
Article Google Scholar
Holm, A.N., Plank, B., Wright, D., Augenstein, I., 2021. Longitudinal Citation Prediction using Temporal Graph Neural Networks.
Hu Z, Tian W, Xu S, Zhang C, Wang X (2018) Four pitfalls in normalizing citation indicators: an investigation of ESI’s selection of highly cited papers. J Informet 12:1133–1145. https://doi.org/10.1016/j.joi.2018.09.006
Article Google Scholar
Hwang A, Arbaugh JB, Bento RF, Asarta CJ, Fornaciari CJ (2019) What causes a business and management education article to be cited: article, author, or journal? Int J Manag Edu 17:139–150. https://doi.org/10.1016/j.ijme.2019.01.005
Article Google Scholar
Ibáñez A, Bielza C, Larrañaga P (2013) Relationship among research collaboration, number of documents and number of citations: a case study in Spanish computer science production in 2000–2009. Scientometrics 95:689–716. https://doi.org/10.1007/s11192-012-0883-6
Article Google Scholar
Ishag MIM, Park KH, Lee JY, Ryu KH (2019) A pattern-based academic reviewer recommendation combining author-paper and diversity metrics. IEEE Access 7:16460–16475. https://doi.org/10.1109/ACCESS.2019.2894680
Article Google Scholar
Jabbour CJC, de Jabbour ABL, S., Oliveira, J.H.C. de, (2013) The perception of Brazilian researchers concerning the factors that influence the citation of their articles: a study in the field of sustainability. Ser Rev 39:93–96
Article Google Scholar
Jian, Z., Ning, C., Zong-Yuan, T., Junaid, K.M., 2019. Analysis of Effects to Journal Impact Factors via Citation Networks Generated by Distributed Parallel Model. IEEE Access.
Jiang J, He D, Ni C (2013) The correlations between article citation and references’ impact measures: what can we learn? Proceed Am Soc Inform Sci Technol 50:1–4. https://doi.org/10.1002/meet.14505001162
Article Google Scholar
Ke Q, Ferrara E, Radicchi F, Flammini A (2015) Defining and identifying sleeping beauties in science. Proc Natl Acad Sci 112:7426–7431. https://doi.org/10.1073/pnas.1424329112
Article ADS CAS PubMed PubMed Central Google Scholar
Koler-Povh T, Južnič P, Turk G (2014) Impact of open access on citation of scholarly publications in the field of civil engineering. Scientometrics 98:1033–1045. https://doi.org/10.1007/s11192-013-1101-x
Article Google Scholar
Kosteas VD (2018) Predicting long-run citation counts for articles in top economics journals. Scientometrics 115:1395–1412. https://doi.org/10.1007/s11192-018-2703-0
Article Google Scholar
Lachance C, Poirier S, Larivière V (2014) The kiss of death? The effect of being cited in a review on subsequent citations. J Am Soc Inf Sci 65:1501–1505. https://doi.org/10.1002/asi.23166
Article Google Scholar
Lee DH (2019a) Predictive power of conference-related factors on citation rates of conference papers. Scientometrics 118:281–304. https://doi.org/10.1007/s11192-018-2943-z
Article Google Scholar
Lee DH (2019b) Predicting the research performance of early career scientists. Scientometrics 121:1481–1504. https://doi.org/10.1007/s11192-019-03232-7
Article Google Scholar
Lee DH, Brusilovsky P (2019) The first impression of conference papers: does it matter in predicting future citations? J Am Soc Inf Sci 70:83–95
Google Scholar
Lei L, Yan S (2016) Readability and citations in information science: evidence from abstracts and articles of four journals (2003–2012). Scientometrics 108:1155–1169. https://doi.org/10.1007/s11192-016-2036-9
Article Google Scholar
Li EY, Liao CH, Yen HR (2013) Co-authorship networks and research impact: a social capital perspective. Res Policy 42:1515–1530. https://doi.org/10.1016/j.respol.2013.06.012
Article Google Scholar
Li K, Cao Z, Qu D (2017) Fair Reviewer Assignment Considering Academic Social Network. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and Big Data. Springer International Publishing, Cham, pp 362–376
Chapter Google Scholar
Li S, Zhao WX, Yin EJ, Wen J-R (2019) A Neural Citation Count Prediction Model based on Peer Review Text, in: EMNLP.
Liao H, Tang M, Li Z, Lev B (2018) Bibliometric analysis for highly cited papers in operations research and management science from 2008 to 2017 based on Essential Science Indicators. Omega. https://doi.org/10.1016/j.omega.2018.11.005
Article PubMed Google Scholar
Lisha L, Dongjin YU, Dongjing W, Fumiyo F (2020) Citation count prediction based on neural Hawkes model. IEICE Trans Inform Syst E103:2379–2388
ADS Google Scholar
Luo F, Sun A, Raamkumar AS, Erdt M, Theng Y (2018) Will your paper get promoted by a citation? A case study of citation promoter in computer science discipline. IEEE Trans Emerg Topics Comput. https://doi.org/10.1109/TETC.2018.2861321
Article Google Scholar
Lyu P, Wolfram D (2018) Do longer articles gather more citations? Article length and scholarly impact among top biomedical journals. Proceed Assoc Inform Sci Technol 55:319–326. https://doi.org/10.1002/pra2.2018.14505501035
Article Google Scholar
Ma A, Liu Y, Xu X, Dong T (2021) A deep-learning based citation count prediction model with paper metadata semantic features. Scientometrics 126:6803–6823
Article Google Scholar
Mariano D, Leite C, Santos L, Rocha R, Melo-Minardi R, (2017) A guide to performing systematic literature reviews in bioinformatics.
Martín-Martín A, Costas R, van Leeuwen T, Delgado López-Cózar E (2018) Evidence of open access of scientific publications in google scholar: a large-scale analysis. J Informet 12:819–841. https://doi.org/10.1016/j.joi.2018.06.012
Article Google Scholar
Meyer M, Waldkirch RW, Duscher I, Just A (2018) Drivers of citations: an analysis of publications in “top” accounting journals. Crit Perspect Account, Res Divers Hierarchies Account J 51:24–46. https://doi.org/10.1016/j.cpa.2017.07.001
Article Google Scholar
Nair LB, Gibbert M (2016) What makes a ‘good’ title and (how) does it matter for citations? A review and general model of article title attributes in management science. Scientometrics 107:1331–1359. https://doi.org/10.1007/s11192-016-1937-y
Article Google Scholar
Niyazov Y, Vogel C, Price R, Lund B, Judd D, Akil A, Mortonson M, Schwartzman J, Shron M (2016) Open access meets discoverability: citations to articles posted to academiaedu. PLoS ONE 11:e0148257. https://doi.org/10.1371/journal.pone.0148257
Article CAS PubMed PubMed Central Google Scholar
Nuzzolese AG, Ciancarini P, Gangemi A, Peroni S, Poggi F, Presutti V (2019) Do altmetrics work for assessing research quality? Scientometrics 118:2–539
Article Google Scholar
Onodera N, Yoshikane F (2015) Factors affecting citation rates of research articles: factors affecting citation rates of research articles. J Am Soc Inf Sci 66:739–764. https://doi.org/10.1002/asi.23209
Article CAS Google Scholar
Patil AH, Mahalle P (2020) Trends and challenges in measuring performance of reviewer paper assignment. Procedia Comput Sci 171:709–718
Article Google Scholar
Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S (2013) On the predictability of future impact in science. Sci Rep 3:3052. https://doi.org/10.1038/srep03052
Article ADS PubMed PubMed Central Google Scholar
Pobiedina N, Ichise R (2016) Citation count prediction as a link prediction problem. Appl Intell 44:252–268. https://doi.org/10.1007/s10489-015-0657-y
Article Google Scholar
Prathap G, Mini S, Nishy P (2016) Does high impact factor successfully predict future citations? An analysis using Peirce’s measure. Scientometrics 108:1043–1047. https://doi.org/10.1007/s11192-016-2034-y
Article Google Scholar
Ramezani-Pakpour-Langeroudi F, Okhovati M, Talebian A (2018) Do highly cited clinicians get more citations when being present at social networking sites? J Educ Health Promot 7:18–18. https://doi.org/10.4103/jehp.jehp_69_17
Article PubMed PubMed Central Google Scholar
Robson B, Mousques A (2014) Predicting citation counts of environmental modelling papers.
Robson BJ, Mousquès A (2016) Can we predict citation counts of environmental modelling papers? Fourteen bibliographic and categorical variables predict less than 30% of the variability in citation counts. Environ Model Softw 75:94–104. https://doi.org/10.1016/j.envsoft.2015.10.007
Article Google Scholar
Rostami F, Mohammadpoorasl A, Hajizadeh M (2014) The effect of characteristics of title on citation rates of articles. Scientometrics 98:2007–2010. https://doi.org/10.1007/s11192-013-1118-1
Article Google Scholar
Ruan X, Zhu Y, Li J, Cheng Y (2020) Predicting the citation counts of individual papers via a BP neural network. J Informet 14:101039. https://doi.org/10.1016/j.joi.2020.101039
Article Google Scholar
Ruano-Ravina A, Alvarez-Dardet C (2012) Evidence-based editing: factors influencing the number of citations in a national journal. Ann Epidemiol 22:649–653. https://doi.org/10.1016/j.annepidem.2012.06.104
Article PubMed Google Scholar
Shen A, Salehi B, Baldwin T, Qi J (2019) A Joint Model for Multimodal Document Quality Assessment, In: Proceedings of the 18th Joint Conference on Digital Libraries, JCDL ’19. IEEE Press, pp. 107–110. https://doi.org/10.1109/JCDL.2019.00024
Shoaib M, Daud A, Amjad T (2020) Author Name Disambiguation in Bibliographic Databases: a Survey. arXiv preprint arXiv:2004.06391.
Sin S-CJ (2011) International coauthorship and citation impact: a bibliometric study of six LIS journals, 1980–2008. J Am Soc Inform Sci Technol 62:1770–1783. https://doi.org/10.1002/asi.21572
Article Google Scholar
Singhal S, Kalra BS (2021) Publication ethics: role and responsibility of authors. Indian J Gastroenterol 40:65–71. https://doi.org/10.1007/s12664-020-01129-5
Article PubMed PubMed Central Google Scholar
Snell KI, Ensor J, Hooft L, Reitsma JB, Riley RD, Moons KG (2017) A guide to systematic review and meta-analysis of prediction model performance. BMJ 356:i6460
PubMed Google Scholar
So M, Choi S, Kim J, Park H (2014) Factors affecting citation networks in science and technology: focused on non-quality factors. Qual Quant. https://doi.org/10.1007/s11135-014-0110-z
Article Google Scholar
Soares CG, de Araújo B, Ramalho R, de Oliveira L, de Oliveira V, Brito TT, da Matta B, Viana F, Souza CP, Guerreiro RC, Slama FA, da Matta E, Portugal M (2012) Two-year citations of JAPPL original articles: evidence of a relative age effect. J Appl Physiol 112(9):1434–1436
Article Google Scholar
Sohrabi B, Iraj H (2017) The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts. Scientometrics 110:243–251. https://doi.org/10.1007/s11192-016-2161-5
Article Google Scholar
Song D, Wang W, Fan Y, Xing Y, Zeng A (2022) Quantifying the structural and temporal characteristics of negative links in signed citation networks. Inf Process Manage 59:102996. https://doi.org/10.1016/j.ipm.2022.102996
Article Google Scholar
Stegehuis C, Litvak N, Waltman L (2015) Predicting the long-term citation impact of recent publications. J Informet 9:642–657. https://doi.org/10.1016/j.joi.2015.06.005
Article Google Scholar
Stremersch S, Camacho N, Vanneste S, Verniers I (2015) Unraveling scientific impact: citation types in marketing journals. Int J Res Mark 32:64–77. https://doi.org/10.1016/j.ijresmar.2014.09.004
Article Google Scholar
Susarla SM, Tveit M, Dodson TB, Kaban LB, Hopper RA, Egbert MA (2018) What are the defining characteristics of the most cited publications in orthognathic surgery? Int J Oral Maxillofac Surg 47:1411–1419. https://doi.org/10.1016/j.ijom.2018.04.016
Article CAS PubMed Google Scholar
Tahamtan I, Bornmann L (2018) Creativity in science and the link to cited references: is the creative potential of papers reflected in their cited references? J Informet 12:906–930. https://doi.org/10.1016/j.joi.2018.07.005
Article Google Scholar
Tahamtan I, Safipour Afshar A, Ahamdzadeh K (2016) Factors affecting number of citations: a comprehensive review of the literature. Scientometrics 107:1195–1225. https://doi.org/10.1007/s11192-016-1889-2
Article Google Scholar
Talaat FM, Gamel SA (2023) Predicting the impact of no. of authors on no. of citations of research publications based on neural networks. J Ambient Intell Humaniz Comput 14:8499–8508. https://doi.org/10.1007/s12652-022-03882-1
Article Google Scholar
Thelwall M, Haustein S, Larivière V, Sugimoto CR (2013) Do altmetrics work? Twitter and ten other social web services. PLoS ONE 8:e64841
Article ADS CAS PubMed PubMed Central Google Scholar
Tsai C-F (2014) Citation impact analysis of top ranked computer science journals and their rankings. J Informet 8:318–328. https://doi.org/10.1016/j.joi.2014.01.002
Article Google Scholar
Uddin S, Khan A (2016) The impact of author-selected keywords on citation counts. J Informet 10:1166–1177. https://doi.org/10.1016/j.joi.2016.10.004
Article Google Scholar
Uddin S, Hossain L, Rasmussen K (2013) Network effects on scientific collaborations. PLoS ONE 8:1–12. https://doi.org/10.1371/journal.pone.0057546
Article CAS Google Scholar
van Dongen T, Maillette de Buy Wenniger G, Schomaker L (2020) SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction. In: Proceedings of the First Workshop on Scholarly Document Processing. https://doi.org/10.18653/v1/2020.sdp-1.17
Van Der Pol CB, McInnes MD, Petrcich W, Tunis AS, Hanna R (2015) Is quality and completeness of reporting of systematic reviews and meta-analyses published in high impact radiology journals associated with citation rates? PLoS ONE 10:e0119892
Article PubMed PubMed Central Google Scholar
Van Wesel M, Wyatt S, Haaf J (2013) What a difference a colon makes: How superficial factors influence subsequent citation. Scientometrics. https://doi.org/10.1007/s11192-013-1154-x
Article Google Scholar
Vanclay JK (2013) Factors affecting citation rates in environmental science. J Informet 7:265–271. https://doi.org/10.1016/j.joi.2012.11.009
Article Google Scholar
Veugelers R, Wang J (2019) Scientific novelty and technological impact. Res Policy. https://doi.org/10.1016/j.respol.2019.01.019
Article Google Scholar
Waltman L (2016) A review of the literature on citation impact indicators. J Informet 10:365–391
Article Google Scholar
Waltman L, Traag VA (2017) Use of the journal impact factor for assessing individual articles need not be statistically wrong. arXiv e-prints arXiv-1703.
Wang X, Zhang Z (2020) Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact. J Informet 14:101019
Article Google Scholar
Wang F, Fan Y, Zeng A, Di Z (2019a) Can we predict ESI highly cited publications? Scientometrics 118(1):109–125
Article Google Scholar
Wang M, Wang Z, Chen G (2019b) Which can better predict the future success of articles? Bibliometric indices or alternative metrics. Scientometrics. https://doi.org/10.1007/s11192-019-03052-9
Article Google Scholar
Wang M, Jiao S, Zhang J, Zhang X, Zhu N (2020) Identification high influential articles by considering the topic characteristics of article. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3001190
Article PubMed PubMed Central Google Scholar
Wang S, Liu X, Zhou J (2022) Readability is decreasing in language and linguistics. Scientometrics 127:4697–4729. https://doi.org/10.1007/s11192-022-04427-1
Article Google Scholar
Wang S, Xie S, Zhang X, Li Z, Yu PS, Shu X (2014) Future Influence Ranking of Scientific Literature. In: Proceedings of the 2014 SIAM International Conference on Data Mining 749–757. https://doi.org/10.1137/1.9781611973440.86
Weihs L, Etzioni O (2017) Learning to Predict Citation-Based Impact Measures, In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). pp. 1–10. https://doi.org/10.1109/JCDL.2017.7991559
Wen J, Wu L, Chai J (2020) Paper Citation Count Prediction Based on Recurrent Neural Network with Gated Recurrent Unit, In: 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). pp. 303–306. https://doi.org/10.1109/ICEIEC49280.2020.9152330
West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT (2013) The role of gender in scholarly authorship. PLoS ONE 8:1–6. https://doi.org/10.1371/journal.pone.0066212
Article CAS Google Scholar
Xia F, Su X, Wang W, Zhang C, Ning Z, Lee I (2016) Bibliographic analysis of Nature based on Twitter and Facebook altmetrics data. PLoS ONE 11:e0165997
Article PubMed PubMed Central Google Scholar
Xie J, Gong K, Cheng Y, Ke Q (2019) The correlation between paper length and citations: a meta-analysis. Scientometrics 118(3):763–786
Article Google Scholar
Yu T, Yu G (2014) Features of scientific papers and the relationships with their citation impact. Malays J Libr Inf Sci 19:37–50
Google Scholar
Yu T, Yu G, Li P-Y, Wang L (2014a) Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics 101:1233–1252. https://doi.org/10.1007/s11192-014-1279-6
Article Google Scholar
Yu T, Yu G, Wang M-Y (2014b) Classification method for detecting coercive self-citation in journals. J Informet 8:123–135. https://doi.org/10.1016/j.joi.2013.11.001
Article Google Scholar
Yuan S, Tang J, Zhang Y, Wang Y, Xiao T (2018) Modeling and Predicting Citation Count via Recurrent Neural Network with Long Short-Term Memory. arXiv:1811.02129 [physics].
Zhang J, Guan J (2017) Scientific relatedness and intellectual base: a citation analysis of un-cited and highly-cited papers in the solar energy field. Scientometrics 110:141–162. https://doi.org/10.1007/s11192-016-2155-3
Article Google Scholar
Zhang Y, Yu Q (2020) What is the best article publishing strategy for early career scientists? Scientometrics 122(1):397–408 https://doi.org/10.1007/s11192-019-03297-4
Zhao Q, Feng X (2022) Utilizing citation network structure to predict paper citation counts: a deep learning approach. J Inform 16(1):101235
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, International Islamic University, Islamabad, Pakistan
Asma Khatoon & Tehmina Amjad
Khoury College of Computer Science, Northeastern University, Silicon Valley Campus, San Jose, CA, USA
Tehmina Amjad
Faculty of Resilience, Rabdan Academy, Abu Dhabi, United Arab Emirates
Ali Daud

Authors

Asma Khatoon
View author publications
You can also search for this author in PubMed Google Scholar
Ali Daud
View author publications
You can also search for this author in PubMed Google Scholar
Tehmina Amjad
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Author A conducted the SLR as part of doctoral thesis and wrote the manuscript in supervison of author C who further improved the manuscript. Author B and author C reviewed the work and suggested changes for improvement.

Corresponding author

Correspondence to Tehmina Amjad.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Khatoon, A., Daud, A. & Amjad, T. Categorization and correlational analysis of quality factors influencing citation. Artif Intell Rev 57, 70 (2024). https://doi.org/10.1007/s10462-023-10657-3

Download citation

Accepted: 19 December 2023
Published: 22 February 2024
DOI: https://doi.org/10.1007/s10462-023-10657-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Categorization and correlational analysis of quality factors influencing citation

Abstract

Similar content being viewed by others

A probe into 66 factors which are possibly associated with the number of citations an article received

Factors affecting citation networks in science and technology: focused on non-quality factors

Factors affecting number of citations: a comprehensive review of the literature

1 Introduction

2 Literature review

2.1 Evidence of factors affecting citation and their classification

2.2 Document factors

2.2.1 Document general factors

2.2.1.1 Number of early citation/citation window

2.2.1.2 Publication type/study design

2.2.1.3 Publication year/time period of publication

2.2.1.4 Number of references/reference density

2.2.1.5 Article length/paper length

2.2.1.6 Page count/number of pages

2.2.1.7 Article language

2.2.1.8 Paper field (number of research fields of the paper/Interest of subject/research focuses Categories)

2.2.2 Document quality-related factors

2.2.2.1 Quality of paper

2.2.2.2 Abstract/ keyword characteristics

2.2.2.3 Presentation/Clarity

2.2.2.4 Novelty/Creativity

2.2.2.5 Open access/Visibility/Timeliness

2.2.2.6 Relevance

2.2.2.7 Peer-reviewed papers

2.2.2.8 Topic feature/Hot topics/Diversity of study topic/Versatile

2.2.2.9 Title of the article/characteristics of articles

2.3 Author related aspects

2.3.1 Author-name/author keywords

2.3.2 Author count/number of authors/co-authors and national/international collaboration

2.3.3 Centrality

2.3.4 Authors rank/ author score/scientific impact of author (first/ last)/H-index, author reputation

2.3.5 Institution related factors promotion/records of best paper awards

2.3.6 Average articles published

2.4 Journal related aspects

2.4.1 Venue

2.4.2 Journal impact factor

2.5 Altmetrics

2.6 Fact findings

3 Research methodology

3.1 Dataset

3.2 Methodology of citation impact computation

3.2.1 Pearson’s correlation coefficient

3.2.2 Data analysis

3.2.3 Correlational analysis among page count (PC) and citations (C)

3.2.4 Correlational analysis among author count (AC) and citations (C)

3.2.5 Correlational analysis among paper title length (TL) and citations (C)

3.2.6 Correlational analysis among abstract length (AL) and citations (C)

3.2.7 Correlational analysis among readability (RD) and citations (C)

3.2.8 Correlational analysis among hot topics (HT) and citations (C)

3.2.9 Correlational analysis among open access (OA) and citations (C)

3.2.10 Correlational analysis among recency (R) and citations (C)

3.2.11 Results

4 Recommendations

5 Conclusion

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation