1 Introduction

Knowledge dissemination is the most promising activity nowadays (Amjad and Ali 2018). Research publications are the main source of knowledge dissemination. Also, the knowledge diffusion research publication assists in hiring, firing, tenure, funding, and promotion decision (Chang et al. 2013). In the existing literature, the word citation is referred as a text used by the researchers to criticize or acknowledge previous research work. Citations provide valuable information that can be used as the basis to determine different information such as H-index, Impact of paper. Citation count is the measure for citation. It describes the influence of a publication and provides information on various research units, such as high ranked authors, venues, research groups, and research institutions. Predicting high impact research articles and evaluating the quality of research publications is a challenging task. Most researchers conclude that the citation count is used as a proxy for measuring the quality of research publications. The number of citations increases due to various reasons as many research publications are cited to refer to existing work (Aksnes et al. 2019; Harwood 2008). Furthermore, there are several ways to inherit knowledge in the form of citation. One is to support their technique and findings, ((Amjad et al. 2020), while others are cited to criticize research design (Song et al. 2022).

Recent work has utilized feature based, deep learning and time series (Zhao and Feng 2022) approaches to predict citations in research articles. In feature based approaches document related factors, author related factors, journal related factors, and altmetrics feature information are employed independently or their combination, however criteria for the selection of features varies in multiple studies. Feature based approaches for citation prediction has been adopted as a regression or classification problem. While using citation prediction as a regression problem, different regression models such as multiple sequential regression analysis (Lee and Brusilovsky 2019), Spearman correlation and Logistic regression, (Wang et al. 2019a, b), Paper Potential Index (PPI) Model, Multi-Feature Model (Bai et al. 2019), Regression equation (Ayaz et al. 2018), three step regression (Meyer et al. 2018), Multiple Linear Regression Model (Susarla et al. 2018), semi-continuous regression (Sohrabi and Iraj 2017), quartile regression (Stegehuis et al. 2015), stepwise multiple regression analysis(Yu et al. 2014a) are used. However using citation prediction as a classification problem SVM, KNN, and Bagging (Wang, et al. 2020), are used. Deep learning techniques, on the other hand, predict citation using smaller or larger datasets by leveraging temporal information, author or document meta information, or information from the title, abstract, or full text. Some of the approaches suggested for citation prediction include DeepCCP [(Ma et al. 2021)], multilayer BP [(Ruan et al. 2020)], BiLstm with attention model [(Shen et al. 2019)], (HAN) ST [(Maillette de Buy Wenniger et al. 2020)], and GRU-CPM model [(Wen et al. 2020)]. Previous research typically used the AMiner, Semantic Scholar, Scopus, Web of Science, and Google Scholar databases for citation prediction. These databases contain information about papers, authors, collaboration groups, and publications.

The most important fact is that several proposed classification and regression techniques exploit either extrinsic factors of the research article or meta information of author, venue, or their combination to predict the scientific impact of the research article. Limited studies have taken some of the intrinsic factors to assess the quality of the research article into consideration.

This study performs an extensive SLR to identify, categorize and correlate the factors that affect the citations received by an article. We collected data from multiple sources and have identified the correlation among multiple factors. We identified the major objectives that have been achieved by the existing literature and performed an analytical review of these factors that affect the citations. This study also presents a novel categorization of the existing literature based on the studied factors including document-based factors, author-based factors, journal-based factors, and altmetrics. The categorized factors have been correlated with the citations using PCC on the DBLP dataset to determine the impact of factors on citations. In addition, to validate the empirical and correlational findings, a comparison with earlier studies was carried out. From correlation analysis, it is concluded that main quality parameters have a favorable impact on citations. Furthermore, this study also propose recommendations for improving quality factors to make research more valuable.

The organization of the paper is as follows, Sect. 2 depicts the detailed SLR, critical analysis of the selected papers. Section 3 represents research methodology and Sect. 4 depicts the Results with their detailed analysis. Section 5 represents the Conclusion of the paper.

2 Literature review

Conducting a literature review is crucial for researchers to make valuable decisions and provide empirical evidence. This study focuses on conducting a systematic literature review (SLR) using Kitchmen guidelines to search, select, extract, and consolidate information from published evidence (Mariano et al. 2017; Snell et al. 2017). The following research question was developed.

RQ: What are the reported factors that can affect the early citation of a research article in existing literature?

The search strings designed was developed using the main research question i.e. ((“Factors” OR “Characteristics” OR “Feature” OR “Elements” OR “Attribute” OR “Properties”) AND (“Influence” OR “Determine” OR “Effect” OR “Predict” OR “Impact” OR “Significant” OR “Considerable”) AND (“Citation Rate” OR “Citation Frequency” OR “Number of Citation” OR “Citation Count” OR “High Citation” OR “Early Citation” OR “Citation Prediction”)).

2.1 Evidence of factors affecting citation and their classification

Several factors exist that drive the citation growth, usually, these studies are based on single as well as multiple factors for better prediction results. Author-based factors, journal-based factors, document-based factors, institutional factors, and altmetrics are employed to predict publication citations (Bai et al. 2019) Whereas, different parameters affect the quality of the scholarly publication. Among them, citation acts as a strong metric for assessing the quality of the scholarly article. Therefore, several studies have correlated citation rate with other factors i.e., author related, journal-related, document-related, and altmetics. Hence, it can be observed that internal quality factors and external factors make a great contribution to the paper’s citation impact. The citation rate could be considered a strong indicator to assess the quality of the paper (Yu et al. 2014a). To clarify the quality of the scholarly articles, several findings have been presented.

2.2 Document factors

As mentioned earlier, some existing studies have used a single parameter as well as a combination of parameters for the prediction of scientific impact. Several studies used machine learning techniques for prediction (Abuhay et al. 2018a, 2018b; Amplayo et al. 2018; Ayaz et al. 2018; El Mohadab et al. 2018; Fu and Aliferis 2010; Hwang et al. 2019; Lee 2019a; Lee and Brusilovsky 2019; Lisha et al. 2020; Luo et al. 2018; Wang et al. 2020; Pobiedina and Ichise 2016; Robson and Mousquès, 2016; Sohrabi and Iraj 2017; Wang, Zhang 2020; Yu and Yu 2014; Yuan et al. 2018). Moreover, some studies performed data identification and analysis (Fu and Ho 2016; Hu et al. 2018; Kosteas 2018; Prathap et al. 2016; Susarla et al. 2018)), by conducting a bibliometric investigation or proposed a novel technique (Abrishami and Aliakbary 2019; Bai et al. 2019; Yuan et al. 2018), In Tables 1 and 2 it can be seen that many influential factors (document-related factors) in assessing citation impact are determined and correlated with other similar factors as well. Document-related factors are divided into two major categories, Document General Factors, and Document Quality-Related factors. Document General Factor indicates the general extrinsic factors related to the article whereas, quality-related factors depict the factors that illustrate the quality of the article.

Table 1 Document general or extrinsic factors
Table 2 Document quality related or intrinsic factors

2.2.1 Document general factors

2.2.1.1 Number of early citation/citation window

Factors such as the Number of Early Citations after 1 year, length of citation window, the short term citation count, early attention, number of citations received in one year after publication, early citations, Citation Index (Citation per year), Total Citation Count in the first three years, illustrates the average citation time after publication and early dissemination of knowledge in the scientific community (Abramo et al. 2019; Chakraborty et al. 2014b; Garner et al. 2014; Guerrero-Bote and Moya-Anegón, 2014). According to research, the more citations a work receives, the higher is its quality (Ramezani-Pakpour-Langeroudi et al. 2018).

The restricting citation window has been used as a significant factor according to several researchers. They have used different terminologies to indicate the context of the citation window. The maximum limit considered for the citation window is from one to five years (Abrishami and Aliakbary 2019; Bai et al. 2019; Kosteas 2018; Lee and Brusilovsky 2019; Wang, et al. 2020; Stegehuis et al. 2015; Susarla et al. 2018; Wang, Zhihui 2020). While some of the studies have expressed that a citation window of one year for any publication is very low. Whereas, receiving a citation in the first year of publication expresses the worth of a particular publication, as some publications do not receive a decent amount of citations in the earliest year (Stegehuis et al. 2015). A novel technique was proposed to predict the future influence of a publication (Abrishami and Aliakbary 2019) depending on the citation window. According to the study, the more the citation count in a small citation window greater will be the publication influence. According to (Wang and Zhang 2020) a short citation window cannot be considered a significant citation impact indicator. In case of a short citation window (e.g., 2 years) results will not be reliable and ultimately leads to bias. Hence reliability can be associated with the length of the citation window. According to different studies, changes in citation window merely also affect the prediction of scholarly publication. Hence, citation impact prediction becomes more challenging when citation patterns evolve. At the same time, the citation window of 5 years is high to measure the impact of any publication. As a result, citation window of three years can be considered significant (Aksnes et al. 2019; Wang et al. 2020).

2.2.1.2 Publication type/study design

Multiple terminologies are analyzed from the prior literature and can be used in the same context such as Type of the Article, the type of papers, the study design, the publication type, and type of study (Hu et al. 2018; Lee and Brusilovsky 2019; Wang et al. 2020; Susarla et al. 2018). Also, these are the most important feature that affects the citation frequency, but some papers receive more citation and some are rarely cited (Aksnes et al. 2019). Publication type or the type of study can be a review, meta-analysis, a publication presenting novel methodology can be positively correlated with the citation frequency (Annalingam et al. 2014; Antoniou et al. 2015; Falagas et al. 2013; Farshad et al. 2013). A well-written and good quality review has a greater chance to enhance the citation impact (Biscaro and Giupponi 2014; Fu and Aliferis 2010; Gargouri et al. 2010; Ruano-Ravina and Alvarez-Dardet 2012; Sin 2011; Vanclay 2013). However, articles presenting a novel technique will add a strong correlation with the citation impact as it introduces a new technique for the dissemination of knowledge.

2.2.1.3 Publication year/time period of publication

The time of publication matters when a paper gets published at the start of the year or the end of the year. It seems that there is a biasness in the reception of citations. When the paper is published earlier in the year it gets more time to gain citations as compared to a paper published at the end of the year. Another terminology that is correlated with the publication time is the academic age of the article. (Araújo et al. 2012; Bornmann and Williams, 2013a; Lachance et al. 2014). Academic age recommends the total citation till the given date. Usually, older papers receive more citations than newer ones (Bornmann and Williams, 2013a; Ruano-Ravina and Alvarez-Dardet 2012; Tahamtan et al. 2016). According to some studies, recent articles receive more citations than older ones. The age of the article also depends upon the papers published. One of the research (Dey et al. 2017; Ke et al. 2015) focused on the fact that some publications get most of the citations in the initial years while, some of them do not get citations but suddenly get huge citations. Therefore, it can be concluded that the publication year or the time of publication does not affect the prediction of citation frequency. Different researchers have used the recent work to express their claims (Bai et al. 2017; Bornmann et al. 2014; Donner 2018; Wang, et al. 2020; Susarla et al. 2018).

2.2.1.4 Number of references/reference density

References in a document play a vital role, which ensures the claims depicted in the document, and references used depend upon the type of study. Usually review article receives more citations than the article presenting the methodology (Biscaro and Giupponi 2014; Fu and Aliferis 2010; Gargouri et al. 2010; Ruano-Ravina and Alvarez-Dardet 2012; Sin 2011; Vanclay 2013). An increase in the number of references receive more citations than fewer ones (Antoniou et al. 2015; Biscaro and Giupponi 2014; Bornmann et al. 2014; Didegah and Thelwall 2013; Farshad et al. 2013; Onodera and Yoshikane 2015; Robson and Mousques 2014; So et al. 2014; Van Wesel et al. 2013; Yu and Yu 2014).

Yuan et al. 2018 found that research novelty fades over time and frequency of citation of old concepts reduces with time. While on the other hand, Mathew effect suggests that rich gets richer. This means that well-known documents and reputable authors receive more citations.

2.2.1.5 Article length/paper length

Article length or the length of the paper indicates a positive correlation with the citation frequency because the increase in paper length indicates the addition of valuable knowledge to the research. A decrease in the length of the paper restricts the author from including information. However, some studies negate that the increase or decrease in the length of the article does not affect the citation frequency.

2.2.1.6 Page count/number of pages

The limitation of the page count limits the information to be presented, usually, the limitation of the publication is imposed in the conference proceedings where the scope of the research is limited, and the author is just presenting the research idea in front of the research community. Whereas, the page count is usually not limited to journal proceedings. The author can present their ideas in detail and therefore the number of pages is not restricted.

2.2.1.7 Article language

The language of the article is sometimes referred to as the journal's language, indicating the language in which the article is written. Language expresses the positive relationship between the language of the paper and the citation frequency. Some studies indicate that English is an international language and is used and understood by international researchers and therefore achieves more citations than other languages. Hence paper published in English journals has greater chances to be cited more frequently than journals in other languages.

2.2.1.8 Paper field (number of research fields of the paper/Interest of subject/research focuses Categories)

Several studies indicate that the number of citations varies across various disciplines and research fields. At the same time, citation behavior varies across major fields and subfields (Amjad and Munir 2021). The citations of paper depend upon the field size. Subfields achieve fewer citations than the major fields. For example, a paper published in the field of chemistry such as organic chemistry achieves more citations than biochemistry. In another study, the major field of biology achieve more citations than subfields. Besides different disciplines have different citation behavior. The citation rate is higher in social sciences as compared to the natural sciences. Therefore, it can be concluded that articles published in major fields and belonging to the social science discipline gain more citations as compared to the subfields and the discipline of natural sciences.

2.2.2 Document quality-related factors

The section illustrates the factors that have been used by several researchers to ensure the increase in citation frequency. Moreover, several techniques have been proposed by the researchers using the mentioned document quality-related factors in Table 2.

2.2.2.1 Quality of paper

One of the most important factors is the quality of the publication that impacts the research (Jabbour et al. 2013). Various studies have focused on the quality of publication but some extrinsic factors indicate the document's general factors as a quality measure. Hence, it can be analyzed that publication quality is illustrated more when used in some intrinsic factors (Bai et al. 2019; Robson and Mousquès, 2016; Yuan et al. 2018). In Table 2 we have concatenated the related factors. Recency (Pobiedina and Ichise 2016), Clarity, Timeliness(Robson and Mousquès, 2016), Abstract ratio, weight ratio (Sohrabi and Iraj 2017), Novelty (Amplayo et al. 2018; Castillo-Vergara et al. 2018), creativity (Tahamtan and Bornmann 2018) can be considered under the umbrella of document quality-related factors.

2.2.2.2 Abstract/ keyword characteristics

Abstract and Keyword factors are extracted from the text of the article (Uddin and Khan 2016). In previous studies, abstract, keywords and title features are explained comprehensively and we have analyzed that the abstract and keyword features contribute to the contents of the article. Diversity and the use of several keywords in an article help to increase the citations ((Chakraborty et al. 2014a; Rostami et al., 2014a; So et al. 2014). Usually, the use of keywords can direct the researchers to the relevant studies. Improper use of the article's keywords might not be accessed. Hence the use of several keywords, the percentage of keywords, and keyword diversity have a significant relation with the citation count (Uddin and Khan 2016). Abstract in any article depicts the crust of the whole concept presented in the article. The researchers usually screen the articles by their title and abstract. Hence abstract has its importance along with the title. At the same time, the title and the keywords used in the article must have relevance with one another and be diverse. Diversity and the number of keywords used in the title or abstract might increase citations (Chakraborty et al. 2014a; Rostami et al. 2014a; So et al. 2014). Another analysis is that the more the number of keywords in the titles matches the number of keywords in the abstract, the greater will be the citations (Annalingam et al. 2014; Falagas et al. 2013; Rostami et al., 2014b). It has also been analyzed that the longer abstract and presence of abstract helps to achieve more citations, while structured abstract has little effect on citation frequency (Tahamtan et al. 2016; Van Wesel et al. 2013).

2.2.2.3 Presentation/Clarity

The presentation can be expressed through three major concepts well-structured article, the use of figures and tables, and the readability of the article (Uddin and Khan 2016). An article is nicely presented, if an article is well structured in which the literature, problem, contribution, methodology, and results are depicted. Secondly, the addition of tables and figures to the articles enhances clarity and ultimately achieves more citations. Last but not least is the readability of the article, we expect that if the language of the article is more readable and helps more to grasp the concepts, it will be cited more frequently (Uddin and Khan 2016). According to the concept used by (Meyer et al. 2018), whenever the article is cited, it is actually due to the presentation and clarity of contents in a particular article. Therefore, the primary drivers of the citations of the articles are their contents, quality, and the way the contents are presented to give the reader deep knowledge about the concept expressed in the article. Hence a good presentation helps to grasp the idea that eventually helps to reduce the efforts of understanding.

2.2.2.4 Novelty/Creativity

(Veugelers and Wang 2019) mentioned that novel publications in respective fields might achieve more citations. However, novelty or creativity factor has some risk associated with it, as the novelty might add knowledge to the research along with a high uncertainty level because it might face failure. Novelty can ensure breakthroughs (Chai and Menon 2019). Publication novelty sometimes expressed as creativity might achieve and have a strong impact on the number of citations (Amplayo et al. 2018; Chai and Menon 2019; Veugelers and Wang 2019).

2.2.2.5 Open access/Visibility/Timeliness

Open access and visibility are related to the number of citations according to several studies. Previous studies assumed that the articles with open access have high visibility and therefore more cited (Koler-Povh et al. 2014; Uddin and Khan 2016) used the concept that there is a positive correlation between open access or online availability to the number of citations. Various studies depicted that open access separately has little or no effect on the citations, therefore, it can add flavor to increase the number of citations when published in high-impact journals (Craig et al. 2007; Koler-Povh et al. 2014).

2.2.2.6 Relevance

The scientific relevance that comes under the umbrella of quality-based factors of the article is related to the citation impact and citation counts. Relevance is related to the contents of the article. Hence scientific relevance and the number of citations is correlated (Leydesdorff, Bornmann, Comins, & Milojevic, 2016) but the number of citations does not help measure scientific relevance. According to our analysis relevance is a factor that can be used along with some other factors as well, for instance, relevance could be the relevance of the topic feature with the article author and relevance could be the relevance of the article with the peer reviewers. Whether it belongs to the specific field and without the conflict of interest of the reviewer or not (Geng et al. 2019; Ishag et al. 2019; Li et al. 2017),.

2.2.2.7 Peer-reviewed papers

Peer review is crucial for high-quality academic publications, assigning experts to relevant publications. Fairness and relevance are essential, with minimal conflict of interest. However, author-reviewer relationships should be examined to avoid biased reviews, which can lead to high citations (Ishag et al. 2019; Li et al. 2017). A negative correlation between peer review and citations can be observed (Geng et al. 2019; Ishag et al. 2019; Li et al. 2017).

2.2.2.8 Topic feature/Hot topics/Diversity of study topic/Versatile

Topic features are more related to the contents of the article reflecting the main semantics of the contents that act as guidance to reference the literature. In this research, we have combined the features used in terms of the topic in different aspects. Topics that might get attention due to their popularity can be referred to as hot topics. According to the scholar’s point of view, hot topics are a high-quality citation influencing factor than others. Therefore, hot topics are gaining more attention and ultimately receive more citations (Fu and Aliferis 2010; Gallivan 2012). The number of citations varies based on study topics' diversity and versatility, with social sciences receiving more citations than natural sciences. Hot topics receive more citations than outdated ones, with study design and article topics being the most predictive factors.

2.2.2.9 Title of the article/characteristics of articles

Title length is a crucial aspect of research articles, as it influences citation impact and attracts researchers and practitioners. Studies show that title length varies across disciplines, with longer titles indicating significance and greater citation impact in medical fields. However, it is not considered an influential factor and has a negative correlation with citations in the field of psychology and management. Hence a shorter title would be more concise and more appealing to the audience. (Stremersch et al. 2015; Rostami et al., 2014b; Tahamtan et al. 2016).

Title characteristics, including punctuation marks, can impact citation impact. Titles with colons, hyphens, and brackets receive more citations, while simple alphabets and questions-oriented titles are better predictors of citation. Research (Giuffrida et al. 2019) determined the factor of the versatile title of a research paper to increase the rate of citation, Therefore this research correlates the paper title to the number of citations. Another feature related to the title of the article is to mention the study design in the title (Subotic and Mukherjee 2014). For instance, systematic literature review, the type of methodology, and the data analysis and investigation indicate a strong correlation with the citation impact (Antoniou et al. 2015).

After summarizing the detailed findings, it can be analysed that Researchers use extrinsic factors to measure publication quality, but intrinsic factors are more crucial and must be associated with publication quality.

2.3 Author related aspects

2.3.1 Author-name/author keywords

The author's name and keywords are mentioned as author-related factors in Table 3 that may play a role in achieving high citation counts (Bai et al. 2017; Fu and Ho 2016). According to (Fong and Wilhite 2017) working harder isn’t the only way to increase an academic’s publication count. The inclusion of researchers as authors on publications or grant submissions even though they did not contribute to the research effort is referred to as honorary authorship (Fong and Wilhite 2017; Singhal and Kalra 2021), (Singhal and Kalra 2021). There can be many reasons to add the author names in research articles that will eventually lead to a gain in the number of citations (Amjad et al. 2015). Positive reviews can be attributed to author's reputation, but resolving the critical conflict of authorship order is crucial. First and last author publications are equally important for recruiting, promotion, and tenure in scientific domains (West et al. 2013). The importance of the authors is analyzed by (Bu et al. 2020) in which weights have been allocated to multiple authors so that their contribution could be added to the context as well. Moreover, it sometimes takes part in funding and other formalities as well (Singhal and Kalra 2021; West et al. 2013). While searching for articles published by the prestigious author, the targeted article is also searched by using the name of the author. However, in case of similar names of multiple authors leads to author disambiguation issues (Shoaib et al. 2020). As a result, a solution to address this issue must be proposed.

2.3.2 Author count/number of authors/co-authors and national/international collaboration

Collaboration between authors and co-authors significantly impacts research, as increased authors contribute valuable knowledge and increase the likelihood of being cited by co-authors (Amjad et al. 2017, 2016). National and international collaborations have a significant relationship with the number of citations, as creative minds from multiple areas enhance the knowledge (Liao et al. 2018). Collaboration among multiple authors residing internationally has a greater impact on citation frequency, and collaboration among authors from western countries increases the number of authors and co-authors in an article (Han et al. 2014).

2.3.3 Centrality

In any research environment, collaboration is very important (Han et al. 2014). Centrality is a crucial factor in collaboration networks, where authors' positions play a significant role in accessing resources and impacting publication citations. Degree centrality indicates the importance of an author, while betweenness centrality determines the importance of a node in information flow. Higher betweenness centrality leads to larger paths among nodes, highlighting the importance of centrality in a collaboration network (Guan et al. 2017). Therefore it has been concluded that in some fields, for instance, climate change field, according to Biscaro and Giupponi, (2014) degree and betweenness centrality has a significant influence on future citations of the authors. Moreover, in fields like steel structure and information science field, both measures have a positive association with the citation count of the authors (Lee 2019b; Li et al. 2013; Uddin et al. 2013).

2.3.4 Authors rank/ author score/scientific impact of author (first/ last)/H-index, author reputation

Author H-index, author reputation, and scientific impact measure productivity, with well-reputed authors having greater impact and prestige. These terms reflect the author's contribution and influence citations.

2.3.5 Institution related factors promotion/records of best paper awards

Several factors are been considered by researchers in previous studies such as size of the department, productivity of the department, ratio of department professors, research projects, and funded projects (Tahamtan et al. 2016). However, the main focus of the research is the institutional factors such as institutional prestige and university rank. Authors affiliated with prestigious and high rank universities receive more citations and some international institutes achieve more citations due to the reputation of the institutions (Amara et al. 2015; Amjad et al. 2018, 2016; Collet et al. 2014),. Authors publishing their research in high impact journals receive more citations and eventually lead the author to promotion and rewards from the institute.

Faculty members publishing multiple publications from single research have a greater impact on the research community, gaining more achievements. Funding and grants from the research community or industry contribute to industry-based research projects, which often acquire more citations. The best paper award is determined by paper characteristics, with longer papers having a higher chance of being award-winning and highly cited (Coupé, 2013).

2.3.6 Average articles published

The category is related to the research ability of the research scientist, whether working with productive and reputable researchers in the early times of the research’s career will affect the future advancement of the researcher (Amjad and Munir 2021; Lee 2019b). According to the study by (Han et al. 2014), in the field of computer science, citations for single-authored articles have decreased in the twenty-first century; consequently, the influence of working with credible collaborators must be considered for future research performance. Now the earlier publication irrespective of their publication type (journal or conference) has a significantly positive impact on attaining the future success of the article as well as the researcher ((Acuna et al. 2012; Penner et al. 2013). Detailed analysis reveals that publications in conferences and journals have different impacts on early researchers, with conference publications serving as a benchmark for future research influence (Amjad et al. 2022; Lee 2019b).

2.4 Journal related aspects

2.4.1 Venue

According to our analysis, the mentioned terms can be used alternatively and have been used by (Ayaz et al. 2018; Robson and Mousquès, 2016; Susarla et al. 2018), (Fu and Ho 2016). The frequency of citations a paper receives is influenced by how it is presented. The modes of presentation can be of two forms either journal or conference (Ibáñez et al. 2013). The papers published in journals gain more citations than conference publications. However, Tahamtan et al. analyzed no relationship between the citations and conference proceedings in a review. The conference proceedings are more valuable for presenting the topic and getting an initial idea. In comparison to conference papers, journal papers receive more citations per document and year. The journals can be international and national, general and specific. International journals receive more citations than national journals (Annalingam et al. 2014; Tahamtan et al. 2016). The papers published in general journals but high impact factor ultimately receive more citations than papers published in specific journals (Tahamtan et al. 2016; Vanclay 2013). Therefore the venues are also associated with the topic specificity (Amjad 2021). According to Daud et al. venues with high citations have low entropy and venues with low citations have high entropy, the term entropy has been used to indicate the closeness to topic specificity (Daud et al. 2019). Hence it can be concluded that papers published in journals could be a significant predictor. Moreover, journals in combination with high impact factors could be more influential to gain citations (Table 3).

Table 3 Author related factors

2.4.2 Journal impact factor

According to our analysis mentioned terms in Table 4 can be used to express journal impact factor i.e., journal impact factor, Venue Prestige, Impact of Venue, Venu rank, Average citation received by the paper published in the venues, journal ranking attribute. Usually, journals are evaluated on the basis of journal impact factor that intern can be used to evaluate research and researchers (Brito and Rodríguez-Navarro 2019). To gain citations, researchers try to publish their work in prestigious and high impact journals. As high impact journals are more appealing to research scholars (Uddin and Khan 2016). Several studies depict the positive correlation of high impact journals with the number of citations (Bornmann et al. 2014; Bornmann and Williams 2013b; Daud et al. 2019; Didegah and Thelwall 2013; Falagas et al. 2013; Fu and Aliferis 2010; Garner et al. 2014; Jiang et al. 2013; Van Der Pol et al. 2015; Vanclay 2013). It has been analyzed by (Waltman and Traag 2017) scientific merit of the article is well measured using the journal impact factor. Moreover, the quality of the paper is associated with the journal impact factor, as papers with high quality will eventually be published by high impact factor journals. More importantly journal impact factor along with other independent factors could help to achieve citations. For instance, open access or visibility with journal impact factor, or early citation count and journal impact factor (Hammarfelt and Rushforth 2017). While, according to (Prathap et al. 2016) journal impact factor solely is not a significant predictor of citation rate.

Table 4 Journal related factors

2.5 Altmetrics

After considering bibliometric indices, several researchers have introduced the alternative indices with the emergence of social networks also known as altmetrics as shown in Table 5. In their point of view while measuring the scholarly impact the altmetrics or alternative indices such as viewed article, saved article, Rate of the Article (Wang et al. 2019a, b), download counts (El Mohadab et al. 2018) should be considered. In today’s era of social network connectivity, altmetrics could be a strong indicator to enhance citation rate. (Nuzzolese et al. 2019) investigated the relevance and effectiveness of altmetrics in determining future success of the article. Several studies investigated the association among traditional bibliometric factors and altmetrics (Nuzzolese et al. 2019; Thelwall et al. 2013; Xia et al. 2016). Altmetric is now considered a significant factor, as the author does not have the knowledge for the existence of the article that can be cited. Articles disseminated via social communities, self-archived and open access have greater chances of citation as the chances of knowledge dissemination increase. Although altmetric is not the replacement for traditional measures but the speed with which the knowledge is disseminated towards the scientific community is the major property. Because the article is discussed and mentioned by the scientific community can be seen and therefore its worth can be analyzed properly (Table 5).

Table 5 Altmetrics

2.6 Fact findings

The following section depicts the detail critical analysis of the paper in considered in the literature.

Table 6 depicts the critical analysis from some literature studies; therefore, it illustrates the basic idea about the type of paper, the problem domain, methodology, and results along with limitations from the year 2012 to 2022, in which factors that affect citations are highlighted. The presented factors considered in the literature are somehow related to the separated factors e.g., Paper related factors, journal related factors, author-related factors, and almetrics. It is observed that articles in the category of paper related factors (El Mohadab et al. 2018; Lee 2019a; Lisha et al. 2020; Wang, et al. 2020; Meyer et al. 2018; Yu et al. 2014b), have not focused on the intrinsic factors, Moreover, it has also been analyzed that the combination of the mentioned factors played a significant role to enhance citations. Different machine learning such as (El Mohadab et al. 2018; Fu and Aliferis 2010; Wang, et al. 2020), has been proposed to certify the claim and the evaluation of results indicate that most of the researchers have presented mostly the author related, journal related factors and extrinsic factors of an article to predict or determine higher citation although intrinsic factors that deal with the contents of the publication is the major document for knowledge dissemination. In some studies, some intrinsic factors have been considered along with extrinsic ones, but the quality-related factors of a publications are ignored.

Table 6 Critical analysis

The objective of this research is to find the quality factors to evaluate the citations used in the existing literature. Therefore, SLR has been conducted for an in-depth study of each existing research and analysis of factors based on the groups classified. Initially, a meta-synthesis is performed, that strengthens the analysis and exploration of existing qualitative studies. As a result, Table 1 depicts the factors overlooked in various studies that lead to the development of another group that is based on the factors related to quality which are highly ignored in previous research.

From Table 7 it can be comprehended that impact assessment of either article, author, venue, and a new matrix altmetrics can be considered as an important activity, though, very few surveys exist that match the objective of our research. (Fronzetti Colladon et al. 2020) conducted an empirical analysis to predict the long-term citation count by considering the bibliometric as well as social features and author restricted the bibliometric features related to the abstract only. The objective of (Wang et al. 2019a, b) was to determine the importance of bibliometric and altmetrics. While most of the intrinsic factors are missed in the survey by considering only extrinsic factors. (Cai et al. 2019) surveyed for scholarly impact assessment and also discussed various solutions by assigning weights. Their main focus was to provide the weighing solutions assessment of the author, journal, and article but the research did not consider the indicators for article assessment. Various factors have not been considered for the appropriate assessment. (Tahamtan et al. 2016) comprehensively presented the factors affecting the citations and they categorized the factors according to paper, author, and journal is conducted. The study of quality-related factors needs more emphasis from the researchers. According to (Waltman 2016), citation impact indicators significantly affect research productivity. They utilized the bibliographic databases for the calculation of citation impact with more focus on the journals and the authors and less focus on the articles. An inclusive survey conducted by (Onodera and Yoshikane 2015) helped the researcher to collect several factors in a single document. The entire work was gracefully separated into several factors, but those factors are not categorized, and most importantly, several intrinsic factors are overlooked, which ultimately influenced the article's citation.

Table 7 Existing surveys for citation impact prediction

3 Research methodology

In this section the use dataset and methodology is elaborated. The correlation among citations and quality factors is determined as shown in Table 9.

3.1 Dataset

In this study, comprehensive experiments are illustrated using the citation network dataset (DBLPV13) from the link https://www.aminer.cn/citation. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other data sources.The dataset is designed only for the research purpose and multiple updated versions are available at the mentioned link. We have used DBLPV13 that contains 5,354,309 papers and 48,227,950 citation relationships from the year 1990–2019. The dataset has 5.3 billion instances and features for instance Paper Id, Paper Title, Author Names, Affiliations, publication year, Field of Study, keywords, Abstract and References. While, some of the features are derived from the existing data, those features include, Paper Title length, Number of Authors, Article Length, Open Access, Presentation, Hot topics on the basis of the keywords feature, hot topics count and abstract length. The dataset has a lot many ambiguities that require huge data cleaning operations. We have considered a subset of the dataset that analyzes 150000 instances for the purpose of experimentation. The dimension of the dataset selected is (150000*30). The considered subset is further divided into multiple chunks for better experimentation. The feature set is built around four major categories that influence article citations: document general features, document quality factors, author related factors, and venue related factors.

3.2 Methodology of citation impact computation

To find the impact of various factors on citations Pearson’s correlation coefficient is used.

3.2.1 Pearson’s correlation coefficient

This type of correlation usually determines the linear regression. It indicates the nature of relationship between two variables i.e. either strong or weak. One is the dependent variable and the other act as the independent variable. Citations received by the articles are correlated with multiple features to depict the association among the features. The Dataset is divided into multiple chunks. The correlation of individual chunk is calculated and mean is calculated for each factor. Some subsets of these features are collected for experimentation purposes to prove the theoretical concepts collected from SLR findings are shown in Table 8.

Table 8 Feature distribution

3.2.2 Data analysis

The quality of the article content relies upon multiple factor that includes presentation and readability of the article, openness, or availability of the article, topic characteristics of the article, abstract, keywords, and relevance. This study evaluated few factors and correlated them with the article's citation rate. Because of the dataset's limitations, only a few factors are chosen. The Correlation of these subsets are visualized in the form of scatter plots. The y-axis represents the dependent variable, citations, and the x-axis represents the factors whose impact on citations must be determined. The visualization is performed on the subsets of the datasets using Jupyter notebook, Python.

3.2.3 Correlational analysis among page count (PC) and citations (C)

The first dimension is the article length or page count of the research article. During the SLR analysis, it was concluded that extending the article length provides favorable conditions in terms of high citation rate; nevertheless, the experimentation demonstrated that increasing the article length has little or no effect on the citation rate of the article. Pearson's correlation is computed between citations and page count. It is extracted and added to the dataset by the difference between the start and finish of the page. The association between page count and citation is 0.035972. We also checked the rest of the data, although the correlation coefficient value fluctuates from 0.01 to 0.03 in different portions of the dataset. Hence it can be concluded that the average value of correlation coefficient is 0.03 that depicts the weak positive correlation. Both values alternatively increase or decrease till a certain pattern as shown in Fig. 1. Furthermore analyzing conference and journal separately might affect the citations. As, conference articles have a limitation of fewer pages than journal articles therefore, conference articles become a bridge for idea initiation and journal article for knowledge dissemination.

Fig. 1
figure 1

Correlational analysis among page count and citations

3.2.4 Correlational analysis among author count (AC) and citations (C)

Author count is the number of authors that contributed in the research article. According to researchers, number of author or authors count is among those factors that affect citation. The increase in author count might increase the indirect self-citations that negates the quality factor in the research. This study correlated the citations with the author count and correlational coefficient value for is 0.0161 that shows the low positive correlation. The data point distribution in the graph indicates how “Author Count” is related to "Citations" received by each article as shown in Fig. 2. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables. This illustrates that the increase in author count affect citations in a positive way till a certain range of author count. Hence this study concluded that low author count has high impact on citations.

Fig. 2
figure 2

Correlational analysis among author count and citations

3.2.5 Correlational analysis among paper title length (TL) and citations (C)

Title characteristics have an impact on citations, although they differ among disciplines. The length of the title is another characteristic that is reported in research as a significant factor. It is investigated from SLR that the paper title length has an impact on citations. The experimentation on dataset indicates that the lesser the title length the higher is the citations rate. The correlation among citations and paper title length is calculated and values range from − 0.08568 to − 0.00609. The average of these results in − 0.03417 and hence there is a weak negative correlation that indicates the increase in title length causes citations to decrease till a certain pattern. The graph's data point distribution shows how "Title length" relates to “Citations” obtained by each article. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables. Hence lower title length is highly correlated with the citations as shown in Fig. 3. The versatile nature of the title and title relevance with the study design strongly correlates with the number of citations.

Fig. 3
figure 3

Correlational analysis among paper title length and citations

3.2.6 Correlational analysis among abstract length (AL) and citations (C)

The abstract of the article depicts the extract of the actual work therefore, the abstract of an article serves as a summary, and articles are typically extracted using keywords. As a result, a well-written abstract and the right usage of keywords (several, diverse) in the abstract are positively associated to the citation impact. The decision to the inclusion of a specific study in a research is usually done on the basis of the abstract of the article as it depicts the relevance of the study hence, the abstract must be readable so that the contents of the study must be delivered. The length of the abstract is usually restricted by several journals and conferences, as the researcher has to communicate the insight of the work from the abstract. The experimentation indicates the value of correlation equals 0.056817 that depicts the positive correlation till a certain range. The graph's data point distribution shows how "count of words in Abstract" relates to "Citations" obtained by each article. Each dot on the graph represents a publication, and its position on the graph reflects its values for both variables as shown in Fig. 4. Moreover, the abstract length of the journals must be less than 300 words. The length along with the readability and relevance of the article will certainly lead to gain more citations.

Fig. 4
figure 4

Correlational analysis among abstract length and citations

3.2.7 Correlational analysis among readability (RD) and citations (C)

Readability of an article is an important aspect in terms of quality of the article. Readability is improved by well-presented content, and the addition of tables and figures to an article intensifies clarity. In this study the readability score is calculated using Flesch readability score (Eleyan et al. 2020). Flesch readability score usually range from 0 to 100 however extremely difficult text could result in a negative score. The score is computed on the abstract of an article as the abstract depicts the extract of the research. The detailed review revealed that the more an article will be readable the more citation it will gain. The correlation coefficient values fluctuates between − 0.03 to 0.0003 and therefore the average values for the correlation is − 0.014 that indicates a positive correlation, i.e. both values either increase or decrease at the same time. The graph data points as shown in Fig. 5, indicates that the value range from 10 to 80 receive more citations and the negative score indicates extreme difficult text. Hence the experimentation showed that understandability of the article will be positive when the article will be easier to read. Consequently, the number of citations has a significant correlation with readabiity and presentation. Another aspect is that readability affect citations in terms of relevance, therefore it can be concluded that if a researcher finds an article more readable but not relevant then it affect citations in a negative way.

Fig. 5
figure 5

Correlational analysis among readability and citations

3.2.8 Correlational analysis among hot topics (HT) and citations (C)

Another important factor is the hot topic or the topic in trend that is calculated using keywords of the article. The frequency of terms from keywords is maintained and compared the term in the title to get the trendy topics. The methodology demonstrates how to obtain the most common term in all title, keywords (organize a collection of frequent words, and if the words appear in the title or keyword, mark that from 1 to onwards according on the frequency of the words). Therefore, it can be concluded from the empirical study that hot topics gain more citations. The experimentation indicates that the correlation coefficients for the hot topics count is 0.0172. However the values varies from -0.0006 to 0.0391. Since our methodology is based on the title and keywords, it may produce different results when the abstract or field of study is taken into account. This study concludes that hot topics are influencing characteristics to gain citations as shown in Fig. 6 when used along with other factors for instance the publication date.

Fig. 6
figure 6

Correlational analysis among hot topics and citations

3.2.9 Correlational analysis among open access (OA) and citations (C)

Another significant factor is the openness or availability of the article, the more the article is visible the more it will be read and used in the research. The features are extracted on the basis of the URL/PDF and DOI (digital Object Identifier) of the article. The data is scraped using DOIs, and binary values (0,1) are encoded based on open access and non-open access. The value 0 is assigned to unavailability, whereas the value 1 is assigned to open access to the article. The correlation among open access and citations varies from 0.05 to 0.10 as shown in the Table 9. Therefore the average correlation coefficient value is 0.0824 that depicts positive correlation i.e. with open access the chances for article citation increases. The graph's data point distribution shows how “Open Access” is related to “Citations” received by each publication. Each dot on the graph represents a publication, and its position on the graph indicates its values for both variables. Hence it can be seen in Fig. 7 that visibility of the article has a significant impact on citations and visibility has greater impact on citation if published in high impact journals.

Table 9 Correlation among Factors and Citations
Fig. 7
figure 7

Correlational analysis among open access and citations

3.2.10 Correlational analysis among recency (R) and citations (C)

One of the factors is publication year of the article has strong effect on citation. It is analyzed through SLR that article starts gaining citations 3 to 5 years after publication. The experimentation is carried out on the mentioned dataset and the correlation is calculated. The correlation coefficient ranges from 0.004 to 0.009 as shown in Table 9. The mean value for the correlation is 0.009025. The visualization of data also indicates that the recently published articles gain more citations than the older ones as shown in Fig. 8. The graph's data point distribution shows how “Years_Since Publication” is related to “Citations” received by each publication. Each dot on the graph represents a publication, and its position on the graph indicates its values for both variables. The lower the value of recency higher is the value of citations. Hence it can be concluded that more recent articles are cited more as they are used as the baseline for other work. Hence it also covers recency effect mentioned by several researchers. The more recent a work will be more it will gain citation.

Fig. 8
figure 8

Correlational analysis among recency and citations

3.2.11 Results

This section expresses a comparative analysis of the factors in existing and the current study. The characteristics identified in this study are the result of SLR; an extensive SLR is performed to find the parameters influencing citations. The detailed discussion on the comparison of factors with results of SLR has already been discussed in the previous section. The effect of citation on some of the factors is investigated and compared to earlier research. The following research considers the affecting elements listed in the table… it shows the existing studies and the methodology for determining the correlation it can be seen from Table 10, that majority of the analysis is performed on extrinsic factors. They have used PCC, MR and SC to get the relationship among individual and multiple factors. Existing research depicts that the article length has positive while weak relationship moreover this research also expressed the same positive (0.0395) hence weak relation. Existing research indicates that author count is a major predictor of citations, while our analysis determined only a weak positive association. Length of the title is considered as a significant predictor in previous research while this research did not find the factor valuable predictor for the citation (− 0.03417). The value shows that the factor is not correlated with the citation i.e. increase or decrease in one title length do not impact the citation of the article. The length of the abstract is limited in journal/conferences. The research found it as significant measure (0.05681), the correlation value indicates that the abstract length is positively correlated with the citation, however the value could be more influential if the readability and relevance is added to the abstract length. The detailed analysis has already been explained in the previous section. Recency or paper age is truly determined after citation window of 3 to 5 years. The limitation to the study is that the citation window is not defined while considering the paper age. However, they act as weak predictor (r = 0.0092) from the data analysis, however the results might varies when considering a specific Citation window. From existing studies, it can be analyzed that it varies among several disciplines. Scientific articles exploiting recent references can be a significant factor for citation and need to be investigated. Open Access determines the visibility or availability of the article and therefore it can be categorized as a document quality factor and the altmetrics. Several social platforms support the research work and helps in the dissemination of the research. From the existing and the current study it can be taken into consideration as a positive predictor (0.0824) for citations.

Table 10 Comparative analysis factor with existing studies

The metric used to assess for readability is different to ensure symmetry with previous studies as shown in Table 11.The comparison reveals that the readability has a negative impact on citation. The minimum score for FRE is − 305.36 (Very difficult) whereas the maximum score is 84.13(very easy) which express that the readability of the abstracts varies. Hence the mean is 9.15 which indicates that the abstracts are difficult to read. The results are compared to information sciences and Linguistics and the mean value of (18.87) and (28.45) respectively. Hence it can be concluded that the information science and computer science article’s abstract are more difficult to read. This study uses the abstract of a paper to calculate the readability score; the illustration may differ if the full text or another content attribute is selected.

Table 11 Comparative analysis of readability with existing studies

Hence it can be concluded that (Recency, Open Access, Hot topics, Abstract Length, page count) have the positive impact on citations whereas (Readability, Paper title length) has negative affect on citation. The former presents that the factors are significantly correlated with the citation count while the latter has no significant impact on citations. Considering other factors in combination with these might affect the citations in a positive way. It is observed that the relationship varies using Pearson’s correlation and is less linear, therefore spearman is also added to the experimentation to get the more accurate and association. Spearman’s correlation give more accurate results. Spearman’s correlation results that there is a need to explore the quality factors as they are significantly affecting the citations. Moreover combining these with other factors might give better results.

4 Recommendations

The quality of the article acts as a major influencing factor, and it has been ignored while extracting the factors that affect citation. Evaluating the quality of the paper is a challenging task (Jabbour et al. 2013; Maillette de Buy Wenniger et al. 2020; Onodera and Yoshikane 2015), Few factors determine the quality of the article, categorize as scientific or non-scientific. According to several studies, it can be determined using journal-related, author-related, and document-related factors. In the case of non-scientific categorization, the prestige of the journal (Impact Factor) in journal-related factors can be used to measure the quality of the article (Kosteas 2018). While, authors reputation, h-index, collaboration among the authors increase the number of citations, and the number of citations usually acts as an indicator to determine the quality of the article resides under non-scientific categorization. Now in the case of paper-related factors, extrinsic factors that are not directly linked to the contents of the paper comes under non-scientific. Moreover, dealing with intrinsic factors that are related to the contents of the paper, for instance, novelty, creativity, and innovation are categorized under scientific. The major problem is that is difficult to quantify the scientific category. Therefore, no consensus has been established to measure the quality of the article.

Every scientific manuscript is evaluated in which the quality is usually assessed by the experts and the process is referred to as Peer Review (Li et al. 2019; Prathap et al. 2016). The collaboration among multiple authors is increasingly prevalent in several fields for instance medicine, economics, and finance which can be due to multiple reasons for instance knowledge diffusion, to involve the experts in the research, to avoid the risk of rejection of manuscripts, and delayed reviews. The Journal reviewer may be related to the manuscript holder as a co-author in other manuscripts which may cause acceptance of a low-quality paper. That might restrict fairness in determining the quality of the article. It is important to understand the position of the author and the reviewer of the particular journal in which the article has been submitted. The author and the reviewer might have some indirect relationship that introduces bias into the evaluation of the article's quality.

Most of the studies worked on extrinsic factors of the article (El Mohadab et al. 2018; Wang, et al. 2020; Wang, Zhang 2020),. While, the quality of the article is determined through the intrinsic factors (Bai et al. 2019; Jabbour et al. 2013). As we have already discussed that the quality of an article cannot be assessed solely on the basis of the manuscript's external factors. Existing research focuses mostly on extrinsic factors and hence assumes that the quality factor is dependent upon only the extrinsic factors. Therefore, it can be analyzed that quality of the article has a strong association with intrinsic factors of the article. A major reason to consider the extrinsic factor is that most of these factors have quantitative basis, therefore, measurable. For instance, using limited number of pages limits the information elaboration increase in number of pages add more knowledge. Moreover, these number of pages are measurable, and can be correlated with the citations. At the same time, intrinsic factors mostly have qualitative basis, hence unmeasurable. Although relevance and clarity are very important quality factors, they are not measurable. According to our analysis, some of the document's extrinsic factors are important and their importance amplifies when they are combined with the document's intrinsic factors.

In existing literature (Abramo et al. n.d.; Brito and Rodríguez-Navarro 2019; Jian et al. 2019) Journals/venues are considered the main source for achieving high citation of articles. Journal impact factor and venue prestige have a positive correlation with the number of citations. Also, some publishers are more promising than others, renowned publishers achieve more citations than others. Hence, it is revealed that papers published by Springer receive more citations than those papers issued by Taylor and Francis. Therefore, journal related factors cannot solely evaluate the quality of the articles. There might be the possibility that the researcher and the peers have a direct or indirect collaborative relationship and therefore the acceptance of articles in reputed journal becomes biased. Presenting an article in a well-known and well recognized journal increases visibility and hence helps to achieve more citations. After analyzing existing literature, it can be extracted that journal related factors cannot be the sole criteria to achieve high citation. Moreover, journal related factors when combined with some other factors particularly, intrinsic factors can be the reason to obtain high citation.

In some studies, factors related to the author are considered more important than others to gain high citation. Author factors such as the number of authors, h-index, or reputation of the author are those that many researchers have used to gain citations. Authors that collaborate with other authors actually generate self-citations. Likewise, the article is also cited because of the author's reputation. Although factors associated with the article's authors are significant, the quality of the article cannot be determined using the information related to the author. Some factors related to authors have a quantitative basis and therefore they are measurable. For example, the number of authors, or h-index of the author have a quantitative basis and can be used to assess the status of an article’s citations indirectly.

Hot topics are those that are likely to attract attention due to their popularity (Daud et al. 2021a). According to our analysis, hot topics receive more citations than outdated topics. Reputable venues are willing to consider a hot topic, focusing more on the article title than the contents covered. Whenever a new topic trends, the topic takes precedence over the quality of the content covered. Therefore, the contents discussed in the article are somehow ignored. The articles with hot topics are assessed based on the article title, nevertheless of the fact that the article must be evaluated on the basis of the contents covered. Consequently, emphasizing more on the article’s title than the contents of the article ultimately affect the quality-based factors to achieve high citation. The hot topic factor should not be a single measure to achieve high citation, in addition to trending and hot topics, covered contents must be of high quality.

Altmetrics is introduced with the rise of social media networks. In today's era of social network connectivity, altmetrics could be the strong indicators to enhance citation rate. As the probabilities of knowledge diffusion improve, articles published via social communities, self-archiving, and open access have a higher possibility of being cited. According to our analysis multiple document quality related factors when combined with altmetrics can be used to gain high citation.

In a nutshell, the quality of the research is better represented by document quality-related parameters. Relevance of the materials employed, novelty/creativity, recentness of the article, topic characteristics, visibility, readability, and other factors should be investigated for improved content development for the research. Furthermore, other factors are important and their importance amplifies when they are combined with the document's intrinsic factors.

5 Conclusion

Although it is well acknowledged that research papers are the primary means of disseminating knowledge, the quality of the publication is even more critical to get adequate citations. Several studies concluded that the number of citations is the main criteria for measuring the quality of the publication, though, citations are influenced by several other factors.

Multiple studies analysed the correlation between different factors and citation rates of the article. The current study also explores the factors that influence the number of citations a paper receives. The study includes the articles from the year 2013 through 22 the findings are divided into five categories. Document General Factors, Document Quality Related Factors, Author Related Aspects, Journal Related factor, Altmetrics. This research also correlates some of the extracted factors with the citations using Pearson’s correlation coefficient to determine the impact of factors on citations. The main contributions of the study is (i) to segregate document related factors to document general factors and document quality related factor, (ii) some of the categorized factor’s impact on citation is also determined using Pearson’s correlation coefficient. The study also compared the results of correlation with the exiting researches. Hence it can be concluded from empirical and correlational analysis that (Recency, Open Access, Hot topics, Abstract Length, page count) have positive correlation on citation while (Readability, Paper title length) have negative impact on citation. Due to non-linearity more accurate and reliable relationship is determined using spearman’s correlation. Conclusively, the quality of the research is better depicted using document quality related factors. Relevance of the contents used, novelty/creativity, recency of the article, topic characteristics, visibility, readability and other needs to be explored for better content creation for the research. Furthermore, other aforementioned factors are significant, and their significance intensifies when integrated with the intrinsic features of the document.