Using altmetrics for detecting impactful research in quasi-zero-day time-windows: the case of COVID-19

On December 31st 2019, the World Health Organization China Country Office was informed of cases of pneumonia of unknown etiology detected in Wuhan City. The cause of the syndrome was a new type of coronavirus isolated on January 7th 2020 and named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2). SARS-CoV-2 is the cause of the coronavirus disease 2019 (COVID-19). Since January 2020 an ever increasing number of scientific works related to the new pathogen have appeared in literature. Identifying relevant research outcomes at very early stages is challenging. In this work we use COVID-19 as a use-case for investigating: (1) which tools and frameworks are mostly used for early scholarly communication; (2) to what extent altmetrics can be used to identify potential impactful research in tight (i.e. quasi-zero-day) time-windows. A literature review with rigorous eligibility criteria is performed for gathering a sample composed of scientific papers about SARS-CoV-2/COVID-19 appeared in literature in the tight time-window ranging from January 15th 2020 to February 24th 2020. This sample is used for building a knowledge graph that represents the knowledge about papers and indicators formally. This knowledge graph feeds a data analysis process which is applied for experimenting with altmetrics as impact indicators. We find moderate correlation among traditional citation count, citations on social media, and mentions on news and blogs. Additionally, correlation coefficients are not inflated by indicators associated with zero values, which are quite common at very early stages after an article has been published. This suggests there is a common intended meaning of the citational acts associated with aforementioned indicators. Then, we define a method, i.e. the Comprehensive Impact Score (CIS), that harmonises different indicators for providing a multi-dimensional impact indicator. CIS shows promising results as a tool for selecting relevant papers even in a tight time-window. Our results foster the development of automated frameworks aimed at helping the scientific community in identifying relevant work even in case of limited literature and observation time.


Introduction
A zero-day attack is a cyber attack exploiting a vulnerability (i.e.zero-day vulnerability) of a computer-software that is either unknown or it has not been disclosed publicly Bilge and Dumitras (2012).There is almost no defense against a zero-day attack.In fact, according to Bilge and Dumitras (2012), while the vulnerability remains unknown, the software affected cannot be patched and anti-virus products cannot detect the attack through signature-based scanning.
On December 31st 2019, the World Health Organization (WHO) China Country Office was informed of cases of pneumonia of unknown etiology detected in Wuhan City (Hubei Province, China), possibly associated with exposures in a seafood wholesale market in the same city2 .The cause of the syndrome was a new type of coronavirus isolated on January 7th 2020 and named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2).Formerly known as the 2019 novel coronavirus (2019-nCoV), SARS-CoV-2 is a positive-sense single-stranded RNA virus that is contagious among humans and is the cause of the coronavirus disease 2019, hereinafter referred to as COVID-19 Gorbalenya (2020).Borrowing cyber security terminology, COVID-19 is a zero-day attack where the target system is the human immune system and the attacker is SARS-CoV-2.The human immune system has no specific defense against SARS-CoV-2.Being SARS-CoV-2 a new type of virus, there is no immunity provided by either natural or artificial immunity (i.e.antibodies or vaccines) humans can rely on.In the last three months, since the virus was first identified as a novel coronavirus in January 2020, an ever increasing number of scientific works have appeared in literature.Identifying relevant research outcomes at very early stages is utmost important for guiding the scientific community and governments in more effective research and decisions, respectively.However, traditional methods for measuring the relevance and impact of research outcomes (e.g.citation count, impact factor, etc.) might be ineffective due to the extremely narrow observation window currently available.Notoriously, indicators like citation count or impact factor require broader observation windows (i.e.few years) to be reliable Lehmann et al. (2008).Altmetrics might be valid tools for measuring the impact in quasi-zero-day time-window.Altmetrics3 have been introduced by Priem et al. (2012) as the study and use of scholarly impact measures based on activity in online tools and environments.The term has also been used to describe the metrics themselves.COVID-19 pandemic offers an extraordinary playground for understanding inherent correlation between impact and altmetrics.In fact, for the first time in human history, we are facing a pandemic, which is described, debated, and investigated in real time by the scientific community via conventional research venues (i.e.journal papers), jointly with social and on-line media.
In this work we investigate the following research questions: • RQ1: Which are the platforms, systems and tools mostly used for early scholarly communication?
• RQ2: How is it possible to use altmetrics for automatically identifying candidate impactful research works in quasi-zero-day time-window?
For answering aforementioned research questions we carry out an experiment by using a sample of 212 papers on COVID-19.This sample has been collected by means of a rigorous literature review.
The rest of the paper is organised in the following way: Section 2 presents related work; Section 3 describes the material and method used for the experiments; Section 4 presents the data analysis we perform and the results we record; Section 5 discusses the results; finally, Section 6 presents our conclusions and future work.

Related work
An ever increasing amount of research work has investigated the role of altmetrics in measuring impact since they have been introduced by Priem et al. (2012).

Correlation among indicators.
Much research focuses on finding a correlation between altmetrics and traditional indicators.The rationale behind these works is based on the assumption that traditional indicators have been extensively used for scoring research works, and measuring their impact.Hence, their reliability is accepted by practice.Works such as Li and Thelwall (2012); Li et al. (2012); Bar-Ilan (2012); Thelwall et al. (2013); Sud and Thelwall (2014); Nuzzolese et al. (2019) follow this research line.These studies record moderate agreement (i.e.∼0.6 with Spearman correlation coefficient) with specific sources of altmetrics, i.e.Mendeley and Twitter.According to Thelwall (2018) and Nuzzolese et al. (2019), Mendeley is the on-line platform that provides an indicator (i,e.the number of Mendeley readers) that correlates well with citation counts after a time period consisting of few years.The meta-analyisis conducted by Bornmann (2015) confirms this result, i.e. the correlation with traditional citations for micro-blogging is negligible, for blog counts it is small, and for bookmark counts from online reference managers, it is medium to large.Nevertheless, none of those studies take into account the key property of altmetrics, i.e. that they emerge quickly Peters et al. (2014).Hence, altmetrics should be used for measuring impact at very early stages, as soon as a topic emerges or a set of research works appear in literature.As a consequence, we use a tight time scale (i.e.quasi-zero-day time-window) for carrying out our analysis.
Altmetrics and research impact.Bornmann and Haunschild (2018) investigates the correlation between citation counts and the relationship between the dimensions and quality of papers using regression analysis on post-publication peer-review system of F1000Prime assessments.Such a regression analysis shows that only Mendeley readers and citation counts are significantly related to quality.Finally, Nuzzolese et al. (2019) uses data from the Italian National Scientific Qualification (NSQ).The results show good correlation between Mendeley readers and citation count, and moderate accuracy for the automatic prediction of the candidates qualification at the NSQ by using independent settings of indicators as features for training a Näive Bayes algorithm.
Some of the aforementioned works focuses on providing a comprehensive analysis investigating not only the correlation between traditional indicators and altmetrics, but also the correlation among the altmetrics themselves.However, all of them overlook the time constraint (i.e. a tight observation window), which is utmost important in our scenario.

Material and method
In this section we present the input data and the method used for processing such data.More in detail, we explain: (i) the approach adopted for carrying out the literature review focused on gathering relevant literature associated with the COVID-19 pandemic; (ii) the sources and the solution used for enriching the resulting articles with citation count as well as altmetrics; and (iii), finally, the method followed for processing collected data.

Literature review
The initial search was implemented on February 17th, 2020 in MEDLINE/Pubmed.The search query consists of the following search terms selected by the authors to describe the new pandemic: [coronavirus* OR Pneumonia of Unknown Etiology OR COVID-19 OR nCoV].Although the name has been updated to SARS-CoV-2 by the International Committee on Taxonomy of Viruses4 on February 11th 2020, the search is performed by using the term "nCoV" because we presume that no one, between February 11th and 13th, would have used the term "SARS-COV-2".Furthermore, the search is limited to the following time-span: from January 15th, 2019 to February 24th, 2020.Due to the extraordinary rapidity, with which scientific papers have been electronically published online (i.e.ePub), it may happen that some of these have indicated a date later than February 13th 2020 as publication date.
We rely on a two-stage screening process to assess the relevance of studies identified in the search.For the first level of screening, only the title and abstract are reviewed to preclude waste of resources in procuring articles that do not meet the minimum inclusion criteria.Titles and abstracts of studies initially identified are then checked by two independent investigators, and disagreement among reviewers are resolved through a mediator.Disagreement is resolved primarily through discussion and consensus between the researchers.If consensus is not reached, another blind reviewer acts as third arbiter.
Then, we retrieve the full-text for those articles deemed relevant after title and abstract screening.A form developed by the authors is used to record meta-data such as publication date, objective of the study, publication type, study sector, subject matter, and data sources.Results, reported challenges, limitations, conclusions and other information are ignored as they are out of scope with respect to this study.
Eligibility criteria.Studies are eligible for inclusion if they broadly include data and information related to COVID-19 and/or SARS-CoV-2.Because of limited resources for translation, articles published in languages other than English are excluded.Papers that describe Coronaviruses that are not SARS-CoV-2 are excluded.There is no restriction regarding publication status.In summary, the inclusion criteria adopted are: (i) English language; (ii) SARS-CoV 2; (iii) COVID-19; (iv) Pneumonia of Unknown etiology occurred in China between December 2019 and January 2020.Instead, exclusion criteria are: (i) irrelevant titles not indicating the research topic; (ii) coronavirus not SARS-CoV 2; (iii) SARS, MERS, other coronavirus-related disease not COVID-19; (iv) not human diseases.
Data summary and synthesis.The data are compiled in a single spreadsheet and imported into Microsoft Excel 2010 for validation.Descriptive statistics were calculated to summarize the data.Frequencies and percentages are utilized to describe nominal data.In next section (cf.Section 3.2) we report statistics about collected papers.

Data processing workflow
Selected papers resulting from the literature review are used as input of the data processing workflow.The latter allows us to automatically gather quantitative bibliometric indicators and altmetrics about selected papers and to organise them in a structured format consisting of a knowledge graph.Figure 1 shows the number of papers in the sample grouped by publication date.The first activity is the identification of DOIs associated with selected papers.This is performed by processing the spreadsheet resulting from the literature review (cf.Section 3.1).Such a spreadsheet contains an article for each row.In turn, for each row, we take into account the following columns: (i) the internal identifier used for uniquely identifying the article within the CSV, (ii) the authors, (iii) the paper title, and (iv) the DOI whenever possible.
We rely on the Metadata API provided by Crossref5 for checking available DOIs and retrieving missing ones.This API is queried by using the first author and the title associated with each article as input parameters.Crossref returns the DOI that matches the query parameters as output.Whether a DOI is already available we first get the DOI from Crossref, then we check that the two DOIs (i.e. the one already available and one gathered from Crossref) are equal.
In case thw two DOIs are not equal we keep the DOI gathered from Crossref as valid.This criterion is followed in order to fix possible manual errors (e.g.typos) that would prevent the correct execution of subsequent actions of the workflow.
The output of the DOI identification activity is a list of DOIs which is passed as input to the second activity named "Processing of DOIs".The latter iterates over the list of DOIs and selects them one by one.This operation allows other activities to gather information about citation count and altmetrics by using the DOI as the key for querying dedicated web services.The processing of DOIs proceeds until there is no remaining unprocessed DOI in the list (cf. the decision point labelled "Is there any unprocessed DOI?" in Figure 2).
The activities "Citation count gathering" and "Altmetrics gathering" are carried out in parallel.Both accept a single DOI as input parameter and return the citation count and the altmetrics associated with such a DOI, respectively.The citation count gathering relies on the API provided by Scopus 6 .We use Scopus as it is used by many organisations as the reference service for assessing the impact of research from a quantitative perspective (e.g.citation count, hindex, and impact factor).For example, the Italian National Scientific Habilitation 7 (ASN) uses Scopus for defining threshold values about the number of citations and h-index scores that candidates to permanent positions of Full and Associate Professor in Italian universities should exceed.The altmetrics gathering activity is based on Plum Analytics 8 (PlumX), which is accessed through its integration in the Scopus API 9 .We use PlumX among the variety of altmetrics providers (e.g.Altmetric.comor ImpactStory) as, according to Peters et al. (2014), it is the service that registers the most metrics for the most platforms.Additionally, in our previous study Nuzzolese et al. (2019), we found that PlumX is currently the service that covers the highest number of research work (∼52.6M10 ) if compared to Altmetric.com(∼5M11 ) and ImpactStory (∼1M12 ).PlumX provides three different levels of analytics consisting of (i) the category, which provides a global view across different indicators that are similar in semantics (e.g. the number alternative citations of a research work on social media); (ii) the metric, which identifies the indicator (e.g. the number of tweets about a research work); (iii) and the source, that basically allows to track the provenance of an indicator (e.g. the number of tweets on Twitter about a research work).Hereinafter we refer to these levels as the category-metric-source hierarchy.Table 1 summarises the categories provided by PlumX by suggesting an explanation for each of them.A more detailed explanation about the categories, metric, and sources as provided by PlumX is available on-line 13 .
Table 1: The categories provided by PlumX along with an explanation about their semantics.

Category Explanation Category Explanation Usage
A signal that anyone is reading an article or otherwise using a research.

Captures
An indication that someone wants to come back to the work.

Mentions
Number of mentions retrieved in news articles or blog posts about research.

Social Media
The number of mentions included in tweets, Facebook likes, etc. that reference a research work.
Once the information about the citation count and altmetrics for an article is available, it is used for populating a knowledge graph in the activity labelled "Knowledge graph population".The knowledge graph is represented as RDF and modelled by using the Indicators Ontology (I-Ont) Nuzzolese et al. (2018).I-Ont is an ontology for representing scholarly artefacts (e.g.journal papers) and their associated indicators, e.g.citation count or altmetrics such as the number of readers on Mendeley.I-Ont is designed as an OWL14 ontology and was originally meant for representing indicators associated with the papers available on ScholarlyData.ScholarlyData15 Nuzzolese et al. (2016) is the reference linked open dataset of the Semantic Web community about papers, people, organisations, and events related to its academic conferences.The resulting knowledge graph, hereinafter referred to as COVID-19-KG, is available on Zenodo16 for download.Table 2 reports the statistics recorded for the metric categories stored into the knowledge graph.We do not report statistics on minimum values as they are meaningless being them 0 for all categories.
Figure 3 shows the distribution of the indicators for each available metric.Namely, Figure 3a shows the distribution of the citation count over articles; Figure 3b  Finally, Figure 4 shows, for each category, the number of papers that count at least an indicator a given category.We provide for each category their underlying metrics and sources.The number of articles recorded at category level is not the sum of those reported at metric level.In fact, this number is the cardinality (i.e.size) of the intersection among   The workflow is implemented as a Python project and its source code is publicly available on GitHub17 .
Figure 4: The number of articles that record at least one indicator value for each available category.

Data analysis
We design our experiment in order to address RQ1 and RQ2 by using COVID-19-KG.Hence, we first analyse the different indicators from a behavioural perspective, i.e. we want to investigate what are the indicators (social media, captures, etc.) and their underlying sources (e.g.Twitter, Mendeley, etc.) that perform better for scholarly communication in a narrow time-window (i.e.quasi-zero-day).Then, we analyse possible methods for identifying candidate impactful research work by relying on available indicators.

Behavioural perspective
In order to investigate the behaviour of collected indicators we set up an experiment composed of two conditions: (i) we compute the density estimation for each indicator in the category-metric-source hierarchy first on absolute values, then on standardised values; and (ii) we analyse the correlation among indicators.
Density estimation.The density provides a value at any point (i.e. the value associated with an indicator for a given paper) in the sample space (i.e. the whole collection of papers with indicator values in COVID-19-KG).This condition is useful to understand what are possible dense areas for each indicator.The density is computed with the Kernel Density Estimation (KDE) Scott (2015) by using Gaussian kernels.We use the method introduced by Silverman (1986) to compute the estimator bandwidth.We remark that the bandwidth is a non-parametric way to estimate the probability density function of a random variable.We opt for Silverman (1986) as it is one of the most used methods at the state of the art for automatic bandwidth estimation.The KDE is performed first by using absolute values (i.e. the values we record by gathering citation count and altmetrics) as sample set.Then, it is performed by using standardised values as sample set.The former is meant to get the probability distribution for each indicator separately.However, each indicator provides values recorded on very different ranges (cf.Table 2).Hence, KDE resulting from those different indicators are not directly comparable.Accordingly, we standardise indicator values and we then perform KDE over them.Again, KDE is performed for each indicator and for each level of the category-metric-source hierarchy.Standardised values are obtained by computing z-scores as the ratio between the sample mean and the standard deviation.Equation 1 formalises the formula we use for computing z-scores.
In Equation 1: (i) p i is the value of the indicator i recorded for the paper p; (ii) µ i represents the arithmetic mean computed over the set of all values available for the indicator i for all papers; and (iii) σ i represents the standard deviation computed over the set of all values available for the indicator i for all papers.
Figure 5 shows the diagrams of the KDEs we record for each category.For citation counts (cf. Figure 5a) the most dense area has d ranging from ∼0.13 and ∼0.001 and comprises articles that have from 0 to ∼16 traditional citations.For social media (cf. Figure 5b) the most dense area has d ranging from ∼0.00023 and ∼0.00001 and comprises articles that have from 0 to ∼6, 000 alternative citations on social media.For mentions (cf. Figure 5c) the most dense area has d ranging from ∼0.029 and ∼0.0008 and comprises articles that have from 0 to ∼80 mentions.For captures (cf. Figure 5d) we record as the most dense area the one having density d ranging from ∼0.08 and ∼0.001 and comprising articles that count from 0 to ∼20 number of captures.We do not compute the KDE for the usage category as there is one article only in COVID-19-KG with a value for such an indicator (cf. Figure 4).Instead, Figure 6 shows the KDE diagrams obtained with the standardised values.More in detail, Figure 6a and Figure 6b compare density estimation curves resulting from for the different categories and sources, respectively.We do not report KDE curves recorded for metrics as thery are identical to those recorded for sources.This is due to the fact that there is a one-to-one correspondence between metrics and sources in COVID-19-KG, e.g. the Tweets metric has Twitter only among its sources.All most dense areas are those under the curve determined by d between ∼1 and ∼0.02 with values ranging from 0 to ∼1 for selected indicators.This is recorded regardless of the specific level of the the category-metric-source hierarchy.
Correlation analysis.The correlation analysis aims at identifying similarities among different indicators both in their semantics and intended use on web platforms or social media (e.g.Twitter, Mendeley, etc.).This analysis repeats the experiment we carried out in Nuzzolese et al. (2019).We remind that in Nuzzolese et al. ( 2019) we used the papers extracted from the curricula of the candidates to the scientific habilitation process held in Italy for all possible disciplines as dataset.In the context of this work we narrow the experiment to a dataset with very peculiar boundaries in terms of (i) the topic (i.e.COVID-19) and (ii) the observation time-window (i.e.ranging from January 15th 2020 to February 24th 2020).As in Nuzzolese et al. (2019), we use the sample Pearson correlation coefficient (i.e.r) as the measure to assess the linear correlation between pairs of sets of indicators.The Pearson correlation coefficient is widely used in literature.It records correlation in terms of a value ranging from +1 to -1, where, +1 indicates total positive linear correlation, 0 indicates no linear correlation, and -1 indicates total negative linear correlation.
For computing r, we construct a vector for each paper.The elements of a vector are the indicator values associated with its corresponding paper.We fill elements with 0 if an indicator is not available for a certain paper.The latter condition is mandatory in order to have vectors of equal size.In fact, r is computed by means of pairwise comparisons among vectors.The sample Pearson correlation coefficient is first computed among categories and then on sources by  following the category-metric-source hierarchy as provided by PlumX.Again, we do not take into account the level of metrics as it is mirrored by the level of sources with a one-to-one correspondence.Additionally, r is investigated futher only for those sources belonging to a category for which we record moderate correlation, i.e. r>0.6.That is, we do not further investigate r if there is limited or no correlation at category level.
Figure 7 shows the confusion matrices resulting from the pairwise comparisons of the correlation coefficients.For categories (cf. Figure 7a) the highest correlation coefficients are recorded between: (i) mentions and citations, with r=0.63, statistical significance p<0.01 (p-values are computed by using the Student's t-distribution), and standard error SE r =0.04; (ii) social media and citations, with r=0.69, p<0.01, and SE r =0.04; and (iii) social media and mentions, with r=0.81, p<0.01, and SE r =0.03. Figure 7b shows the confusion matrix for the sources associated with the social media and citations categories, i.e.Twitter and Facebook for social media and Scopus for citations.If we focus on cross-category sources only (i.e.we do not take into account moderate correlation coefficients recorded between sources associated with the same category) we record moderate correlation between Facebook and Scopus, with r=0.69, p<0.01, and SE r =0.04. Figure 7c shows the confusion matrix for the sources associated with the mentions and citations categories, i.e.News, Stack Exchange, and Wikipedia for mentions and Scopus for citations.
The only cross-category sources associated with moderate correlation are News for mentions and Scopus for citations, with r=0.63, p<0.01, and SE r =0.04.Finally, Figure 7d shows the confusion matrix for the sources associated with the mentions and social media categories.In the latter we record r>0.6 for the following cross-category sources: (i) Facebook and News, with r=0.69, p<0.01, and SE r =0.04; (ii) Facebook and Blog, with r=0.62, p<0.01, and SE r =0.04; (iii) Twitter and News, with r=0.83, p<0.01, and SE r =0.03; and (iv) Twitter and Blog, with r=0.84, p<0.01, and SE r =0.03.

Selecting candidate impactful papers
We then investigate how indicators can be used for selecting candidate impactful papers among those available in COVID-19-KG.
Geometric selection.First we rely on the result of the correlation analysis for selecting pairs of indicators that behave similarly.Hence, we use each pair for positioning papers on a Cartesian plane.Then we use such a positioning for defining a selection criterion.The axes of the Cartesian plane are the two indicators part of a pair.The axes values are the z-scores computed for each indicator (cf.Equation 1).We perform this analysis for the pairs (citations, social media), (citations, mentions), and (social media, mentions).We select these pairs only as they correlate better than others according to the correlation analysis (cf.Section 4.1).Furthermore, in COVID-19-KG citations, social media, and mentions are available for the most papers (cf. Figure 4).Figure 8 shows the results of this analysis.In order to draw a boundary around candidate impactful papers, we identify a threshold t for each category of indicators.We use the lower bound of the 95% quantiles, i.e.Q 95 , as t.The quantiles are obtained by dividing the indicator values available for a given category (e.g.social media) COVID-19-KG into subsets of equal sizes.The lower bounds of the 95% quantiles recorded are 0.27, 1.11, and 1.75 for citations, social media, and mentions, respectively.For example, the Q 95 for the citations category contains all that papers that count more than 0.27 citations each.We opt for 95% quantiles as they are selective.In fact, they allow us to gather the 5% papers in COVID-19-KG that record the highest value with respect to the selected indicator categories.When we use citations and social media categories (we refer this combination to as G c,s ) as the axis of the Cartesian plane we record 6 papers whose indicator values are in Q 95 of both categories (cf. Figure 8a).Instead, when we use citations and mentions categories (i.e.G c,m ) as axis we record 5 papers whose indicator values are in Q 95 of both categories (cf. Figure 8b).Finally, when we use social media and mentions categories (i.e.G s,m ) as axis we record 9 papers whose indicator values are in Q 95 of both categories (cf. Figure 7d).
Comprehensive Impact Score.(c) Social media and mentions as axis.
Equation 1) for obtaining standard values and the arithmetic mean for the average.
In Equation 2: (i) p is a paper that belongs to the set of available papers in COVID-19-KG; (ii) i is an indicator that belongs to I, which, in turn, is the set of available indicators (e.g.citations, social media, etc.); and (iii) z is the function for computing z-scores as defined in Equation 1. Finally, we compute the 95% quantile on resulting CIS values.Again, the lower bound of the 95% quantile is used as threshold value (i.e.t) for identifying candidate impactuful papers.We perform the selection of papers (i) first by using the whole set of available indicators (i.The lower bound of the 95% quantile for CIS values computed over I is 1.21.Such a lower bound for I is 2. Figure 9 shows computed CIS values and selected papers.More specifically, Figure 9a shows CIS values computed on the whole set of indicators.Similarly, Figure 9b shows CIS values computed I .Both figures present papers distributed according to their publication date.Furthermore, in those figures the threshold t is represented by the horizontal blue statistics about usage (cf.Table 2), along with the results recorded for the KDE and the correlation analysis entitle us to claim that tweets on Twitter, shares and likes on Facebook, mentions on news and blogs, and citations in academic literature tracked by Scopus, are the channels that have been mostly used for early scholarly communication about COVID-19, i.e. this addresses RQ1.
The selection method based on the geometric space shows that mentions and citations are more selective than other pairs, when they are used together as axes of the Cartesian plane meant for positioning papers geometrically (cf. Table 3).In fact, by using them, we record a set of 5 candidate papers.
The intersection among the three sets of candidate papers gathered with G c,s , G c,m , and G s,m is equivalent to the set of 5 candidate papers obtained when using citations-mentions as axes (i.e.G c,m .This suggests that selection of those 5 papers is reliable even if we have no evidence of its exhaustiveness.If we assess the selection based on the impact of the journal the papers have been published in, then we record good evidence about quality.In fact, both the New England Journal of Medicine and The Lancet are in the top-5 journal ranking on medicine according to SCImago19 , with an SJR of 19.524 and 15.871, respectively.With regards to the exhaustiveness, the selection of candidate impactful papers is, in our opinion, an exploratory search task.According to White et al. (2005), exploratory search tasks are typically associated with undefined and uncertain goals.This means that identifying all possible impactful papers is nearly impossible.Hence, dealing with sub-optimal exhaustiveness is the practice in scenarios like these due to the inherent nature of the search problem itself.
The selection based on the Comprehensive Impact Score (CIS) overcomes the limitation of a two-dimensional space introduced when defining a selection method based on a Cartesian plane.Indeed, CIS is a multi-dimensional selection tool which is customisable in terms of the indicators used for performing the analysis.It is fairly evident (cf.Table 3) that both CIS I and CIS I share most of the papers identified by applying the two-dimensional geometric space selection with citations and mentions used as axes (referred to as G c,m in Table 3).CIS I and CIS I extend the set of selected papers returned by G c,m with 4 additional papers (cf.Table 3) published in The Lancet, the International Journal of Infectious Diseases (SJR = 1.456),JAMA (SJR = 7.477), the Journal of Medical Virology (SJR = 0.966).Among those additional papers only the one appeared in the Journal of Medical Virology is debatable both for the journal impact (i.e.SJR = 0.966) and its scientific relevance.In fact, in this paper the authors claim that two type of snakes, which are common in Southeastern China, are the intermediate hosts responsible for the "crossspecies" transmission of SARS-CoV-2 to humans.Subsequent genomic studies 20 Andersen et al. (2020) confirm crossspecies transmission, but they refute the theory of snakes being the intermediate hosts.However, the paper reporting the theory of snakes being the intermediate hosts has been largely (i) retweeted, shared, and liked on different social networks, and (ii) discussed and reported by many international newspapers worldwide.Thus it contributed to the massive "infodemic"21 about COVID-19, i.e. an over-abundance of information, either accurate or not, which makes it hard for people to find trustworthy sources and reliable guidance when they need it Zarocostas (2020).This infodemic is captured by altmetrics that by definition are fed by on-line tools and platforms.As a matter of fact, the paper about the theory of snakes being the intermediate hosts is selected among the candidates only by G s,m , where both axes are altmetrics, and not by G c,s and G c,m , where traditional citations are taken into account.This suggests a twofold speculation: (i) the scientometric community should handle altmetrics very carefully as they may lead to unreliable and debatable results; (ii) altmetrics are promising tools not only for measuring impact, but also to make unwanted scanarios (e.g.infodemic) emerge from the knowledge soup of scientific literature.We opt for the second.Nevertheless, further and more focused research is needed.

Conclusions and future work
In this work we investigate altmetrics and citation count as tools for detecting impactful research works in quasizero-day time-windows, as it is for the case of COVID-19.COVID-19 offers an extraordinary real-world case-study for understanding inherent correlation among impact and altmetrics.As mentioned in Section 1, for the first time in history, humankind is facing a pandemic, which is described, debated, and investigated in real time by the scientific community via conventional research venues (i.e.journal papers), and social and on-line media.The latters are the natural playground of altmetrics.Our case-study relies on a sample of 212 scientific papers on COVID-19 collected by means of a literature review.Such a literature review is based on a two-stage screening process used to assess the relevance of studies on COVID-19 appeared in literature from January 15th 2020 to February 24th 2020.This sample is used for constructing a knowledge graph, i.e.COVID-19-KG, modelled in compliance with the Indicators Ontology (i.e.I-Ont).COVID-19-KG is the input of our analysis aimed at investigating (i) behavioural characteristics of altmetrics and citation count and (ii) possibible approaches for using altmetrics along with citation count for automatically identifying candidate impactful research works in COVID-19-KG.We find moderate correlation among traditional citation count, citations on social media, and mentions on news and blogs.This suggests there is a common intended meaning of the citational acts associated with these indicators.Additionally, we define the Comprehensive Impact Score (CIS) that harmonises different indicators for providing a multi-dimensional impact indicator.CIS shows to be a promising tool for selecting relevant papers even in a tight observation window.Possible future work include the use of CIS as a feature for predicting the results of evaluation procedures of academics as presented in works like Poggi et al. (2019).Similarly, further investigation is needed to mine the rhetorical nature of citational acts associated with altmetrics.The latter is a mandatory step for building tools such as Ciancarini et al. (2014) and Peroni et al. (2020).More ambitiously, future research focused on altmetrics and citations should go in the direction envisioned by Gil et al. (2014) and Kitano (2016), thus contributing to a new family of artificial intelligence aimed at achieving autonomous discovery science.

Figure 1 :
Figure 1: Number of papers per publication date in the time-window ranging from January 15th, 2020 to February 24th, 2020.

Figure 2 :
Figure 2: The UML activity diagram that graphically represents the workflow re-used and extended from (Nuzzolese et al. 2019) for the data processing activities.
Figure3shows the distribution of the indicators for each available metric.Namely, Figure3ashows the distribution of the citation count over articles; Figure3bshows the distribution of Shares, Likes & Comments, and Tweets (Social Media category); Figure3cshows the distribution of Blog Mentions, News Mentions, Q&A Site Mentions, and References (Mentions category); and Figure3dshows the distribution of Readers (Captures category) over articles.For each graphic line plotted in Figures3a-dwe report the DOIs associated with the papers that record the highest indicator value for the specific category.That is: (i) 10.1016/S0140-6736(20)30183-5 (appeared in The Lancet) for citation count, mentions, and social media; (ii) 10.1016/j.ijid.2020.01.009 (appeared in the International Journal of Infectious Diseases) for catpures; and (iii) 10.1056/NEJMoa2001191 (appeared in the New England Journal of Medicine) for usage.

Figure 3 :
Figure 3: Distributions the number of indicators per paper.

Figure 5 :
Figure 5: Diagrams of the kernel density estimations obtained for each category.

Figure 6 :
Figure 6: Diagrams of the KDEs computed on z-scores for categories (a) and sources (b).
(a) Correlation among categories of indicators.(b) Correlation among Twitter, Facebook, and Scopus citation count.(c) Correlation among News, Stack Exchange, Wikipedia, and Scopus citation count.(d) Correlation among News, Stack Exchange, Wikipedia, Twitter, and Facebook.
(a) Citations and social media as axis.(b) Citations and mentions as axis.
e. I) for computing CIS values, then (ii) by limiting the set of indicators to the categories of citations, social media, and mentions for computing CIS values.The limited set of indicators is referred to as I and comprises the categories with highest correlation values.(a) CIS computed on the all set of available indicators I .(b) CIS computed on the set indicators I limited to citations, social media, and indicators.

Figure 9 :
Figure 9: Selection of papers based on the Comprehensive Impact Score (CIS).
Bornmann and Haunschild (2018)h respect to research evaluation frameworks has been carried out byWouters et al. (2015);Ravenscroft et al. (2017);Bornmann and Haunschild (2018);Nuzzolese et al. (2019).More in detail,Wouters et al. (2015)uses the Research Excellence Framework (REF) 2016, i.e. the reference system for assessing the quality of research in UK higher education institutions, for mining possible correlation among different metrics.The analysis is based on different metrics (either traditional or alternative) and research areas, and its outcomes converge towards limited or no correlation.Ravenscroft et al. (2017)finds very low or negative correlation coefficients between altmetrics provided by Altmetric.comandREFscores concerning societal impact published by British universities in use case studies.The aim of the analysis carried out byBornmann and Haunschild (2018)is twofold.

Table 2 :
The statistics recorded for metric categories including the citation count.
On top of the different indicators we compute, for each paper, a Comprehensive Impact Score (CIS).CIS aims at providing a multi-dimensional and homogeneous view over indicators which are different in quantities and semantics, i.e.CIS represents a unifying score over heterogeneous bibliometric indicators.A paper CIS is computed by first standardising the values associated with each indicator category (e.g.number of social media mentions, number traditiona citations, etc.) and then averaging the resulting values.We use z-scores (cf.