Skip to main content

Table 1 Variables used to define the metrics of the ERC fact-checking benchmark

From: Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?

Symbol Definition
s # of nucleotide sequences present in the corpus
c # of nucleotide sequences correctly extracted from the corpus
f # of extracted nucleotide sequences that are alien to the corpus. For example, a text mining tool might extract a reagent appearing across two columns as two different reagents, which is incorrect
a # of correctly extracted nucleotide sequences whose status was correctly assigned by the fact-checking tool
n # of nucleotide sequences for which the system failed to assign any status
w # of nucleotide sequences for which the system assigned the wrong status
\(a'\) # of correctly extracted nucleotide sequences whose target was correctly assigned
\(n'\) # of nucleotide sequences for which the system failed to assign any target
\(w'\) # of nucleotide sequences for which the system assigned the wrong target
o # of nucleotide sequences for which the system output is correct: either Fact-checked as \(\mathrm {Class}_0\) or error flagged as \(\mathrm {Class}_{\{6, 7, 8\}}\)
p # of nucleotide sequences for which the system failed to take a decision
q # of nucleotide sequences for which the system assigned a wrong decision (i.e., incorrect class assigned)