Correction to: Personal and Ubiquitous Computing (2023) 27:45–57

https://doi.org/10.1007/s00779-021-01605-5

In “Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation,” published in Personal and Ubiquitous Computing 27 (1), 4557, an oversight occurred regarding the clarity and comprehensibility of the proposed method. This erratum serves to rectify and clarify the methodological description to enhance readers’ understanding of the research.

1 Description of error

Upon review of the original publication, it has been identified that the description of the proposed method lacked clarity in certain aspects, potentially hindering readers’ comprehension of the research methodology. The terminology and procedural steps were not sufficiently elucidated, which may have led to confusion regarding the implementation and interpretation of the method.

2 Changes

  1. (1)

    Section 4.1: Attention from BERT

Before:

Thus, a vector is obtained for each token where the components determine how much focus to put on the other parts of the input at this position. Next, this vector is multiplied by the υ vector to keep the values of the original token t. Equation 2 recaps this process for the matrix calculation for all words at once [31]. Where \({d}_{k}\) is the dimension of q, k, υ, and Q, K and V are the matrix representations respectively for the text.

$$Attention\left(Q,K,V\right)=softmax\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V$$
(2)

After:

As a result, a vector (Attentiont) is obtained for each token t where each component i determines how much focus to put on the position i of the input. Next, this vector is multiplied by the υ vector to keep the values of the original token t. Equation 2 recaps this process for the matrix calculation for all words at once [31]. Where \({d}_{k}\) is the dimension of q, k, υ, and Q, K and V are the matrix representations respectively for the text.

$$\begin{array}{l}Attention\left(Q,K\right)=softmax\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)\\ Head(Q, K, V ) = Attention(Q, K)V\end{array}$$
(2)
  1. (2)

    Section 4.1: Attention from BERT

Before:

Once the parameters of BERT are learned in the fine-tuning step, the texts feed the model again to obtain the attention values for each pair of words in the dataset. These attention values are obtained by the condensation of the pattern of each head in the last layer of BERT as Eq. 3, where h is the number of heads in the layers. That is, first, all attention heads are concatenated and then, it is projected into a new space by multiplying for a matrix W, which is also fitted in the training step.

$$\begin{array}{c}Att_i=Attention\left(Q_i,K_i,V_i\right),i=\overline{1,h}\\\mathrm{MHA}=Concat(Att_1,...,Att_h)W\end{array}$$
(3)

As result, we obtain a matrix \(A\in {\mathcal{M}}_{\left|\mathcal{T}\right|}\)(ℝ) with the relativity of words. Where \(\mathcal{T}\) is the set which contains the words, \(\left|\cdot\right|\) denotes the size of a set and \({\mathcal{M}}_{n}\)(ℝ) represents the set of square matrices of size n with inputs in the field ℝ.

After:

Once the BERT parameters are learned in the fine-tuning step, the texts feed the model again to obtain the new vectors for each token. These vectors are obtained by the condensation of the pattern of each head in each BERT layer as Eq. 3, where h is the number of heads in the layers, i.e., the output of all heads is first concatenated and then projected to a new space by multiplying by a matrix W, which is also fitted in the training step.

$$\begin{array}{cc}\mathrm{MHA}=Concat(Head_1,...,Head_h)W,&i=\overline{1,h}\end{array}$$
(3)

For our method, we use only the attention values Attentioni of each head i in a layer. Specifically, we use the sum of the values of all the heads as Eq. 4.

$$A=\sum_{1}^{h}{Attention}^{i}$$
(4)

As a result, we obtain a matrix \(A\in {\mathcal{M}}_{\left|\mathcal{T}\right|}\)(ℝ) with the relationship of relevance (attention) between pairs of words. Where \(\mathcal{T}\) is the set of the words, \(\left|\cdot\right|\) denotes the size of a set and \({\mathcal{M}}_{n}\)(ℝ) represents the set of square matrices of size n with inputs in the field ℝ.

  1. (3)

    Section 4.1: Attention from BERT

Before:

Moreover, let \({A}_{t}\) be the attention matrix obtained given t, that is \({A}_{t}\) = MHA for t.

After:

Moreover, let \({A}_{t}\) be the attention matrix obtained given t.

  1. (4)

    Section 5.1: Results

Before:

In Table 1, the names of the columns mistakenly reference bibliographies that are not related.

After:

No reference bibliographies in Table 1.

3 Impact of correction

These clarifications aim to improve the accessibility and comprehensibility of the proposed method, thereby enhancing the utility of the research findings for the scientific community. By addressing the identified shortcomings in the methodological description, we strive to ensure the accuracy and clarity of the published work.

4 Conclusion

This erratum article serves to rectify the lack of clarity in the description of the proposed method in the original paper. We apologize for any confusion caused by the oversight and appreciate the opportunity to clarify and enhance the understanding of our research methodology.

The original article has been corrected.