A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases


Encapsulators are linguistic units which establish coherent referential connections to the preceding discourse in a text. In this paper, we address the challenge of automatically analysing the pronominal encapsulator ello in Spanish text. Our method identifies, for each occurrence, the antecedent of the pronoun (including its grammatical type), the connective phrase which combines with the pronoun to express a discourse relation linking the antecedent text segment to the following text segment, and the type of semantic relation expressed by the complex discourse marker formed by the connective phrase and pronoun. We describe our annotation of a corpus to inform the development of our method and to finetune an automatic analyser based on bidirectional encoder representation transformers. On testing our method, we find that it performs with greater accuracy than three baselines (0.76 for the resolution task), and sets a promising benchmark for the automatic annotation of occurrences of the pronoun ello, their antecedents, and the semantic relations between the two text segments linked by the connective in combination with the pronoun.

Fig. 1


  1. 1.

    The complex discourse marker is not explicitly annotated. Only the component pronouns and connective phrases are annotated.

  2. 2. Last accessed 4th July 2019.

  3. 3.

    In this paper, we consider gerund phrases to be noun phrases due to their distributional similarity to the latter.

  4. 4.

    Available at Last accessed 3rd July 2019.

  5. 5.

    Available at Last accessed 26th May 2021. Further details on the derivation of BERT’s multilingual models are presented at Last accessed 26th May 2021.

  6. 6.

    Associating each occurrence of ello with a context of 512 neighbouring tokens.

  7. 7.

    Tagging each sequence of 512 tokens independently of other sequences in the text.

  8. 8.

    In the literature, this additional layer is usually described as being situated “on top of” the BERT layer.

  9. 9.

    System to Automatically Classify and Resolve Ello.

  10. 10.

    We used the implementation made in the scikit-learn machine learning library for Python to compute \(\kappa \) scores.

  11. 11.

    According to the scale proposed by Viera and Garrett Viera and Garrett (2005).

  12. 12.

    By contrast, token T2 in column Pred. class label (Method 3) is not of this type because it is two tokens away from the true start of the antecedent.

  13. 13.

    Available from Last accessed 22nd August 2019. These word embeddings were derived from the Spanish Billion Word corpus, available from Last accessed 22nd August 2019.

  14. 14.

    Adjusted from \(\alpha =0.05\) for comparisons between two systems.


Parodi, G., Evans, R., Ha, L.A. et al. A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases. Lang Resources & Evaluation (2021).

  • Anaphora resolution
  • Encapsulation
  • Ello
  • Referential coherence
  • Relational coherence