Comparing Sentence-Level Features for Authorship Analysis in Portuguese

  • Rui Sousa-Silva
  • Luís Sarmento
  • Tim Grant
  • Eugénio Oliveira
  • Belinda Maia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6001)

Abstract

In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Eagleson, R.: Forensic analysis of personal written texts: a case study. In: Gibbons, J. (ed.) Forensic Linguistics: An Introduction to Language in the Justice System, pp. 362–373. Longman, Harlow (1994)Google Scholar
  2. 2.
    Grant, T.: Quantifying evidence in forensic authorship analysis. The International Journal of Speech, Language and the Law 14(1), 1–25 (2007)Google Scholar
  3. 3.
    Hirst, G., Feiguina, O.: Bigrams of syntactic labels for authorship discrimination of short texts. Lit Linguist Computing 22(4), 405–417 (2007)CrossRefGoogle Scholar
  4. 4.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Rui Sousa-Silva
    • 1
    • 3
  • Luís Sarmento
    • 2
  • Tim Grant
    • 1
  • Eugénio Oliveira
    • 2
  • Belinda Maia
    • 3
  1. 1.Centre for Forensic Linguistics at Aston University 
  2. 2.Faculdade de Engenharia da Universidade do Porto - DEI - LIACC 
  3. 3.CLUP - Centro de Linguística da Universidade do Porto 

Personalised recommendations