Abstract
In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Eagleson, R.: Forensic analysis of personal written texts: a case study. In: Gibbons, J. (ed.) Forensic Linguistics: An Introduction to Language in the Justice System, pp. 362–373. Longman, Harlow (1994)
Grant, T.: Quantifying evidence in forensic authorship analysis. The International Journal of Speech, Language and the Law 14(1), 1–25 (2007)
Hirst, G., Feiguina, O.: Bigrams of syntactic labels for authorship discrimination of short texts. Lit Linguist Computing 22(4), 405–417 (2007)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sousa-Silva, R., Sarmento, L., Grant, T., Oliveira, E., Maia, B. (2010). Comparing Sentence-Level Features for Authorship Analysis in Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-12320-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12319-1
Online ISBN: 978-3-642-12320-7
eBook Packages: Computer ScienceComputer Science (R0)