Abstract
Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This method characterizes documents by a set of word sequences that combine functional and content words. The experimental results on poem classification demonstrated that this method outperforms most current state-of-the-art approaches, and that it is appropriate to handle the attribution of short documents.
Chapter PDF
Similar content being viewed by others
References
Ahonen-Myka, H.: Discovery of Frequent Word Sequences in Text Source. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, London, UK (2002)
Argamon, S., Levitan, S.: Measuring the Usefulness of Function Words for Authorship Attribution. In: Association for Literary and Linguistic Computing/ Association Computer Humanities, University Of Victoria, Canada (2005)
Bekkerman, R., Allan, J.: Using, Bigrams in Text Categorization. CIIR Technical Report IR-408 Center for Intelligent Information Retrieval, University of Massachusetts Amherst (2004)
Chaski, C.: Who’s at the Keyword? Authorship Attribution in Digital Evidence Investigations. International Journal of Digital Evidence 4(1) (2005)
Diederich, J., Kindermann, J., Leopold, E., Paas, G.: Authorship Attribution with Support Vector Machines. Applied Intelligence 19(1), 109–123 (2003)
García-Hernández, R., Martínez-Trinidad, F., Carrasco-Ochoa, A.: A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, Springer, Heidelberg (2006)
Holmes, D.: Authorship Attribution. Computers and the Humanities 28, 87–106 (1995)
Kaster, A., Siersdorfer, S., Weikum, G.: Combining Text and Linguistic Document Representations for Authorship Attribution. In: Workshop Stylistic Analysis of Text for Information Access, 28th Int. SIGIR 1. MPI, Saarbrücken, pp. 27–35 (2005)
Malyutov, M.B.: Authorship Attribution of Texts: a Review. In: Proceedings of the program Information transfer held in ZIF, p. 17. University of Bielefeld, Germany (2004)
Peng, F., Schuurmans, D., Keselj, V., Wang, S.: Augmenting Naïve Bayes Classifiers with Statistical Languages Models. In: Information Retrieval, vol. 7, pp. 317–345. Kluwer Academic Publishers, Dordrecht (2004)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-Based Authorship Attribution Without Lexical Measures. In: Computers and the Humanities, vol. 35, pp. 193–214. Kluwer Academic Publishers, Dordrecht (2001)
Zhao, Y., Zobel, J.: Effective and Scalable Authorship Attribution Using Function Words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coyotl-Morales, R.M., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. (2006). Authorship Attribution Using Word Sequences. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2006. Lecture Notes in Computer Science, vol 4225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892755_87
Download citation
DOI: https://doi.org/10.1007/11892755_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46556-0
Online ISBN: 978-3-540-46557-7
eBook Packages: Computer ScienceComputer Science (R0)