Authorship Attribution Using Word Sequences

  • Rosa María Coyotl-Morales
  • Luis Villaseñor-Pineda
  • Manuel Montes-y-Gómez
  • Paolo Rosso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4225)


Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This method characterizes documents by a set of word sequences that combine functional and content words. The experimental results on poem classification demonstrated that this method outperforms most current state-of-the-art approaches, and that it is appropriate to handle the attribution of short documents.


  1. 1.
    Ahonen-Myka, H.: Discovery of Frequent Word Sequences in Text Source. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, London, UK (2002)Google Scholar
  2. 2.
    Argamon, S., Levitan, S.: Measuring the Usefulness of Function Words for Authorship Attribution. In: Association for Literary and Linguistic Computing/ Association Computer Humanities, University Of Victoria, Canada (2005)Google Scholar
  3. 3.
    Bekkerman, R., Allan, J.: Using, Bigrams in Text Categorization. CIIR Technical Report IR-408 Center for Intelligent Information Retrieval, University of Massachusetts Amherst (2004)Google Scholar
  4. 4.
    Chaski, C.: Who’s at the Keyword? Authorship Attribution in Digital Evidence Investigations. International Journal of Digital Evidence 4(1) (2005)Google Scholar
  5. 5.
    Diederich, J., Kindermann, J., Leopold, E., Paas, G.: Authorship Attribution with Support Vector Machines. Applied Intelligence 19(1), 109–123 (2003)zbMATHCrossRefGoogle Scholar
  6. 6.
    García-Hernández, R., Martínez-Trinidad, F., Carrasco-Ochoa, A.: A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Holmes, D.: Authorship Attribution. Computers and the Humanities 28, 87–106 (1995)CrossRefGoogle Scholar
  8. 8.
    Kaster, A., Siersdorfer, S., Weikum, G.: Combining Text and Linguistic Document Representations for Authorship Attribution. In: Workshop Stylistic Analysis of Text for Information Access, 28th Int. SIGIR 1. MPI, Saarbrücken, pp. 27–35 (2005)Google Scholar
  9. 9.
    Malyutov, M.B.: Authorship Attribution of Texts: a Review. In: Proceedings of the program Information transfer held in ZIF, p. 17. University of Bielefeld, Germany (2004)Google Scholar
  10. 10.
    Peng, F., Schuurmans, D., Keselj, V., Wang, S.: Augmenting Naïve Bayes Classifiers with Statistical Languages Models. In: Information Retrieval, vol. 7, pp. 317–345. Kluwer Academic Publishers, Dordrecht (2004)Google Scholar
  11. 11.
    Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-Based Authorship Attribution Without Lexical Measures. In: Computers and the Humanities, vol. 35, pp. 193–214. Kluwer Academic Publishers, Dordrecht (2001)Google Scholar
  12. 12.
    Zhao, Y., Zobel, J.: Effective and Scalable Authorship Attribution Using Function Words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rosa María Coyotl-Morales
    • 1
  • Luis Villaseñor-Pineda
    • 1
  • Manuel Montes-y-Gómez
    • 1
  • Paolo Rosso
    • 2
  1. 1.Laboratorio de Tecnologías del LenguajeInstituto Nacional de Astrofísica, Óptica y ElectrónicaMéxico
  2. 2.Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de ValenciaEspaña

Personalised recommendations