Authorship Attribution for Short Texts with Author-Document Topic Model

  • Haowen Zhang
  • Peng Nie
  • Yanlong Wen
  • Xiaojie YuanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11061)


The goal of authorship attribution is to assign the controversial texts to the known authors correctly. With the development of social media services, authorship attribution for short texts becomes very necessary. In the earlier works, topic models, such as the Latent Dirichlet Allocation (LDA), have been used to find latent semantic features of authors and achieve better performance on authorship attribution. However, most of them focus on authorship attribution for long texts. In this paper, we propose a novel model named Author-Document Topic Model (ADT) which builds the model for the corpus both at the author level and the document level to figure out the problem of authorship attribution for short texts. Also, we propose a new classification algorithm to calculate the similarity between texts for finding the authors of the anonymous texts. Experimental results on two public datasets validate the effectiveness of our proposed method.


Authorship attribution Topic model Short text 



This work is supported by the National Natural Science Foundation of China [grant number 61772289] and the Fundamental Research Funds for the Central Universities.


  1. 1.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRefGoogle Scholar
  2. 2.
    Abbasi, A., Chen, H.: Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 1–29 (2008)CrossRefGoogle Scholar
  3. 3.
    Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)CrossRefGoogle Scholar
  4. 4.
    Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730 (2015)Google Scholar
  5. 5.
    Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  7. 7.
    Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with latent Dirichlet allocation. In: Fifteenth Conference on Computational Natural Language Learning, pp. 181–189 (2011)Google Scholar
  8. 8.
    Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, Amsterdam, Netherlands, 19–22 September 2011Google Scholar
  9. 9.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. Front. Inf. Technol. Electron. Eng. 274(s 1–2), 199–205 (2006)Google Scholar
  10. 10.
    Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int. J. Digit. Evid. 6(1), 1–18 (2007)Google Scholar
  11. 11.
    Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–660 (2006)Google Scholar
  12. 12.
    Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B.: ‘twazn me!!!;(’ Automatic authorship analysis of micro-blogging messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 161–168. Springer, Heidelberg (2011). Scholar
  13. 13.
    Schwartz, R., Tsur, O., Rappoport, A., Koppel, M.: Authorship attribution of micro-messages. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1880–1891 (2013)Google Scholar
  14. 14.
    Seroussi, Y., Bohnert, F., Zukerman, I.: Authorship attribution with author-aware topic models. In: Meeting of the Association for Computational Linguistics: Short Papers, pp. 264–269 (2012)Google Scholar
  15. 15.
    Yang, M., Zhu, D., Tang, Y., Wang, J.: Authorship attribution with topic drift model. In: AAAI, pp. 5015–5016 (2017)Google Scholar
  16. 16.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5228 (2004)CrossRefGoogle Scholar
  17. 17.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Haowen Zhang
    • 1
  • Peng Nie
    • 1
  • Yanlong Wen
    • 1
  • Xiaojie Yuan
    • 1
    Email author
  1. 1.College of Computer and Control EngineeringNankai UniversityTianjinChina

Personalised recommendations