Skip to main content

Probabilistic Anomaly Detection Method for Authorship Verification

  • Conference paper
  • First Online:
Book cover Statistical Language and Speech Processing (SLSP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

Abstract

Authorship verification is the task of determining if a given text is written by a candidate author or not. In this paper, we present a first study on using an anomaly detection method for the authorship verification task. We have considered a weakly supervised probabilistic model based on a multivariate Gaussian distribution. To evaluate the effectiveness of the proposed method, we conducted experiments on a classic French corpus. Our preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 %. Thus, this method can be very valuable for authorship verification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.gutenberg.org/

  2. 2.

    http://gallica.bnf.fr/

  3. 3.

    http://ero.corneille-moliere.com/?p=page52&m=ero&l=fra

References

  1. Argamon, S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005)

    Google Scholar 

  2. Baayen, H., van Halteren, H., Neijt, A., Tweedie, F.: An experiment in authorship attribution. In: 6th JADT, pp. 29–37 (2002)

    Google Scholar 

  3. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)

    Article  Google Scholar 

  4. Chung, C., Pennebaker, J.W.: The psychological functions of function words. In: Fielder, K. (ed.) Social Communication, pp. 343–359. Psychology Press, New York (2007)

    Google Scholar 

  5. Eder, M.: Does size matter? Authorship attribution, small samples, big problem. Lit. Linguist. Comput. fqt066 (2013)

    Google Scholar 

  6. Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611 (2004)

    Google Scholar 

  7. Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection (2014) arXiv Preprint arXiv:1401.6424

    Google Scholar 

  8. Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL, 19 November 2003, pp. 2–9 (2003)

    Google Scholar 

  9. Holmes, D.I., Robertson, M., Paez, R.: Stephen Crane and the New-York tribune: a case study in traditional and non-traditional authorship attribution. Comput. Humanit. 35(3), 315–331 (2001)

    Article  Google Scholar 

  10. Hoover, D.L.: Frequent collocations and authorial style. Lit. Linguist. Comput. 18(3), 261–286 (2003)

    Article  MathSciNet  Google Scholar 

  11. Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, vol. 3, pp. 255–264 (2003)

    Google Scholar 

  12. Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 62 (2004)

    Google Scholar 

  13. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  14. Kukushkina, O.V., Polikarpov, A.A., Khmelev, V.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  15. Markou, M., Singh, S.: Novelty detection: a review—part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)

    Article  MATH  Google Scholar 

  16. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  17. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)

    Article  Google Scholar 

  18. Wressnegger, C., Schwenk, G., Arp, D., Rieck, K.: A close look on n-grams in intrusion detection: anomaly detection vs. classification. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 67–76 (2013)

    Google Scholar 

  19. Yule, G.U.: The Statistical Study of Literary Vocabulary. CUP Archive, Cambridge (1944)

    Google Scholar 

  20. Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds.) Information Retrieval Technology. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Amine Boukhaled .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Boukhaled, M.A., Ganascia, JG. (2014). Probabilistic Anomaly Detection Method for Authorship Verification. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11397-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11396-8

  • Online ISBN: 978-3-319-11397-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics