Abstract
Authorship verification is the task of determining if a given text is written by a candidate author or not. In this paper, we present a first study on using an anomaly detection method for the authorship verification task. We have considered a weakly supervised probabilistic model based on a multivariate Gaussian distribution. To evaluate the effectiveness of the proposed method, we conducted experiments on a classic French corpus. Our preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 %. Thus, this method can be very valuable for authorship verification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Argamon, S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005)
Baayen, H., van Halteren, H., Neijt, A., Tweedie, F.: An experiment in authorship attribution. In: 6th JADT, pp. 29–37 (2002)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Chung, C., Pennebaker, J.W.: The psychological functions of function words. In: Fielder, K. (ed.) Social Communication, pp. 343–359. Psychology Press, New York (2007)
Eder, M.: Does size matter? Authorship attribution, small samples, big problem. Lit. Linguist. Comput. fqt066 (2013)
Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611 (2004)
Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection (2014) arXiv Preprint arXiv:1401.6424
Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL, 19 November 2003, pp. 2–9 (2003)
Holmes, D.I., Robertson, M., Paez, R.: Stephen Crane and the New-York tribune: a case study in traditional and non-traditional authorship attribution. Comput. Humanit. 35(3), 315–331 (2001)
Hoover, D.L.: Frequent collocations and authorial style. Lit. Linguist. Comput. 18(3), 261–286 (2003)
Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, vol. 3, pp. 255–264 (2003)
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 62 (2004)
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Kukushkina, O.V., Polikarpov, A.A., Khmelev, V.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)
Markou, M., Singh, S.: Novelty detection: a review—part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)
Wressnegger, C., Schwenk, G., Arp, D., Rieck, K.: A close look on n-grams in intrusion detection: anomaly detection vs. classification. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 67–76 (2013)
Yule, G.U.: The Statistical Study of Literary Vocabulary. CUP Archive, Cambridge (1944)
Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds.) Information Retrieval Technology. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Boukhaled, M.A., Ganascia, JG. (2014). Probabilistic Anomaly Detection Method for Authorship Verification. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-11397-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)