Skip to main content

Temporal Context for Authorship Attribution

A Study of Danish Secondary Schools

  • Conference paper
Book cover Multidisciplinary Information Retrieval (IRFC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8849))

Included in the following conference series:

Abstract

We study temporal aspects of authorship attribution - a task which aims to distinguish automatically between texts written by different authors by measuring textual features. This task is important in a number of areas, including plagiarism detection in secondary education, which we study in this work. As the academic abilities of students evolve during their studies, so does their writing style. These changes in writing style form a type of temporal context, which we study for the authorship attribution process by focussing on the students’ more recent writing samples. Experiments with real world data from Danish secondary school students show 84% prediction accuracy when using all available material and 71.9% prediction accuracy when using only the five most recent writing samples from each student.

This type of authorship attribution with only few recent writing samples is significantly faster than conventional approaches using the complete writings of all authors. As such, it can be integrated into working interactive plagiarism detection systems for secondary education, which assist teachers by flagging automatically incoming student work that deviates significantly from the student’s previous work, even during scenarios requiring fast response and heavy data processing, like the period of national examinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bugarin, A., Carreira, M., Lama, M., Pardo, X.M.: Plagiarism detection using software tools: a study in a computer science degree. In: 2008 European University Information Systems Conference, Aarhus, Denmark, pp. 72.1–72.5 (2008)

    Google Scholar 

  2. de Oliveira Jr., W.R., Justino, E.J.R., Oliveira, L.S.: Authorship attribution of electronic documents comparing the use of normalized compression distance and support vector machine in authorship attribution. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part I. LNCS, vol. 7663, pp. 632–639. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Farach, M., Noordewier, M., Savari, S., Shepp, L., Wyner, A., Ziv, J.: On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence. In: Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995, pp. 48–57 (1995)

    Google Scholar 

  4. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: LIBSVM: A Practical Guide to Support Vector Classification (2003), http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  5. Ingwersen, P.: A context-driven integrated framework for research on interactive IR. Document, Information & Knowledge 126(6), 44–50 (in Chinese version) and 11 (in English version) (2008)

    Google Scholar 

  6. Ingwersen, P., Järvelin, K.: The Turn: Integration of Information Seeking and Retrieval in Context (The Information Retrieval Series). Springer-Verlag New York, Inc., Secaucus (2005)

    Google Scholar 

  7. Iqbal, F., Hadjidj, R., Fung, B.C.M., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit. Investig. 5, S42–S51 (2008)

    Google Scholar 

  8. Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)

    Article  Google Scholar 

  9. Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)

    Article  Google Scholar 

  10. Koppel, M., Schler, J., Argamon, S., Winter, Y.: The fundamental problem of authorship attribution. English Studies 93(3), 284–291 (2012)

    Article  Google Scholar 

  11. Kukushkina, O.V., Polikarpov, A.A., Khmelev, D.V.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kumpulainen, S., Järvelin, K.: Information interaction in molecular medicine: Integrated use of multiple channels. In: Proceedings of the Third Symposium on Information Interaction in Context, IIiX 2010, pp. 95–104 (2010)

    Google Scholar 

  13. Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics, COLING 2008, vol. 1, pp. 513–520 (2008)

    Google Scholar 

  14. Plotkin, N., Wyner, A.: An entropy estimator algorithm and telecommunications applications. In: Heidbreder, G. (ed.) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol. 62, pp. 351–363. Springer, Netherlands (1996)

    Chapter  Google Scholar 

  15. Savoy, J.: Authorship attribution based on specific vocabulary. ACM Trans. Inf. Syst. 30(2), 12:1–12:30 (2012)

    Google Scholar 

  16. Savoy, J.: Authorship attribution based on a probabilistic topic model. Inf. Process. Manage. 49(1), 341–354 (2013)

    Article  Google Scholar 

  17. Savoy, J.: Feature selections for authorship attribution. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, pp. 939–941. ACM, New York (2013)

    Chapter  Google Scholar 

  18. Seroussi, Y., Bohnert, F., Zukerman, I.: Authorship attribution with author-aware topic models. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - ACL 2012, vol. 2, pp. 264–269. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  19. Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with latent dirichlet allocation. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL 2011, pp. 181–189. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  20. Shannon, C.E.: Prediction and entropy of printed English. Bell System Technical Journal 30(1), 50–64 (1951)

    Article  MATH  Google Scholar 

  21. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)

    Article  Google Scholar 

  22. Zhao, Y., Zobel, J.: Searching with style: Authorship attribution in classic literature. In: Proceedings of the Thirtieth Australasian Conference on Computer Science, ACSC 2007, vol. 62, pp. 59–68. Australian Computer Society, Inc., Darlinghurst (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hansen, N.D., Lioma, C., Larsen, B., Alstrup, S. (2014). Temporal Context for Authorship Attribution. In: Lamas, D., Buitelaar, P. (eds) Multidisciplinary Information Retrieval. IRFC 2014. Lecture Notes in Computer Science, vol 8849. Springer, Cham. https://doi.org/10.1007/978-3-319-12979-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12979-2_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12978-5

  • Online ISBN: 978-3-319-12979-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics