Advertisement

Lexical Stress-Based Authorship Attribution with Accurate Pronunciation Patterns Selection

  • Lubomir Ivanov
  • Amanda Aebig
  • Stephen Meerman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

This paper presents a feature selection methodology for authorship attribution based on lexical stress patterns of words in text. The methodology uses part-of-speech information to make the proper selection of a lexical stress pattern when multiple possible pronunciations of the word exist. The selected lexical stress patterns are used to train machine learning classifiers to perform author attribution. The methodology is applied to a corpus of 18th century political texts, achieving a significant improvement in performance compared to previous work.

Keywords

Authorship attribution Lexical stress Prosody Part-of-speech tagging Machine learning 

References

  1. 1.
    Morton, A.Q.: The authorship of greek prose. J. R. Stat. Soc. (A) 128, 169–233 (1965)Google Scholar
  2. 2.
    Binongo, J.N.G.: Who wrote the 15th book of Oz? An application of multivariate statistics to authorship attribution. Comput. Linguist. 16(2), 9–17 (2003)MathSciNetGoogle Scholar
  3. 3.
    Barquist, C., Shie, D.: Computer analysis of alliteration in beowulf using distinctive feature theory. Lit. Linguist. Computing. 6(4), 274–280 (1991).  https://doi.org/10.1093/llc/6.4.274CrossRefGoogle Scholar
  4. 4.
    Matthews, R., Merriam, T.: Neural computation in stylometry: an application to the works of Shakespeare and Fletcher. Lit. Linguist. Comput. 8(4), 203–209 (1993)CrossRefGoogle Scholar
  5. 5.
    Lowe, D., Matthews, R.: Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput. Humanit. 29, 449–461 (1995)CrossRefGoogle Scholar
  6. 6.
    Smith, M.W.A.: An investigation of Morton’s method to distinguish Elizabethan Playwrights. Comput. Humanit. 19, 3–21 (1985)CrossRefGoogle Scholar
  7. 7.
    Burrows, J.: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press, Oxford (1987)Google Scholar
  8. 8.
    Holmes, D.I.: A stylometric analysis of mormon scripture and related texts. J. Roy. Stat. Soc.: Ser. A: Appl. Stat. 155(1), 91–120 (1992)CrossRefGoogle Scholar
  9. 9.
    Mosteller, F, Wallace, D.: Inference and disputed authorship: the Federalist: AWL (1964)Google Scholar
  10. 10.
    Berton, G., Petrovic, S., Ivanov, L., Schiaffino, R.: Examining the Thomas Paine corpus: automated computer authorship attribution methodology applied to Thomas Paine’s writings. In: Cleary, S., Stabell, I.L. (eds.) New Directions in Thomas Paine Studies, pp. 31–47. Palgrave Macmillan US, New York (2016).  https://doi.org/10.1057/9781137589996_3CrossRefGoogle Scholar
  11. 11.
    Petrovic, S., Berton, G., Campbell, S., Ivanov, L.: Attribution of 18th century political writings using machine learning. J. of Technol. Soci. 11(3), 1–13 (2015)Google Scholar
  12. 12.
    Petrovic, S., Berton, G., Schiaffino, R., Ivanov, L.: Authorship attribution of Thomas Paine works. In: International Conference on Data Mining, DMIN 2014, pp. 183–189. CSREA Press (2014). ISBN: 1-60132-267-4Google Scholar
  13. 13.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)CrossRefGoogle Scholar
  14. 14.
    Argamon, S., Saric, M., Stein, S.: Style mining of electronic messages for multiple authorship discrimination. In: Proceedings of the 9th ACM SIGKDD, pp. 475–480 (2003)Google Scholar
  15. 15.
    de Vel, O., Anderson, A., Corney, M., Mohay, G.M.: Mining e-mail content for author identification forensics. SIGMOD Rec. 30(4), 55–64 (2001)CrossRefGoogle Scholar
  16. 16.
    Kotzé, E.: Author identification from opposing perspectives in forensic linguistics. South. Afr. Linguist. Appl. Lang. Stud. 28(2), 185–197 (2010)CrossRefGoogle Scholar
  17. 17.
    Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)CrossRefGoogle Scholar
  18. 18.
    Dumalus, A., Fernandez, P.: Authorship attribution using writer’s rhythm based on lexical stress. In: 11th Philippine Computing Science Congress, Naga City, Philippines (2011)Google Scholar
  19. 19.
    Ivanov, L.: Using alliteration in authorship attribution of historical texts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 239–248. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45510-5_28CrossRefGoogle Scholar
  20. 20.
    Ivanov, L., Petrovic, S.: Using lexical stress in authorship attribution of historical texts. Chapter, Lecture Notes in Computer Science: TSD 9302, 105–113 (2015).  https://doi.org/10.1007/978-3-319-24033-6_12CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Fischer, J.H.: British and American, continuity and divergence. In: Algeo, J. (ed.) The Cambridge History of English Language, pp. 59–85. Cambridge University Press, Cambridge (2001)CrossRefGoogle Scholar
  23. 23.
    Scotto Di Carlo, G.: Lexical differences between American and British english: a survey study. Lang. Des.: J. Theor. Exp. Linguist. 15, 61-75 (2013)Google Scholar
  24. 24.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  25. 25.
    Toutanova, K., Klein D., Manning C.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 252–259 (2003)Google Scholar
  26. 26.
    Toutanova, K., Manning, D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: EMNLP/VLC-2000, pp. 63–70 (2000)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Computer Science DepartmentIona CollegeNew RochelleUSA

Personalised recommendations