Skip to main content

Lexical Stress-Based Authorship Attribution with Accurate Pronunciation Patterns Selection

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

Abstract

This paper presents a feature selection methodology for authorship attribution based on lexical stress patterns of words in text. The methodology uses part-of-speech information to make the proper selection of a lexical stress pattern when multiple possible pronunciations of the word exist. The selected lexical stress patterns are used to train machine learning classifiers to perform author attribution. The methodology is applied to a corpus of 18th century political texts, achieving a significant improvement in performance compared to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Morton, A.Q.: The authorship of greek prose. J. R. Stat. Soc. (A) 128, 169–233 (1965)

    Google Scholar 

  2. Binongo, J.N.G.: Who wrote the 15th book of Oz? An application of multivariate statistics to authorship attribution. Comput. Linguist. 16(2), 9–17 (2003)

    MathSciNet  Google Scholar 

  3. Barquist, C., Shie, D.: Computer analysis of alliteration in beowulf using distinctive feature theory. Lit. Linguist. Computing. 6(4), 274–280 (1991). https://doi.org/10.1093/llc/6.4.274

    Article  Google Scholar 

  4. Matthews, R., Merriam, T.: Neural computation in stylometry: an application to the works of Shakespeare and Fletcher. Lit. Linguist. Comput. 8(4), 203–209 (1993)

    Article  Google Scholar 

  5. Lowe, D., Matthews, R.: Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput. Humanit. 29, 449–461 (1995)

    Article  Google Scholar 

  6. Smith, M.W.A.: An investigation of Morton’s method to distinguish Elizabethan Playwrights. Comput. Humanit. 19, 3–21 (1985)

    Article  Google Scholar 

  7. Burrows, J.: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press, Oxford (1987)

    Google Scholar 

  8. Holmes, D.I.: A stylometric analysis of mormon scripture and related texts. J. Roy. Stat. Soc.: Ser. A: Appl. Stat. 155(1), 91–120 (1992)

    Article  Google Scholar 

  9. Mosteller, F, Wallace, D.: Inference and disputed authorship: the Federalist: AWL (1964)

    Google Scholar 

  10. Berton, G., Petrovic, S., Ivanov, L., Schiaffino, R.: Examining the Thomas Paine corpus: automated computer authorship attribution methodology applied to Thomas Paine’s writings. In: Cleary, S., Stabell, I.L. (eds.) New Directions in Thomas Paine Studies, pp. 31–47. Palgrave Macmillan US, New York (2016). https://doi.org/10.1057/9781137589996_3

    Chapter  Google Scholar 

  11. Petrovic, S., Berton, G., Campbell, S., Ivanov, L.: Attribution of 18th century political writings using machine learning. J. of Technol. Soci. 11(3), 1–13 (2015)

    Google Scholar 

  12. Petrovic, S., Berton, G., Schiaffino, R., Ivanov, L.: Authorship attribution of Thomas Paine works. In: International Conference on Data Mining, DMIN 2014, pp. 183–189. CSREA Press (2014). ISBN: 1-60132-267-4

    Google Scholar 

  13. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

  14. Argamon, S., Saric, M., Stein, S.: Style mining of electronic messages for multiple authorship discrimination. In: Proceedings of the 9th ACM SIGKDD, pp. 475–480 (2003)

    Google Scholar 

  15. de Vel, O., Anderson, A., Corney, M., Mohay, G.M.: Mining e-mail content for author identification forensics. SIGMOD Rec. 30(4), 55–64 (2001)

    Article  Google Scholar 

  16. Kotzé, E.: Author identification from opposing perspectives in forensic linguistics. South. Afr. Linguist. Appl. Lang. Stud. 28(2), 185–197 (2010)

    Article  Google Scholar 

  17. Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)

    Article  Google Scholar 

  18. Dumalus, A., Fernandez, P.: Authorship attribution using writer’s rhythm based on lexical stress. In: 11th Philippine Computing Science Congress, Naga City, Philippines (2011)

    Google Scholar 

  19. Ivanov, L.: Using alliteration in authorship attribution of historical texts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 239–248. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_28

    Chapter  Google Scholar 

  20. Ivanov, L., Petrovic, S.: Using lexical stress in authorship attribution of historical texts. Chapter, Lecture Notes in Computer Science: TSD 9302, 105–113 (2015). https://doi.org/10.1007/978-3-319-24033-6_12

    Article  Google Scholar 

  21. Internet resource. http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  22. Fischer, J.H.: British and American, continuity and divergence. In: Algeo, J. (ed.) The Cambridge History of English Language, pp. 59–85. Cambridge University Press, Cambridge (2001)

    Chapter  Google Scholar 

  23. Scotto Di Carlo, G.: Lexical differences between American and British english: a survey study. Lang. Des.: J. Theor. Exp. Linguist. 15, 61-75 (2013)

    Google Scholar 

  24. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  25. Toutanova, K., Klein D., Manning C.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 252–259 (2003)

    Google Scholar 

  26. Toutanova, K., Manning, D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: EMNLP/VLC-2000, pp. 63–70 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lubomir Ivanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ivanov, L., Aebig, A., Meerman, S. (2018). Lexical Stress-Based Authorship Attribution with Accurate Pronunciation Patterns Selection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics