Abstract
This paper presents a feature selection methodology for authorship attribution based on lexical stress patterns of words in text. The methodology uses part-of-speech information to make the proper selection of a lexical stress pattern when multiple possible pronunciations of the word exist. The selected lexical stress patterns are used to train machine learning classifiers to perform author attribution. The methodology is applied to a corpus of 18th century political texts, achieving a significant improvement in performance compared to previous work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Morton, A.Q.: The authorship of greek prose. J. R. Stat. Soc. (A) 128, 169–233 (1965)
Binongo, J.N.G.: Who wrote the 15th book of Oz? An application of multivariate statistics to authorship attribution. Comput. Linguist. 16(2), 9–17 (2003)
Barquist, C., Shie, D.: Computer analysis of alliteration in beowulf using distinctive feature theory. Lit. Linguist. Computing. 6(4), 274–280 (1991). https://doi.org/10.1093/llc/6.4.274
Matthews, R., Merriam, T.: Neural computation in stylometry: an application to the works of Shakespeare and Fletcher. Lit. Linguist. Comput. 8(4), 203–209 (1993)
Lowe, D., Matthews, R.: Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput. Humanit. 29, 449–461 (1995)
Smith, M.W.A.: An investigation of Morton’s method to distinguish Elizabethan Playwrights. Comput. Humanit. 19, 3–21 (1985)
Burrows, J.: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press, Oxford (1987)
Holmes, D.I.: A stylometric analysis of mormon scripture and related texts. J. Roy. Stat. Soc.: Ser. A: Appl. Stat. 155(1), 91–120 (1992)
Mosteller, F, Wallace, D.: Inference and disputed authorship: the Federalist: AWL (1964)
Berton, G., Petrovic, S., Ivanov, L., Schiaffino, R.: Examining the Thomas Paine corpus: automated computer authorship attribution methodology applied to Thomas Paine’s writings. In: Cleary, S., Stabell, I.L. (eds.) New Directions in Thomas Paine Studies, pp. 31–47. Palgrave Macmillan US, New York (2016). https://doi.org/10.1057/9781137589996_3
Petrovic, S., Berton, G., Campbell, S., Ivanov, L.: Attribution of 18th century political writings using machine learning. J. of Technol. Soci. 11(3), 1–13 (2015)
Petrovic, S., Berton, G., Schiaffino, R., Ivanov, L.: Authorship attribution of Thomas Paine works. In: International Conference on Data Mining, DMIN 2014, pp. 183–189. CSREA Press (2014). ISBN: 1-60132-267-4
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
Argamon, S., Saric, M., Stein, S.: Style mining of electronic messages for multiple authorship discrimination. In: Proceedings of the 9th ACM SIGKDD, pp. 475–480 (2003)
de Vel, O., Anderson, A., Corney, M., Mohay, G.M.: Mining e-mail content for author identification forensics. SIGMOD Rec. 30(4), 55–64 (2001)
Kotzé, E.: Author identification from opposing perspectives in forensic linguistics. South. Afr. Linguist. Appl. Lang. Stud. 28(2), 185–197 (2010)
Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)
Dumalus, A., Fernandez, P.: Authorship attribution using writer’s rhythm based on lexical stress. In: 11th Philippine Computing Science Congress, Naga City, Philippines (2011)
Ivanov, L.: Using alliteration in authorship attribution of historical texts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 239–248. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_28
Ivanov, L., Petrovic, S.: Using lexical stress in authorship attribution of historical texts. Chapter, Lecture Notes in Computer Science: TSD 9302, 105–113 (2015). https://doi.org/10.1007/978-3-319-24033-6_12
Internet resource. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Fischer, J.H.: British and American, continuity and divergence. In: Algeo, J. (ed.) The Cambridge History of English Language, pp. 59–85. Cambridge University Press, Cambridge (2001)
Scotto Di Carlo, G.: Lexical differences between American and British english: a survey study. Lang. Des.: J. Theor. Exp. Linguist. 15, 61-75 (2013)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Toutanova, K., Klein D., Manning C.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 252–259 (2003)
Toutanova, K., Manning, D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: EMNLP/VLC-2000, pp. 63–70 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ivanov, L., Aebig, A., Meerman, S. (2018). Lexical Stress-Based Authorship Attribution with Accurate Pronunciation Patterns Selection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-00794-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)