Abstract
Understanding what makes written texts sound like they are written by their author has been an unsolved problem for hundreds of years. The attributes of authorship are often clumped together as an attempt to solve the case of an unknown author while the practice of investigating a single attribute by eliminating the effect of all others has been paid little attention. One of the debated attributes is the size of the text segments which authors use to group words together. Texts consist of these segments — sentences — which are of different lengths, the values being distributed in ways that are assumed to be characteristic of the author. Comparing the statistics of paired text samples, we can show that differences in the statistics in fact indicate difference in the authorship of the texts. However, certain choices of metrics and units easily lead to random and meaningless results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Briscoe, T.: The syntax and semantics of punctuation and its use in interpretation. In: Proceedings of the Association for Computational Linguistics Workshop on Punctuation. pp. 1–7 (1996)
Encyclopaedia Britannica. Encyclopaedia Britannica, Inc. (1768–2014), https://www.britannica.com
Ghaeini, M.: Intrinsic author identification using modified weighted knn. In: Notebook for PAN at CLEF 2013 (2013)
Grieve, J.W.: Quantitative authorship attribution: a history and an evaluation of techniques. Master’s thesis, Simon Fraser University, British Columbia, Canada (2005)
Holmes, D.: The analysis of literary style — a review. Statistical Society A 148, 328–341 (1985)
Khonji, M., Iraqi, Y.: A slightly-modified gi-based author-verifier with lots of features (asgalf). In: Notebook for PAN at CLEF 2014 (2014)
Mendenhall, T.C.: The characteristic curves of composition. Science 11, 237–249 (1887)
Parker, H.A.: Curves of literary style. Science 13(321), 245 (1890)
Pearson, E.S., Hartley, H.O.: Biometrika tables for statisticians. vol. 2. University Press, Cambridge (1972), http://opac.inria.fr/record=b1080107
Rygl, J.: Automatic adaptation of authors stylometric features to document types. In: Proceedings of 17th International Conference, TSD 2014: Text, Speech and Dialogue. pp. 53–61. Springer (2014)
Simard, R., L’Ecuyer, P.: Computing the two-sided Kolmogorov-Smirnov distribution. Journal of Statistical Software 39(11), 1–18 (3 2011), http://www.jstatsoft.org/v39/i11
Smith, W.B.: Curves of pauline and pseudo-pauline style i-ii. Unitarian Review 30, 452–460, 539–546 (1888)
Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009), http://dx.doi.org/10.1002/asi.21001
Stamatatos, E., Daelemans, W., Verhoeven, B., Potthast, M., Stein, B., Juola, P., Sanchez-Perez, M.A., Barrόn-Cedeño, A.: Overview of the Author Identification Task at PAN 2014. Analysis 13, 31 (2014)
Williams, C.B.: A note on the statistical analysis of sentence-length as a criterion of literary style. Biometrika 31, 363–390 (1940)
Yule, G.U.: On sentence length as a statistical characteristic of style in prose: With application to two cases of disputed authorship. Biometrika 30(3-4), 363–390 (1939), http://biomet.oxfordjournals.org/content/30/3-4/363.short
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lehtonen, M. (2015). On sentence length distribution as an authorship attribute. In: Kim, K. (eds) Information Science and Applications. Lecture Notes in Electrical Engineering, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46578-3_96
Download citation
DOI: https://doi.org/10.1007/978-3-662-46578-3_96
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46577-6
Online ISBN: 978-3-662-46578-3
eBook Packages: EngineeringEngineering (R0)