Problems of Information Transmission

, Volume 37, Issue 2, pp 172–184

Using Literal and Grammatical Statistics for Authorship Attribution

  • O. V. Kukushkina
  • A. A. Polikarpov
  • D. V. Khmelev
Article

Abstract

Markov chains are used as a formal mathematical model for sequences of elements of a text. This model is applied for authorship attribution of texts. As elements of a text, we consider sequences of letters or sequences of grammatical classes of words. It turns out that the frequencies of occurrences of letter pairs and pairs of grammatical classes in a Russian text are rather stable characteristics of an author and, apparently, they could be used in disputed authorship attribution. A comparison of results for various modifications of the method using both letters and grammatical classes is given. Experimental research involves 385 texts of 82 writers. In the Appendix, the research of D.V. Khmelev is described, where data compression algorithms are applied to authorship attribution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    Khmelev, D.V., Using Markov Chains for Authorship Attribution, Vestn. MGU, Ser. 9, Philolog., 2000, no. 2, pp. 115-126.Google Scholar
  2. 2.
    Morozov, N.A., Linguistic Spectra: Methods for Telling Plagiary from True Creation of a Known Author. Stylometric Study, Izv. Otd. Rus. Yazyka i Slovesnosti Imp. Akad. Nauk, 1915, vol. 20,no. 4.Google Scholar
  3. 3.
    Markov, A.A., On One Application of the Statistical Method, Izv. Imp. Akad. Nauk, Ser. 6, 1916, no. 4, pp. 239-242.Google Scholar
  4. 4.
    Markov, A.A., An Example of Statistical Analysis of the Text of “Evgenii Onegin” Illustrating the Linking of Events into a Chain, Izv. Imp. Akad. Nauk, Ser. 6, 1913, no. 3, pp. 153-162.Google Scholar
  5. 5.
    Ot Nestora do Fonvizina. Novye metody opredeleniya avtorstva (From Nestor to Fonvizin. New Methods for Authorship Attribution), Moscow: Progress, 1994.Google Scholar
  6. 6.
    Holmes, D.I., The Evolution of Stylometry in Humanities Scholarship, Literary Linguist. Comput., 1997, vol. 13,no. 3, pp. 111-117.Google Scholar
  7. 7.
    Fomenko, V.P. and Fomenko, T.G., Author Invariant for Russian Literary Texts, in Metody kolichestvennogo analiza tekstov narrativnykh istochnikov (Methods of Quantitative Analysis of Texts from Narrative Sources), Moscow: Inst. Istorii SSSR, 1983, pp. 86-109.Google Scholar
  8. 8.
    Yaglom, A.M. and Yaglom, I.M., Veroyatnost' i informatsiya, Moscow: Nauka, 1960. Translated under the title Probability and Information, Boston: Reidel, 1983.Google Scholar
  9. 9.
    Dobrushin, R.L., Mathematical Methods in Linguistics, Matematicheskoe prosveshchenie, 1959, issue 6.Google Scholar
  10. 10.
    Zaliznyak, A.A., Grammaticheskii slovar' russkogo yazyka (Grammatical Dictionary of the Russian Language), Moscow: Rus. Yaz., 1977.Google Scholar
  11. 11.
    Grammatika sovremennogo russkogo literaturnogo yazyka (The Grammar of Modern Russian Literary Language), Moscow: Nauka, 1970.Google Scholar
  12. 12.
    Russkaya grammatika (Russian Grammar), 2 vols., Moscow: Nauka, 1980.Google Scholar
  13. 13.
    Ivchenko, G.I. and Medvedev, Yu.I., Matematicheskaya statistika (Mathematical Statistics), Moscow: Vyssh. Shkola, 1992.Google Scholar
  14. 14.
    Li, M. and Vitányi, P., An Introduction to Kolmogorov Complexity and Its Applications, New York: Springer, 1997.Google Scholar
  15. 15.
    Kolmogorov, A.N., Three Approaches to the Definition of the Notion of “Information Amount,” Probl. Peredachi Inf., 1965, vol. 1,no. 1, pp. 3-11.Google Scholar
  16. 16.
    Burrows, M. and Wheeler, D.J., A Block-Sorting Lossless Data Compression Algorithm, Digital SRC Research Report, 1994, no. 124. Available from ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gzGoogle Scholar
  17. 17.
    Lempel, A. and Ziv, J., On the Complexity of Finite Sequences, IEEE Trans. Inf. Theory., 1976, vol. 22,no. 1, pp. 75-81.Google Scholar
  18. 18.
    Cleary, J.G. and Witten, I.H., Data Compression Using Adaptive Coding and Partial String Matching, IEEE Trans. Commun., 1984, vol. 32,no. 4, pp. 396-402.Google Scholar
  19. 19.
    Cormack, G.V. and Horspool, R.N., Data Compression Using Dynamic Markov Modelling, Comput. J., 1987, vol. 30,no. 6, pp. 541-550.Google Scholar

Copyright information

© MAIK “Nauka/Interperiodica” 2001

Authors and Affiliations

  • O. V. Kukushkina
  • A. A. Polikarpov
  • D. V. Khmelev

There are no affiliations available

Personalised recommendations