Skip to main content
Log in

A Methodology of Using a Concordancer and Table Processor for Authorship Attribution

  • TEXT PROCESSING AUTOMATION
  • Published:
Automatic Documentation and Mathematical Linguistics Aims and scope

Abstract

The paper proposes an original methodology of authorship attribution based on the deviations from Zipf distribution and statistical data obtained with the help of a concordance program and computations performed in a table processor. The methodology involves finding distances between input texts and a reference text basing on deviations of stop-words frequencies. The results that have been achieved prove that the proposed methodology allows performing efficient authorship attribution and that it can be used in the educational process to develop student skills and competencies pertaining to natural language processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. On the approval of the federal state educational standard of higher education in the direction of preparation 03.03.02 Linguistics (bachelor’s level): order of the Ministry of Education and Science of Russia dated 07.08.2014 N 940. - URL: http:// fgosvo.ru/uploadfiles/fgosvob/450302_Lingvistika.pdf (date of the application: 25.06.2020).

REFERENCES

  1. Francis, W.N. and Kucera, H., Computational Analysis of Present Day American English, Providence, RI: Brown Univ. Press, 1967.

    Google Scholar 

  2. Anthony, L., AntConc 3.5.8, Tokyo: Waseda Univ., 2019. https://www.laurenceanthony.net/software. Accessed June 25, 2020.

  3. Yatsko, V.A., Zipf’s law as an indicator of the reference data distribution, in Rol’ i mesto informatsionnykh tekhnologii v sovremennoi nauke (The Role and Place of Information Technology in Modern Science), Omsk, 2016, pp. 48–50. https://os-russia.com/SBORNIKI/KON-129.pdf#page=48. Accessed June 25, 2020.

  4. Yatsko, V.A., Automatic text classification method based on Zipf’s law, Autom. Doc. Math. Linguist., 2015, vol. 49, pp. 83–88.

    Article  Google Scholar 

  5. Amarasinghe, K., Manic, M., and Hruska, R., Optimal stop word selection for text mining in critical infrastructure domain, Resilience Week (RWS), Philadelphia, PA, 2015, pp. 1–6. https://www.researchgate.net/publication/ 281377695_Optimal_Stop_Word_Selection_for_Text_ Mining_in_Critical_Infrastructure_Domain#fullTextFileContent.https://doi.org/10.1109/RWEEK.2015.7287440

    Book  Google Scholar 

  6. Singhal, A., Buckley, C., and Mitra, M., Pivoted document length-normalization, SIGIR Forum, 2017, vol. 51, no. 2, pp. 176–184. https://doi.org/10.1145/3130348.3130365http://singhal.info/pivoted-dln.pdf. Accessed June 25, 2020.

  7. Sinclair, J., Reading Concordances, London: Longman, 2003. http://www.twc.it/rc/readings.htm. Accessed June 25, 2020.

  8. Concapp.rar. https://docs.zoho.com/file/1hhltd2e9dd94a00d4aec88094394b1d42255. Accessed June 25, 2020.

  9. Scott, M., WordSmith Tools Version 8, 2020, Stroud: Lexical Analysis Software. https://lexically.net/wordsmith/?gclid=EAIaIQobChMI-pLbtuSV6gIVkpIYCh208guuEAAYASAAEgKAAvD_BwE. Accessed June 25, 2020.

  10. WordStat, Provalis Research, 2020. https://provalisresearch.com/products/content-analysis-software/.Accessed June 25, 2020.

  11. Free eBooks – Project Gutenberg. https://www.gutenberg.org/. Accessed June 25, 2020.

  12. Dendamrongvit, S., Vateekul, P., and Kubat, M., Irrelevant attributes and imbalanced classes in multi-label text-categorization domains, Intell. Data Anal., 2011, vol. 15, no. 6, pp. 843–859. https://content.iospress.com/articles/intelligent-data-analysis/ida00499. Accessed June 25, 2020.

    Article  Google Scholar 

  13. Yatsko, V., Zonal text processing, Digital Scholarship Humanit., 2016, vol. 31, no. 4, pp. 773–781.

    Article  Google Scholar 

  14. Fox, C., A stop list for general text, SIGIR Forum, 1989, vol. 24, nos. 1–2, pp. 19–21. https://doi.org/10.1145/378881.378888. https://dl.acm.org/doi/pdf/10.1145/378881.378888. Accessed June 25, 2020.

Download references

Funding

This research was carried out with the support of the Russian Foundation for Basic Research (project no. 20-07-00124).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. A. Yatsko.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yatsko, V.A. A Methodology of Using a Concordancer and Table Processor for Authorship Attribution. Autom. Doc. Math. Linguist. 54, 269–274 (2020). https://doi.org/10.3103/S0005105520050088

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0005105520050088

Keywords:

Navigation