Advertisement

Classifying Written Texts Through Rhythmic Features

  • Mihaela Balint
  • Mihai DascaluEmail author
  • Stefan Trausan-Matu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9883)

Abstract

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

Keywords

Rhythm Text classification Natural language processing Discourse analysis 

Notes

Acknowledgements

The work presented in this paper was partially funded by the EC H2020 project RAGE (Realising and Applied Gaming Eco-System) http://www.rageproject.eu/ Grant agreement No 644187.

References

  1. 1.
    Lefebvre, H.: Rhythmanalysis: Space. Time and Everyday Life. Continuum, London (2004)Google Scholar
  2. 2.
    Fürnkranz, J.: A study using n-gram features for text categorization. Austrian Research Institute for Artificial Intelligence, Wien (1998)Google Scholar
  3. 3.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: 14th International Conference on Machine Learning (ICML 1997), pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  4. 4.
    Chomsky, N., Halle, M.: The Sound Pattern of English. Harper & Row, New York (1968)Google Scholar
  5. 5.
    Liberman, M., Prince, A.: On stress and linguistic rhythm. Linguist. Inq. 8(2), 249–336 (1977)Google Scholar
  6. 6.
    Boychuk, E., Paramonov, I., Kozhemyakin, N., Kasatkina, N.: Automated approach for rhythm analysis of french literary texts. In: 15th Conference of Open Innovations Association FRUCT, pp. 15–23. IEEE, St. Petersburg (2014)Google Scholar
  7. 7.
    Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Heidelberg (1982)CrossRefGoogle Scholar
  8. 8.
    Barbosa, P., Bailly, G.: Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15(1–2), 127–137 (1994)CrossRefGoogle Scholar
  9. 9.
    Beeferman, D.: The rhythm of lexical stress in prose. In: 34th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, Santa Cruz (1996)Google Scholar
  10. 10.
    Galves, A., Galves, C., Garcia, J., Garcia, N., Leonardi, F.: Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 6(1), 186–209 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Buhlmann, P., Wyner, A.J.: Variable length Markov chains. Ann. Stat. 27(2), 480–513 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Patel, A.D., Daniele, J.R.: An empirical comparison of rhythm in language and music. Cognition 87(1), B35–B45 (2003)CrossRefGoogle Scholar
  13. 13.
    Grabe, E., Low, E.L.: Durational variability in speech and the rhythm class hypothesis. In: Gussenhoven, C., Warner, N. (eds.) Papers in Laboratory Phonology, pp. 515–546. Mouton de Gruyter, Berlin (2002)Google Scholar
  14. 14.
    London, J., Jones, K.: Rhythmic refinements to the nPVI measure: a reanalysis of Patel & Daniele (2003a). Music Percept. Interdisc. J. 29(1), 115–120 (2011)CrossRefGoogle Scholar
  15. 15.
    Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: 2nd SIGdial Workshop on Discourse and Dialogue (SIGDIAL 2001), vol. 16, pp. 1–10. Association for Computational Linguistics, Stroudsburg (2001)Google Scholar
  16. 16.
    Balint, M., Trausan-Matu, S.: A critical comparison of rhythm In music and natural language. Ann. Acad. Rom. Scientists Ser. Sci. Technol. Inf. 9(1), 43–60 (2016)Google Scholar
  17. 17.
    Stevens, J.P.: Applied Multivariate Statistics for the Social Sciences. Lawrence Erblaum, Mahwah (2002)zbMATHGoogle Scholar
  18. 18.
    Garson, G.D.: Multivariate GLM, MANOVA, and MANCOVA. Statistical Associates Publishing, Asheboro (2015)Google Scholar
  19. 19.
    Klecka, W.R.: Discriminant Analysis. Quantitative Applications in the Social Sciences Series, vol. 19. Sage Publications, Thousand Oaks (1980)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mihaela Balint
    • 1
  • Mihai Dascalu
    • 1
    Email author
  • Stefan Trausan-Matu
    • 1
    • 2
  1. 1.Computer Science DepartmentUniversity Politehnica of BucharestBucharestRomania
  2. 2.Research Institute for Artificial Intelligence of the Romanian AcademyBucharestRomania

Personalised recommendations