Computational Study of Stylistics: Visualizing the Writing Style with Self-Organizing Maps

  • Antonio Neme
  • Sergio Hernández
  • Teresa Dey
  • Abril Muñoz
  • J. R. G. Pulido
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 198)

Abstract

The style authors follow to express their ideas has been a subject of great debate. Several perspectives have been followed to try to analyze the style. In this contribution we present a computational methodology to study the writing style in a collection of hundreds of texts. For each text several attributes, which include different time series, are extracted and a battery of tools from the signal processing and the machine learning communities are applied to identify a set of features that may define a candidate style space. We applied self-organizing maps to visualize how several authors are distributed in the high-dimensional space associated to the style, and to visually prospect the similarities between styles from different authors.

Keywords

computational stylistics authorship attribution visualization self-organizing maps mutual information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Juola, P.: Authorship attribution. NOW Press (2008)Google Scholar
  2. 2.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. of the American Soc. for Information Science and Technology 60(3), 538–556 (2010)CrossRefGoogle Scholar
  3. 3.
    Canter, D.: An evaluation of Cusum stylistics analysis of confessions. Expert Evidence 1(2), 93–99 (1992)Google Scholar
  4. 4.
    Garrard, P., Maloney, L.M., Hodges, J.R., Patterson, K.: The effects of very early Alzheimer’s disease on the characteristics of writing by a renowned author. Brain 128, 250–260 (2005)CrossRefGoogle Scholar
  5. 5.
    Mayer, R., Rauber, A.: On Wires and Cables: Content Analysis of WikiLeaks Using Self-Organising Maps, pp. 238–246 (2011)Google Scholar
  6. 6.
    Neme, A., Cervera, A., Lugo, T.: Authorship attribution as a case of anomaly detection: A neural network model. Int. J. of Hybrid Intell. Syst. 8, 225–235 (2011)Google Scholar
  7. 7.
    Manning, C., Schutze, H.: Foundations of statistical natural language processig. MIT Press (2003)Google Scholar
  8. 8.
    Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), 135–156 (2004)CrossRefGoogle Scholar
  9. 9.
    Abarbanel, H.: Analysis of observed chaotic data. Springer (1996)Google Scholar
  10. 10.
    Kantz, H., Schreiber, T.: Nonlinear time series analysis, 2nd edn. Cambridge PressGoogle Scholar
  11. 11.
    Cellucci, C.J., Albano, A.M., College, B., Rapp, P.E.: Statistical Validation of Mutual Information Calculations: Comparison of Alternative Numerical Algorithms. Physical Review E 71(6) (2005), doi:10.1103/PhysRevE.71.066208Google Scholar
  12. 12.
    Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)MathSciNetMATHGoogle Scholar
  13. 13.
    Kohonen, T.: Self-organizing maps, 2nd edn. Springer (2000)Google Scholar
  14. 14.
    Hujun, Y.: The Self-Organizing Maps: Background, Theories, Extensions and Applications. Studies in Computational Intelligence (SCI) 115, 715–762 (2008)CrossRefGoogle Scholar
  15. 15.
    Quinlan, R.: Programs for Machine Learning. Morgan Kaufmann Publishers (1993)Google Scholar
  16. 16.
    Cortes, M.L., Ruiz-Shulcloper, J., Alba-Cabrera, E.: An overview of the evolution of the concept of testor. Pattern Recognition 34, 753–762 (2001)MATHCrossRefGoogle Scholar
  17. 17.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. of Machine Learning Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  18. 18.
    Hernández, S., Neme, A.: Identification of the minimal set of attributes that maximizes the authorship information (to appear in LNCS, 2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Antonio Neme
    • 1
    • 2
  • Sergio Hernández
    • 3
  • Teresa Dey
    • 4
  • Abril Muñoz
    • 5
  • J. R. G. Pulido
    • 6
  1. 1.Complex Systems GroupUniversidad Autónoma de la Ciudad de MéxicoMéxicoD.F. México
  2. 2.Institute for Molecular MedicineHelsinkiFinland
  3. 3.Postgraduation Program in Complex SystemsUniversidad Autónoma de la Ciudad de MéxicoMéxicoMéxico
  4. 4.Faculty of Literary CreationUniversidad Autónoma de la Ciudad de MéxicoMéxicoMéxico
  5. 5.CINVESTAV IDSMexicoMéxico D.F.
  6. 6.Facultad de TelemáticaUniversidad de ColimaMéxicoMéxico

Personalised recommendations