Advertisement

StyleExplorer: A Toolkit for Textual Writing Style Visualization

  • Michael TschuggnallEmail author
  • Thibault Gerrier
  • Günther Specht
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11438)

Abstract

The analysis of textual writing styles is a well-studied problem with ongoing and active research in fields like authorship attribution, author profiling, text segmentation or plagiarism detection. While many features have been proposed and shown to be effective to characterize authors or document types in terms of high-dimensional feature vectors, an intuitive, human-friendly view on the computed data is often lacking. For example, machine learning algorithms are able to attribute previously unseen documents to a set of known authors by utilizing those features, but a visualization of the most discriminating features is usually not provided. To this end, we present StyleExplorer, a freely available web tool that is able to extract textual features from documents and to visualize them in multiple variants. Besides analyzing single documents intrinsically, it is also possible to visually compare multiple documents in single views with respect to selected metrics, making it a valuable analysis tool for various tasks in natural language processing as well as for areas in the humanities that work and analyze textual data.

Keywords

Text mining Visualization Natural language processing 

References

  1. 1.
    Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING), p. 611. ACL (2004)Google Scholar
  2. 2.
    Gibbons, J.: Forensic Linguistics: An Introduction to Language in the Justice System. Wiley-Blackwell, Hoboken (2003)Google Scholar
  3. 3.
    Misra, H., et al.: Text segmentation: a topic modeling perspective. Inf. Process. Manage. 47(4), 528–544 (2011)CrossRefGoogle Scholar
  4. 4.
    Huber, B.: Evaluation of Style Features of Text Documents, Bachelor thesis. Department of Computer Science, Universität Innsbruck (2016)Google Scholar
  5. 5.
    Koppel, M., Schler, J.: Exploiting stylistic idiosyncrasies for authorship attribution. In: Proceedings of the 18th International Joint Conference on AI, vol. 69, pp. 72–80 (2003)Google Scholar
  6. 6.
    Potthast, M., et al.: Overview of the 5th international competition on plagiarism detection. In: Notebook Papers of the 9th PAN Evaluation Lab (2013)Google Scholar
  7. 7.
    Mosteller, F., Wallace, D.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)zbMATHGoogle Scholar
  8. 8.
    Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, vol. 1609 (2016)Google Scholar
  9. 9.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009).  https://doi.org/10.1002/asi.v60:3CrossRefGoogle Scholar
  10. 10.
    Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Notebook Papers of the 5th PAN Evaluation Lab (2011)Google Scholar
  11. 11.
    Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)CrossRefGoogle Scholar
  12. 12.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  13. 13.
    Tschuggnall, M., Specht, G.: Using grammar-profiles to intrinsically expose plagiarism in text documents. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 297–302. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38824-8_28CrossRefGoogle Scholar
  14. 14.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)CrossRefGoogle Scholar
  15. 15.
    Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006).  https://doi.org/10.1007/11735106_66CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Michael Tschuggnall
    • 1
    Email author
  • Thibault Gerrier
    • 1
  • Günther Specht
    • 1
  1. 1.Department of Computer ScienceUniversität InnsbruckInnsbruckAustria

Personalised recommendations