Skip to main content

Unsupervised Author Identification and Characterization

  • Conference paper
  • First Online:
  • 366 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 612))

Abstract

Author identification is a hot topic, especially in the Internet age. Following our previous work in which we proposed a novel approach to this problem, based on relational representations that take into account the structure of sentences, here we present a tool that computes and visualizes a numerical and graphical characterization of the authors/texts based on several linguistic features. This tool, that extends a previous language analysis tool, is the ideal complement to the author identification technique, that is based on a clustering procedure whose outcomes (i.e., the authors’ models) are not human-readable. Both approaches are unsupervised, which allows them to tackle problems to which other state-of-the-art systems are not applicable.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features: research articles. J. Am. Soc. Inf. Sci. Technol. 58(6), 802–822 (2007)

    Article  Google Scholar 

  2. Feng, V.W., Hirst, G.: Authorship verication with entity coherence and other rich linguistic features notebook for PAN at CLEF 2013. In: CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, PROMISE, September 2013

    Google Scholar 

  3. Ferilli, S.: A sentence structure-based approach to unsupervised author identification. J. Intell. Inf. Syst. 1–19. Published on-line: 19 December 2014

    Google Scholar 

  4. Ferilli, S., Basile, T.M.A., Biba, M., Di Mauro, N., Esposito, F.: A general similarity framework for horn clause logic. Fundamenta Informaticæ 90(1–2), 43–46 (2009)

    MathSciNet  MATH  Google Scholar 

  5. Ferilli, S., Esposito, F., Grieco, D.: Automatic learning of linguistic resources for stopword removal and stemming from text. Procedia Comput. Sci. 38, 116–123 (2014)

    Article  Google Scholar 

  6. Leuzzi, F., Ferilli, S., Rotella, F.: ConNeKTion: a tool for handling conceptual graphs automatically extracted from text. In: Catarci, T., Ferro, N., Poggi, A. (eds.) IRCDL 2013. CCIS, vol. 385, pp. 93–104. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  7. Li, J., Zheng, R., Chen, H.: From fingerprint to writeprint. Commun. ACM 49(4), 76–82 (2006)

    Article  Google Scholar 

  8. Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Heidelberg (1987)

    Book  MATH  Google Scholar 

  9. Mccarthy, P.M., Lewis, G.A., Dufty, D.F., Mcnamara, D.S.: Analyzing writing styles with coh-metrix. In: Florida Artificial Intelligence Research Society International Conference (FLAIRS), pp. 764–769. AAAI Press (2006)

    Google Scholar 

  10. Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars. In: ACL 2010 Conference Short Papers, ACLShort 2010, pp. 38–42. Association for Computational Linguistics (2010)

    Google Scholar 

  11. Rotella, F., Ferilli, S., Leuzzi, F.: A domain based approach to information retrieval in digital libraries. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds.) IRCDL 2012. CCIS, vol. 354, pp. 129–140. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Seidman, S.: Authorship verification using the impostors method - notebook for PAN at CLEF 2013. In: CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, PROMISE, September 2013

    Google Scholar 

  13. van Halteren, H.: Linguistic profiling for author recognition and verification. In: 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics (2004)

    Google Scholar 

  14. Vilariño, D., Pinto, D., Gómez, H., León, S., Castillo, E.: Lexical-syntactic and graph-based features for authorship verification - notebook for PAN at CLEF 2013. In: CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, PROMISE, September 2013

    Google Scholar 

  15. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Fabio Leuzzi, Fulvio Rotella and Domenico Grieco for their work in setting up the system and running the experiments. This work was partially funded by the Italian PON 2007–2013 project PON02_00563_3489339 “Puglia@Service”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Ferilli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ferilli, S., Redavid, D., Esposito, F. (2016). Unsupervised Author Identification and Characterization. In: Calvanese, D., De Nart, D., Tasso, C. (eds) Digital Libraries on the Move. IRCDL 2015. Communications in Computer and Information Science, vol 612. Springer, Cham. https://doi.org/10.1007/978-3-319-41938-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41938-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41937-4

  • Online ISBN: 978-3-319-41938-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics