Visualizing Authorship for Identification

  • Ahmed Abbasi
  • Hsinchun Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3975)


As a result of growing misuse of online anonymity, researchers have begun to create visualization tools to facilitate greater user accountability in online communities. In this study we created an authorship visualization called Writeprints that can help identify individuals based on their writing style. The visualization creates unique writing style patterns that can be automatically identified in a manner similar to fingerprint biometric systems. Writeprints is a principal component analysis based technique that uses a dynamic feature-based sliding window algorithm, making it well suited at visualizing authorship across larger groups of messages. We evaluated the effectiveness of the visualization across messages from three English and Arabic forums in comparison with Support Vector Machines (SVM) and found that Writeprints provided excellent classification performance, significantly outperforming SVM in many instances. Based on our results, we believe the visualization can assist law enforcement in identifying cyber criminals and also help users authenticate fellow online members in order to deter cyber deception.


Support Vector Machine Online Community Software Piracy Writing Style Online Anonymity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems 20(5), 67–75 (2005)CrossRefGoogle Scholar
  2. 2.
    Baayen, R.H., Halteren, H.v., Tweedie, F.J.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 2, 110–120 (1996)Google Scholar
  3. 3.
    Burrows, J.F.: Word patterns and story shapes: the statistical analysis of narrative style. Literary and Linguistic Computing 2, 61–67 (1987)CrossRefGoogle Scholar
  4. 4.
    De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining E-mail content for author identification forensics. SIGMOD Record 30(4), 55–64 (2001)CrossRefGoogle Scholar
  5. 5.
    Donath, J.: Identity and Deception in the Virtual Community. In: Communities in Cyberspace. Routledge Press, London (1999)Google Scholar
  6. 6.
    Donath, J., Karahalio, K., Viegas, F.: Visualizing Conversation. In: Proceedings of the 32nd Hawaii International Conference on System Sciences (HICSS 1999), Hawaii, USA (1999)Google Scholar
  7. 7.
    Erickson, T., Kellogg, W.A.: Social Translucence: An Approach to Designing Systems that Support Social Processes. ACM Transactions on Computer-Human Interaction 7(1), 59–83 (2001)CrossRefGoogle Scholar
  8. 8.
    Kelly, S.U., Sung, C., Farnham, S.: Designing for Improved Social Responsibility, User Participation and Content in On-Line Communities. In: Proceedings of the Conference on Human Factors in Computing Systems, CHI 2002 (2002)Google Scholar
  9. 9.
    Kjell, B., Woods, W.A., Frieder, O.: Discrimination of authorship using visualization. Information Processing and Management 30(1), 141–150 (1994)CrossRefGoogle Scholar
  10. 10.
    Li, J., Zeng, R., Chen, H.: From Fingerprint to Writeprint. Communications of the ACM (2006) (Forthcoming)Google Scholar
  11. 11.
    Moores, T., Dhillon, G.: Software Piracy: A View from Hong Kong. Communications of the ACM 43(12), 88–93 (2000)CrossRefGoogle Scholar
  12. 12.
    Ribler, R.L., Abrams, M.: Using visualization to detect plagiarism in computer science classess. In: Proceedings of the IEEE Symposium on Information Vizualization (2000)Google Scholar
  13. 13.
    Rocco, E.: Trust Breaks Down in Electronic Contexts but can be repaired by some Initial Face-to-Face Contact. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI 1998), pp. 496–502 (1998)Google Scholar
  14. 14.
    Sack, W.: Conversation Map: An Interface for Very Large-Scale Conversations. Journal of Management Information Systems 17(3), 73–92 (2000)Google Scholar
  15. 15.
    Shaw, C.D., Kukla, J.M., Soboroff, I., Ebert, D.S., Nicholas, C.K., Zwa, A., Miller, E.L., Roberts, D.A.: Interactive volumetric information visualization for document corpus management. International Journal on Digital Libraries 2, 144–156 (1999)CrossRefGoogle Scholar
  16. 16.
    Tweedie, F.J., Singh, S., Holmes, D.I.: Neural Network applications in stylometry: the Federalist papers. Computers and the Humanities 30(1), 1–10 (1996)CrossRefGoogle Scholar
  17. 17.
    Viegas, F.B., Smith, M.: Newsgroup Crowds and AuthorLines: Visualizing the Activity of Individuals. In: Conversational Cyberspaces Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS 2004), Hawaii, USA (2004)Google Scholar
  18. 18.
    Watanabe, S.: Pattern Recognition: Human and Mechanical. John Wiley and Sons, Inc., New York (1985)Google Scholar
  19. 19.
    Webb, A.: Statistical Pattern Recognition. John Wiley and Sons, Inc., New York (2002)MATHCrossRefGoogle Scholar
  20. 20.
    Zheng, R., Qin, Y., Huang, Z., Chen, H.: A Framework for Authorship Analysis of Online Messages: Writing-style Features and Techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ahmed Abbasi
    • 1
  • Hsinchun Chen
    • 1
  1. 1.Department of Management Information SystemsThe University of ArizonaTucsonUSA

Personalised recommendations