Segmentation and Automated Social Hierarchy Detection through Email Network Analysis

  • Germán Creamer
  • Ryan Rowe
  • Shlomo Hershkop
  • Salvatore J. Stolfo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5439)


We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of communications between entities to rank relationships. The advantage is that the analysis can be done in an automatic fashion and can adopt itself to organizational changes over time.

We illustrate the algorithms over real world data using the Enron corporation’s email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players.

General Terms

Social Network Enron Behavior Profile Link Mining Data Mining Corporate Householding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bar-Yossef, Z., Guy, I., Lempel, R., Maarek, Y.S., Soroka, V.: Cluster ranking with an application to mining mailbox networks. In: ICDM 2006, pp. 63–74. IEEE Computer Society, Washington (2006)Google Scholar
  2. 2.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)CrossRefMATHGoogle Scholar
  3. 3.
    Carenini, G., Ng, R.T., Zhou, X.: Scalable discovery of hidden emails from large folders. In: KDD 2005: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 544–549. ACM Press, New York (2005)CrossRefGoogle Scholar
  4. 4.
    Cohen, W.: Enron data set (March 2004)Google Scholar
  5. 5.
    Garg, D., Deepak, P., Varshney, V.: Analysis of Enron email threads and quantification of employee responsiveness. In: Proceedings of the Text Mining and Link Analysis Workshop on International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)Google Scholar
  6. 6.
    Diesner, J., Carley, K.: Exploration of communication networks from the Enron email corpus. In: Proceedings of Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, CA (2005)Google Scholar
  7. 7.
    Diesner, J., Frantz, T.L., Carley, K.M.: Communication networks from the Enron email corpus. Journal of Computational and Mathematical Organization Theory 11, 201–228 (2005)CrossRefMATHGoogle Scholar
  8. 8.
    Elsayed, T., Oard, D.W.: Modeling identity in archival collections of email: a preliminary study. In: Third Conference on Email and Anti-spam (CEAS), Mountain View, CA (July 2006)Google Scholar
  9. 9.
    Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the Fifth ACM SIGKDD International conference on knowledge discovery and data mining (KDD 1999), pp. 53–62 (1999)Google Scholar
  10. 10.
    Freeman, L.: Centrality in networks: I. conceptual clarification. Social networks 1, 215–239 (1979)CrossRefGoogle Scholar
  11. 11.
    Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7(2), 3–12 (2005)CrossRefGoogle Scholar
  12. 12.
    Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)MathSciNetMATHGoogle Scholar
  13. 13.
    Goldberg, H.G., Kirkland, J.D., Lee, D., Shyr, P., Thakker, D.: The NASD securities observation, news analysis and regulation system (sonar). In: IAAI 2003 (2003)Google Scholar
  14. 14.
    Hershkop, S.: Behavior-based Email Analysis with Application to Spam Detection. PhD thesis, Columbia University (2006)Google Scholar
  15. 15.
    Joshua O’Madadhain, D.F., White, S.: Java universal network/graph framework. JUNG 1.7.4 (2006)Google Scholar
  16. 16.
    Keila, P., Sillicorn, D.: Structure in the Enron email dataset. Journal of Computational and Mathematical Organization Theory 11, 183–199 (2005)CrossRefMATHGoogle Scholar
  17. 17.
    Kirkland, J.D., Senator, T.E., Hayden, J.J., Dybala, T., Goldberg, H.G., Shyr, P.: The NASD regulation advanced detection system (ads). AI Magazine 20(1), 55–67 (1999)Google Scholar
  18. 18.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (1999)Google Scholar
  19. 19.
    Yeh, J.-Y., Harnly, A.: Email Thread Reassembly Using Similarity Matching. In: CEAS 2006 - Third Conference on Email and Anti-spam, July 27-28, 2006, Mountain View, California, USA (2004)Google Scholar
  20. 20.
    Klimt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS (2004)Google Scholar
  21. 21.
    Klimt, B., Yang, Y.: The Enron corpus: A new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 217–226. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  22. 22.
    Madnick, S., Wang, R., Xian, X.: The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems 20(3), 41–69 (Winter 2003)Google Scholar
  23. 23.
    Madnick, S., Wang, R., Zhang, W.: A framework for corporate householding. In: Fisher, C., Davidson, B. (eds.) Proceedings of the Seventh International Conference on Information Quality, Cambridge, MA, pp. 36–40 (November 2002)Google Scholar
  24. 24.
    McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. In: NIPS 2004 Workshop on Structured Data and Representations in Probabilistic Models for Categorization, Whistler, B.C. (2004)Google Scholar
  25. 25.
    McCullough, R.: Memorandum related to reading Enron‘s scheme accounting materials (2004),
  26. 26.
    Perlich, C., Huang, Z.: Relational learning for customer relationship management. In: Proceedings of International Workshop on Customer Relationship Management: Data Mining Meets Marketing (2005)Google Scholar
  27. 27.
    Perlich, C., Provost, F.: Acora: Distribution-based aggregation for relational learning from identifier attributes. Journal of Machine Learning (2005)Google Scholar
  28. 28.
    Senator, T.E.: Link mining applications: Progress and challenges. SIGKDD Explorations 7(2), 76–83 (2005)CrossRefGoogle Scholar
  29. 29.
    Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report (2004)Google Scholar
  30. 30.
    Shetty, J., Adibi, J.: Discovering important nodes through graph entropy: the case of Enron email database. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ill (August 2005)Google Scholar
  31. 31.
    Sparrow, M.: The application of network analysis to criminal intelligence: an assessment of the prospects. Social networks 13, 251–274 (1991)CrossRefGoogle Scholar
  32. 32.
    Stolfo, S., Creamer, G., Hershkop, S.: A temporal based forensic discovery of electronic communication. In: Proceedings of the National Conference on Digital Government Research, San Diego, California (2006)Google Scholar
  33. 33.
    Stolfo, S.J., Hershkop, S., Hu, C.-W., Li, W.-J., Nimeskern, O., Wang, K.: Behavior-based modeling and its application to email analysis. ACM Transactions on Internet Technology 6(2), 187–221 (2006)CrossRefGoogle Scholar
  34. 34.
    Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Nebel, B. (ed.) Proceeding of IJCAI 2001, 17th International Joint Conference on Artificial Intelligence, Seattle, US, pp. 870–878 (2001)Google Scholar
  35. 35.
    Taskar, B., Wong, M., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Processing Systems, 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Germán Creamer
    • 1
    • 3
  • Ryan Rowe
    • 2
  • Shlomo Hershkop
    • 3
  • Salvatore J. Stolfo
    • 3
  1. 1.Center for Computational Learning SystemsColumbia UniversityNew YorkUSA
  2. 2.Department of Applied MathematicsColumbia UniversityNew YorkUSA
  3. 3.Department of Computer ScienceColumbia UniversityNew YorkUSA

Personalised recommendations