Skip to main content

Segmentation and Automated Social Hierarchy Detection through Email Network Analysis

  • Conference paper
Advances in Web Mining and Web Usage Analysis (SNAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5439))

Included in the following conference series:

Abstract

We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of communications between entities to rank relationships. The advantage is that the analysis can be done in an automatic fashion and can adopt itself to organizational changes over time.

We illustrate the algorithms over real world data using the Enron corporation’s email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players.

This work is based on an earlier work: Automated Social Hierarchy Detection through Email Network Analysis in Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis© ACM, 2007. http://doi.acm.org/10.1145/1348549.1348562

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bar-Yossef, Z., Guy, I., Lempel, R., Maarek, Y.S., Soroka, V.: Cluster ranking with an application to mining mailbox networks. In: ICDM 2006, pp. 63–74. IEEE Computer Society, Washington (2006)

    Google Scholar 

  2. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)

    Article  MATH  Google Scholar 

  3. Carenini, G., Ng, R.T., Zhou, X.: Scalable discovery of hidden emails from large folders. In: KDD 2005: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 544–549. ACM Press, New York (2005)

    Chapter  Google Scholar 

  4. Cohen, W.: Enron data set (March 2004)

    Google Scholar 

  5. Garg, D., Deepak, P., Varshney, V.: Analysis of Enron email threads and quantification of employee responsiveness. In: Proceedings of the Text Mining and Link Analysis Workshop on International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)

    Google Scholar 

  6. Diesner, J., Carley, K.: Exploration of communication networks from the Enron email corpus. In: Proceedings of Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, CA (2005)

    Google Scholar 

  7. Diesner, J., Frantz, T.L., Carley, K.M.: Communication networks from the Enron email corpus. Journal of Computational and Mathematical Organization Theory 11, 201–228 (2005)

    Article  MATH  Google Scholar 

  8. Elsayed, T., Oard, D.W.: Modeling identity in archival collections of email: a preliminary study. In: Third Conference on Email and Anti-spam (CEAS), Mountain View, CA (July 2006)

    Google Scholar 

  9. Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the Fifth ACM SIGKDD International conference on knowledge discovery and data mining (KDD 1999), pp. 53–62 (1999)

    Google Scholar 

  10. Freeman, L.: Centrality in networks: I. conceptual clarification. Social networks 1, 215–239 (1979)

    Article  Google Scholar 

  11. Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7(2), 3–12 (2005)

    Article  Google Scholar 

  12. Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)

    MathSciNet  MATH  Google Scholar 

  13. Goldberg, H.G., Kirkland, J.D., Lee, D., Shyr, P., Thakker, D.: The NASD securities observation, news analysis and regulation system (sonar). In: IAAI 2003 (2003)

    Google Scholar 

  14. Hershkop, S.: Behavior-based Email Analysis with Application to Spam Detection. PhD thesis, Columbia University (2006)

    Google Scholar 

  15. Joshua O’Madadhain, D.F., White, S.: Java universal network/graph framework. JUNG 1.7.4 (2006)

    Google Scholar 

  16. Keila, P., Sillicorn, D.: Structure in the Enron email dataset. Journal of Computational and Mathematical Organization Theory 11, 183–199 (2005)

    Article  MATH  Google Scholar 

  17. Kirkland, J.D., Senator, T.E., Hayden, J.J., Dybala, T., Goldberg, H.G., Shyr, P.: The NASD regulation advanced detection system (ads). AI Magazine 20(1), 55–67 (1999)

    Google Scholar 

  18. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (1999)

    Google Scholar 

  19. Yeh, J.-Y., Harnly, A.: Email Thread Reassembly Using Similarity Matching. In: CEAS 2006 - Third Conference on Email and Anti-spam, July 27-28, 2006, Mountain View, California, USA (2004)

    Google Scholar 

  20. Klimt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS (2004)

    Google Scholar 

  21. Klimt, B., Yang, Y.: The Enron corpus: A new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 217–226. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Madnick, S., Wang, R., Xian, X.: The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems 20(3), 41–69 (Winter 2003)

    Google Scholar 

  23. Madnick, S., Wang, R., Zhang, W.: A framework for corporate householding. In: Fisher, C., Davidson, B. (eds.) Proceedings of the Seventh International Conference on Information Quality, Cambridge, MA, pp. 36–40 (November 2002)

    Google Scholar 

  24. McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. In: NIPS 2004 Workshop on Structured Data and Representations in Probabilistic Models for Categorization, Whistler, B.C. (2004)

    Google Scholar 

  25. McCullough, R.: Memorandum related to reading Enron‘s scheme accounting materials (2004), http://www.mresearch.com/pdfs/89.pdf

  26. Perlich, C., Huang, Z.: Relational learning for customer relationship management. In: Proceedings of International Workshop on Customer Relationship Management: Data Mining Meets Marketing (2005)

    Google Scholar 

  27. Perlich, C., Provost, F.: Acora: Distribution-based aggregation for relational learning from identifier attributes. Journal of Machine Learning (2005)

    Google Scholar 

  28. Senator, T.E.: Link mining applications: Progress and challenges. SIGKDD Explorations 7(2), 76–83 (2005)

    Article  Google Scholar 

  29. Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report (2004)

    Google Scholar 

  30. Shetty, J., Adibi, J.: Discovering important nodes through graph entropy: the case of Enron email database. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ill (August 2005)

    Google Scholar 

  31. Sparrow, M.: The application of network analysis to criminal intelligence: an assessment of the prospects. Social networks 13, 251–274 (1991)

    Article  Google Scholar 

  32. Stolfo, S., Creamer, G., Hershkop, S.: A temporal based forensic discovery of electronic communication. In: Proceedings of the National Conference on Digital Government Research, San Diego, California (2006)

    Google Scholar 

  33. Stolfo, S.J., Hershkop, S., Hu, C.-W., Li, W.-J., Nimeskern, O., Wang, K.: Behavior-based modeling and its application to email analysis. ACM Transactions on Internet Technology 6(2), 187–221 (2006)

    Article  Google Scholar 

  34. Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Nebel, B. (ed.) Proceeding of IJCAI 2001, 17th International Joint Conference on Artificial Intelligence, Seattle, US, pp. 870–878 (2001)

    Google Scholar 

  35. Taskar, B., Wong, M., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Processing Systems, 2004 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Creamer, G., Rowe, R., Hershkop, S., Stolfo, S.J. (2009). Segmentation and Automated Social Hierarchy Detection through Email Network Analysis. In: Zhang, H., et al. Advances in Web Mining and Web Usage Analysis. SNAKDD 2007. Lecture Notes in Computer Science(), vol 5439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00528-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00528-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00527-5

  • Online ISBN: 978-3-642-00528-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics