Abstract
We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of communications between entities to rank relationships. The advantage is that the analysis can be done in an automatic fashion and can adopt itself to organizational changes over time.
We illustrate the algorithms over real world data using the Enron corporation’s email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players.
This work is based on an earlier work: Automated Social Hierarchy Detection through Email Network Analysis in Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis© ACM, 2007. http://doi.acm.org/10.1145/1348549.1348562
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bar-Yossef, Z., Guy, I., Lempel, R., Maarek, Y.S., Soroka, V.: Cluster ranking with an application to mining mailbox networks. In: ICDM 2006, pp. 63–74. IEEE Computer Society, Washington (2006)
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)
Carenini, G., Ng, R.T., Zhou, X.: Scalable discovery of hidden emails from large folders. In: KDD 2005: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 544–549. ACM Press, New York (2005)
Cohen, W.: Enron data set (March 2004)
Garg, D., Deepak, P., Varshney, V.: Analysis of Enron email threads and quantification of employee responsiveness. In: Proceedings of the Text Mining and Link Analysis Workshop on International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)
Diesner, J., Carley, K.: Exploration of communication networks from the Enron email corpus. In: Proceedings of Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, CA (2005)
Diesner, J., Frantz, T.L., Carley, K.M.: Communication networks from the Enron email corpus. Journal of Computational and Mathematical Organization Theory 11, 201–228 (2005)
Elsayed, T., Oard, D.W.: Modeling identity in archival collections of email: a preliminary study. In: Third Conference on Email and Anti-spam (CEAS), Mountain View, CA (July 2006)
Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the Fifth ACM SIGKDD International conference on knowledge discovery and data mining (KDD 1999), pp. 53–62 (1999)
Freeman, L.: Centrality in networks: I. conceptual clarification. Social networks 1, 215–239 (1979)
Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7(2), 3–12 (2005)
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)
Goldberg, H.G., Kirkland, J.D., Lee, D., Shyr, P., Thakker, D.: The NASD securities observation, news analysis and regulation system (sonar). In: IAAI 2003 (2003)
Hershkop, S.: Behavior-based Email Analysis with Application to Spam Detection. PhD thesis, Columbia University (2006)
Joshua O’Madadhain, D.F., White, S.: Java universal network/graph framework. JUNG 1.7.4 (2006)
Keila, P., Sillicorn, D.: Structure in the Enron email dataset. Journal of Computational and Mathematical Organization Theory 11, 183–199 (2005)
Kirkland, J.D., Senator, T.E., Hayden, J.J., Dybala, T., Goldberg, H.G., Shyr, P.: The NASD regulation advanced detection system (ads). AI Magazine 20(1), 55–67 (1999)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (1999)
Yeh, J.-Y., Harnly, A.: Email Thread Reassembly Using Similarity Matching. In: CEAS 2006 - Third Conference on Email and Anti-spam, July 27-28, 2006, Mountain View, California, USA (2004)
Klimt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS (2004)
Klimt, B., Yang, Y.: The Enron corpus: A new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
Madnick, S., Wang, R., Xian, X.: The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems 20(3), 41–69 (Winter 2003)
Madnick, S., Wang, R., Zhang, W.: A framework for corporate householding. In: Fisher, C., Davidson, B. (eds.) Proceedings of the Seventh International Conference on Information Quality, Cambridge, MA, pp. 36–40 (November 2002)
McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. In: NIPS 2004 Workshop on Structured Data and Representations in Probabilistic Models for Categorization, Whistler, B.C. (2004)
McCullough, R.: Memorandum related to reading Enron‘s scheme accounting materials (2004), http://www.mresearch.com/pdfs/89.pdf
Perlich, C., Huang, Z.: Relational learning for customer relationship management. In: Proceedings of International Workshop on Customer Relationship Management: Data Mining Meets Marketing (2005)
Perlich, C., Provost, F.: Acora: Distribution-based aggregation for relational learning from identifier attributes. Journal of Machine Learning (2005)
Senator, T.E.: Link mining applications: Progress and challenges. SIGKDD Explorations 7(2), 76–83 (2005)
Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report (2004)
Shetty, J., Adibi, J.: Discovering important nodes through graph entropy: the case of Enron email database. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ill (August 2005)
Sparrow, M.: The application of network analysis to criminal intelligence: an assessment of the prospects. Social networks 13, 251–274 (1991)
Stolfo, S., Creamer, G., Hershkop, S.: A temporal based forensic discovery of electronic communication. In: Proceedings of the National Conference on Digital Government Research, San Diego, California (2006)
Stolfo, S.J., Hershkop, S., Hu, C.-W., Li, W.-J., Nimeskern, O., Wang, K.: Behavior-based modeling and its application to email analysis. ACM Transactions on Internet Technology 6(2), 187–221 (2006)
Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Nebel, B. (ed.) Proceeding of IJCAI 2001, 17th International Joint Conference on Artificial Intelligence, Seattle, US, pp. 870–878 (2001)
Taskar, B., Wong, M., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Processing Systems, 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Creamer, G., Rowe, R., Hershkop, S., Stolfo, S.J. (2009). Segmentation and Automated Social Hierarchy Detection through Email Network Analysis. In: Zhang, H., et al. Advances in Web Mining and Web Usage Analysis. SNAKDD 2007. Lecture Notes in Computer Science(), vol 5439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00528-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-00528-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00527-5
Online ISBN: 978-3-642-00528-2
eBook Packages: Computer ScienceComputer Science (R0)