Structure in the Enron Email Dataset

Keila, P. S.; Skillicorn, D. B.

doi:10.1007/s10588-005-5379-y

P. S. Keila¹ &
D. B. Skillicorn¹

868 Accesses
57 Citations
Explore all metrics

Abstract

We investigate the structures present in the Enron email dataset using singular value decomposition and semidiscrete decomposition. Using word frequency profiles, we show that messages fall into two distinct groups, whose extrema are characterized by short messages and rare words versus long messages and common words. It is surprising that length of message and word use pattern should be related in this way. We also investigate relationships among individuals based on their patterns of word use in email. We show that word use is correlated to function within the organization, as expected. Lastly, we show that relative changes to individuals' word usage over time can be used to identify key players in major company events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Click to subscribe: interest group emails as a source of data

Article 21 July 2020

Hierarchical and Matrix Structures in a Large Organizational Email Network: Visualization and Modeling Approaches

Assortative Mixture of English Parts of Speech

References

British National Corpus (BNC), (2004), http://www.natcorp.ox.ac.uk.
Cohen, W.W. (1996), “Learning to Classify English Text with ILP Methods,” in L. De Raedt (Eds.), Advances in Inductive Logic Programming, IOS Press, pp. 124–143.
Diesner, J. and K. Carley (2005), “Exploration of Communication Networks from the Enron Email Corpus,”in Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining, pp. 3–14.
European Parliament Temporary Committee on the ECHELON Interception System (2001), “Final Report on the Existence of a Global System for the Interception of Private and Commercial Communications,” Echelon Interception System.
Golub, G.H. and C.F. van Loan (1996), Matrix Computations, 3rd edn. Johns Hopkins University Press.
Kolda, G. and D.P. O'Leary (1998), “A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval,” ACM Transactions on Information Systems, 16, 322–346.
Article Google Scholar
Kolda, T.G. and D.P. O'Leary (1999), “Computation and Uses of the Semidiscrete Matrix Decomposition,” ACM Transactions on Information Processing.
Lloyd, D. and N. Spruill (2001), “Security Screening and Knowledge Management in the department of defense,” in Federal Conference on Statistical Methodology.
McArthur, R. and P. Bruza (2003), Discovery of Implicit and Explicit Connections Between People Using Email Utterance,” in Proceedings of the Eighth European Conference of Computer-supported Cooperative Work, Helsinki, pp. 21–40.
McConnell, S. and D.B. Skillicorn (2002), “Semidiscrete Decomposition: A Bump Hunting Technique,” in Australasian Data Mining Workshop, pp. 75–82.
O'Brien, C. and C. Vogel (2004), “Exploring the Subject of Email Filtering: Feature Selection in Statistical Filtering.”
Shetty, J. and J. Adibi (2004), “The Enron Email Dataset Database Schema and Brief Statistical Report,” Technical report, Information Sciences Institute.
Simon, A.F. and M. Xenos (2004), “Dimensional Reduction of Word-Frequency Data as a Substitute for Intersubjective Content Analysis,” Political Analysis, 12, 63–75.
Article Google Scholar
Skillicorn, D.B. (2005), “Beyond Keyword Filtering for Message and Conversation Detection,” in IEEE International Conference on Intelligence and Security Informatics (ISI2005), Springer-Verlag Lecture Notes in Computer Science LNCS 3495, pp. 231–243.

Download references

Author information

Authors and Affiliations

School of Computing, Queen's University, Kingston, Canada, K7L 3N6
P. S. Keila & D. B. Skillicorn

Authors

P. S. Keila
View author publications
You can also search for this author in PubMed Google Scholar
D. B. Skillicorn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. S. Keila.

Additional information

P.S. Keila is a graduate student in the School of Computing at Queen's University. His research area is data mining in text.

D.B. Skillicorn is a professor in the School of Computing at Queen's University, where he heads the Smart Information Management Laboratory. His research area is data mining using matrix decompositions, particularly applied to complex datasets in areas such as biomedicine, geochemistry, counterterrorism and fraud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keila, P.S., Skillicorn, D.B. Structure in the Enron Email Dataset. Comput Math Organiz Theor 11, 183–199 (2005). https://doi.org/10.1007/s10588-005-5379-y

Download citation

Published: 14 January 2006
Issue Date: October 2005
DOI: https://doi.org/10.1007/s10588-005-5379-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structure in the Enron Email Dataset

Abstract

Access this article

Similar content being viewed by others

Click to subscribe: interest group emails as a source of data

Hierarchical and Matrix Structures in a Large Organizational Email Network: Visualization and Modeling Approaches

Assortative Mixture of English Parts of Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structure in the Enron Email Dataset

Abstract

Access this article

Similar content being viewed by others

Click to subscribe: interest group emails as a source of data

Hierarchical and Matrix Structures in a Large Organizational Email Network: Visualization and Modeling Approaches

Assortative Mixture of English Parts of Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation