Computational & Mathematical Organization Theory

, Volume 11, Issue 3, pp 249–264

Email Surveillance Using Non-negative Matrix Factorization

Article

DOI: 10.1007/s10588-005-5380-5

Cite this article as:
Berry, M.W. & Browne, M. Comput Math Organiz Theor (2005) 11: 249. doi:10.1007/s10588-005-5380-5

Abstract

In this study, we apply a non-negative matrix factorization approach for the extraction and detection of concepts or topics from electronic mail messages. For the publicly released Enron electronic mail collection, we encode sparse term-by-message matrices and use a low rank non-negative matrix factorization algorithm to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in topic detection and message clustering are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting basis vectors and matrix projections of this approach can be used to identify and monitor underlying semantic features (topics) and message clusters in a general or high-level way without the need to read individual electronic mail messages.

Keywords

electronic mailEnron collectionnon-negative matrix factorizationsurveillancetopic detectionconstrained least squares

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TennesseeKnoxville