Discussion Tracking in Enron Email Using PARAFAC
- 2.2k Downloads
In this chapter, we apply a nonnegative tensor factorization algorithm to extract and detect meaningful discussions from electronic mail messages for a period of one year. For the publicly released Enron electronic mail collection, we encode a sparse term-author-month array for subsequent three-way factorization using the PARAllel FACtors (or PARAFAC) three-way decomposition first proposed by Harshman. Using nonnegative tensors, we preserve natural data nonnegativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in thread detection and interpretation are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting tensor factorizations can be used to produce Gantt-like charts that can be used to assess the duration, order, and dependencies of focused discussions against the progression of time.
KeywordsNonnegative Matrix Factorization Alternate Little Square Tensor Decomposition Federal Energy Regulatory Commission Anchor Text
Unable to display preview. Download preview PDF.
- E. Acar, S.A. C¸amtepe, M.S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In ISI 2005: IEEE International Conference on Intelligence and Security Informatics, volume 3495 of Lecture Notes in Computer Science, pages 256-268. Springer, New York, 2005.Google Scholar
- M.W. Berry and M. Browne. Email surveillance using non-negative matrix factorization. In Workshop on Link Analysis, Counterterrorism and Security, SIAM Conf. on Data Mining, Newport Beach, CA, 2005.Google Scholar
- B.W. Bader and T.G. Kolda. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, New Mexico and Livermore, California, December 2006. Available from World Wide Web: http://csmr.ca.sandia.gov/ ∼tgkolda/ pubs.html#SAND2006- 7592.
- B.W. Bader and T.G. Kolda. MATLAB Tensor Toolbox, version 2.1. http:// csmr.ca.sandia.gov/∼tgkolda/TensorToolbox/, December 2006.
- W.W. Cohen. Enron email dataset. Web page. http://www.cs.cmu.edu/∼enron/.
- Federal Energy Regulatory Commission. FERC: Information released in Enron investigation. http://www.ferc.gov/industries/electric/indus-act/wec/enron/info-release.asp.
- T. Grieve. The decline and fall of the Enron empire. Slate, October 14 2003. Available from World Wide Web: http://www.salon.com/news/feature/2003/10/14/enron/index\ np.html.
- J.T. Giles, L. Wo, and M.W. Berry. GTP (General Text Parser) Software for Text Mining. In H. Bozdogan, editor, Software for Text Mining, in Statistical Data Mining and Knowledge Discovery, pages 455-471. CRC Press, Boca Raton, FL, 2003.Google Scholar
- R.A. Harshman. Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 16:1-84, 1970. Available at http://publish.uwo.ca/∼harshman/wpppfac0.pdf.
- T.G. Kolda and B.W. Bader. The TOPHITS model for higher-order web link analysis. In Workshop on Link Analysis, Counterterrorism and Security, 2006. Available from World Wide Web: http://www.cs.rit.edu/∼amt/linkanalysis06/accepted/21.pdf.
- T.G. Kolda, B.W. Bader, and J.P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM 2005: Proceedings of the 5th IEEE International Conference on Data Mining, pages 242-249. IEEE Computer Society, Los Alamitos, CA, 2005.Google Scholar
- B. Mclean and P. Elkind. The Smartest Guys in the Room: The Amazing Rise and Scandalous Fall of Enron. Portfolio, New York, 2003.Google Scholar
- M. Mørup, L. K. Hansen, and S. M. Arnfred. Sparse higher order non-negative matrix factorization. Neural Computation, 2006. Submitted.Google Scholar
- M. Mørup. Decomposing event related eeg using parallel factor (parafac). Presentation, August 29 2005. Workshop on Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France.Google Scholar
- C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Enron dataset. Web page, February 2006. http://cis.jhu.edu/∼parky/Enron/enron.html.
- J. Shetty and J. Adibi. Ex employee status report. Online, 2005. http://www.isi.edu/∼adibi/Enron/Enron Employee Status.xls.
- A. Smilde, R. Bro, and P. Geladi. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, England, 2004. Available from World Wide Web: http://www.wiley.com/WileyCDA/WileyTitle/productCd- 0471986917.html.