Despite a technology bias that focuses on external electronic threats, insiders pose the greatest threat to commercial and government organizations. Once information on a specific topic has gone missing, being able to quickly determine who has shown an interest in that topic can allow investigators to focus their attention. Even more promising is when individuals can be found who have an interest in the topic but who have never communicated that interest within the organization. An employee’s interests can be discerned by data mining corporate email correspondence. These interests can be used to construct social networks that graphically expose investigative leads. This paper describes the use of Probabilistic Latent Semantic Indexing (PLSI)  extended to include users (PLSI-U) to determine topics that are of interest to employees from their email activity. It then applies PLSI-U to the Enron email corpus and finds a small number of employees (0.02%) who appear to have had clandestine interests.
- Probabilistic Latent Semantic Indexing (PLSI)
- insider threat
- data mining
- social networks
The views expressed in this article are those of the authors and do not reflect the official policy or position of the U.S. Air Force, U.S. Department of Defense or the U.S. Government.
D. Cohn and H. Chang, Learning to probabilistically identify authoritative documents, Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, California, pp. 167–174, 2000.
M. Girolami and A. Kaban, On an equivalence between PLSI and LDA (citeseer.ist.psu.edu/girolami03equivalence.html).
K. Herbig and M. Wiskoff, Espionage Against the United States by American Citizens 1947–2001, Technical Report, Defense Personnel Security Research Center, Monterey, California, 2002.
T. Hoffman, Probabilistic latent semantic indexing, Proceedings of the Twenty-Second Annual ACM Conference on Research and Development in Information Retrieval, 1999.
P. Keila and D. Skillicorn, Detecting Unusual and Deceptive Communication in Email, Technical Report, Queen’s University, Kingston, Ontario, Canada, 2005.
S. Martin, A. Sewani, B. Nelson, K. Chen and A. Joseph, Analying behaviorial features for email classification, Proceedings of the Second Conference on Email and Anti-Spam, 2005.
A. McCallum, A. Corrada-Emmanuel and X. Wang, Topic and role discovery in social networks, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, 2004.
B. McLean and P. Elkind, The Smartest Guys in the Room, Penguin, New York, 2003.
Merriam-Webster Collegiate Dictionary, Espionage (www.m-w.com /cgi-bin/dictionary).
RAND, Research and development initiatives focused on preventing, detecting and responding to insider misuse of critical defense information systems (www.rand.org/publications/CF/CF151 /CF151.pdf).
M. Rosen-Zvi, T. Griffiths, M. Steyvers and P. Smyth, The Author-Topic Model for authors and documents, Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence, pp. 487–494, 2004.
S. Symonenko, E. Libby, O. Yilmazel, R. Del Zoppo, E. Brown and M. Downey, Semantic analysis for monitoring insider threats, Proceedings of the Second Symposium on Intelligence and Security Informatics, 2004.
Editors and Affiliations
© 2006 IFIP Internatonal Federation for Information Processing
About this paper
Cite this paper
Okolica, J., Peterson, G., Mills, R. (2006). Using PLSI-U To Detect Insider Threats from Email Traffic. In: Olivier, M.S., Shenoi, S. (eds) Advances in Digital Forensics II. DigitalForensics 2006. IFIP Advances in Information and Communication, vol 222. Springer, Boston, MA. https://doi.org/10.1007/0-387-36891-4_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36890-0
Online ISBN: 978-0-387-36891-7