Skip to main content

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

Private information management and compliance are important issues nowadays for most of organizations. As a major communication tool for organizations, email is one of the many potential sources for privacy leaks. Information extraction methods have been applied to detect private information in text files. However, since email messages usually consist of low quality text, information extraction methods for private information detection may not achieve good performance. In this paper, we address the problem of predicting the presence of private information in email using data mining and text mining methods. Two prediction models are proposed. The first model is based on association rules that predict one type of private information based on other types of private information identified in emails. The second model is based on classification models that predict private information according to the content of the emails. Experiments on the Enron email dataset show promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, Texas, pp. 439–450 (2000)

    Google Scholar 

  3. Armour, Q., Elazmeh, W., El-Kadri, N., Japkowicz, N., Matwin, S.: Privacy Compliance Enforcement in Email. In: Canadian Conference on AI, pp. 194–204 (2005)

    Google Scholar 

  4. Boufaden, N., Elazmeh, W., Ma, Y., Matwin, S., El-Kadri, N., Japkowicz, N.: PEEP - An Information Extraction base approach for Privacy Protection in Email. In: CEAS (2005)

    Google Scholar 

  5. Carvalho, V.R., Cohen, W.W.: Preventing Information Leaks in Email. In: SDM (2007)

    Google Scholar 

  6. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy Preserving Mining of Association Rules. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2002)

    Google Scholar 

  7. Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic Document Metadata Extraction Using Support Vector Machines. In: Proceedings of the 2003 Joint Conference o Digital Libraries (JDCL 2003), pp. 37–48 (2003)

    Google Scholar 

  8. Korba, L., Song, R., Yee, G., Patrick, A., Buffett, S., Wang, Y., Geng, L.: Private Data Management in Collaborative Environments. In: Luo, Y. (ed.) CDVE 2007. LNCS, vol. 4674, pp. 88–96. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Micheline Kamber Publishers (2006)

    Google Scholar 

  10. Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  12. http://www.isi.edu/~adibi/Enron/Enron.htm

  13. http://en.wikipedia.org/wiki/Luhn_algorithm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geng, L., Korba, L., Wang, X., Wang, Y., Liu, H., You, Y. (2008). Using Data Mining Methods to Predict Personally Identifiable Information in Emails. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics