Skip to main content

A System for Email Recipient Prediction

  • 898 Accesses

Part of the Lecture Notes in Social Networks book series (LNSN)


The ability to accurately predict recipients of an email, while it is being composed, is of great practical importance for two reasons. First, prediction of recipients allows for effective “auto-complete” of this field, thereby improving user experience and reducing the overhead of manual typing of the recipient. Second, this capability allows the system to alert the user when she has typed unlikely recipients. Such alerts can help avoid human error that might result in forgetting relevant recipients, or, even worse, disclosure of personal or classified information.In this article, a system that effectively predicts email recipients, given an email history, will be presented. The system takes into consideration a variety of email related features to achieve high accuracy. Extensive experimentation on diverse email corpora has shown that our system adapts well to a variety of domains (such as business, personal and political email).


  • Recipient prediction problem
  • Greeting feature
  • Enron dataset
  • Cross-user approach
  • Feature selection problem

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-51367-6_2
  • Chapter length: 29 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-51367-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   149.99
Price excludes VAT (USA)
Hardcover Book
USD   149.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

  2. 2.

    Some emails have several recipients and more than one name in the greeting. In such cases, we cannot distinguish to which recipient each name refers, and therefore \(\mathcal{G}_{m}\), for an email sent to c, may actually contain a name not referring to c.

  3. 3.

    In one case “more recent incoming percentage” outperformed the personalized function. This seems to be due to the fact that the Gmail datasets were mostly small accounts, except for two very large accounts of users—including one of the authors of this article—who have a compulsive habit of immediately answering every email they receive. For larger and more diverse datasets, we expect the personalized function to be the best performing.


  1. Radicati S, Levenstein J. Email statistics report, 2015–2019. Technical report, The Radicati Group; 2015.

    Google Scholar 

  2. Shen J, Brdiczka O, Liu J. Understanding email writers: personality prediction from email messages. In: Carberry S, Weibelzahl S, Micarelli A, Semeraro G, editors. User modeling, adaptation, and personalization. Lecture notes in computer science, vol. 7899. Berlin, Heidelberg: Springer; 2013. p. 318–30.

    CrossRef  Google Scholar 

  3. Dabbish LA, Kraut RE, Fussell S, Kiesler S. Understanding email use: predicting action on a message. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’05. New York: ACM; 2005. p. 691–700.

    CrossRef  Google Scholar 

  4. Aberdeen D, Pacovsky O, Slater A. The learning behind Gmail Priority Inbox. In: NIPS 2010 workshop on learning on cores, clusters and clouds; 2010.

    Google Scholar 

  5. Ayodele T, Zhou S, Khusainov R. Email reply prediction: a machine learning approach. In: Salvendy G, Smith MJ, editors. Human interface and the management of information. information and interaction. Lecture notes in computer science, vol. 5618. Berlin, Heidelberg: Springer; 2009. p. 114–23.

    CrossRef  Google Scholar 

  6. Karagiannis T, Vojnovic M. Behavioral profiles for advanced email features. In: Quemada J, León G, Maarek YS, Nejdl W, editors. WWW. New York: ACM; 2009. p. 711–20.

    Google Scholar 

  7. Martin S, Sewani A, Nelson B, Chen K, Joseph AD. Analyzing behaviorial features for email classification. Berkeley, CA: University of California; 2005.

    Google Scholar 

  8. Dredze M. “Sorry, i forgot the attachment:” email attachment prediction. In: Proceedings of the third conference on E mail and anti spam (CEAS); 2006.

    Google Scholar 

  9. Ghiglieri M, Fürnkranz J. Learning to recognize missing e-mail attachments. Technical report TUD-KE-2009-05. Knowledge Engineering Group. Darmstadt: TU Darmstadt; 2009.

    Google Scholar 

  10. Shetty J, Adibi J. Enron email dataset. Technical report. Marina Del Rey, CA: USC Information Sciences Institute; 2004.

    Google Scholar 

  11. Carvalho VR, Cohen WW. Ranking users for intelligent message addressing. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08. Berlin, Heidelberg: Springer; 2008. p. 321–33.

    CrossRef  Google Scholar 

  12. Carvalho VR, Cohen WW. Preventing information leaks in email. In: Proceedings of SIAM international conference on data mining (SDM-07), Minneapolis, MN; 2007.

    Google Scholar 

  13. Carvalho VR. Modeling intention in email - speech acts, information leaks and recommendation models. Studies in computational intelligence, vol. 349. Berlin: Springer; 2011.

    Google Scholar 

  14. Carvalho VR. Modeling intention in email. Ph.D. thesis. School of Computer Science Carnegie Mellon University, Pittsburgh, PA, 2008.

    Google Scholar 

  15. Roth M, Ben-David A, Deutscher D, Flysher G, Horn I, Leichtberg A, Leiser N, Matias Y, Merom R. Suggesting friends using the implicit social graph. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining; 2010.

    Google Scholar 

  16. Bartel JW, Dewan P. Towards hierarchical email recipient prediction. In: CollaborateCom; 2012. p. 50–9.

    Google Scholar 

  17. Sofershtein Z, Cohen S. Predicting email recipients. In: Proceedings of 2015 IEEE/ACM international conference on advances in social networks analysis and mining; 2015.

    Google Scholar 

  18. Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. New York: ACM; 2006. p. 217–26.

    CrossRef  Google Scholar 

  19. Fiat A, Woeginger G, editors. Online algorithms: the state of the art. Lecture notes in computer science. New York: Springer; 1998.

    CrossRef  Google Scholar 

Download references


Zvi Sofershtein and Sara Cohen were partially supported by the Israel Science Foundation (Grant 1467/13) and the Ministry of Science and Technology (Grant 3-9617).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sara Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Sofershtein, Z., Cohen, S. (2017). A System for Email Recipient Prediction. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51366-9

  • Online ISBN: 978-3-319-51367-6

  • eBook Packages: Computer ScienceComputer Science (R0)