Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Finding email correspondents in online social networks

  • 636 Accesses

  • 23 Citations

Abstract

Email correspondents play an important role in many people’s social networks. Finding email correspondents in social networks accurately, though may seem to be straightforward at a first glance, is challenging. Most of the existing online social networking sites recommend possible matches by comparing the information of email accounts and social network profiles, such as display names and email addresses. However, as shown empirically in this paper, such methods may not be effective in practice. To the best of our knowledge, this problem has not been carefully and thoroughly addressed in research. In this paper, we systematically investigate the problem and develop a practical data mining approach. We find that using only the profiles or the graph structures is far from effective. Our method utilizes the similarity between email accounts and social network user profiles, and at the same time explores the similarity between the email communication network and the social network under investigation. We demonstrate the effectiveness of our method using two real data sets on emails and Facebook.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Almohamad, H.A., Duffuaa, S.O.: A linear programming approach for the weighted graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 522–525 (1993)

  2. 2.

    Balamurugan, S.A.A., Athiappan, G., Pandian, M.M., Rajaram, R.: Classification methods in the detection of new suspicious emails. J. Inf. Knowl. Manag. (JIKM) 7(3), 209–217 (2008)

  3. 3.

    Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Dooren, P.V.: A measure of similarity between graph vertices: applications to synonym extraction and web searching. SIAM Rev. 46(4), 647–666 (2004)

  4. 4.

    Bunke, H.: Graph matching: theoretical foundations, algorithms, and applications. In: Proc. Vision Interface, pp. 82–88 (2000)

  5. 5.

    Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V., Smola, A.J.: Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1048–1058 (2009)

  6. 6.

    Carvalho, V.R., Cohen, W.W.: Preventing information leaks in email. In: SIAM Data Mining Conference (SDM) (2007)

  7. 7.

    Cason, T.P., Absil, P.-A., Dooren, P.V.: Review of similarity matrices and application to subgraph matching. In: Book of Abstracts of the 29th Benelux Meeting on Systems and Control, p. 109 (2010)

  8. 8.

    Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)

  9. 9.

    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

  10. 10.

    Dice, L.R.: Measure of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

  11. 11.

    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)

  12. 12.

    Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)

  13. 13.

    Heymans, M., Singh, A.: Deriving phylogenetic trees from the similairty analysis of metabolic pathways. Bioinformatics 19(suppl 1), 138–146 (2003)

  14. 14.

    Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)

  15. 15.

    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)

  16. 16.

    Kondrak, G., Marcu, D., Knight, K.: Cognates can improve statistical translation models. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies (HLT-NAACL) (2003)

  17. 17.

    Lawlor, L.R.: Overlap, similarity, and competition coefficients. Ecology 61(2), 245–251 (1980)

  18. 18.

    Michelson, M., Knoblock, C.A.: Semantic annotation of unstructured and ungrammatical text. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1091–1098 (2005)

  19. 19.

    Musial, K., Kazienko, P.: Social networks on the internet. World Wide Web: Internet and Web Information Systems (2012). doi:10.1007/s11280-011-0155-z

  20. 20.

    Pal, C.: Cc prediction with graphical models. In: 3rd Conference on Email and Anti-Spam (CEAS) (2006)

  21. 21.

    Paul, J.: \(\acute{E}\)tude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaud. Sci. Nat. 37, 547–579 (1901)

  22. 22.

    Roth, M., Ben-David, A., Deutscher, D., Flysher, G., Horn, I., Leichtberg, A., Leiser, N., Matias, Y., Merom, R.: Suggesting friends using the implicit social graph. In: 16th ACM SIGKDD Conference on Knowlegde Discovery and Data Mining (KDD), pp. 233–242 (2010)

  23. 23.

    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approcah to filtering junk e-mail. In: Association for the Advancement of Artificial Intelligence (AAAI) Workshop on Learning for Text Categorization (1998)

  24. 24.

    Saux, B.L., Bunke, H.: Feature selection for graph-based image classifiers. In: Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA) (2), pp. 147–154 (2005)

  25. 25.

    Ullmann, J.R.: An algorithm for subgraph isomorphism. Journal of the ACM (JACM) 23(1), 31–42 (1976)

  26. 26.

    van Wyk, M.A., Durrani, T.S., van Wyk, B.J.: A rkhs interpolator-based graph matching algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 988–995 (2002)

  27. 27.

    Wang, J., Li, G., Feng, J.: Fast-join: an efficient method for fuzzy token matching based string similarity join. In: Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE) (2011)

  28. 28.

    Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research, pp. 354–359 (1990)

  29. 29.

    Yoo, S., Yang, Y., Lin, F., Moon, I.-C.: Mining social networks for personalized email prioritization. In: 15th ACM SIGKDD Conference on Knowlegde Discovery and Data Mining (KDD), pp. 967–976 (2009)

  30. 30.

    Zager, L.: Graph Similarity and Matching. Master’s thesis, Massachusetts Institute of Technology, USA (2005)

  31. 31.

    Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008)

Download references

Author information

Correspondence to Jian Pei.

Additional information

We are grateful to the anonymous reviewers for their constructive suggestions, which help to improve the quality of this paper. This research is supported in part by an NSERC Discovery Grant, a BCFRST NRAS Endowment Research Team Program project, two SAP Business Objects ARC Fellowships, and two NSERC CRD Research Grants, and a GRAND NCE project. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cui, Y., Pei, J., Tang, G. et al. Finding email correspondents in online social networks. World Wide Web 16, 195–218 (2013). https://doi.org/10.1007/s11280-012-0168-2

Download citation

Keywords

  • email mining
  • social network mining
  • recommendation systems
  • string matching
  • graph matching