Skip to main content
Log in

An Efficient and Spam-Robust Proximity Measure Between Communication Entities

  • Short Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph-theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kotzanikolaou P (2008) Data retention and privacy in electronic communications. IEEE Security and Privacy 6(5):46–52

    Article  Google Scholar 

  2. Canter D, Alison L J. The Social Psychology of Crime: Groups, Teams and Networks. Aldershot, UK: Ashgate, 1999.

  3. Aery M, Chakravarthy S. eMailSift: Email classification based on structure and content. In Proc. the 15th ICDM, November 2005, pp.18–25.

  4. Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowledge-Based Systems 21(4):355–362

    Article  Google Scholar 

  5. Layfield R, Thuraisingham B, Khan L, Kantarcioglu M (2009) Design and implementation of a secure social network system. International Journal of Computer Systems Science & Engineering 24(2):71–84

    Google Scholar 

  6. Song H H, Cho T W, Dave V, Zhang Y, Qiu L. Scalable proximity estimation and link prediction in online social networks. In Proc. the 9th IMC, November 2009, pp.322–335.

  7. Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic multimedia crossmodal correlation discovery. In Proc. the 10th SIGKDD, August 2004, pp.653–658.

  8. Sozio M, Gionis A. The community-search problem and how to plan a successful cocktail party. In Proc. the 16th SIGKDD, July 2010, pp.939–948.

  9. Pirmez L, Carmo LFRC, Bacellar LF (2010) Enhancing Levenshtein distance algorithm for assessing behavioral trust. Int J Computer Systems Science & Engineering 25(1):5–14

    Google Scholar 

  10. Tong H, Faloutsos C. Center-piece subgraphs: Problem definition and fast solutions. In Proc. the 12th SIGKDD, August 2006, pp.404–413.

  11. Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: Fast solutions and applications. Knowledge of Information Systems 14(3):327–346

    Article  MATH  Google Scholar 

  12. Tong H, Qu H, Jamjoom H. Measuring proximity on graphs with side information. In Proc. ICDM, December 2008, pp.598–607.

  13. Koren Y, North S C, Volinsky C. Measuring and extracting proximity graphs in networks. ACM Trans. Knowledge Discovery from Data, 2007, 1(3), Article No.12.

  14. Faloutsos C, McCurley K S, Tomkins A. Fast discovery of connection subgraphs. In Proc. the 10th SIGKDD, August 2004, pp.118–127.

  15. Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9:1981–2014

    MATH  Google Scholar 

  16. Kemp C, Tenenbaum J B, Griffiths T L, Yamada T, Ueda N. Learning systems of concepts with an infinite relational model. In Proc. the 21st AAAI, July 2006, pp.381–388.

  17. Kubica J, Moore A, Schneider J, Yang Y. Stochastic link and group detection. In Proc. the 18th AAAI, July 28-August 1, 2002, pp.798–806.

  18. Kurihara K, Kameya Y, Sato T. A frequency-based stochastic blockmodel. In Proc. Workshop on Information-Based Induction Sciences, October 2006.

  19. Lantuejoul C, Maisonneuve F (1984) Geodesic methods in quantitative image analysis. Pattern Recognition 17(2):177–187

    Article  MathSciNet  MATH  Google Scholar 

  20. Grazzini J, Soille P, Bielskiy C. On the use of geodesic distances for spatial interpolation. In Proc. GeoComputation, September 2007.

  21. Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Social Networks 28(4):466–484

    Article  Google Scholar 

  22. Shetty J, Adibi J. The Enron email dataset database schema and brief statistical report. Technical Report, Information Sciences Institute, University of Southern California, 2004.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joo Hyuk Jeon.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 28.0 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeon, J.H., Song, J., Kwon, J.E. et al. An Efficient and Spam-Robust Proximity Measure Between Communication Entities. J. Comput. Sci. Technol. 28, 394–400 (2013). https://doi.org/10.1007/s11390-013-1339-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1339-z

Keywords

Navigation