Abstract
Electronic communication service providers are obliged to retain communication data for a certain amount of time by their local laws. The retained communication data or the communication logs are used in various applications such as crime detection, viral marketing, analytical study, and so on. Many of these applications rely on effective techniques for analyzing communication logs. In this paper, we focus on measuring the proximity between two communication entities, which is a fundamental and important step toward further analysis of communication logs, and propose a new proximity measure called ESP (Efficient and Spam-Robust Proximity measure). Our proposed measure considers only the (graph-theoretically) shortest paths between two entities and gives small values to those between spam-like entities and others. Thus, it is not only computationally efficient but also spam-robust. By conducting several experiments on real and synthetic datasets, we show that our proposed proximity measure is more accurate, computationally efficient and spam-robust than the existing measures in most cases.
Similar content being viewed by others
References
Kotzanikolaou P (2008) Data retention and privacy in electronic communications. IEEE Security and Privacy 6(5):46–52
Canter D, Alison L J. The Social Psychology of Crime: Groups, Teams and Networks. Aldershot, UK: Ashgate, 1999.
Aery M, Chakravarthy S. eMailSift: Email classification based on structure and content. In Proc. the 15th ICDM, November 2005, pp.18–25.
Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowledge-Based Systems 21(4):355–362
Layfield R, Thuraisingham B, Khan L, Kantarcioglu M (2009) Design and implementation of a secure social network system. International Journal of Computer Systems Science & Engineering 24(2):71–84
Song H H, Cho T W, Dave V, Zhang Y, Qiu L. Scalable proximity estimation and link prediction in online social networks. In Proc. the 9th IMC, November 2009, pp.322–335.
Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic multimedia crossmodal correlation discovery. In Proc. the 10th SIGKDD, August 2004, pp.653–658.
Sozio M, Gionis A. The community-search problem and how to plan a successful cocktail party. In Proc. the 16th SIGKDD, July 2010, pp.939–948.
Pirmez L, Carmo LFRC, Bacellar LF (2010) Enhancing Levenshtein distance algorithm for assessing behavioral trust. Int J Computer Systems Science & Engineering 25(1):5–14
Tong H, Faloutsos C. Center-piece subgraphs: Problem definition and fast solutions. In Proc. the 12th SIGKDD, August 2006, pp.404–413.
Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: Fast solutions and applications. Knowledge of Information Systems 14(3):327–346
Tong H, Qu H, Jamjoom H. Measuring proximity on graphs with side information. In Proc. ICDM, December 2008, pp.598–607.
Koren Y, North S C, Volinsky C. Measuring and extracting proximity graphs in networks. ACM Trans. Knowledge Discovery from Data, 2007, 1(3), Article No.12.
Faloutsos C, McCurley K S, Tomkins A. Fast discovery of connection subgraphs. In Proc. the 10th SIGKDD, August 2004, pp.118–127.
Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9:1981–2014
Kemp C, Tenenbaum J B, Griffiths T L, Yamada T, Ueda N. Learning systems of concepts with an infinite relational model. In Proc. the 21st AAAI, July 2006, pp.381–388.
Kubica J, Moore A, Schneider J, Yang Y. Stochastic link and group detection. In Proc. the 18th AAAI, July 28-August 1, 2002, pp.798–806.
Kurihara K, Kameya Y, Sato T. A frequency-based stochastic blockmodel. In Proc. Workshop on Information-Based Induction Sciences, October 2006.
Lantuejoul C, Maisonneuve F (1984) Geodesic methods in quantitative image analysis. Pattern Recognition 17(2):177–187
Grazzini J, Soille P, Bielskiy C. On the use of geodesic distances for spatial interpolation. In Proc. GeoComputation, September 2007.
Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Social Networks 28(4):466–484
Shetty J, Adibi J. The Enron email dataset database schema and brief statistical report. Technical Report, Information Sciences Institute, University of Southern California, 2004.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Jeon, J.H., Song, J., Kwon, J.E. et al. An Efficient and Spam-Robust Proximity Measure Between Communication Entities. J. Comput. Sci. Technol. 28, 394–400 (2013). https://doi.org/10.1007/s11390-013-1339-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1339-z