Skip to main content
Log in

Identity matching using personal and social identity features

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript


Identity verification is essential in our mission to identify potential terrorists and criminals. It is not a trivial task because terrorists reportedly assume multiple identities using either fraudulent or legitimate means. A national identification card and biometrics technologies have been proposed as solutions to the identity problem. However, several studies show their inability to tackle the complex problem. We aim to develop data mining alternatives that can match identities referring to the same individual. Existing identity matching techniques based on data mining primarily rely on personal identity features. In this research, we propose a new identity matching technique that considers both personal identity features and social identity features. We define two groups of social identity features including social activities and social relations. The proposed technique is built upon a probabilistic relational model that utilizes a relational database structure to extract social identity features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  • Ananthakrishna, R., Chaudhuri, S., & Ganti, V. (2002). Eliminating Fuzzy Duplicates in Data Warehouses, Proceedings of the 28th VLDB Conference. Hong Kong, China.

  • Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S., & Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.

    Article  Google Scholar 

  • Bhattacharya, I., & Getoor, L. (2006a). Entity Resolution in Graphs. In D. J. Cook & L. B. Holder (Eds.), Mining graph data (p. 311). Hoboken: Wiley.

    Chapter  Google Scholar 

  • Bhattacharya, I., & Getoor, L. (2006b). A Latent Dirichlet Model for Unsupervised Entity Resolution, Proceedings of the 6th SIAM Conference on Data Mining (SIAM SDM-06). Bethesda.

  • Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.

    Article  Google Scholar 

  • Brown, D. E., & Hagen, S. C. (2003). Data association methods with applications to law enforcement. Decision Support Systems, 34(4), 369–378.

    Article  Google Scholar 

  • Cheek, J. M., & Briggs, S. R. (1982). Self-consciousness and aspects of identity. Journal of Research in Personality, 16(4), 401–408.

    Article  Google Scholar 

  • Conyers, R., & Sensenbrenner, F. (2005). Real Id Act of 2005. Congressional Record House, 151, 14.

    Google Scholar 

  • Culotta, A., & McCallum, A. (2005). Joint deduplication of multiple record types in relational data, Proceedings of the 14th ACM international conference on Information and knowledge management. Bremen: ACM.

  • Deaux, K., & Martin, D. (2003). Interpersonal networks and social categories: specifying levels of context in identity processes. Social Psychology Quarterly, 66(2), 101–117.

    Article  Google Scholar 

  • Dey, D., Sarkar, S., & De, P. (1998). A probabilistic decision model for entity matching in heterogeneous databases. Management Science, 44(10), 1379–1395.

    Article  Google Scholar 

  • Dey, D., Sarkar, S., & De, P. (2002). A distance-based approach to entity reconciliation in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering, 14(3), 567–582.

    Article  Google Scholar 

  • Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.

    Article  Google Scholar 

  • Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.

    Article  Google Scholar 

  • Finch, E. (2003). What a tangled web we weave: Identity theft and the internet. In Dot Cons: Crime, deviance, and identity on the internet (pp. 86-104). Collompton, England: Willan.

  • Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models, Proceedings of the 16th International Joint Conference on Artificial Intelligence (Vol. 16, pp. 1300–1309). Stockholm, Sweden: Citeseer.

  • Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2003). Learning probabilistic models of link structure. The Journal of Machine Learning Research, 3, 679–707.

    Article  Google Scholar 

  • Hernandez, M. A., & Stolfo, S. J. (1995). The Merge/Purge Problem for Large Databases. Paper presented at the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, CA.

  • Jonas, J. (2006). Identity Resolution: 23 Years of practical experience and observations at scale, Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. Chicago: ACM.

    Google Scholar 

  • Kalashnikov, D., Mehrotra, S., & Chen, Z. (2005). Exploiting relationships for domain-independent data cleaning, Proceedings of SIAM International Conference on Data Mining. Newport Beach, CA.

  • Kean, T., Kojm, C., Zelikow, P., Thompson, J., Gorton, S., Roemer, T., et al. (2004). The 9/11 Commission Report.

  • Kent, S. T., & Millett, L. I. (2002). Ids–not that easy: Questions about nationwide identity systems. Washington, D.C: National Academy.

    Google Scholar 

  • Langley, P., & Sage, S. (1994). Induction of Selective Bayesian Classifiers, the 10th Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA.

  • Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., et al. (2004). Cross-jurisdictional criminal activity networks to support border and transportation security. Paper presented at the the 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C.

  • Matsumoto, T., Matsumoto, H., Yamada, K., & Hoshino, S. (2002). Impact of artificial gummy fingers on fingerprint systems, SPIE, Optical Security and Counterfeit Deterrence Techniques IV (Vol. 4677).

  • Monge, A. E. (2000). Matching algorithms within a duplicate detection system. IEEE Data Engineering Bulletin, 23(4), 14–20.

    Google Scholar 

  • Mumford, E. (1999). Dangerous decisions—Problem solving in tomorrow’s world. New York: Kluwer Academic/Plenum.

    Google Scholar 

  • Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39, 103–134.

    Article  Google Scholar 

  • O’Neil, P. (2005). Complexity and counterterrorism: thinking about biometrics. Studies in Conflict & Terrorism, 28(6), 547–566.

    Article  Google Scholar 

  • Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1425–1432.

  • Pistole, J. (2003). Fraudulent identification documents and the implications for homeland security. Statement for the Record Before the House Select Committee on Homeland Security. Retrieved October, 1, 2003.

  • Privacy International. (2004). Mistaken identity; Exploring the relationship between national identity cards & the prevention of terrorism. London: Privacy International.

    Google Scholar 

  • Ravikumar, P., & Cohen, W. W. (2004). A hierarchical graphical model for record linkage, Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04). Banff, Canada: Banff Park Lodge.

    Google Scholar 

  • Stryker, S., & Serpe, R. T. (1982). Commitment, identity salience, and role behavior: Theory and research example. New York: Springer-Verlag.

    Google Scholar 

  • Tajfel, H., & Turner, J. C. (1986). The social identity theory of inter-group behavior. Chicago: Nelson-Hall.

    Google Scholar 

  • Turner, J. C. (1999). Some current issues in research on social identity and self-categorization theories. Oxford: Blackwell.

    Google Scholar 

  • U.S. Department of State. (2007). Country Reports on Terrorism 2006.

  • United Kingdom Home Office. (2002). Identity fraud: A study.

  • Wang, G., Chen, H., & Atabakhsh, H. (2004). Automatically detecting deceptive criminal identities. Communications of the ACM, 47(3), 71–76.

    Article  Google Scholar 

  • Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer naïve bayes model for approximate identity matching. Lecture Notes in Computer Science, 3975, 479–484.

    Article  Google Scholar 

  • Wang, G. A., Chen, H. C., Xu, J. J., & Atabakhsh, H. (2006). Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 36(5), 988–999.

    Article  Google Scholar 

  • Went, P. C. (2007). The necessity of fuzzy logic for identity matching. In H. Chen, R. Santana, R. Ramesh, A. Vinze & D. Zeng (Eds.), National Security (Vol. 2, pp. 442): Elsevier Science.

  • Winkler, W. E. (2002). Methods for record linkage and Bayesian Networks. Paper presented at the Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, Virginia.

  • Zhang, H. (2005). Exploring conditions for the optimality of naïve Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(2), 183–198.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jiexun Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, G.A. & Chen, H. Identity matching using personal and social identity features. Inf Syst Front 13, 101–113 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI: