Artificial Intelligence and Law

, Volume 18, Issue 1, pp 77–102 | Cite as

Disclosing false identity through hybrid link analysis



Combating the identity problem is crucial and urgent as false identity has become a common denominator of many serious crimes, including mafia trafficking and terrorism. Without correct identification, it is very difficult for law enforcement authority to intervene, or even trace terrorists’ activities. Amongst several identity attributes, personal names are commonly, and effortlessly, falsified or aliased by most criminals. Typical approaches to detecting the use of false identity rely on the similarity measure of textual and other content-based characteristics, which are usually not applicable in the case of highly deceptive, erroneous and unknown descriptions. This barrier can be overcome through analysis of link information displayed by the individual in communication behaviours, financial interactions and social networks. In particular, this paper presents a novel link-based approach that improves existing techniques by integrating multiple link properties in the process of similarity evaluation. It is utilised in a hybrid model that proficiently combines both text-based and link-based measures of examined names to refine the justification of their similarity. This approach is experimentally evaluated against other link-based and text-based techniques, over a terrorist-related dataset, with further generalization to a similar problem occurring in publication databases. The empirical study demonstrates the great potential of this work towards developing an effective identity verification system.


False identity detection Hybrid algorithm Link analysis Terrorist data 



This work is sponsored by UK EPSRC grant EP/D057086. The authors are grateful to the members of the project team for their contribution, whilst taking full responsibility for the views expressed in this paper. The authors would also like to thank the anonymous referees for their constructive comments which have helped considerably in revising this work.


  1. Aleman-Meza B, Nagarajan M, Ding L, Sheth AP, Arpinar IB, Joshi A, Finin T (2008) Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection. ACM Trans Web 2(1):1–29CrossRefGoogle Scholar
  2. Ali AH, Dubois D, Prade H (2003) Qualitative reasoning based on fuzzy relative orders of magnitude. IEEE Trans Fuzzy Syst 11(1):9–23CrossRefGoogle Scholar
  3. Angheluta R, Moens MF (2007) Cross-document entity tracking. In: Proceedings of European conference on IR research, pp 670–673Google Scholar
  4. Ashley KD, Bruninghaus S (2009) Automatically classifying case texts and predicting outcomes. Artif Intell Law 17:125–165CrossRefGoogle Scholar
  5. Badia A, Kantardzic MM (2005) Link analysis tools for intelligence and counterterrorism. In: Proceedings of IEEE international conference on intelligence and security informatics, Atlanta, pp 49–59 (2005)Google Scholar
  6. Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans KDD 1(1):5-exGoogle Scholar
  7. Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48Google Scholar
  8. Bilenko M, Mooney R, Cohen W, Ravikumar P, Fienberg S (2003) Adaptive name matching in information integration. IEEE Intell Syst 18(5):16–23CrossRefGoogle Scholar
  9. Boongoen T, Shen Q (2008a) Clus-DOWA: a new dependent OWA operator. In: Proceedings of IEEE international conference on fuzzy sets and systems, pp 1057–1063Google Scholar
  10. Boongoen T, Shen Q (2008b) Detecting false identity through behavioural patterns. In: Proceedings of international crime science conference, LondonGoogle Scholar
  11. Boongoen T, Shen Q (2009a) Order-of-magnitude based link analysis for false identity detection. In: Proceedings of the 23rd international workshop on qualitative reasoning, pp 7–15Google Scholar
  12. Boongoen T, Shen Q (2009b) Semi-supervised OWA aggregation for link-based similarity evaluation and alias detection. In: Proceedings of IEEE international conference on fuzzy sets and systems, pp 288–293Google Scholar
  13. Branting K (2003) A comparative evaluation of name matching algorithms. In: Proceedings of international conference on AI and law, pp 224–232Google Scholar
  14. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRefGoogle Scholar
  15. Calado P, Cristo M, Gonçalves MA, de Moura ES, Ribeiro-Neto BA, Ziviani N (2006) Link based similarity measures for the classification of web documents. J Am Soc Inform Sci Technol 57(2):208–221CrossRefGoogle Scholar
  16. Clarke R (1994) Human identification in information systems: management challenges and public policy issues. IT People 7(4):6–37CrossRefGoogle Scholar
  17. Fellegi I, Sunter A (1969) Theory of record linkage. J Am Stat Assoc 64:1183–1210CrossRefGoogle Scholar
  18. Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 19(3):355–369CrossRefGoogle Scholar
  19. Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newslett 7(2):3–12CrossRefGoogle Scholar
  20. Hou J, Zhang Y (2003) Effectively finding relevant web pages from linkage information. IEEE Trans Knowl Data Eng 15(4):940–951CrossRefGoogle Scholar
  21. Hsiung P, Moore A, Neill D, Schneider J (2005) Alias detection in link data sets. In: Proceedings of international conference on intelligence analysisGoogle Scholar
  22. Jaro MA (1995) Probabilistic linkage of large public health data files. Stat Med 14(5–7):491–498CrossRefGoogle Scholar
  23. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543Google Scholar
  24. Klink S, Reuther P, Weber A, Walter B, Ley M (2006) Analysing social networks within bibliographical data. In: Proceedings of international conference on database and expert systems applications, Poland, pp 234–243Google Scholar
  25. Kukich K (1992) Techniques for automatically correcting words in text. ACM Comput Surv 24(4):377–439CrossRefGoogle Scholar
  26. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031CrossRefGoogle Scholar
  27. Lin Z, King I, Lyu MR (2006) Pagesim: a novel link-based similarity measure for the world wide web. In: Proceedings of IEEE/WIC/ACM international conference on web intelligence, pp 687–693Google Scholar
  28. Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of international conference on research and development in IR, pp 27–34Google Scholar
  29. Murata T, Moriyasu S (2008) Link prediction based on structural properties of online social networks. New Gener Comput 26:245–257CrossRefGoogle Scholar
  30. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88CrossRefGoogle Scholar
  31. Oatley GC, Zeleznikow J, Ewart BW (2005) Criminal networks and spatial density. In: Proceedings of international conference on artificial intelligence and law, pp 246–247Google Scholar
  32. Oskamp A, Lauritsen M (2002) AI in law practice? So far, not much. Artif Intell Law 10:227–236CrossRefGoogle Scholar
  33. Pantel P (2006) Alias detection in malicious environments. In: Proceedings of AAAI fall symposium on capturing and using patterns for evidence detection, Washington, D.C., pp 14–20Google Scholar
  34. Pasula H, Marthi B, Milch B, Russell S, Shpitser I (2003) Identity uncertainty and citation matching. Adv Neural Inform Process Syst 15:1425–1432Google Scholar
  35. Philipps L, Sartor G (1999) Introduction: from legal theories to neural networks and fuzzy reasoning. Artif Intell Law 7:115–128CrossRefGoogle Scholar
  36. Porter G (2008) Crying (iranian) wolf in argentina. Asia Times OnlineGoogle Scholar
  37. Raiman O (1991) Order of magnitude reasoning. Artif Intell 51(1–3):11–38CrossRefGoogle Scholar
  38. Reuther P, Walter B (2006) Survey on test collections and techniques for personal name matching. Int J Metadata Semant Ontol 1(2):89–99CrossRefGoogle Scholar
  39. Schwartz ME, Wood DCM (1993) Discovering shared interests using graph analysis. Commun ACM 36(8):78–89CrossRefGoogle Scholar
  40. Shen Q, Leitch R (1992) On extending the quantity space in qualitative reasoning. Artif Intell Eng 7:167–173CrossRefGoogle Scholar
  41. Shen Q, Keppens J, Aitken C, Schafer B, Lee M (2006) A scenario-driven decision support system for serious crime investigation. Law Probab Risk 5(2):87–117CrossRefGoogle Scholar
  42. Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inform Sci 24:265–269CrossRefGoogle Scholar
  43. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Relevance search and anomaly detection in bipartite graphs. ACM SIGKDD Explor Newslett 7(2):48–55CrossRefGoogle Scholar
  44. Torvik V, Weeber M, Swanson DW, Smalheiser NR (2004) A probabilistic similarity metric for medline records: a model of author name disambiguation. J Am Soc Inform Sci Technol 56(2):140–158CrossRefGoogle Scholar
  45. Wang GA, Chen H, Atabakhsh H (2004) Automatically detecting deceptive criminal identities. Commun ACM 47(3):71–76CrossRefGoogle Scholar
  46. Wang GA, Atabakhsh H, Petersen T, Chen H (2005) Discovering identity problems: a case study. In: Proceedings of IEEE international conference on intelligence and security informatics, Atlanta, pp 368–373Google Scholar
  47. Wang GA, Chen H, Xu JJ, Atabakhsh H (2006) Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Trans Syst Man Cybern Part A 36(5):988–999CrossRefGoogle Scholar
  48. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, CambridgeGoogle Scholar
  49. Yager RR (2007) Using stress functions to obtain OWA operators. IEEE Trans Fuzzy Syst 15(6):1122–1129CrossRefMathSciNetGoogle Scholar
  50. Zadeh LA (1965) Fuzzy sets. Inform Control 8:338–353MATHCrossRefMathSciNetGoogle Scholar
  51. Zhang P, Koppaka L (2007) Semantics-based legal citation network. In: Proceedings of international conference on artificial intelligence and law, pp 123–130Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceAberystwyth UniversityAberystwythUK

Personalised recommendations