Abstract
Combating the identity problem is crucial and urgent as false identity has become a common denominator of many serious crimes, including mafia trafficking and terrorism. Without correct identification, it is very difficult for law enforcement authority to intervene, or even trace terrorists’ activities. Amongst several identity attributes, personal names are commonly, and effortlessly, falsified or aliased by most criminals. Typical approaches to detecting the use of false identity rely on the similarity measure of textual and other content-based characteristics, which are usually not applicable in the case of highly deceptive, erroneous and unknown descriptions. This barrier can be overcome through analysis of link information displayed by the individual in communication behaviours, financial interactions and social networks. In particular, this paper presents a novel link-based approach that improves existing techniques by integrating multiple link properties in the process of similarity evaluation. It is utilised in a hybrid model that proficiently combines both text-based and link-based measures of examined names to refine the justification of their similarity. This approach is experimentally evaluated against other link-based and text-based techniques, over a terrorist-related dataset, with further generalization to a similar problem occurring in publication databases. The empirical study demonstrates the great potential of this work towards developing an effective identity verification system.
Similar content being viewed by others
References
Aleman-Meza B, Nagarajan M, Ding L, Sheth AP, Arpinar IB, Joshi A, Finin T (2008) Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection. ACM Trans Web 2(1):1–29
Ali AH, Dubois D, Prade H (2003) Qualitative reasoning based on fuzzy relative orders of magnitude. IEEE Trans Fuzzy Syst 11(1):9–23
Angheluta R, Moens MF (2007) Cross-document entity tracking. In: Proceedings of European conference on IR research, pp 670–673
Ashley KD, Bruninghaus S (2009) Automatically classifying case texts and predicting outcomes. Artif Intell Law 17:125–165
Badia A, Kantardzic MM (2005) Link analysis tools for intelligence and counterterrorism. In: Proceedings of IEEE international conference on intelligence and security informatics, Atlanta, pp 49–59 (2005)
Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans KDD 1(1):5-ex
Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48
Bilenko M, Mooney R, Cohen W, Ravikumar P, Fienberg S (2003) Adaptive name matching in information integration. IEEE Intell Syst 18(5):16–23
Boongoen T, Shen Q (2008a) Clus-DOWA: a new dependent OWA operator. In: Proceedings of IEEE international conference on fuzzy sets and systems, pp 1057–1063
Boongoen T, Shen Q (2008b) Detecting false identity through behavioural patterns. In: Proceedings of international crime science conference, London
Boongoen T, Shen Q (2009a) Order-of-magnitude based link analysis for false identity detection. In: Proceedings of the 23rd international workshop on qualitative reasoning, pp 7–15
Boongoen T, Shen Q (2009b) Semi-supervised OWA aggregation for link-based similarity evaluation and alias detection. In: Proceedings of IEEE international conference on fuzzy sets and systems, pp 288–293
Branting K (2003) A comparative evaluation of name matching algorithms. In: Proceedings of international conference on AI and law, pp 224–232
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Calado P, Cristo M, Gonçalves MA, de Moura ES, Ribeiro-Neto BA, Ziviani N (2006) Link based similarity measures for the classification of web documents. J Am Soc Inform Sci Technol 57(2):208–221
Clarke R (1994) Human identification in information systems: management challenges and public policy issues. IT People 7(4):6–37
Fellegi I, Sunter A (1969) Theory of record linkage. J Am Stat Assoc 64:1183–1210
Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 19(3):355–369
Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newslett 7(2):3–12
Hou J, Zhang Y (2003) Effectively finding relevant web pages from linkage information. IEEE Trans Knowl Data Eng 15(4):940–951
Hsiung P, Moore A, Neill D, Schneider J (2005) Alias detection in link data sets. In: Proceedings of international conference on intelligence analysis
Jaro MA (1995) Probabilistic linkage of large public health data files. Stat Med 14(5–7):491–498
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 538–543
Klink S, Reuther P, Weber A, Walter B, Ley M (2006) Analysing social networks within bibliographical data. In: Proceedings of international conference on database and expert systems applications, Poland, pp 234–243
Kukich K (1992) Techniques for automatically correcting words in text. ACM Comput Surv 24(4):377–439
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031
Lin Z, King I, Lyu MR (2006) Pagesim: a novel link-based similarity measure for the world wide web. In: Proceedings of IEEE/WIC/ACM international conference on web intelligence, pp 687–693
Minkov E, Cohen WW, Ng AY (2006) Contextual search and name disambiguation in email using graphs. In: Proceedings of international conference on research and development in IR, pp 27–34
Murata T, Moriyasu S (2008) Link prediction based on structural properties of online social networks. New Gener Comput 26:245–257
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Oatley GC, Zeleznikow J, Ewart BW (2005) Criminal networks and spatial density. In: Proceedings of international conference on artificial intelligence and law, pp 246–247
Oskamp A, Lauritsen M (2002) AI in law practice? So far, not much. Artif Intell Law 10:227–236
Pantel P (2006) Alias detection in malicious environments. In: Proceedings of AAAI fall symposium on capturing and using patterns for evidence detection, Washington, D.C., pp 14–20
Pasula H, Marthi B, Milch B, Russell S, Shpitser I (2003) Identity uncertainty and citation matching. Adv Neural Inform Process Syst 15:1425–1432
Philipps L, Sartor G (1999) Introduction: from legal theories to neural networks and fuzzy reasoning. Artif Intell Law 7:115–128
Porter G (2008) Crying (iranian) wolf in argentina. Asia Times Online
Raiman O (1991) Order of magnitude reasoning. Artif Intell 51(1–3):11–38
Reuther P, Walter B (2006) Survey on test collections and techniques for personal name matching. Int J Metadata Semant Ontol 1(2):89–99
Schwartz ME, Wood DCM (1993) Discovering shared interests using graph analysis. Commun ACM 36(8):78–89
Shen Q, Leitch R (1992) On extending the quantity space in qualitative reasoning. Artif Intell Eng 7:167–173
Shen Q, Keppens J, Aitken C, Schafer B, Lee M (2006) A scenario-driven decision support system for serious crime investigation. Law Probab Risk 5(2):87–117
Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inform Sci 24:265–269
Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Relevance search and anomaly detection in bipartite graphs. ACM SIGKDD Explor Newslett 7(2):48–55
Torvik V, Weeber M, Swanson DW, Smalheiser NR (2004) A probabilistic similarity metric for medline records: a model of author name disambiguation. J Am Soc Inform Sci Technol 56(2):140–158
Wang GA, Chen H, Atabakhsh H (2004) Automatically detecting deceptive criminal identities. Commun ACM 47(3):71–76
Wang GA, Atabakhsh H, Petersen T, Chen H (2005) Discovering identity problems: a case study. In: Proceedings of IEEE international conference on intelligence and security informatics, Atlanta, pp 368–373
Wang GA, Chen H, Xu JJ, Atabakhsh H (2006) Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Trans Syst Man Cybern Part A 36(5):988–999
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge
Yager RR (2007) Using stress functions to obtain OWA operators. IEEE Trans Fuzzy Syst 15(6):1122–1129
Zadeh LA (1965) Fuzzy sets. Inform Control 8:338–353
Zhang P, Koppaka L (2007) Semantics-based legal citation network. In: Proceedings of international conference on artificial intelligence and law, pp 123–130
Acknowledgments
This work is sponsored by UK EPSRC grant EP/D057086. The authors are grateful to the members of the project team for their contribution, whilst taking full responsibility for the views expressed in this paper. The authors would also like to thank the anonymous referees for their constructive comments which have helped considerably in revising this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boongoen, T., Shen, Q. & Price, C. Disclosing false identity through hybrid link analysis. Artif Intell Law 18, 77–102 (2010). https://doi.org/10.1007/s10506-010-9085-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-010-9085-9