Abstract
Identity resolution is crucial for law enforcement agencies globally and a difficult task to match the real-world identity in big data due to data inconsistency e.g. typographical errors, naming variation, and abbreviations. The fuzzy approach to identity resolution has been introduced that uses Soundex and Jaro-Winkler distance algorithms in a cascaded manner to calculate an aggregate score for the full name. While the Edit-distance algorithm is used to score the address and ethnicity description attributes. The Soundex code has been modified to numbers only with increased code length to 6-digits for this fuzzy approach. This allowed the matching algorithm to overcome some of the Soundex code limitations of name matching. The approach accommodates three different variations of name for an iterative search process that retrieves matched records based on inputs. In the experiment, searching for a suspect in two different cases, the initial search retrieved 173 and 52 records for each target suspect. These records were grouped using the Mean-Shift clustering technique based on the similarity score of three attributes. For further analysis, the segmentation process of records matched 16 and 22 records for each case respectively, and graph analysis matched the target suspect identity out of other matched identities with links association to different addresses. The overall matching performance of this fuzzy approach is encouraging, and it can benefit law enforcement agencies to speed up the investigation process and most importantly can help to identify the suspect with even minimal information available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, G., Chen, H., Atabakhsh, H.: Automatically detecting deceptive criminal identities. Commun. ACM Homel. Secur. 47(3), 70–76 (2004)
Barkay, D., Dror-Rein, E.: Achieving cyber identity resolution via electronic warfare techniques. In: RSA Conference, Singapore (2015)
Duncan, J., et al.: Building an ontology for identity resolution in healthcare and public health. Online J. Publ. Health Inform. 7(2), 1–17 (2015). https://doi.org/10.5210/ojphi.v7i2.6010
Roth, D., Ratinov, L.: Who’s who in your digital collection: developing a tool for name disambiguation and identity resolution. J. Chicago Colloquium Digital Humanit. Comput. Sci. (DHCS), 1–17 (2009). http://hdl.handle.net/2142/15393
Clendenen, C.: A new approach to workers compensation fraud. IAIABC J. Introd. Identity Resolut. 46(1), 103–114 (2009)
Raksha, N., Alankar, R.: Detection of fuzzy duplicates in high dimensional datasets, pp. 1423–1428 (2016)
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2003). https://doi.org/10.1145/956755.956759
Mon, A.C., Mie, M., Thwin, S.: Effective blocking for combining multiple entity resolution systems. Int. J. Comput. Sci, Eng. 2(4), 126–136 (2013)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007). https://doi.org/10.1109/TKDE.2007.250581
Gomaa, W., Fahmy, A.: A survey of text similarity approaches. Int. J. Comput. Appl. Found. Comput. Sci. (FCS) 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118
Brown, D.E., Hagen, S.: Data association methods with applications to law enforcement. Decis. Support Syst. 34(4), 369–378 (2003)
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses, Hong Kong, China, pp. 586–597. VLDB Endowment (2002)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. Adv. Neural Inform. Proc. Syst., 1425–1432 (2003). https://doi.org/10.1.1.15.8644
Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 257–258 (2005)
Li, J., Wang, A.G.: A framework of identity resolution: evaluating identity attributes and matching algorithms. Secur. Inform. 4(1), 1–12 (2015). https://doi.org/10.1186/s13388-015-0021-0
Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data, p. 311. Wiley-Blackwell, Hoboken (2006)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Disc. Data (TKDD) 1(5), 5–es (2007). https://doi.org/10.1145/1217299.1217304
Bartunov, S., Korshunov, A., Park, S., Ryu, W., Lee, H.: Joint link-attribute user identity resolution in online social networks categories and subject descriptors. In: The Sixth SNA-KDD Workshop Proceedings (2012)
Phillips, M., Amirhosseini, M.H., Kazemian, H.B.: A rule and graph-based approach for targeted identity resolution on policing data. In: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 2077–2083. Institute of Electrical and Electronics Engineers Inc. (2020). https://doi.org/10.1109/SSCI47803.2020.9308182
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nawaz, A., Kazemian, H. (2021). A Fuzzy Approach to Identity Resolution. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds) Proceedings of the 22nd Engineering Applications of Neural Networks Conference. EANN 2021. Proceedings of the International Neural Networks Society, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-80568-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-80568-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80567-8
Online ISBN: 978-3-030-80568-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)