Skip to main content

A Fuzzy Approach to Identity Resolution

  • Conference paper
  • First Online:
Proceedings of the 22nd Engineering Applications of Neural Networks Conference (EANN 2021)

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 3))

  • 837 Accesses

Abstract

Identity resolution is crucial for law enforcement agencies globally and a difficult task to match the real-world identity in big data due to data inconsistency e.g. typographical errors, naming variation, and abbreviations. The fuzzy approach to identity resolution has been introduced that uses Soundex and Jaro-Winkler distance algorithms in a cascaded manner to calculate an aggregate score for the full name. While the Edit-distance algorithm is used to score the address and ethnicity description attributes. The Soundex code has been modified to numbers only with increased code length to 6-digits for this fuzzy approach. This allowed the matching algorithm to overcome some of the Soundex code limitations of name matching. The approach accommodates three different variations of name for an iterative search process that retrieves matched records based on inputs. In the experiment, searching for a suspect in two different cases, the initial search retrieved 173 and 52 records for each target suspect. These records were grouped using the Mean-Shift clustering technique based on the similarity score of three attributes. For further analysis, the segmentation process of records matched 16 and 22 records for each case respectively, and graph analysis matched the target suspect identity out of other matched identities with links association to different addresses. The overall matching performance of this fuzzy approach is encouraging, and it can benefit law enforcement agencies to speed up the investigation process and most importantly can help to identify the suspect with even minimal information available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, G., Chen, H., Atabakhsh, H.: Automatically detecting deceptive criminal identities. Commun. ACM Homel. Secur. 47(3), 70–76 (2004)

    Article  Google Scholar 

  2. Barkay, D., Dror-Rein, E.: Achieving cyber identity resolution via electronic warfare techniques. In: RSA Conference, Singapore (2015)

    Google Scholar 

  3. Duncan, J., et al.: Building an ontology for identity resolution in healthcare and public health. Online J. Publ. Health Inform. 7(2), 1–17 (2015). https://doi.org/10.5210/ojphi.v7i2.6010

  4. Roth, D., Ratinov, L.: Who’s who in your digital collection: developing a tool for name disambiguation and identity resolution. J. Chicago Colloquium Digital Humanit. Comput. Sci. (DHCS), 1–17 (2009). http://hdl.handle.net/2142/15393

  5. Clendenen, C.: A new approach to workers compensation fraud. IAIABC J. Introd. Identity Resolut. 46(1), 103–114 (2009)

    Google Scholar 

  6. Raksha, N., Alankar, R.: Detection of fuzzy duplicates in high dimensional datasets, pp. 1423–1428 (2016)

    Google Scholar 

  7. Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2003). https://doi.org/10.1145/956755.956759

  8. Mon, A.C., Mie, M., Thwin, S.: Effective blocking for combining multiple entity resolution systems. Int. J. Comput. Sci, Eng. 2(4), 126–136 (2013)

    Google Scholar 

  9. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007). https://doi.org/10.1109/TKDE.2007.250581

    Article  Google Scholar 

  10. Gomaa, W., Fahmy, A.: A survey of text similarity approaches. Int. J. Comput. Appl. Found. Comput. Sci. (FCS) 68(13), 13–18 (2013). https://doi.org/10.5120/11638-7118

    Article  Google Scholar 

  11. Brown, D.E., Hagen, S.: Data association methods with applications to law enforcement. Decis. Support Syst. 34(4), 369–378 (2003)

    Article  Google Scholar 

  12. Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses, Hong Kong, China, pp. 586–597. VLDB Endowment (2002)

    Google Scholar 

  13. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. Adv. Neural Inform. Proc. Syst., 1425–1432 (2003). https://doi.org/10.1.1.15.8644

  14. Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 257–258 (2005)

    Google Scholar 

  15. Li, J., Wang, A.G.: A framework of identity resolution: evaluating identity attributes and matching algorithms. Secur. Inform. 4(1), 1–12 (2015). https://doi.org/10.1186/s13388-015-0021-0

    Article  MathSciNet  Google Scholar 

  16. Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data, p. 311. Wiley-Blackwell, Hoboken (2006)

    Google Scholar 

  17. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Disc. Data (TKDD) 1(5), 5–es (2007). https://doi.org/10.1145/1217299.1217304

  18. Bartunov, S., Korshunov, A., Park, S., Ryu, W., Lee, H.: Joint link-attribute user identity resolution in online social networks categories and subject descriptors. In: The Sixth SNA-KDD Workshop Proceedings (2012)

    Google Scholar 

  19. Phillips, M., Amirhosseini, M.H., Kazemian, H.B.: A rule and graph-based approach for targeted identity resolution on policing data. In: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 2077–2083. Institute of Electrical and Electronics Engineers Inc. (2020). https://doi.org/10.1109/SSCI47803.2020.9308182

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asif Nawaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nawaz, A., Kazemian, H. (2021). A Fuzzy Approach to Identity Resolution. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds) Proceedings of the 22nd Engineering Applications of Neural Networks Conference. EANN 2021. Proceedings of the International Neural Networks Society, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-80568-5_26

Download citation

Publish with us

Policies and ethics