A Multi-layer Naïve Bayes Model for Approximate Identity Matching
Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a Naïve Bayes identity matching model that improves existing techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique and achieves higher precision than the record comparison technique. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 30% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.
KeywordsBayesian Network Training Dataset Unsupervised Learning Identity Information Approximate Identity
Unable to display preview. Download preview PDF.
- 1.Camp, J.: Identity in Digital Government. In: Proceedings of 2003 Civic Scenario Workshop: an Event of the Kennedy School of Government. Cambridge, MA 02138 (2003)Google Scholar
- 5.Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., Chen, H.: Cross-Jurisdictional criminal activity networks to support border and transportation security. In: Proceedings of 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C (2004)Google Scholar
- 6.Ravikumar, P., Cohen, W.W.: A Hierarchical Graphical Model for Record Linkage. In: Proceedings of 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff Park Lodge, Banff, Canada (2004)Google Scholar
- 8.Winkler, W.E.: Methods for Record Linkage and Bayesian Networks. In: Proceedings of Section on Survey Research Methods, American Statistical Association, Alexandria, Virginia (2002)Google Scholar