A Dictionary-Based Approach to Fast and Accurate Name Matching in Large Law Enforcement Databases
In the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information. This is an issue of grave importance in homeland security, criminology, medical applications, GIS (geographic information systems) and so on. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. There is a pressing need for name matching approaches that provide high levels of accuracy, while at the same time maintaining the computational complexity of achieving this goal reasonably low. In this paper, we present ANSWER, a name matching approach that utilizes a prefix-tree of available names in the database. Creating and searching the name dictionary tree is fast and accurate and, thus, ANSWER is superior to other techniques of retrieving fuzzy name matches in large databases.
KeywordsEdit Distance Recall Rate Homeland Security Geographic Information System Match Approach
Unable to display preview. Download preview PDF.
- 2.Taipale, K.A.: Data Mining & Domestic Security: Connecting the Dots to Make Sense of Data. The Columbia Science & Technology Law Review 5, 1–83 (2003)Google Scholar
- 5.Pfeifer, U., Poersch, T., Fuhr, N.: Searching Proper Names in Databases. In: Proceedings of the Hypertext - Information Retrieval – Multimedia (HIM 1995), vol. 20, pp. 259–276 (1995)Google Scholar
- 6.Winkler, W.E.: The state of record linkage and current research problems. In: Proceedings of the Section on Survey Methods of the Statistical Society of Canada (1999)Google Scholar
- 7.Monge, A.E., Elkan, C.P.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the ACM-SIGMOD Workshop on Research Issues on Knowledge Discovery and Data Mining, Tucson, AZ (1997)Google Scholar
- 8.Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.: Automatic linkage of vital records. In: Science, vol. 3381, pp. 954–959 (1959)Google Scholar
- 10.Jaro, M.A.: “UNIMATCH: A Record Linkage System: User’s Manual. Technical Report”, U.S. Bureau of the Census, Washington, DC (1976)Google Scholar
- 12.Wilcox, J.: Police Agencies Join Forces To Build Data-Sharing Networks: Local, State, and Federal Crimefighters Establish IT Posses, Government Computer News (September 1997)Google Scholar
- 13.Maxwell, T.: Information, Data Mining, and National Security: False Positives and Unidentified Negatives. In: Proceedings of the 38th Hawaii International Conference on System Science (2005)Google Scholar