Selecting potentially relevant records using re-identification methods
This work proposes re-identification algorithms to select records that are interesting from the point of view of giving new information. Instead of focusing on re-identified elements, we focus on non re-identified records (non linked records) as they are the ones that potentially supply new and relevant information. Moreover, these relevant characteristics can correspond to chances for improving the knowledge of a system.
To evaluate our approach, we have applied it to a example using publicly available data from the UCI repository. We have used the data of theionosphere data base to build a re-identification problem for 35 non-common variables.
We show that the use of a simple heuristic rule base can effectively select potentially interesting records.
KeywordsChance Discovery Knowledge Discovery in Databases Data Mining Multi-database Mining Re-identification Algorithms Record Selection Record Linkage
Unable to display preview. Download preview PDF.
- 2).Ohsawa, Y. and Fukuda, H., “Potential Motivations as Fountains of Chances,” inProc. of the IEEE Int. Conf. on Industrial Electronics, Control and Instrumentation (IECON 2000), pp. 1626–1631, 2000Google Scholar
- 4).Winkler, W. E., “Matching and Record Linkage”, in Gox, B. G. (ed.),Business Survey Methods, J. Wiley, pp. 355–384, 1995.Google Scholar
- 5).Domingo-Ferrer, J. and Torra, V., “Disclosure Control Methods and Information Loss for Microdata”, pp. 91–110, in Doyle, P., Lane, J. I., Theeuwes, J. J. M., Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier, 2001.Google Scholar
- 6).Torra, V., “Towards the Re-identification of Individuals in Data Files with Noncommon Variables”, inProc. of the European Conference on Artificial Intelligence. ECAI, pp. 326–330, Berlin, Germany, 2000.Google Scholar
- 7).Domingo-Ferrer, J. and Torra, V., “Validating Distance-based Record Linkage with Probabilistic-based One,” inProc. of the 5th Catalan Conference on Artificial Intelligence, in Escrig, M. T., Toledo, F. and Golobardes, E. (eds.), “Topics in Artificial Intelligence”,Lecture Notes on Artificial Intelligence, 2504, pp. 207–215, 2002.Google Scholar
- 10).Torra, V. and Domingo-Ferrer, J., “Record Linkage Methods for Multidatabase Data Mining,” in Torra, V. (ed.),Information Fusion in Data Mining, pp. 101–132, ISBN 3-540-00676-1, Springer, 2003.Google Scholar
- 11).Winkler, W. E., “Advanced Methods for Record Linkage,” American Statistical Association, inProc. of the Section on Survey Research Methods, pp. 467–472, 1995.Google Scholar
- 13).Leicester, G., “Methods for Automatic Record Matching and Linking and Their Use in National Statistics,” Office for National Statistics, London, 2001.Google Scholar
- 14).Pagliuca, D. and Seri, G., “Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project,” Deliverable MI-3/D2, 1999.Google Scholar
- 15).Domingo-Ferrer, J. and Torra, V., “A Quantitative Comparison of Disclosure Control Methods for Microdata,” pp. 111–133, in Doyle, P., Lane, J. I., Theeuwes, J. J. M. and Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier 2001.Google Scholar
- 16).Bacher, J., Brand, R. and Bender, S., “Re-identifying Register Data by Survey Data Using Cluster Analysis: an Empirical Study,” inInt. J. of Unc., Fuzziness and Knowledge-Based Systems, 10, 5, pp. 589–607, 2002.Google Scholar
- 17).Torra, V., “Re-identifying Individuals Using OWA Operators,” inProc. Int. Conf. Soft Comp., lizuka, Japan, 2000.Google Scholar
- 20).Murphy, P. M. and Aha, D. W., UCI Repository Machine Learning Databases http://www.ics.uci.edu/mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA, 1994.Google Scholar