Selecting potentially relevant records using re-identification methods

Domingo-Ferrer, Josep; Torra, Vicenç

doi:10.1007/BF03040962

Selecting potentially relevant records using re-identification methods

Regular Papers
Published: September 2004

Volume 22, pages 239–252, (2004)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Josep Domingo-Ferrer¹ &
Vicenç Torra²

43 Accesses
2 Citations
Explore all metrics

Abstract

This work proposes re-identification algorithms to select records that are interesting from the point of view of giving new information. Instead of focusing on re-identified elements, we focus on non re-identified records (non linked records) as they are the ones that potentially supply new and relevant information. Moreover, these relevant characteristics can correspond to chances for improving the knowledge of a system.

To evaluate our approach, we have applied it to a example using publicly available data from the UCI repository. We have used the data of theionosphere data base to build a re-identification problem for 35 non-common variables.

We show that the use of a simple heuristic rule base can effectively select potentially interesting records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Data clustering: application and trends

Article 27 November 2022

A comprehensive survey of data mining

Article 06 February 2020

References

Prendinger, H. and Ishizuka, M., “Methodological Considerations on Chance Discovery,”Lecture Notes on Artificial Intelligence, 2253, pp. 425–434, 2001.
Article Google Scholar
Ohsawa, Y. and Fukuda, H., “Potential Motivations as Fountains of Chances,” inProc. of the IEEE Int. Conf. on Industrial Electronics, Control and Instrumentation (IECON 2000), pp. 1626–1631, 2000
Horiguchi, T. and Hirashima, T., “The Role of Counterexamples in Discovery Learning Environment: Awareness of the Chance for Learning,”Lecture Notes on Artificial Intelligence, 2253, pp. 468–474, 2001.
Article Google Scholar
Winkler, W. E., “Matching and Record Linkage”, in Gox, B. G. (ed.),Business Survey Methods, J. Wiley, pp. 355–384, 1995.
Domingo-Ferrer, J. and Torra, V., “Disclosure Control Methods and Information Loss for Microdata”, pp. 91–110, in Doyle, P., Lane, J. I., Theeuwes, J. J. M., Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier, 2001.
Torra, V., “Towards the Re-identification of Individuals in Data Files with Noncommon Variables”, inProc. of the European Conference on Artificial Intelligence. ECAI, pp. 326–330, Berlin, Germany, 2000.
Domingo-Ferrer, J. and Torra, V., “Validating Distance-based Record Linkage with Probabilistic-based One,” inProc. of the 5th Catalan Conference on Artificial Intelligence, in Escrig, M. T., Toledo, F. and Golobardes, E. (eds.), “Topics in Artificial Intelligence”,Lecture Notes on Artificial Intelligence, 2504, pp. 207–215, 2002.
Fellegi, I. P. and Sunter, A. B., “A Theory for Record Linkage,”Journ. of the American Statistical Association, 64, 328, pp. 1183–1210, 1969.
Article Google Scholar
Jaro, M. A., “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,”Journ. of the American Statistical Association, 84, 406, pp. 414–420, 1989.
Article Google Scholar
Torra, V. and Domingo-Ferrer, J., “Record Linkage Methods for Multidatabase Data Mining,” in Torra, V. (ed.),Information Fusion in Data Mining, pp. 101–132, ISBN 3-540-00676-1, Springer, 2003.
Winkler, W. E., “Advanced Methods for Record Linkage,” American Statistical Association, inProc. of the Section on Survey Research Methods, pp. 467–472, 1995.
Dempster, A. P., Lairrd, N. M. and Rubin, D. B., “Maximum Likelihood from Incomplete Data Via the EM Algorithm,”Journ. of the Royal Statistical Society, 39, pp. 1–38, 1977.
MATH Google Scholar
Leicester, G., “Methods for Automatic Record Matching and Linking and Their Use in National Statistics,” Office for National Statistics, London, 2001.
Google Scholar
Pagliuca, D. and Seri, G., “Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project,” Deliverable MI-3/D2, 1999.
Domingo-Ferrer, J. and Torra, V., “A Quantitative Comparison of Disclosure Control Methods for Microdata,” pp. 111–133, in Doyle, P., Lane, J. I., Theeuwes, J. J. M. and Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier 2001.
Bacher, J., Brand, R. and Bender, S., “Re-identifying Register Data by Survey Data Using Cluster Analysis: an Empirical Study,” inInt. J. of Unc., Fuzziness and Knowledge-Based Systems, 10, 5, pp. 589–607, 2002.
Torra, V., “Re-identifying Individuals Using OWA Operators,” inProc. Int. Conf. Soft Comp., lizuka, Japan, 2000.
Yager, R. R., “On Ordered Weighted Averaging Aggregation Operators in Multicriteria Decision Making,”IEEE Trans. on SMC, 18, pp. 183–190, 1988.
MATH MathSciNet Google Scholar
Ohsawa, Y., “The Scope of Chance Discovery,“Lecture Notes on Artificial Intelligence, 2253, pp. 413, 2001.
Article Google Scholar
Murphy, P. M. and Aha, D. W., UCI Repository Machine Learning Databases http://www.ics.uci.edu/mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA, 1994.

Download references

Author information

Authors and Affiliations

Dept. Comput. Eng. and Maths-ETSE, Universitat Rovira i Virgili, Av Països Catalans 26, 43007, Tarragona, Catalonia, Spain
Josep Domingo-Ferrer
Institut d’Investigació en Intel-ligència Artificial-CSIC, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra

Authors

Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Josep Domingo-Ferrer: He is a Professor of Computer Science at University Rovira i Virgili, Tarragona, Catalonia, Spain. He received M.Sc. in Computer Science from the Autonomous University of Barcelona in 1988 (Outstanding Graduation Award) and Ph.D. in Computer Science in 1991. He also holds a M.Sc. in Mathematics. His research has been devoted to data security in a broad sense, including cryptography and inference control in statistical databases. He has authored over 100 scientific publications and holds two patents. He has led several competitively funded research projects, sponsored by the European Commission, the Spanish government and the U.S. government. He is a Senior Member of IEEE.

Vicenç Torra: He is an Associate Research Professor at the Artificial Intelligence Research Institut (CSIC), Bellaterra, Catalonia, Spain. He received BSc in 1991, MSc in 1992 and PhD in 1991 (in Computer Science).

His research is on Information Fusion and Approximated Reasoning (Fuzzy Systems) and applications to Statistical Disclosure Control and Information Retrieval. He has authored more than 40 papers in SCI journals and 60 in other conferences and books. He has edited a book on Information Fusion in Data Mining and written a book on Artificial Intelligence. He has led several (national and international) research projects. He is a Senior Member of IEEE and member of the board (2001-) of the European Society for Fuzzy Logic and Technology.

About this article

Cite this article

Domingo-Ferrer, J., Torra, V. Selecting potentially relevant records using re-identification methods. New Gener Comput 22, 239–252 (2004). https://doi.org/10.1007/BF03040962

Download citation

Received: 27 August 2002
Revised: 02 May 2003
Issue Date: September 2004
DOI: https://doi.org/10.1007/BF03040962

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selecting potentially relevant records using re-identification methods

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

A comprehensive survey of data mining

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

Selecting potentially relevant records using re-identification methods

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

A comprehensive survey of data mining

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation