Skip to main content
Log in

Selecting potentially relevant records using re-identification methods

  • Regular Papers
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This work proposes re-identification algorithms to select records that are interesting from the point of view of giving new information. Instead of focusing on re-identified elements, we focus on non re-identified records (non linked records) as they are the ones that potentially supply new and relevant information. Moreover, these relevant characteristics can correspond to chances for improving the knowledge of a system.

To evaluate our approach, we have applied it to a example using publicly available data from the UCI repository. We have used the data of theionosphere data base to build a re-identification problem for 35 non-common variables.

We show that the use of a simple heuristic rule base can effectively select potentially interesting records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Prendinger, H. and Ishizuka, M., “Methodological Considerations on Chance Discovery,”Lecture Notes on Artificial Intelligence, 2253, pp. 425–434, 2001.

    Article  Google Scholar 

  2. Ohsawa, Y. and Fukuda, H., “Potential Motivations as Fountains of Chances,” inProc. of the IEEE Int. Conf. on Industrial Electronics, Control and Instrumentation (IECON 2000), pp. 1626–1631, 2000

  3. Horiguchi, T. and Hirashima, T., “The Role of Counterexamples in Discovery Learning Environment: Awareness of the Chance for Learning,”Lecture Notes on Artificial Intelligence, 2253, pp. 468–474, 2001.

    Article  Google Scholar 

  4. Winkler, W. E., “Matching and Record Linkage”, in Gox, B. G. (ed.),Business Survey Methods, J. Wiley, pp. 355–384, 1995.

  5. Domingo-Ferrer, J. and Torra, V., “Disclosure Control Methods and Information Loss for Microdata”, pp. 91–110, in Doyle, P., Lane, J. I., Theeuwes, J. J. M., Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier, 2001.

  6. Torra, V., “Towards the Re-identification of Individuals in Data Files with Noncommon Variables”, inProc. of the European Conference on Artificial Intelligence. ECAI, pp. 326–330, Berlin, Germany, 2000.

  7. Domingo-Ferrer, J. and Torra, V., “Validating Distance-based Record Linkage with Probabilistic-based One,” inProc. of the 5th Catalan Conference on Artificial Intelligence, in Escrig, M. T., Toledo, F. and Golobardes, E. (eds.), “Topics in Artificial Intelligence”,Lecture Notes on Artificial Intelligence, 2504, pp. 207–215, 2002.

  8. Fellegi, I. P. and Sunter, A. B., “A Theory for Record Linkage,”Journ. of the American Statistical Association, 64, 328, pp. 1183–1210, 1969.

    Article  Google Scholar 

  9. Jaro, M. A., “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,”Journ. of the American Statistical Association, 84, 406, pp. 414–420, 1989.

    Article  Google Scholar 

  10. Torra, V. and Domingo-Ferrer, J., “Record Linkage Methods for Multidatabase Data Mining,” in Torra, V. (ed.),Information Fusion in Data Mining, pp. 101–132, ISBN 3-540-00676-1, Springer, 2003.

  11. Winkler, W. E., “Advanced Methods for Record Linkage,” American Statistical Association, inProc. of the Section on Survey Research Methods, pp. 467–472, 1995.

  12. Dempster, A. P., Lairrd, N. M. and Rubin, D. B., “Maximum Likelihood from Incomplete Data Via the EM Algorithm,”Journ. of the Royal Statistical Society, 39, pp. 1–38, 1977.

    MATH  Google Scholar 

  13. Leicester, G., “Methods for Automatic Record Matching and Linking and Their Use in National Statistics,” Office for National Statistics, London, 2001.

    Google Scholar 

  14. Pagliuca, D. and Seri, G., “Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project,” Deliverable MI-3/D2, 1999.

  15. Domingo-Ferrer, J. and Torra, V., “A Quantitative Comparison of Disclosure Control Methods for Microdata,” pp. 111–133, in Doyle, P., Lane, J. I., Theeuwes, J. J. M. and Zayatz, L. M. (eds.),Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier 2001.

  16. Bacher, J., Brand, R. and Bender, S., “Re-identifying Register Data by Survey Data Using Cluster Analysis: an Empirical Study,” inInt. J. of Unc., Fuzziness and Knowledge-Based Systems, 10, 5, pp. 589–607, 2002.

  17. Torra, V., “Re-identifying Individuals Using OWA Operators,” inProc. Int. Conf. Soft Comp., lizuka, Japan, 2000.

  18. Yager, R. R., “On Ordered Weighted Averaging Aggregation Operators in Multicriteria Decision Making,”IEEE Trans. on SMC, 18, pp. 183–190, 1988.

    MATH  MathSciNet  Google Scholar 

  19. Ohsawa, Y., “The Scope of Chance Discovery,“Lecture Notes on Artificial Intelligence, 2253, pp. 413, 2001.

    Article  Google Scholar 

  20. Murphy, P. M. and Aha, D. W., UCI Repository Machine Learning Databases http://www.ics.uci.edu/mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA, 1994.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Josep Domingo-Ferrer: He is a Professor of Computer Science at University Rovira i Virgili, Tarragona, Catalonia, Spain. He received M.Sc. in Computer Science from the Autonomous University of Barcelona in 1988 (Outstanding Graduation Award) and Ph.D. in Computer Science in 1991. He also holds a M.Sc. in Mathematics. His research has been devoted to data security in a broad sense, including cryptography and inference control in statistical databases. He has authored over 100 scientific publications and holds two patents. He has led several competitively funded research projects, sponsored by the European Commission, the Spanish government and the U.S. government. He is a Senior Member of IEEE.

Vicenç Torra: He is an Associate Research Professor at the Artificial Intelligence Research Institut (CSIC), Bellaterra, Catalonia, Spain. He received BSc in 1991, MSc in 1992 and PhD in 1991 (in Computer Science).

His research is on Information Fusion and Approximated Reasoning (Fuzzy Systems) and applications to Statistical Disclosure Control and Information Retrieval. He has authored more than 40 papers in SCI journals and 60 in other conferences and books. He has edited a book on Information Fusion in Data Mining and written a book on Artificial Intelligence. He has led several (national and international) research projects. He is a Senior Member of IEEE and member of the board (2001-) of the European Society for Fuzzy Logic and Technology.

About this article

Cite this article

Domingo-Ferrer, J., Torra, V. Selecting potentially relevant records using re-identification methods. New Gener Comput 22, 239–252 (2004). https://doi.org/10.1007/BF03040962

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03040962

Keywords

Navigation