New Approach to the Re-identification Problem Using Neural Networks

  • Jordi Nin
  • Vicenç Torra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3885)

Abstract

Schema and record matching are tools to integrate files or databases. Record linkage is one of the tools used to link those records that while belonging to different files correspond to the same individual.

Standard record linkage methods are applied when the records of both files are described using the same variables. One of the non-standard record linkage methods corresponds to the case when files are not described using the same variables.

In this paper we study record linkage for non common variables. In particular, we use a supervised approach based on neural networks. We use a neural network to find the relationships between variables. Then, we use these relationships to translate the information in the domain of one file into the domain of the other file.

Keywords

Database integration record linkage re-identification algorithms neural networks data mining information fusion information privacy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  2. 2.
    Data Extraction System, U.S. Census Bureau, http://www.census.gov/DES/www/welcome.html
  3. 3.
    Freeman, J.A., Skapura, D.M.: Neural Networks. Algorithms Applications and Programming Techniques. Addison-Wesley, Reading (1991)MATHGoogle Scholar
  4. 4.
    Li, W., Clifton, C.: SEMINT: A tool for identifying correspondences in heterogeneus databases using neural networks. Data & Knowledge Engineering 33, 49–84 (2000)CrossRefMATHGoogle Scholar
  5. 5.
    Narukawa, Y., Torra, V.: Twofold integral and Multi-step Choquet integral. Kybernetika 40(1), 39–50 (2004)MathSciNetMATHGoogle Scholar
  6. 6.
    Narukawa, Y., Torra, V.: Graphical interpretation of the twofold integral and its generalization. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 13(4), 415–424 (2005)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Nguyen, D., Widrow, B.: Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: Proc. of the Int’l. Joint Conference on Neural Networks, vol. 3, pp. 21–26 (1990)Google Scholar
  8. 8.
    Nin, J., Torra, V.: Towards the use of OWA operators for record linkage. In: Proc. of the European Soc. on Fuzzy Logic and Technologies (in press, 2005)Google Scholar
  9. 9.
    Nin, J., Torra, V.: Empirical analysis of database privacy using twofold integrals. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 1–8. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Heidelberg (1996)MATHGoogle Scholar
  11. 11.
    Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 101–132. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Torra, V.: Towards the re-identification of individuals in data files with non-common variables. In: Proc. of the 14th European Conference on Artificial Intelligence (ECAI 2000), Berlin, Germany, pp. 326–330. IOS Press, Amsterdam (2000)Google Scholar
  13. 13.
    Torra, V.: OWA operators in data modeling and re-identification. IEEE Trans. on Fuzzy Systems 12(5), 652–660 (2004)CrossRefGoogle Scholar
  14. 14.
    Murphy, P.M., Aha, D.W.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/MLRepository.html
  15. 15.
    Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, Heidelberg (2001)CrossRefMATHGoogle Scholar
  16. 16.
    Winkler, W.E.: Data Cleaning Methods. In: Proc. SIGKDD 2003, Washington (2003)Google Scholar
  17. 17.
    Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Yager, R.R.: On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Trans. Syst., Man, Cybern. 18, 183–190 (1988)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jordi Nin
    • 1
  • Vicenç Torra
    • 1
  1. 1.IIIA-CSICBellaterra (Catalonia)Spain

Personalised recommendations