Distance Based Re-identification for Time Series, Analysis of Distances

  • Jordi Nin
  • Vicenç Torra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4302)

Abstract

Record linkage is a technique for linking records from different files or databases that correspond to the same entity. Standard record linkage methods need the files to have some variables in common. Typically, variables are either numerical or categorical. These variables are the basis for permitting such linkage.

In this paper we study the problem when the files to link are formed by numerical time series instead of numerical variables. We study some extensions of distance base record linkage in order to take advantage of this kind of data.

Keywords

Re-identification algorithms time series privacy statistical databases time series distances record linkage 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  2. 2.
    Stock Exchange web, Sabadell Bank, http://www.bsmarkets.com/
  3. 3.
    Chu, S., Keogh, E., Hart, D., Pazzani, M.: Iterative Deepening Dynamic Time Warping for Time Series. In: The Second SIAM International Conference on Data Mining, Chicago, USA, April 11-13 (2002)Google Scholar
  4. 4.
    Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier Science, Amsterdam (2001)Google Scholar
  5. 5.
    Internation institute of forecasters, http://www.forecasters.org/
  6. 6.
    Möller-Levet, C.S., Klawonn, F., Cho, K.-H., Wolkenhauer, O.: Fuzzy clustering of short time series and unevenly distributed sampling points. In: Proceedings of the 5th International Symposium on Intelligent Data Analysis, Berlin, Germany, August 28-30 (2003)Google Scholar
  7. 7.
    Football statistics web, http://www.histora.org
  8. 8.
    Nin, J., Torra, V.: Extending Microaggregation Procedures for Time Series Protection. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)MATHCrossRefGoogle Scholar
  10. 10.
    Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 101–132. Springer, Heidelberg (2003)Google Scholar
  11. 11.
    Warren Liao, T.: Clustering of time series data - a survey. Pattern Recognition 38, 1857–1874 (2005)MATHCrossRefGoogle Scholar
  12. 12.
    Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, Heidelberg (2001)MATHGoogle Scholar
  13. 13.
    Winkler, W.E.: Data Cleaning Methods. In: Proc. SIGKDD 2003, Washington (2003)Google Scholar
  14. 14.
    Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jordi Nin
    • 1
  • Vicenç Torra
    • 2
  1. 1.DAMA-UPC, Computer Architecture Dept.Universitat Politècnica de CatalunyaBarcelona, CataloniaSpain
  2. 2.IIIA-CSICBellaterra, CataloniaSpain

Personalised recommendations