Artificial Intelligence and Law

, Volume 22, Issue 2, pp 141–173 | Cite as

Anonymity preserving sequential pattern mining

  • Anna MonrealeEmail author
  • Dino Pedreschi
  • Ruggero G. Pensa
  • Fabio Pinelli


The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility.


Privacy-by-design Sequence data k-anonymity 


  1. Abul O, Atzori M, Bonchi F, Giannotti F (2007a) Hiding sensitive trajectory patterns. In: Proceedings of IEEE ICDM workshops, pp 693–698Google Scholar
  2. Abul O, Atzori M, Bonchi F, Giannotti F (2007b) Hiding sequences. In: Proceedings of IEEE ICDE workshops, pp 147–156Google Scholar
  3. Abul O, Atzori M, Bonchi F, Giannotti F (2007c) Hiding sequences. In: Proceedings of IEEE ICDE workshops, pp 147–156Google Scholar
  4. Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Proceedings of IEEE ICDE, pp 376–385Google Scholar
  5. Aggarwal CC, Yu PS (2008b) A framework for condensation-based anonymization of string data. Data Min Knowl Discov 16(3):251–275CrossRefMathSciNetGoogle Scholar
  6. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of IEEE ICDE, pp 3–14Google Scholar
  7. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of ACM SIGMOD, pp 439–450Google Scholar
  8. Article 29 data protection working party and working party on police and justice, the future of privacy: Joint contribution to the consultation of the european commission on the legal framework for the fundamental right to protection of personal data. 02356/09/en, wp 168 (December 1, 2009).
  9. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: Proceedings of KDEX’99, pp 45–52Google Scholar
  10. Barbaro M, Zeller Jr T (2006) A face is exposed for aol searcher no. 4417749. The New York TimesGoogle Scholar
  11. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of IEEE ICDE, pp 217–228Google Scholar
  12. Bettini C, Mascetti S (2006) Preserving k-anonymity in spatio-temporal datasets and location-based servicesGoogle Scholar
  13. Bonchi F, Saygin Y, Verykios VS, Atzori M, Gkoulalas-Divanis A, Kaya SV, Savas E (2008) Privacy in spatiotemporal data mining. In: 19, pp 297–329. SpringerGoogle Scholar
  14. Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Stat Tidskrift 12:213–225Google Scholar
  15. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of IHW 2001, pp 369–383Google Scholar
  16. European Data Protection Supervisor (2010) Opinion of the european data protection supervisor on promoting trust in the information society by fostering data protection and privacyGoogle Scholar
  17. Federal Trade Commission (Bureau of Consumer Protection) (2010) Preliminary staff report, protecting consumer privacy in an era of rapid change: a proposed framework for business and policy makers, at v, 41,
  18. Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In ICDE, pp 715–724Google Scholar
  19. Giannotti, F, Pedreschi, D (eds) (2008) Mobility, data mining and privacy. Springer, BerlinGoogle Scholar
  20. Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of ACM SIGKDD, pp 330–339Google Scholar
  21. LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of IEEE ICDE, p 25Google Scholar
  22. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. TKDD, 1(1)Google Scholar
  23. Malin B (2008) k-unlinkability: a privacy protection model for distributed data. Data Knowl Eng 64(1):294–311CrossRefGoogle Scholar
  24. Mascetti S, Bettini C, Wang XS, Jajodia S (2006) k-anonymity in databases with timestamped data. In TIME, pp 177–186Google Scholar
  25. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: PODS’04, pp 223–228, New York, NY, USA, ACMGoogle Scholar
  26. Mielikainen T (2003) On inverse frequent set mining. In: 2nd workshop on privacy preserving data mining (PPDM 2003), pp 18–23Google Scholar
  27. Monreale A (2011) Privacy by design in data mining. Ph.D. Thesis at Department of Computer Science, University of Pisa, PisaGoogle Scholar
  28. Nanni M (2005) Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Proceedings of PAKDD, LNCS 3518, pp 378–387. SpringerGoogle Scholar
  29. Nergiz ME, Atzori M, Saygin Y (2007) Perturbation-driven anonymization of trajectories. Technical report 2007-TR-017, ISTI-CNR, Pisa, Italy, p 10Google Scholar
  30. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Hsu M, Dayal U (2001) Prefixspan: mining sequential patterns by prefix-projected growth. In: Proceedings of IEEE ICDE, pp 215–225Google Scholar
  31. International Conference of Data Protection and Privacy Commissioners (2010) Privacy by design resolution. Jerusalem, Israel, October 27–29, 2010Google Scholar
  32. Samarati P, Sweeney L (1998a) Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of PODS, p 188Google Scholar
  33. Samarati P, Sweeney L (1998b) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI InternationalGoogle Scholar
  34. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Record 30(4):45–54CrossRefGoogle Scholar
  35. Sweeney L (2001) Computational disclosure control: a primer. Ph.D. thesis, Department of Electrical Engineering and Computer Science, MITGoogle Scholar
  36. Terrovitis M, Mamoulis N (2008) Privacy preservation in the publication of trajectories. In: MDM, pp 65–72Google Scholar
  37. Xu Y, Fung BCM, Wang K, Fu AW-C, Pei J (2008) Publishing sensitive transactions for itemset utility. In: ICDM, pp 1109–1114Google Scholar
  38. Yarovoy R, Bonchi F, Lakshmanan LVS, Wang WH (2009) Anonymizing moving objects: how to hide a mob in a crowd? In: EDBT, pp 72–83Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Anna Monreale
    • 1
    • 2
    Email author
  • Dino Pedreschi
    • 1
    • 2
  • Ruggero G. Pensa
    • 3
  • Fabio Pinelli
    • 4
  1. 1.University of PisaPisaItaly
  2. 2.KDD Lab, ISTI-CNRPisaItaly
  3. 3.Department of Computer ScienceUniversity of TorinoTurinItaly
  4. 4.IBM ResearchDublinIreland

Personalised recommendations