Approaches of Anonymisation of an SMS Corpus

  • Namrata Patel
  • Pierre Accorsi
  • Diana Inkpen
  • Cédric Lopez
  • Mathieu Roche
Conference paper

DOI: 10.1007/978-3-642-37247-6_7

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7816)
Cite this paper as:
Patel N., Accorsi P., Inkpen D., Lopez C., Roche M. (2013) Approaches of Anonymisation of an SMS Corpus. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg

Abstract

This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Namrata Patel
    • 1
  • Pierre Accorsi
    • 1
  • Diana Inkpen
    • 2
  • Cédric Lopez
    • 3
  • Mathieu Roche
    • 1
  1. 1.LIRMM – CNRSUniv. Montpellier 2France
  2. 2.Univ. of OttawaCanada
  3. 3.Objet Direct – VISEOFrance

Personalised recommendations