On the Declassification of Confidential Documents

  • Daniel Abril
  • Guillermo Navarro-Arribas
  • Vicenç Torra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6820)


We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.


Privacy Declassification Anonymization Named Entity Recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abril, D., Navarro-Arribas, G., Torra, V.: Towards Semantic Microaggregation of Categorical Data for Confidential Documents. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Yu, P.S. (eds.): Privacy-Preserving Data Mining. Springer, Heidelberg (2007)Google Scholar
  3. 3.
    Brand, R.: Microdata Protection through Noise Addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Trans. on Knowl. and Data Eng. 18(10), 1411–1428 (2006)Google Scholar
  5. 5.
    Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2(3), 329–336 (1986)Google Scholar
  6. 6.
    DARPA, New technologies to support declassification. Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73 (2010)Google Scholar
  7. 7.
    Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proc. of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics, Canada, pp. 195–204 (1993)Google Scholar
  8. 8.
    Domingo-Ferrer, J., Torra, V.: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. International Conference on Computational Linguistics (1996)Google Scholar
  10. 10.
    He, Y., Naughton, J.: Anonymization of Set-Valued Data via Top-Down. In: VLDB 2009: Proceedings of the Thirtieth International Conference on Very Large Data Bases. VLDB Endowment, Lyon (2009)Google Scholar
  11. 11.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 279–288 (2002)Google Scholar
  12. 12.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  13. 13.
    Li, T., Li, N.: Towards optimal k-anonymization. Data Knowledge Engineering 65(1), 22–39 (2008)CrossRefGoogle Scholar
  14. 14.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Martínez, S., Sánchez, D., Valls, A.: Ontology-Based Anonymization of Categorical Values. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS (LNAI), vol. 6408, pp. 243–254. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets, U. S. Bureau of the Census (unpublished manuscript) (1996)Google Scholar
  17. 17.
    Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 2–26 (2007)Google Scholar
  18. 18.
    Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Research 20(3), 366–384 (2010)CrossRefGoogle Scholar
  19. 19.
    Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)Google Scholar
  20. 20.
    Sekine, S., Nobata, C.: Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In: Proc. Conference on Language Resources and Evaluation (2004)Google Scholar
  21. 21.
    Tjong Kim Sang, E.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proc. Conference on Natural Language Learning (2002)Google Scholar
  22. 22.
    Torra, V.: Privacy in Data Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. (2010) (invited chapter)Google Scholar
  23. 23.
    Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)MathSciNetGoogle Scholar
  25. 25.
    Torra, V.: Rank swapping for partial orders and continuous variables. In: International Conference on Availability, Reliability and Security, pp. 888–893 (2009)Google Scholar
  26. 26.
    Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, Heidelberg (2001)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Daniel Abril
    • 1
  • Guillermo Navarro-Arribas
    • 2
  • Vicenç Torra
    • 1
  1. 1.Institut d’Investigació en Intel·ligència Artificial (IIIA)Consejo Superior de Investigaciones Científicas (CSIC)Spain
  2. 2.Dep. Enginyeria de la Informació i de les Comunicacions (DEIC)Universitat Autònoma de Barcelona (UAB)Spain

Personalised recommendations