Knowledge and Information Systems

, Volume 45, Issue 3, pp 705–730

The Mask of ZoRRo: preventing information leakage from documents

  • Prasad M. Deshpande
  • Salil Joshi
  • Prateek Dewan
  • Karin Murthy
  • Mukesh Mohania
  • Sheshnarayan Agrawal
Regular Paper

DOI: 10.1007/s10115-014-0811-6

Cite this article as:
Deshpande, P.M., Joshi, S., Dewan, P. et al. Knowl Inf Syst (2015) 45: 705. doi:10.1007/s10115-014-0811-6

Abstract

In today’s enterprise world, information about business entities such as a customer’s or patient’s name, address, and social security number is often present in both relational databases as well as content repositories. Information about such business entities is generally well protected in databases by well-defined and fine-grained access control. However, current document retrieval systems do not provide user-specific, fine-grained redaction of documents to prevent leakage of information about business entities from documents. Leaving companies with only two choices: either providing complete access to a document, risking potential information leakage, or prohibiting access to the document altogether, accepting potentially negative impact on business processes. In this paper, we present ZoRRo, an add-on for document retrieval systems to dynamically redact sensitive information of business entities referenced in a document based on access control defined for the entities. ZoRRo exploits database systems’ fine-grained, label-based access-control mechanism to identify and redact sensitive information from unstructured text, based on the access privileges of the user viewing it. To make on-the-fly redaction feasible, ZoRRo exploits the concept of \(k\)-safety in combination with Lucene-based indexing and scoring. We demonstrate the efficiency and effectiveness of ZoRRo through a detailed experimental study.

Keywords

Sanitization Redaction Security and protection 

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Prasad M. Deshpande
    • 1
  • Salil Joshi
    • 1
  • Prateek Dewan
    • 3
  • Karin Murthy
    • 1
  • Mukesh Mohania
    • 2
  • Sheshnarayan Agrawal
    • 4
  1. 1.IBM ResearchBangaloreIndia
  2. 2.IBM ResearchDelhiIndia
  3. 3.Indraprastha Institute of Information TechnologyDelhiIndia
  4. 4.IBMBangaloreIndia