Skip to main content

N-Gram Based Secure Similar Document Detection

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 6818)


Secure similar document detection (SSDD) plays an important role in many applications, such as justifying the need-to-know basis and facilitating communication between government agencies. The SSDD problem considers situations where Alice with a query document wants to find similar information from Bob’s document collection. During this process, the content of the query document is not disclosed to Bob, and Bob’s document collection is not disclosed to Alice. Existing SSDD protocols are developed under the vector space model, which has the advantage of identifying global similar information. To effectively and securely detect similar documents with overlapping text fragments, this paper proposes a novel n-gram based SSDD protocol.


  • privacy
  • security
  • n-gram


  1. Atallah, M., Bykova, M., Li, J., Frikken, K., Topkara, M.: Private collaborative forecasting and benchmarking. In: Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society, WPES 2004, pp. 103–114 (October 2004)

    Google Scholar 

  2. Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On secure scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  3. Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)

    CrossRef  Google Scholar 

  4. Goldreich, O.: Encryption Schemes. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)

    CrossRef  Google Scholar 

  5. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof systems. In: Proceedings of the 17th Annual ACM Symposium on Theory of Computing, Providence, Rhode Island, U.S.A., May 6-8, pp. 291–304 (1985)

    Google Scholar 

  6. Jiang, W., Murugesan, M., Clifton, C., Si, L.: Similar document detection with limited information disclosure. In: Proceedings of the 24th International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7-12 (2008)

    Google Scholar 

  7. Manber, U.: Finding similar files in a large file system. Technical Report TR 93-33, Department of Computer Science, The University of Arizona, Tucson, Arizona (October 1993)

    Google Scholar 

  8. Murugesan, M., Jiang, W., Clifton, C., Si, L., Vaidya, J.: Efficient privacy-preserving similar document detection. The VLDB Journal, January 16 (2010)

    Google Scholar 

  9. Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)

    Google Scholar 

  10. Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: Local algorithms for document fingerprinting. In: Proceedings of the ACM SIGMOD Conference on Management of Data, San Diego, California, United States, June 9-12, pp. 76–85. ACM, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 IFIP International Federation for Information Processing

About this paper

Cite this paper

Jiang, W., Samanthula, B.K. (2011). N-Gram Based Secure Similar Document Detection. In: Li, Y. (eds) Data and Applications Security and Privacy XXV. DBSec 2011. Lecture Notes in Computer Science, vol 6818. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22347-1

  • Online ISBN: 978-3-642-22348-8

  • eBook Packages: Computer ScienceComputer Science (R0)