Abstract
Secure similar document detection (SSDD) plays an important role in many applications, such as justifying the need-to-know basis and facilitating communication between government agencies. The SSDD problem considers situations where Alice with a query document wants to find similar information from Bob’s document collection. During this process, the content of the query document is not disclosed to Bob, and Bob’s document collection is not disclosed to Alice. Existing SSDD protocols are developed under the vector space model, which has the advantage of identifying global similar information. To effectively and securely detect similar documents with overlapping text fragments, this paper proposes a novel n-gram based SSDD protocol.
Chapter PDF
References
Atallah, M., Bykova, M., Li, J., Frikken, K., Topkara, M.: Private collaborative forecasting and benchmarking. In: Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society, WPES 2004, pp. 103–114 (October 2004)
Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On secure scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)
Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)
Goldreich, O.: Encryption Schemes. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)
Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof systems. In: Proceedings of the 17th Annual ACM Symposium on Theory of Computing, Providence, Rhode Island, U.S.A., May 6-8, pp. 291–304 (1985)
Jiang, W., Murugesan, M., Clifton, C., Si, L.: Similar document detection with limited information disclosure. In: Proceedings of the 24th International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7-12 (2008)
Manber, U.: Finding similar files in a large file system. Technical Report TR 93-33, Department of Computer Science, The University of Arizona, Tucson, Arizona (October 1993)
Murugesan, M., Jiang, W., Clifton, C., Si, L., Vaidya, J.: Efficient privacy-preserving similar document detection. The VLDB Journal, January 16 (2010)
Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)
Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: Local algorithms for document fingerprinting. In: Proceedings of the ACM SIGMOD Conference on Management of Data, San Diego, California, United States, June 9-12, pp. 76–85. ACM, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Jiang, W., Samanthula, B.K. (2011). N-Gram Based Secure Similar Document Detection. In: Li, Y. (eds) Data and Applications Security and Privacy XXV. DBSec 2011. Lecture Notes in Computer Science, vol 6818. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22348-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-22348-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22347-1
Online ISBN: 978-3-642-22348-8
eBook Packages: Computer ScienceComputer Science (R0)