Skip to main content

Robust Hash Algorithms for Text

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNSC,volume 8099)


We discuss and compare robust hash functions for natural text with respect to their performance regarding text modification and natural language watermark embedding. Our goal is to identify algorithms suitable for efficiently identifying watermarked copies of eBooks before watermark detection.


  • Robust Hashing
  • Text Watermarking
  • Evaluation


  1. Hoffelder, N.: AAP Reports US eBook Sales Up 46% in 2012, Now Well Over a Fifth of US Book Market

    Google Scholar 

  2. Wolf, M.: E-book market forecast to hit $5.2B as the book industry burns

    Google Scholar 

  3. Wauters, R.: Total Mobile eBook Sales Forecast To Reach $10B By 2016; Now Close To 1 Million Books In Kindle Store

    Google Scholar 

  4. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3(S) (2006)

    Google Scholar 

  5. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic Clustering of the Web. In: 6th International World Wide Web Conference, pp. 393–404 (April 1997)

    Google Scholar 

  6. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proc. 34th Annual Symposium on Theory of Computing, STOC 2002, pp. 380–388 (2002)

    Google Scholar 

  7. Manku, G., Jain, A., Sarma, A.: Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  8. Gabrilovich, E.: Wikipedia Preprocessor (WikiPrep),

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 IFIP International Federation for Information Processing

About this paper

Cite this paper

Steinebach, M., Klöckner, P., Reimers, N., Wienand, D., Wolf, P. (2013). Robust Hash Algorithms for Text. In: De Decker, B., Dittmann, J., Kraetzer, C., Vielhauer, C. (eds) Communications and Multimedia Security. CMS 2013. Lecture Notes in Computer Science, vol 8099. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40778-9

  • Online ISBN: 978-3-642-40779-6

  • eBook Packages: Computer ScienceComputer Science (R0)