Batch Text Similarity Search with MapReduce
- Cite this paper as:
- Li R., Ju L., Peng Z., Yu Z., Wang C. (2011) Batch Text Similarity Search with MapReduce. In: Du X., Fan W., Wang J., Peng Z., Sharaf M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg
Batch text similarity search aims to find the similar texts according to users’ batch text queries. It is widely used in the real world such as plagiarism check, and attracts more and more attention with the emergence of abundant texts on the web. Existing works, such as FuzzyJoin, can neither support the variation of thresholds, nor support the online batch text similarity search. In this paper, a two-stage algorithm is proposed. It can effectively resolve the problem of batch text similarity search based on inverted index structures. Experimental results on real datasets show the efficiency and expansibility of our method.
KeywordsMapReduce Batch Text Similarity Search
Unable to display preview. Download preview PDF.