Cross-Reading by Leveraging a Hybrid Index of Heterogeneous Information
In this paper, we present a novel application named Crossreading, which is derived from user’s reading process. Cross-reading is essentially a searching by document task from large-scale text corpus. The state-of-the-art approaches utilize similarity hashing to address this issue by modeling it as a high-dimensional data similarity search problem. However, most approaches only consider document’s lexical information while ignoring documents semantic information and metadata. Moreover, searching similar hash codes from massive hash codes quickly is still a major bottleneck. To address those problems, we propose a Fast Searching By Document approach, which considers the Cross-reading from the perspective of semantic similarity and time efficiency.
KeywordsTopic-sensitive similarity hash Hybrid index HashCode extension ReRank
Unable to display preview. Download preview PDF.
- 1.C. C. Aggarwal, W. Lin, and P. S. Yu. Searching by corpus with fingerprints. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 2012.Google Scholar
- 2.D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 2003.Google Scholar
- 3.M. Charikar. Similarity estimation techniques from rounding algorithms. In Pro- ceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 2002.Google Scholar
- 4.M. Norouzi, A. Punjani, and D. Fleet. Fast search in hamming space with multiindex hashing. CVPR, 2012.Google Scholar
- 5.S. Sood and D. Loguinov. Probabilistic near-duplicate detection using simhash. In CIKM, pages 1117–1126. ACM, 2011.Google Scholar
- 6.L. Weng, Z. Li, R. Cai, Y. Zhang, Y. Zhou, L. Yang, and L. Zhang. Query by document via a decomposition-based two-level retrieval approach. SIGIR, 2011.Google Scholar
- 7.B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, and J. Luo. Efficient manifold ranking for image retrieval. In SIGIR, pages 525–534. ACM, 2011.Google Scholar
- 8.J. Yang, Q. Li, and Y. Zhuang. Image retrieval and relevance feedback using peer indexing. ICME, 2002.Google Scholar