Cross-Reading by Leveraging a Hybrid Index of Heterogeneous Information

  • Shansong YangEmail author
  • Weiming Lu
  • Baogang Wei
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)


In this paper, we present a novel application named Crossreading, which is derived from user’s reading process. Cross-reading is essentially a searching by document task from large-scale text corpus. The state-of-the-art approaches utilize similarity hashing to address this issue by modeling it as a high-dimensional data similarity search problem. However, most approaches only consider document’s lexical information while ignoring documents semantic information and metadata. Moreover, searching similar hash codes from massive hash codes quickly is still a major bottleneck. To address those problems, we propose a Fast Searching By Document approach, which considers the Cross-reading from the perspective of semantic similarity and time efficiency.


Topic-sensitive similarity hash Hybrid index HashCode extension ReRank 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    C. C. Aggarwal, W. Lin, and P. S. Yu. Searching by corpus with fingerprints. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 2012.Google Scholar
  2. 2.
    D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 2003.Google Scholar
  3. 3.
    M. Charikar. Similarity estimation techniques from rounding algorithms. In Pro- ceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 2002.Google Scholar
  4. 4.
    M. Norouzi, A. Punjani, and D. Fleet. Fast search in hamming space with multiindex hashing. CVPR, 2012.Google Scholar
  5. 5.
    S. Sood and D. Loguinov. Probabilistic near-duplicate detection using simhash. In CIKM, pages 1117–1126. ACM, 2011.Google Scholar
  6. 6.
    L. Weng, Z. Li, R. Cai, Y. Zhang, Y. Zhou, L. Yang, and L. Zhang. Query by document via a decomposition-based two-level retrieval approach. SIGIR, 2011.Google Scholar
  7. 7.
    B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, and J. Luo. Efficient manifold ranking for image retrieval. In SIGIR, pages 525–534. ACM, 2011.Google Scholar
  8. 8.
    J. Yang, Q. Li, and Y. Zhuang. Image retrieval and relevance feedback using peer indexing. ICME, 2002.Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  1. 1.Zhejiang UniversityHangzhouPeople’s Republic of China

Personalised recommendations