Skip to main content

Keyword Search over Web Documents Based on Earth Mover’s Distance

  • Conference paper
Book cover Web Information Systems Engineering – WISE 2014 (WISE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

  • 1555 Accesses

Abstract

Keyword search is widely used in many practical applications. Unfortunately, most keyword-based search engines compute the similarity distance between two Web documents by only matching the keywords at the same positions in both the query and the document vectors, without considering the impact of the keywords at neighbouring positions. Such approach usually results in incompleteness of search results. In this paper, we exploit the Earth Mover’s Distance (EMD) as a distance function, which is more flexible against other distance functions such as Euclidean distance. To overcome the limitation of EMD-based computation complexity, we use the filtering techniques to minimize the total number of actual EMD computations. We further develop a novel lower bound as a new EMD filter for partial matching technique that is suitable for searching Web documents. The experimental results demonstrate the efficiency of EMD-based search with filtering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Masri, E., Mahmoud, Q.: Investigating web services on the world wide web. In: Proc. of the 17th Intl. World Wide Web Conf., WWW 2008 (2008)

    Google Scholar 

  2. Assent, I., Wenning, A., Seidl, T.: Approximation techniques for indexing the earth mover’s distance in multimedia databases. In: Proc. of the 22nd Intl. Conference on Data Engineering, ICDE 2006, pp. 11–22 (2006)

    Google Scholar 

  3. Dong, X., et al.: Similarity search for web services. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, VLDB 2004 (2004)

    Google Scholar 

  4. Fu, A., Liu, W., Deng, X.: Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans. on Dependable and Secure Computing 3(4), 301–311 (2006)

    Article  Google Scholar 

  5. Fujii, A.: Modeling anchor text and classifying queries to enhance web document retrieval. In: Proc. of the 17th Intl. World Wide Web Conf., WWW 2008 (2008)

    Google Scholar 

  6. Hitchcock, F.: The distribution of a product from several sources to numerous localities. J. Math. Phys. 20(2), 224–230 (1941)

    MathSciNet  Google Scholar 

  7. Karmarkar, N.: A new polynomial-time algorithm for linear programming. In: Proc. of the 16th Annual ACM Symposium on Theory of Computing, pp. 302–311 (1984)

    Google Scholar 

  8. Ling, H., Okada, K.: An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(5), 840–853 (2007)

    Article  Google Scholar 

  9. Ljosa, V., Bhattacharya, A., Singh, A.K.: Indexing spatially sensitive distance measures using multi-resolution lower bounds. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 865–883. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 460–467. IEEE (2009)

    Google Scholar 

  11. Poblete, B., Baeza-Yates, R.: Query-sets: using implicit feedback and query patterns to organize web documents. In: Proc. of the 17th Intl. World Wide Web Conf., WWW 2008 (2008)

    Google Scholar 

  12. Rubner, Y., Tomasi, C., Guibas, L.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)

    Article  MATH  Google Scholar 

  13. Shirdhonkar, S., Jacobs, D.: Approximate earth mover’s distance in linear time. In: Proc. of Intl. Conf. on Computer Vision and Pattern Recognition, CVPR 2008 (2008)

    Google Scholar 

  14. Wan, X.: A novel document similarity measure based on earth mover’s distance. Information Sciences 177(18), 3718–3730 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ma, J., Sheng, Q.Z., Yao, L., Xu, Y., Shemshadi, A. (2014). Keyword Search over Web Documents Based on Earth Mover’s Distance. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics