Abstract
A lexical signature of a web page consists of several key words carefully chosen from the web page and is used to generate robust hyperlink to find the web page when its URL fails. In this paper, we propose a novel method based on WordRank to compute lexical signatures, which can take into account the semantic relatedness between words and choose the most representative and salient words as lexical signature. Experiments show that the DF-based lexical signatures are best at uniquely identifying web pages, and hybrid lexical signatures are good candidates for retrieving the desired web pages, while WordRank-based lexical signatures are best for retrieving highly relevant web pages when the desired web page cannot be extracted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Lawrence, S., Pennock, D.M., Flake, G., Krovetz, R., Coetzee, F.M., Glover, E., Nielsen, F.A., Kruger, A., Giles, C.L.: Persistence of Web references in scientific research. IEEE Computer 34(2), 26–31 (2001)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Park, S.T., Pennock, D.M., Giles, C.L., Krovetz, R.: Analysis of lexical signatures for finding lost or related documents. In: Proceedings of SIGIR 2002 (2002)
Park, S.T., Pennock, D.M., Giles, C.L., Krovetz, R.: Analysis of lexical signatures for improving information persistence on the World Wide Web. ACM Transactions on Information Systems 22(4), 540–572 (2004)
Patwardhan, S.: Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master’s thesis, Univ. of Minnesota, Duluth (2003)
Phelps, T.A., Wilensky, R.: Robust hyperlinks: cheap, everywhere, now. In: Proceedings of Digital Documents and Electronic Publishing 2000 (DDEP 2000) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, X., Yang, J. (2006). WordRank-Based Lexical Signatures for Finding Lost or Related Web Pages. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_83
Download citation
DOI: https://doi.org/10.1007/11610113_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)