Abstract
Blog users enjoy tagging for better document organization, while ambiguity in tags leads to inaccuracy in tag-based applications, such as retrieval, visualization or trend discovery. The dynamic nature of tag meanings makes current word sense disambiguation(WSD) methods not applicable. In this paper, we propose an unsupervised method for disambiguating tags in blogs. We first cluster the tags by their context words using Spectral Clustering. Then we compare a tag with these clusters to find the most suitable meaning. We use Normalized Google Distance to measure word similarity, which can be computed by querying search engines, thus reflects the up-to-date meaning of words. No human labeling efforts or dictionary needed in our method. Evaluation using crawled blog data showed a promising micro average precision of 0.842.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yeung, C.A., Gibbins, N., Shadbolt, N.: Tag Meaning Disambiguation through Analysis of Tripartite Structure of Folksonomies. In: Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 3–6 (2007)
Chan, Y.S., Ng, H.T., Zhong, Z.: NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), pp. 253–256 (2007)
Cilibrasi, R., Vitányi, P.: The Google Similarity Distance. IEEE transactions on Knowledge and Data Engineering 19(3) (2007)
Han, H., Zha, H., Giles, L.C.: Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries (2005)
Lin, D., Pantel, P.: Concept Discovery from Text. In: Proceedings of COLING 2002, pp. 577–583 (2002)
Lu, Z., Wang, H., Yao, J., Liu, T., Li, S.: An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation. In: Proceedings of ACL 2007 (2007)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)
Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, To Read. In: Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (2006)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised Acquisition of Predominant Word Senses. Computational Linguistics 33(4), 553–590 (2007)
Navigli, R., Litkowski, K.C., Hargraves, O.: SemEval-2007 Task 07: Coarse-Grained English All-Words Task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), pp. 30–35 (2007)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering Analysis and an algorithm. In: Proceedings of NIPS (2002)
Pereira, F.C.N., Tishby, N., Lee, L.: Distributional Clustering of English Words. In: Proceedings of ACL 1993, pp. 183–193 (1993)
Schütze, H.: Context Space. In: Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 113–120. AAAI Press, Menlo Park (1992)
Bordag, S.: Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation. In: Proceedings of the EACL 2006, pp. 137–144 (2006)
Gaustad, T.: Statistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words. In: Proceedings of ACL 2001 Student Research Workshop (2001)
Widdows, D., Dorow, B.: A Graph Model for Unsupervised Lexical Acquisition. In: Proceedings of COLING 2002, pp. 1093–1099 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Si, X., Sun, M. (2009). Disambiguating Tags in Blogs. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-04208-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04207-2
Online ISBN: 978-3-642-04208-9
eBook Packages: Computer ScienceComputer Science (R0)