Disambiguating Tags in Blogs

  • Xiance Si
  • Maosong Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5729)


Blog users enjoy tagging for better document organization, while ambiguity in tags leads to inaccuracy in tag-based applications, such as retrieval, visualization or trend discovery. The dynamic nature of tag meanings makes current word sense disambiguation(WSD) methods not applicable. In this paper, we propose an unsupervised method for disambiguating tags in blogs. We first cluster the tags by their context words using Spectral Clustering. Then we compare a tag with these clusters to find the most suitable meaning. We use Normalized Google Distance to measure word similarity, which can be computed by querying search engines, thus reflects the up-to-date meaning of words. No human labeling efforts or dictionary needed in our method. Evaluation using crawled blog data showed a promising micro average precision of 0.842.


Spectral Cluster Word Sense Disambiguation Unsupervised Method Context Word Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yeung, C.A., Gibbins, N., Shadbolt, N.: Tag Meaning Disambiguation through Analysis of Tripartite Structure of Folksonomies. In: Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 3–6 (2007)Google Scholar
  2. 2.
    Chan, Y.S., Ng, H.T., Zhong, Z.: NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), pp. 253–256 (2007)Google Scholar
  3. 3.
    Cilibrasi, R., Vitányi, P.: The Google Similarity Distance. IEEE transactions on Knowledge and Data Engineering 19(3) (2007)Google Scholar
  4. 4.
    Han, H., Zha, H., Giles, L.C.: Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries (2005)Google Scholar
  5. 5.
    Lin, D., Pantel, P.: Concept Discovery from Text. In: Proceedings of COLING 2002, pp. 577–583 (2002)Google Scholar
  6. 6.
    Lu, Z., Wang, H., Yao, J., Liu, T., Li, S.: An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation. In: Proceedings of ACL 2007 (2007)Google Scholar
  7. 7.
    Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)Google Scholar
  8. 8.
    Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, To Read. In: Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (2006)Google Scholar
  9. 9.
    McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Unsupervised Acquisition of Predominant Word Senses. Computational Linguistics 33(4), 553–590 (2007)CrossRefGoogle Scholar
  10. 10.
    Navigli, R., Litkowski, K.C., Hargraves, O.: SemEval-2007 Task 07: Coarse-Grained English All-Words Task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), pp. 30–35 (2007)Google Scholar
  11. 11.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering Analysis and an algorithm. In: Proceedings of NIPS (2002)Google Scholar
  12. 12.
    Pereira, F.C.N., Tishby, N., Lee, L.: Distributional Clustering of English Words. In: Proceedings of ACL 1993, pp. 183–193 (1993)Google Scholar
  13. 13.
    Schütze, H.: Context Space. In: Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 113–120. AAAI Press, Menlo Park (1992)Google Scholar
  14. 14.
    Bordag, S.: Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation. In: Proceedings of the EACL 2006, pp. 137–144 (2006)Google Scholar
  15. 15.
    Gaustad, T.: Statistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words. In: Proceedings of ACL 2001 Student Research Workshop (2001)Google Scholar
  16. 16.
    Widdows, D., Dorow, B.: A Graph Model for Unsupervised Lexical Acquisition. In: Proceedings of COLING 2002, pp. 1093–1099 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xiance Si
    • 1
  • Maosong Sun
    • 1
  1. 1.Tsinghua UniversityBeijingChina

Personalised recommendations