The VLDB Journal

, Volume 23, Issue 2, pp 201–226 | Cite as

An expressive framework and efficient algorithms for the analysis of collaborative tagging

  • Mahashweta DasEmail author
  • Saravanan Thirumuruganathan
  • Sihem Amer-Yahia
  • Gautam Das
  • Cong Yu
Special Issue Paper


The rise of Web 2.0 is signaled by sites such as Flickr,, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we develop a dual mining framework to explore tagging behavior. This framework is centered around two opposing measures, similarity and diversity, applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags or diverse users tagging similar items with diverse tags. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design four sets of efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.


Collaborative tagging Dual mining framework Optimization Algorithm 


  1. 1.
    Amer-Yahia, S., Huang, J., Yu, C.: Building community-centric information exploration applications on social content sites. In: SIGMOD, pp. 947–952 (2009)Google Scholar
  2. 2.
    Amer-Yahia, S., Huang, J., Yu, C.: Jelly: a language for building community-centric information exploration applications. In: ICDE, pp. 1588–1594 (2009)Google Scholar
  3. 3.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466 (2004)Google Scholar
  6. 6.
    Bundschus, M., Yu, S., Tresp, V., Rettinger, A., Dejori, M., Kriegel, H.-P.: Hierarchical bayesian models for collaborative tagging systems. In: ICDM, pp. 728–733 (2009)Google Scholar
  7. 7.
    Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)Google Scholar
  8. 8.
    Chen, Y., Harper, F.M., Konstan, J.A., Li, S.X.: Social comparisons and contributions to online communities: a field experiment on movielens. In: Computational Social Systems and the Internet (2007)Google Scholar
  9. 9.
    Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. In: PVLDB, pp. 1063–1074 (2011)Google Scholar
  10. 10.
    Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: CIKM, pp. 669–678 (2008)Google Scholar
  11. 11.
    Erkut, E., Baptie, T., Hohenbalken, B.V.: The discrete p-maxian location problem. Comput. OR 17(1), 51–61 (1990)CrossRefzbMATHGoogle Scholar
  12. 12.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)Google Scholar
  13. 13.
    Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR, abs/cs/0508082 (2005)Google Scholar
  15. 15.
    Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)CrossRefGoogle Scholar
  16. 16.
    Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econ. Soc. 34 (1966)Google Scholar
  17. 17.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRefGoogle Scholar
  18. 18.
    Guo, Y., Joshi, J.B.D.: Topic-based personalized recommendation for collaborative tagging system. In: HT, pp. 61–66 (2010)Google Scholar
  19. 19.
    Gwizdka, J.: Of kings, traffic signs and flowers: exploring navigation of tagged documents. In: HT, pp. 167–172 (2010)Google Scholar
  20. 20.
    Handler, G., Mirchandani, P.: Location on networks: theory and algorithms. MIT Press series in signal processing, optimization, and, control (1979)Google Scholar
  21. 21.
    Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: CSCW, pp. 241–250 (2000)Google Scholar
  22. 22.
    Heymann, P., Paepcke, A., Garcia-Molina, H.: Tagging human knowledge. In: WSDM, pp. 51–60 (2010)Google Scholar
  23. 23.
    Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR, pp. 531–538 (2008)Google Scholar
  24. 24.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)Google Scholar
  25. 25.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRefGoogle Scholar
  26. 26.
    Johnson, D.S.: The np-completeness column: an ongoing guide. J. Algorithms 8(3), 438–448 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Körner, C., Kern, R., Grahsl, H.-P., Strohmaier, M.: Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In: HT, pp. 157–166 (2010)Google Scholar
  28. 28.
    Liang, H., Xu, Y., Li, Y., Nayak, R., Tao, X.: Connecting users and items with weighted tags for personalized item recommendations. In: HT, pp. 51–60 (2010)Google Scholar
  29. 29.
    Liu, K., Fang, B., Zhang, W.: Speak the same language with your friends: augmenting tag recommenders with social relations. In: HT, pp. 45–50 (2010)Google Scholar
  30. 30.
    Lu, C., Hu, X., Chen, X., Park, J.-R., He, T., Li, Z.: The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 683–692. ACM, New York (2010)Google Scholar
  31. 31.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007)Google Scholar
  32. 32.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  33. 33.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), 331–340 (2009)Google Scholar
  34. 34.
    Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)CrossRefzbMATHGoogle Scholar
  35. 35.
    Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, AAAI (2010)Google Scholar
  36. 36.
    Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: WSDM, pp. 54–63 (2009)Google Scholar
  37. 37.
    Ramakrishnan, R., Chen, B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007)CrossRefMathSciNetGoogle Scholar
  38. 38.
    Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: Heuristics and special cases (extended abstract). In: WADS, pp. 355–366 (1991)Google Scholar
  39. 39.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  40. 40.
    Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional olap data. In: VLDB, pp. 531–540 (2001)Google Scholar
  41. 41.
    Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW, pp. 181–190 (2006)Google Scholar
  42. 42.
    Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)CrossRefGoogle Scholar
  43. 43.
    Vaughan, D.E., Jacobson, S.H., Kaul, H.: Analyzing the performance of simultaneous generalized hill climbing algorithms. Comput. Opt. Appl. 37, 103–119 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  44. 44.
    Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: WSDM, pp. 835–844 (2011)Google Scholar
  45. 45.
    Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD Conference, pp. 617–628 (2007)Google Scholar
  46. 46.
    Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 368–378. ACM, New York (2009)Google Scholar
  47. 47.
    Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: WWW, pp. 715–724 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mahashweta Das
    • 1
    Email author
  • Saravanan Thirumuruganathan
    • 1
  • Sihem Amer-Yahia
    • 3
  • Gautam Das
    • 1
    • 2
  • Cong Yu
    • 4
  1. 1.University of Texas at ArlingtonArlingtonUSA
  2. 2.QCRIDohaQatar
  3. 3.CNRSLIGGrenobleFrance
  4. 4.Google ResearchNew YorkUSA

Personalised recommendations