Skip to main content
Log in

An expressive framework and efficient algorithms for the analysis of collaborative tagging

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we develop a dual mining framework to explore tagging behavior. This framework is centered around two opposing measures, similarity and diversity, applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags or diverse users tagging similar items with diverse tags. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design four sets of efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Since the user and item dimensions share the same characteristics in the dual mining framework, we present here only the user dimension for simplicity.

  2. https://www.opencalais.com/

  3. Two tagging action groups are neighbors if they are directly connected in the lattice.

  4. http://www.grouplens.org/node/73

  5. http://www.imdb.com/interfaces

  6. http://zip4.usps.com

  7. https://www.mturk.com

References

  1. Amer-Yahia, S., Huang, J., Yu, C.: Building community-centric information exploration applications on social content sites. In: SIGMOD, pp. 947–952 (2009)

  2. Amer-Yahia, S., Huang, J., Yu, C.: Jelly: a language for building community-centric information exploration applications. In: ICDE, pp. 1588–1594 (2009)

  3. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466 (2004)

  6. Bundschus, M., Yu, S., Tresp, V., Rettinger, A., Dejori, M., Kriegel, H.-P.: Hierarchical bayesian models for collaborative tagging systems. In: ICDM, pp. 728–733 (2009)

  7. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)

  8. Chen, Y., Harper, F.M., Konstan, J.A., Li, S.X.: Social comparisons and contributions to online communities: a field experiment on movielens. In: Computational Social Systems and the Internet (2007)

  9. Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. In: PVLDB, pp. 1063–1074 (2011)

  10. Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: CIKM, pp. 669–678 (2008)

  11. Erkut, E., Baptie, T., Hohenbalken, B.V.: The discrete p-maxian location problem. Comput. OR 17(1), 51–61 (1990)

    Article  MATH  Google Scholar 

  12. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)

  13. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  14. Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR, abs/cs/0508082 (2005)

  15. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)

    Article  Google Scholar 

  16. Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econ. Soc. 34 (1966)

  17. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)

    Article  Google Scholar 

  18. Guo, Y., Joshi, J.B.D.: Topic-based personalized recommendation for collaborative tagging system. In: HT, pp. 61–66 (2010)

  19. Gwizdka, J.: Of kings, traffic signs and flowers: exploring navigation of tagged documents. In: HT, pp. 167–172 (2010)

  20. Handler, G., Mirchandani, P.: Location on networks: theory and algorithms. MIT Press series in signal processing, optimization, and, control (1979)

  21. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: CSCW, pp. 241–250 (2000)

  22. Heymann, P., Paepcke, A., Garcia-Molina, H.: Tagging human knowledge. In: WSDM, pp. 51–60 (2010)

  23. Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR, pp. 531–538 (2008)

  24. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)

  25. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)

    Article  Google Scholar 

  26. Johnson, D.S.: The np-completeness column: an ongoing guide. J. Algorithms 8(3), 438–448 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  27. Körner, C., Kern, R., Grahsl, H.-P., Strohmaier, M.: Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In: HT, pp. 157–166 (2010)

  28. Liang, H., Xu, Y., Li, Y., Nayak, R., Tao, X.: Connecting users and items with weighted tags for personalized item recommendations. In: HT, pp. 51–60 (2010)

  29. Liu, K., Fang, B., Zhang, W.: Speak the same language with your friends: augmenting tag recommenders with social relations. In: HT, pp. 45–50 (2010)

  30. Lu, C., Hu, X., Chen, X., Park, J.-R., He, T., Li, Z.: The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 683–692. ACM, New York (2010)

  31. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007)

  32. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  33. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), 331–340 (2009)

  34. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)

    Article  MATH  Google Scholar 

  35. Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, AAAI (2010)

  36. Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: WSDM, pp. 54–63 (2009)

  37. Ramakrishnan, R., Chen, B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007)

    Article  MathSciNet  Google Scholar 

  38. Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: Heuristics and special cases (extended abstract). In: WADS, pp. 355–366 (1991)

  39. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  40. Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional olap data. In: VLDB, pp. 531–540 (2001)

  41. Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW, pp. 181–190 (2006)

  42. Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)

    Article  Google Scholar 

  43. Vaughan, D.E., Jacobson, S.H., Kaul, H.: Analyzing the performance of simultaneous generalized hill climbing algorithms. Comput. Opt. Appl. 37, 103–119 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  44. Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: WSDM, pp. 835–844 (2011)

  45. Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD Conference, pp. 617–628 (2007)

  46. Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 368–378. ACM, New York (2009)

  47. Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: WWW, pp. 715–724 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahashweta Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, M., Thirumuruganathan, S., Amer-Yahia, S. et al. An expressive framework and efficient algorithms for the analysis of collaborative tagging. The VLDB Journal 23, 201–226 (2014). https://doi.org/10.1007/s00778-013-0341-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0341-y

Keywords

Navigation