Scaling Pair-Wise Similarity-Based Algorithms in Tagging Spaces

  • Damir Vandic
  • Flavius Frasincar
  • Frederik Hogenboom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7387)


Users of Web tag spaces, e.g., Flickr, find it difficult to get adequate search results due to syntactic and semantic tag variations. In most approaches that address this problem, the cosine similarity between tags plays a major role. However, the use of this similarity introduces a scalability problem as the number of similarities that need to be computed grows quadratically with the number of tags. In this paper, we propose a novel algorithm that filters insignificant cosine similarities in linear time complexity with respect to the number of tags. Our approach shows a significant reduction in the number of calculations, which makes it possible to process larger tag data sets than ever before. To evaluate our approach, we used a data set containing 51 million pictures and 112 million tag annotations from Flickr.


Input Vector Parameter Combination Cosine Similarity Scalability Issue Inverted Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alted, F., Vilata, I., et al.: PyTables: Hierarchical Datasets in Python (2012),
  2. 2.
    Bayardo, R.J., Ma, Y., Srikant, R.: Scaling Up All Pairs Similarity Search. In: 16th International Conference on World Wide Web (WWW 2007), pp. 131–140. ACM Press (2007)Google Scholar
  3. 3.
    Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD Skills: New Analysis Practices for Big Data. VLDB Endowment 2(2), 1481–1492 (2009)Google Scholar
  4. 4.
    Görlitz, O., Sizov, S., Staab, S.: Pints: Peer-to-peer Infrastructure for Tagging Systems. In: 7th International Conference on Peer-to-Peer Systems (IPTPS 2008), pp. 19–19 (2008)Google Scholar
  5. 5.
    Halpin, H., Robu, V., Shepherd, H.: The Complex Dynamics of Collaborative Tagging. In: 16th International Conference on World Wide Web (WWW 2007), pp. 211–220 (2007)Google Scholar
  6. 6.
    Indyk, P., Motwani, R.: Approximate Nearest Neighbors. In: 13th Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613. ACM Press (1998)Google Scholar
  7. 7.
    Li, X., Guo, L., Zhao, Y.E.: Tag-Based Social Interest Discovery. In: 17th International Conference on World Wide Web (WWW 2008), pp. 675–684. ACM Press (2008)Google Scholar
  8. 8.
    Oliphant, T.E.: Python for Scientific Computing. Science & Engineering 9(3), 10–20 (2007)Google Scholar
  9. 9.
    Radelaar, J., Boor, A.-J., Vandic, D., van Dam, J.-W., Hogenboom, F., Frasincar, F.: Improving the Exploration of Tag Spaces Using Automated Tag Clustering. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds.) ICWE 2011. LNCS, vol. 6757, pp. 274–288. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Specia, L., Motta, E.: Integrating Folksonomies with the Semantic Web. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 624–639. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    TechRadar: Flickr reaches 6 billion photo uploads (2012),

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Damir Vandic
    • 1
  • Flavius Frasincar
    • 1
  • Frederik Hogenboom
    • 1
  1. 1.Erasmus University RotterdamRotterdamThe Netherlands

Personalised recommendations