Abstract
Users of Web tag spaces, e.g., Flickr, find it difficult to get adequate search results due to syntactic and semantic tag variations. In most approaches that address this problem, the cosine similarity between tags plays a major role. However, the use of this similarity introduces a scalability problem as the number of similarities that need to be computed grows quadratically with the number of tags. In this paper, we propose a novel algorithm that filters insignificant cosine similarities in linear time complexity with respect to the number of tags. Our approach shows a significant reduction in the number of calculations, which makes it possible to process larger tag data sets than ever before. To evaluate our approach, we used a data set containing 51 million pictures and 112 million tag annotations from Flickr.
Keywords
- Input Vector
- Parameter Combination
- Cosine Similarity
- Scalability Issue
- Inverted Index
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download conference paper PDF
References
Alted, F., Vilata, I., et al.: PyTables: Hierarchical Datasets in Python (2012), http://www.pytables.org
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling Up All Pairs Similarity Search. In: 16th International Conference on World Wide Web (WWW 2007), pp. 131–140. ACM Press (2007)
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD Skills: New Analysis Practices for Big Data. VLDB Endowment 2(2), 1481–1492 (2009)
Görlitz, O., Sizov, S., Staab, S.: Pints: Peer-to-peer Infrastructure for Tagging Systems. In: 7th International Conference on Peer-to-Peer Systems (IPTPS 2008), pp. 19–19 (2008)
Halpin, H., Robu, V., Shepherd, H.: The Complex Dynamics of Collaborative Tagging. In: 16th International Conference on World Wide Web (WWW 2007), pp. 211–220 (2007)
Indyk, P., Motwani, R.: Approximate Nearest Neighbors. In: 13th Annual ACM Symposium on Theory of Computing (STOC 1998), pp. 604–613. ACM Press (1998)
Li, X., Guo, L., Zhao, Y.E.: Tag-Based Social Interest Discovery. In: 17th International Conference on World Wide Web (WWW 2008), pp. 675–684. ACM Press (2008)
Oliphant, T.E.: Python for Scientific Computing. Science & Engineering 9(3), 10–20 (2007)
Radelaar, J., Boor, A.-J., Vandic, D., van Dam, J.-W., Hogenboom, F., Frasincar, F.: Improving the Exploration of Tag Spaces Using Automated Tag Clustering. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds.) ICWE 2011. LNCS, vol. 6757, pp. 274–288. Springer, Heidelberg (2011)
Specia, L., Motta, E.: Integrating Folksonomies with the Semantic Web. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 624–639. Springer, Heidelberg (2007)
TechRadar: Flickr reaches 6 billion photo uploads (2012), http://www.techradar.com/news/internet/web/flickr-reaches-6-billion-photo-uploads-988294
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vandic, D., Frasincar, F., Hogenboom, F. (2012). Scaling Pair-Wise Similarity-Based Algorithms in Tagging Spaces. In: Brambilla, M., Tokuda, T., Tolksdorf, R. (eds) Web Engineering. ICWE 2012. Lecture Notes in Computer Science, vol 7387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31753-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-31753-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31752-1
Online ISBN: 978-3-642-31753-8
eBook Packages: Computer ScienceComputer Science (R0)
