Incremental Cosine Computations for Search and Exploration of Tag Spaces
Tags are often used to describe user-generated content on the Web. However, the available Web applications are not incrementally dealing with new tag information, which negatively influences their scalability. Since the cosine similarity between tags represented as co-occurrence vectors is an important aspect of these frameworks, we propose two approaches for an incremental computation of cosine similarities. The first approach recalculates the cosine similarity for new tag pairs and existing tag pairs of which the co-occurrences has changed. The second approach computes the cosine similarity between two tags by reusing, if available, the previous cosine similarity between these tags. Both approaches compute the same cosine values that would have been obtained when a complete recalculation of the cosine similarities is performed. The performed experiments show that our proposed approaches are between 1.2 and 23 times faster than a complete recalculation, depending on the number of co-occurrence changes and new tags.
KeywordsExecution Time Euclidean Norm Cosine Similarity Locality Sensitive Hash Cosine Similarity Measure
Unable to display preview. Download preview PDF.
- 1.Java matrix package, http://math.nist.gov/javanumerics/jama/
- 2.Begelman, G.: Automated Tag Clustering: Improving Search and Exploration in the Tag Space. In: Collaborative Web Tagging Workshop at WWW 2006 (2006), http://www2006.org/workshops/#W06
- 3.Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary Clustering. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 554–560. ACM (2006)Google Scholar
- 4.van Dam, J.W., Vandic, D., Hogenboom, F., Frasincar, F.: Searching and Browsing Tag Spaces Using the Semantic Tag Clustering Search Framework. In: Fourth IEEE International Conference on Semantic Computing (ICSC 2010), pp. 436–439. IEEE Computer Society (2010)Google Scholar
- 6.Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. In: 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 518–529. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
- 7.Jung, S.Y., Kim, T.S.: An Agglomerative Hierarchical Clustering Using Partial Maximum Array and Incremental Similarity Computation Method. In: IEEE International Conference on Data Mining (ICDM 2001), pp. 265–272. IEEE Computer Society (2001)Google Scholar
- 8.Li, X.: Flickr-3.5M Dataset (2009), http://staff.science.uva.nl/~xirong/index.php?n=DataSet.Flickr3m
- 11.Vandic, D., van Dam, J.W., Hogenboom, F., Frasincar, F.: A Semantic Clustering-Based Approach for Searching and Browsing Tag Spaces. In: 26th ACM Symposium on Applied Computing (SAC 2011), pp. 1693–1699. ACM (2011)Google Scholar