Incremental Cosine Computations for Search and Exploration of Tag Spaces

  • Raymond Vermaas
  • Damir Vandic
  • Flavius Frasincar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7447)


Tags are often used to describe user-generated content on the Web. However, the available Web applications are not incrementally dealing with new tag information, which negatively influences their scalability. Since the cosine similarity between tags represented as co-occurrence vectors is an important aspect of these frameworks, we propose two approaches for an incremental computation of cosine similarities. The first approach recalculates the cosine similarity for new tag pairs and existing tag pairs of which the co-occurrences has changed. The second approach computes the cosine similarity between two tags by reusing, if available, the previous cosine similarity between these tags. Both approaches compute the same cosine values that would have been obtained when a complete recalculation of the cosine similarities is performed. The performed experiments show that our proposed approaches are between 1.2 and 23 times faster than a complete recalculation, depending on the number of co-occurrence changes and new tags.


Execution Time Euclidean Norm Cosine Similarity Locality Sensitive Hash Cosine Similarity Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Begelman, G.: Automated Tag Clustering: Improving Search and Exploration in the Tag Space. In: Collaborative Web Tagging Workshop at WWW 2006 (2006),
  3. 3.
    Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary Clustering. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 554–560. ACM (2006)Google Scholar
  4. 4.
    van Dam, J.W., Vandic, D., Hogenboom, F., Frasincar, F.: Searching and Browsing Tag Spaces Using the Semantic Tag Clustering Search Framework. In: Fourth IEEE International Conference on Semantic Computing (ICSC 2010), pp. 436–439. IEEE Computer Society (2010)Google Scholar
  5. 5.
    Friedman, M., Last, M., Makover, Y., Kandel, A.: Anomaly Detection in Web Documents Using Crisp and Fuzzy-based Cosine Clustering Methodology. Information Sciences 177(2), 467–475 (2007)CrossRefGoogle Scholar
  6. 6.
    Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. In: 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 518–529. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  7. 7.
    Jung, S.Y., Kim, T.S.: An Agglomerative Hierarchical Clustering Using Partial Maximum Array and Incremental Similarity Computation Method. In: IEEE International Conference on Data Mining (ICDM 2001), pp. 265–272. IEEE Computer Society (2001)Google Scholar
  8. 8.
  9. 9.
    Radelaar, J., Boor, A.-J., Vandic, D., van Dam, J.-W., Hogenboom, F., Frasincar, F.: Improving the Exploration of Tag Spaces Using Automated Tag Clustering. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds.) ICWE 2011. LNCS, vol. 6757, pp. 274–288. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Specia, L., Motta, E.: Integrating Folksonomies with the Semantic Web. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 624–639. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Vandic, D., van Dam, J.W., Hogenboom, F., Frasincar, F.: A Semantic Clustering-Based Approach for Searching and Browsing Tag Spaces. In: 26th ACM Symposium on Applied Computing (SAC 2011), pp. 1693–1699. ACM (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Raymond Vermaas
    • 1
  • Damir Vandic
    • 1
  • Flavius Frasincar
    • 1
  1. 1.Erasmus University RotterdamRotterdamThe Netherlands

Personalised recommendations