The Triangle Inequality versus Projection onto a Dimension in Determining Cosine Similarity Neighborhoods of Non-negative Vectors

  • Marzena Kryszkiewicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7413)

Abstract

In many applications, objects are represented by non-negative vectors and cosine similarity is used to measure their similarity. It was shown recently that the determination of the cosine similarity of two vectors can be transformed to the problem of determining the Euclidean distance of normalized forms of these vectors. This equivalence allows applying the triangle inequality to determine cosine similarity neighborhoods efficiently. Alternatively, one may apply the projection onto a dimension to this end. In this paper, we prove that the triangle inequality is guaranteed to be a pruning tool, which is not less efficient than the projection in determining neighborhoods of non-negative vectors.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proc. of ICML 2003, Washington, pp. 147–153 (2003)Google Scholar
  2. 2.
    Kryszkiewicz, M.: Efficient Determination of Neighborhoods Defined in Terms of Cosine Similarity Measure. ICS Research Report 4, Institute of Computer Science. Warsaw University of Technology, Warsaw (2011)Google Scholar
  3. 3.
    Kryszkiewicz, M., Lasek, P.: TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 60–69. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Kryszkiewicz, M., Lasek, P.: A Neighborhood-Based Clustering by Means of the Triangle Inequality. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 284–291. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Moore, A.W.: The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data. In: Proc. of UAI, Stanford, pp. 397–405 (2000)Google Scholar
  6. 6.
    Patra, B.K., Hubballi, N., Biswas, S., Nandi, S.: Distance Based Fast Hierarchical Clustering Method for Large Datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 50–59. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marzena Kryszkiewicz
    • 1
  1. 1.Institute of Computer ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations