Abstract
In many applications, objects are represented by non-negative vectors and cosine similarity is used to measure their similarity. It was shown recently that the determination of the cosine similarity of two vectors can be transformed to the problem of determining the Euclidean distance of normalized forms of these vectors. This equivalence allows applying the triangle inequality to determine cosine similarity neighborhoods efficiently. Alternatively, one may apply the projection onto a dimension to this end. In this paper, we prove that the triangle inequality is guaranteed to be a pruning tool, which is not less efficient than the projection in determining neighborhoods of non-negative vectors.
This work was supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 devoted to the Strategic scientific research and experimental development program: ‘Interdisciplinary System for Interactive Scientific and Scientific-Technical Information’.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proc. of ICML 2003, Washington, pp. 147–153 (2003)
Kryszkiewicz, M.: Efficient Determination of Neighborhoods Defined in Terms of Cosine Similarity Measure. ICS Research Report 4, Institute of Computer Science. Warsaw University of Technology, Warsaw (2011)
Kryszkiewicz, M., Lasek, P.: TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 60–69. Springer, Heidelberg (2010)
Kryszkiewicz, M., Lasek, P.: A Neighborhood-Based Clustering by Means of the Triangle Inequality. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 284–291. Springer, Heidelberg (2010)
Moore, A.W.: The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data. In: Proc. of UAI, Stanford, pp. 397–405 (2000)
Patra, B.K., Hubballi, N., Biswas, S., Nandi, S.: Distance Based Fast Hierarchical Clustering Method for Large Datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 50–59. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kryszkiewicz, M. (2012). The Triangle Inequality versus Projection onto a Dimension in Determining Cosine Similarity Neighborhoods of Non-negative Vectors. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-32115-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)