Efficient Determination of Binary Non-negative Vector Neighbors with Regard to Cosine Similarity

Kryszkiewicz, Marzena

doi:10.1007/978-3-642-31087-4_6

Marzena Kryszkiewicz²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7345))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2654 Accesses
5 Citations

Abstract

The cosine and Tanimoto similarity measures are often and successfully applied in classification, clustering and ranking in chemistry, biology, information retrieval, and text mining. A basic operation in such tasks is identification of neighbors. This operation becomes critical for large high dimensional data. The usage of the triangle inequality property was recently offered to alleviate this problem in the case of applying a distance metric. The triangle inequality holds for the Tanimoto dissimilarity, which functionally determines the Tanimoto similarity, provided the underlying data have a form of vectors with binary non-negative values of attributes. Unfortunately, the triangle inequality holds neither for the cosine similarity measure nor for its corresponding dissimilarity measure. However, in this paper, we propose how to use the triangle inequality property and/or bounds on lengths of neighbor vectors to efficiently determine non-negative binary vectors that are similar with regard to the cosine similarity measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Leo, E.: New relations between similarity measures for vectors based on vector norms. ASIS&T Journal 60(2), 232–239 (2009)
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington, DC, USA, August 21-24, pp. 147–153. AAAI Press (2003)
Google Scholar
Kryszkiewicz, M., Lasek, P.: TI-DBSCAN: Clustering with DBSCAN by means of the triangle inequality. ICS Research Report 3, Institute of Computer Science. Warsaw University of Technology, Warsaw (2010)
Google Scholar
Kryszkiewicz, M., Lasek, P.: TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 60–69. Springer, Heidelberg (2010)
Chapter Google Scholar
Kryszkiewicz, M., Lasek, P.: A Neighborhood-Based Clustering by Means of the Triangle Inequality. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 284–291. Springer, Heidelberg (2010)
Chapter Google Scholar
Kryszkiewicz, M., Lasek, P.: A neighborhood-based clustering by means of the triangle inequality and reference points. ICS Research Report 3, Institute of Computer Science. Warsaw University of Technology, Warsaw (2011)
Google Scholar
Lipkus, A.H.: A proof of the triangle inequality for the Tanimoto dissimilarity. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)
Article MATH Google Scholar
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000), Stanford, California, USA, June 30-July 3, pp. 397–405. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Patra, B.K., Hubballi, N., Biswas, S., Nandi, S.: Distance Based Fast Hierarchical Clustering Method for Large Datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 50–59. Springer, Heidelberg (2010)
Chapter Google Scholar
Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz

Authors

Marzena Kryszkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Software, Dalian University of Technology, Dalian, China
He Jiang
Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, 02125-3393, Boston,, MA, USA
Wei Ding
Department of Computer Science, Texas State University San Marcos, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali
Department of Computer Science, University of Vermont, Burlington, VT, USA
Xindong Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kryszkiewicz, M. (2012). Efficient Determination of Binary Non-negative Vector Neighbors with Regard to Cosine Similarity. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds) Advanced Research in Applied Artificial Intelligence. IEA/AIE 2012. Lecture Notes in Computer Science(), vol 7345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31087-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-31087-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31086-7
Online ISBN: 978-3-642-31087-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics