Abstract
This paper presents a distance function between sets based on an average of distances between their elements. The distance function is a metric if the sets are non-empty finite subsets of a metric space. It includes the Jaccard distance as a special case, and can be generalized by using the power mean so as to also include the Hausdorff metric on finite sets. It can be extended to deal with non-null measurable sets, and applied for measuring distances between fuzzy sets and between probability distributions. These distance functions are useful for measuring similarity between data in computer science and information science. In instructional systems design and information retrieval, for example, they are likely to be useful for analyzing and processing text documents that are modeled as hierarchical collections of sets of terms. A distance measure of learners’ knowledge is also discussed in connection with quantities of information.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Hart, K.P., Nagata, J., Vaughan, J.E. (eds): Encyclopedia of General Topology. Elsevier, Amsterdam (2004)
Nagata, J.: Modern General Topology, 2nd rev. edn. North-Holland, Amsterdam (1985)
Rucklidge W.: Efficient Visual Recognition Using the Hausdorff Distance. Lecture Notes in Computer Science, vol. 1173. Springer, Berlin (1996)
Duda R.O., Hart P.E., Stork D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Manning C.D., Raghavan P., Schütze H.: Introduction to Information Retrieval. Cambridge University Press, London (2008)
Deza M.M., Deza E.: Encyclopedia of Distances. Springer, Berlin (2009)
Bullen P.S.: Handbook of Means and Their Inequalities. Mathematics and Its Applications, vol. 560. Kluwer, Dordrecht (2003)
Searcoid M.O.: Metric Spaces. Springer, Berlin (2007)
Lowen R.: Approach Spaces: The Missing Link in the Topology-Uniformity-Metric Triad. Oxford University Press, NY (1997)
Everitt B.S.: Cluster Analysis. Heinemann, London (1980)
Bukatin M., Kopperman R., Matthews S., Pajoohesh H.: Partial Metric Spaces. Am. Math. Mon. 116(8), 708–718 (2009)
Rubinstein R.Y., Kroese D.P.: Simulation and the Monte Carlo Method, 2nd edn. Wiley, New York (2008)
Zimmermann H.J.: Fuzzy Set Theory and Its Applications, 4th edn. Kluwer, Dordrecht (2001)
IEEE Learning Technology Standard Committee: Learning object metadata. http://ltsc.ieee.org/wg12/
Salton G., Wong A., Yang C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Cover T.M., Hart P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Cortes C., Vapnik V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Boser B.E., Guyon I.M., Vapnik V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press, Pittsburgh (1992)
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Fujita, O. Metrics based on average distance between sets. Japan J. Indust. Appl. Math. 30, 1–19 (2013). https://doi.org/10.1007/s13160-012-0089-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-012-0089-6