Mutational and Nonmutational Similarity Measures: A Preliminary Examination

Dale, M.

doi:10.1007/978-94-011-3418-7_12

M. Dale¹

Part of the book series: Handbook of vegetation science ((HAVS,volume 11))

187 Accesses
1 Citations

Abstract

An examination of many of the indices proposed as numerical measures of pairwise similarity shows that they have strong relationships to string-to-string measures variously known as ‘Levenshtein distance’, ‘longest common subsequence’ or ‘minimal mutation distance’. The variations among coefficients are created in several ways, including changing the set of operations, using a richer structural pattern, modifying weights, limiting the extent of operations and varying the basis for normalisation. In total these measures provide a very flexible means of assessing similarity and can be extended to similarities based on collections of strings. While not denying the interest to the user of other properties, such as metricity or embedding in a euclidean space, examining the coefficients as variations on the Levenshtein theme provides a common basis for their comparison and provides the user with a means of choosing between coefficients in a rational manner. But however interesting this array of coefficients might be, it remains true that only some features of similarity will be captured in a minimal mutational measure. These features may be more or less than are actually required by the user. In this paper I have made a preliminary examination of various measures, some of which are related to the Levenshtein metric, and some of which appear to capture other aspects of similarity (i.e. topological, functional, analogic and/or conceptual). These latter are all measures which I have been unable to relate to the Levenshtein distance, although I have not pursued this very far as yet. All measures were applied to vegetation data, classifying both plots and attributes into a two-way table. The SAHN algorithm has been used for most of the clusterings, so that differences between measures of similarity are the primary cause of differences in results. In a few cases other clustering algorithms have been used and the data has been converted to presence/absence when this was necessary with the particular coefficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arabie, P. and J.D. Carroll. 1980. MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235.
Article Google Scholar
Austin, M.P. and L. Belbin. 1982. A new approach to the species classification problem in floristic analysis. Austral. J. Ecol. 7: 75–89.
Article Google Scholar
Bartels, P.H., G.F. Bahr, D.W. Calhoun and G.L. Wied. 1970. Cell recognition by neighbourhood grouping techniques in Ticas. Acta Cytol. 14: 313–324.
PubMed CAS Google Scholar
Bednarek, A.R. and S.M. Ulam. 1979. An integer valued metric for patterns. In: Fundamentals of Computation Theory, pp. 52–57. Academie-Verlag, Berlin.
Google Scholar
Ben-Bassat, M. and L. Zaidenberg. 1984. Contextual template matching: a distance measure for patterns with hierarchically dependent features. IEEE Trans Patt. Anal. Machine Intell. PAMI-6: 201–211.
Article Google Scholar
Blackborn, D.T. 1980. A generalized distance metric for the analysis of variable taxa. Bot. Gaz. 141: 325–335.
Article Google Scholar
Bowman, D.M.J.S. and B.A. Wilson. 1986. Wetland vegetation pattern on the Adelaide River flood plain, Northern Territory, Australia. Proc. Roy. Soc. Qld. 97: 69–77.
Google Scholar
Burkea, J. and C.R. Rao. 1982. Entropy differential metric distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 17: 575–596.
Google Scholar
Bykat, A. 1979. On polygon similarity. Inform. Process. Lett. 9: 23–25.
Article Google Scholar
Culik, K. and D. Wood. 1982. A note on some tree similarity measures. Inform. Process. Lett. 15: 39–42.
Article Google Scholar
Cheetham, A.H. and J.E. Hazel. 1969. Binary (presence absence) similarity coefficients. J. Paleont. 43: 1130–1136.
Google Scholar
Czekanowski, J. 1909. Zur differential Diagnose der Neanderthalgruppe. Korrespbl. dt. Ges. Anthrop. 40: 44–47.
Google Scholar
Dale, M.B. 1964. The application of multivariate methods to heterogenous data. Ph.D. Thesis, University of Southampton.
Google Scholar
Dale, M.B. and D.J. Anderson. 1972. Qualitative and quantitative information analysis. J. Ecol. 60: 639–653.
Article Google Scholar
Dale, M.B. and D.J. Anderson. 1973. Inosculate analysis of vegetation data. Austral. J. Bot. 21: 253–276.
Article Google Scholar
Dale, M.B., H.T. Clifford and D.R. Ross. 1984. Species, equivalence and morphological redescription: a Stradbroke Island vegetation study. In: R.J. Coleman, J. Covacevich and P. Davie (eds.), Focus on Stradbroke: New Information on North Stradbroke Island and surrounding areas, 1974–1984. Boolarong Publ., Brisbane and Stradbroke Island Management Organization, Amity Point.
Google Scholar
Dale, M.B. and W.T. Williams. 1978. A new method of species reduction for ecological data. Austral. J. Ecol. 3: 1–5.
Article Google Scholar
Day, W.H.E. and D.P. Faith. 1986. A model in partial orders for comparing objects by dualistic measures. Math. Bio. Sci. 78: 179–192.
Article Google Scholar
Faith, D.P. 1983. Asymmetric binary similarity measures. Oecologia (Berlin) 57: 287–290.
Article Google Scholar
Faith, D.P. 1984. Patterns of sensitivity of association measures in numerical taxonomy. Math. Bio. Sci. 69: 199–207.
Article Google Scholar
Feoli, E. and M. Lagonegro. 1983. A resemblance function based on probability: applications to field and simulated data. Vegetatio 53: 3–9.
Article Google Scholar
Feoli, E., M. Lagonegro and L. Orlóci. 1984. Information Analysis of Vegetation Data. Dr. W. Junk, The Hague, p. 143.
Book Google Scholar
Fowlkes, E.B. and C.L. Mallowes. 1983. A method for comparing two hierarchical clusterings. J. Amer. Statist. Assoc. 78: 553–569.
Article Google Scholar
Gambarov, G.M., I.D. Mandel and I.A. Rybina. 1980. Some metrics arising in data analysis. Automat. Remote Control 41: 1717–1723.
Google Scholar
Goodall, D.W. 1964. A probabilistic similarity index. Nature 203: 1098.
Article Google Scholar
Gower, J.C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325–338.
Google Scholar
Gower, J.C. 1986. Metric and euclidean properties of dissimilarity coefficients. J. Classif. 3: 5–48.
Article Google Scholar
Hajdu, L.J. 1981. Graphical comparison of resemblance coefficients in phytosociology. Vegetatio 48: 47–59.
Article Google Scholar
Hill, M.O., R.G.H. Bunce and M.W. Shaw. 1975. Indicator species analysis, a divisive polythetic method of classification and its application to a survey of native pine-woods in Scotland. J. Ecol. 63: 597–613.
Article Google Scholar
Janson, S. and J. Vegelius. 1981. Measures of ecological association. Oecologia 49: 371–376.
Article Google Scholar
Juhász-Nagy P. 1984. Spatial dependence in plant populations 2. A family of new models. Acta Bot. Hung. 30: 363–402.
Google Scholar
Kashyap, R.L. and B.J. Oommen. 1983a. A common basis for similarity measures involving two strings. Int. J. Comput. Math. 13: 17–40.
Article Google Scholar
Kashyap, R.L. and B.J. Oommen. 1983b. Similarity measures for sets of strings. Intern. J. Comput. Math. 13: 95–104.
Article Google Scholar
Klopman, G. and O.T. Macina. 1985. Use of the computer automated structure evaluation program in determining quantitative structure-activity relationships with hallucinogenic phenylalkylamines. J. Theor. Biol. 113: 637–648.
Article PubMed CAS Google Scholar
Korhonen, T. 1984. Self-Organization and Associative Memory. Springer-Verlag, Berlin, pp. 125–188.
Google Scholar
Kullback, S. 1959. Information Theory and Statistics. Wiley, New York.
Google Scholar
Lambert, J.M. and W.T. Williams. 1962. Multivariate methods in plant ecology IV. Nodal analysis. J. Ecol. 50: 775–802.
Article Google Scholar
Lamont, B.B. and K.J. Grant. 1979. A comparison of twenty-one measures of site dissimilarity In: L. Orlóci, C.R. Rao and W.M. Stiteler (eds.), Multivariate Methods in Ecological Work, pp. 101–126. International Coop. Publ. House, Fairland, Maryland.
Google Scholar
Lance, G.N. and W.T. Williams. 1967. Mixed data classificatory programs Agglomerative systems. Austral. Comput. J. 1: 82–85.
Google Scholar
Lance, G.N. and W.T. Williams. 1968. Mixed data classificatory programs II divisive systems. Austral. Comput. J. 1: 82–85.
Google Scholar
Le Quense, W.J. 1974. The uniquely derived character concept and its cladistic application. Syst. Zool. 23: 513–517.
Article Google Scholar
Lehmann, D.R. 1972. Judged similarity and brand-switching data as similarity measures. J. Marketing Res. 9: 331–334.
Article Google Scholar
Lemone, K.A. 1982. Similarity measures between strings extended to lets of strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-4; 345–347.
Article Google Scholar
Lerman, I.-C. and P. Peter. 1985. Elaboration et logiciel d’un indice de similarité entre objets d’un type quelconque. Application au probleme de consensus en classification. IRISA, Rennes. Publ. Intern. 262. p. 72.
Google Scholar
Levandowsky, M. 1972. An ordination of phytoplankton populations of ponds of varying salinity and temperature. Ecology 53: 398–407.
Article Google Scholar
Levandowsky, M. and D. Winter. 1971. Distance between sets. Nature 234: 34–35.
Article Google Scholar
Lewis, P.A.W., Baxendale, P.B. and J.L. Bennet. 1967. Statistical discrimination of the Synonymy/Antonymy relationship between words. Assoc. Comput. Mach. J. 14: 20–44.
Article Google Scholar
Littlem, I.P. and D.R. Ross. 1985. The Levenshtein metric, a new means for soil classification tested by data from a sandpodzol chronosequence and evaluated by discriminant analysis. Austral. J. Soil Res. 23: 115–130.
Article Google Scholar
Lu, S.-Y. and K.S. Fu. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems, Man and Cybernetics SMC-8: 381–389.
Article Google Scholar
Micalaski, S. and R.E. Stepp. 1985. Automated construction of classifications: conceptual clustering versus numerical texonomy. IEEE Trans. Patt. Anal. Mach Intel. PAMI-5: 396–410.
Article Google Scholar
Miyamoto, S. and K. Nakayama. 1986. Similarity measures based on a fuzzy set model and application to hierarchical clustering. IEEE Trans. Systems, Man and Cybernetics. SMC-16: 479–482.
Article Google Scholar
Moore, R.K. 1979. A dynamic programming algorithm for the distance between two finite areas. IEEE Trans. Patt. Anal. Machine Intell. PAMI-1: 86–88.
Article Google Scholar
Mountfort, M.S. 1971. A test of the difference between two clusters. In: Patil G.P., Pielou, E.C. and W.E. Waters. Statistical Ecology 3. pp. 237–251. Penn. State Univ. Press.
Google Scholar
Nakamura, K. and S. Iwai. 1982. A representation of analogical inference by fuzzy sets and its application to information retrieval system. In: M.M. Grupta and E. Sanchew (eds.), Fuzzy Information and Decision Processes pp. 373–386. North-Holland.
Google Scholar
Orlóci, L. 1978. Multivariate Analysis in Vegetation Research. Dr. W. Junk, The Hague, p. 451.
Google Scholar
Orlóci, L. 1969. Information theory models for hierarchic and non hierarchic classification. In: A.J. Cole (ed.), Numerical Taxonomy, pp. 148–164. Academic Press, London.
Google Scholar
Rao, C.R. 1982. Diversity and dissimilarity coefficients: unified approach. Theor. Popultn. Biol. 21: 24–43.
Article Google Scholar
Rajski, C. 1961. Entropy and metric spaces. In: C. Cheny (ed.) Information Theory. pp. 41–45. Butterworth, London.
Google Scholar
Sankoff, D. and J.B. Kruskal. 1983. Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison. Addison-Wesley, London.
Google Scholar
Samdal, C.E.A. 1974. A comparative study of association measures. Psychometrika 39: 165–187.
Article Google Scholar
Sattath, S. and A. Tversky. 1977. Additive similarity trees. Psychometrika 42: 319–345.
Article Google Scholar
Sepolsky, J.J. 1974. Quantified coefficients of association and measurement of similarity. Math. Geol. 6: 135–152.
Article Google Scholar
Tversky, A. 1977. Features of similarity. Psychol. Rev. 84: 327–352.
Article Google Scholar
Van Rijsbergen, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29: 481–485.
Article Google Scholar
Vašiček, Z. and R. Jicin. 1976. The problem of similarity of shape. Syst. Zool. 21: 91–96.
Google Scholar
Venot, A., J.F. Leubruchec and J.C. Roucayrol. 1984. A new class of similarity measures for robust image registration. Comput. Vision, Graphics. Image Process. 28: 176–184.
Article Google Scholar
Vesely, A. 1981. Logically oriented cluster analysis. Kybernetika 17: 82–92.
Google Scholar
Wahl, F.M. 1983. A new distance mapping and its use for shape measurement of binary patterns. Comput. Vision, Graph. Image Process. 23: 218–226.
Article Google Scholar
Wallbrecher, E. 1976. Ein-Cluster-Vertahren wur richtungsstatistichen Analyse tektonischer Daten. Geol. Rdsch. 67: 840–857.
Article Google Scholar
Werman, M., S. Pelg and A. Rosenfeld. 1985. A distance metric for multidimensional histograms. Comput. Vision, Graphics and Image Process. 32: 328–336.
Article Google Scholar
Williams, W.T. 1973. Partition of information. Austral. J. Bot. 21: 277–281.
Article Google Scholar
Wolds, H. 1986. Similarity indices, sample size and diversity. Oecologia 50: 296–302.
Article Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO Division of Tropical Crops and Pastures, Carmody Rd., St. Lucia, 4067, Australia
M. Dale

Authors

M. Dale
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

E. Feoli L. Orlóci

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dale, M. (1991). Mutational and Nonmutational Similarity Measures: A Preliminary Examination. In: Feoli, E., Orlóci, L. (eds) Computer assisted vegetation analysis. Handbook of vegetation science, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-3418-7_12

Download citation

DOI: https://doi.org/10.1007/978-94-011-3418-7_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-5512-3
Online ISBN: 978-94-011-3418-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics