Skip to main content

Mutational and Nonmutational Similarity Measures: A Preliminary Examination

  • Chapter
Computer assisted vegetation analysis

Part of the book series: Handbook of vegetation science ((HAVS,volume 11))

Abstract

An examination of many of the indices proposed as numerical measures of pairwise similarity shows that they have strong relationships to string-to-string measures variously known as ‘Levenshtein distance’, ‘longest common subsequence’ or ‘minimal mutation distance’. The variations among coefficients are created in several ways, including changing the set of operations, using a richer structural pattern, modifying weights, limiting the extent of operations and varying the basis for normalisation. In total these measures provide a very flexible means of assessing similarity and can be extended to similarities based on collections of strings. While not denying the interest to the user of other properties, such as metricity or embedding in a euclidean space, examining the coefficients as variations on the Levenshtein theme provides a common basis for their comparison and provides the user with a means of choosing between coefficients in a rational manner. But however interesting this array of coefficients might be, it remains true that only some features of similarity will be captured in a minimal mutational measure. These features may be more or less than are actually required by the user. In this paper I have made a preliminary examination of various measures, some of which are related to the Levenshtein metric, and some of which appear to capture other aspects of similarity (i.e. topological, functional, analogic and/or conceptual). These latter are all measures which I have been unable to relate to the Levenshtein distance, although I have not pursued this very far as yet. All measures were applied to vegetation data, classifying both plots and attributes into a two-way table. The SAHN algorithm has been used for most of the clusterings, so that differences between measures of similarity are the primary cause of differences in results. In a few cases other clustering algorithms have been used and the data has been converted to presence/absence when this was necessary with the particular coefficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Arabie, P. and J.D. Carroll. 1980. MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235.

    Article  Google Scholar 

  • Austin, M.P. and L. Belbin. 1982. A new approach to the species classification problem in floristic analysis. Austral. J. Ecol. 7: 75–89.

    Article  Google Scholar 

  • Bartels, P.H., G.F. Bahr, D.W. Calhoun and G.L. Wied. 1970. Cell recognition by neighbourhood grouping techniques in Ticas. Acta Cytol. 14: 313–324.

    PubMed  CAS  Google Scholar 

  • Bednarek, A.R. and S.M. Ulam. 1979. An integer valued metric for patterns. In: Fundamentals of Computation Theory, pp. 52–57. Academie-Verlag, Berlin.

    Google Scholar 

  • Ben-Bassat, M. and L. Zaidenberg. 1984. Contextual template matching: a distance measure for patterns with hierarchically dependent features. IEEE Trans Patt. Anal. Machine Intell. PAMI-6: 201–211.

    Article  Google Scholar 

  • Blackborn, D.T. 1980. A generalized distance metric for the analysis of variable taxa. Bot. Gaz. 141: 325–335.

    Article  Google Scholar 

  • Bowman, D.M.J.S. and B.A. Wilson. 1986. Wetland vegetation pattern on the Adelaide River flood plain, Northern Territory, Australia. Proc. Roy. Soc. Qld. 97: 69–77.

    Google Scholar 

  • Burkea, J. and C.R. Rao. 1982. Entropy differential metric distance and divergence measures in probability spaces: a unified approach. J. Multivar. Anal. 17: 575–596.

    Google Scholar 

  • Bykat, A. 1979. On polygon similarity. Inform. Process. Lett. 9: 23–25.

    Article  Google Scholar 

  • Culik, K. and D. Wood. 1982. A note on some tree similarity measures. Inform. Process. Lett. 15: 39–42.

    Article  Google Scholar 

  • Cheetham, A.H. and J.E. Hazel. 1969. Binary (presence absence) similarity coefficients. J. Paleont. 43: 1130–1136.

    Google Scholar 

  • Czekanowski, J. 1909. Zur differential Diagnose der Neanderthalgruppe. Korrespbl. dt. Ges. Anthrop. 40: 44–47.

    Google Scholar 

  • Dale, M.B. 1964. The application of multivariate methods to heterogenous data. Ph.D. Thesis, University of Southampton.

    Google Scholar 

  • Dale, M.B. and D.J. Anderson. 1972. Qualitative and quantitative information analysis. J. Ecol. 60: 639–653.

    Article  Google Scholar 

  • Dale, M.B. and D.J. Anderson. 1973. Inosculate analysis of vegetation data. Austral. J. Bot. 21: 253–276.

    Article  Google Scholar 

  • Dale, M.B., H.T. Clifford and D.R. Ross. 1984. Species, equivalence and morphological redescription: a Stradbroke Island vegetation study. In: R.J. Coleman, J. Covacevich and P. Davie (eds.), Focus on Stradbroke: New Information on North Stradbroke Island and surrounding areas, 1974–1984. Boolarong Publ., Brisbane and Stradbroke Island Management Organization, Amity Point.

    Google Scholar 

  • Dale, M.B. and W.T. Williams. 1978. A new method of species reduction for ecological data. Austral. J. Ecol. 3: 1–5.

    Article  Google Scholar 

  • Day, W.H.E. and D.P. Faith. 1986. A model in partial orders for comparing objects by dualistic measures. Math. Bio. Sci. 78: 179–192.

    Article  Google Scholar 

  • Faith, D.P. 1983. Asymmetric binary similarity measures. Oecologia (Berlin) 57: 287–290.

    Article  Google Scholar 

  • Faith, D.P. 1984. Patterns of sensitivity of association measures in numerical taxonomy. Math. Bio. Sci. 69: 199–207.

    Article  Google Scholar 

  • Feoli, E. and M. Lagonegro. 1983. A resemblance function based on probability: applications to field and simulated data. Vegetatio 53: 3–9.

    Article  Google Scholar 

  • Feoli, E., M. Lagonegro and L. Orlóci. 1984. Information Analysis of Vegetation Data. Dr. W. Junk, The Hague, p. 143.

    Book  Google Scholar 

  • Fowlkes, E.B. and C.L. Mallowes. 1983. A method for comparing two hierarchical clusterings. J. Amer. Statist. Assoc. 78: 553–569.

    Article  Google Scholar 

  • Gambarov, G.M., I.D. Mandel and I.A. Rybina. 1980. Some metrics arising in data analysis. Automat. Remote Control 41: 1717–1723.

    Google Scholar 

  • Goodall, D.W. 1964. A probabilistic similarity index. Nature 203: 1098.

    Article  Google Scholar 

  • Gower, J.C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325–338.

    Google Scholar 

  • Gower, J.C. 1986. Metric and euclidean properties of dissimilarity coefficients. J. Classif. 3: 5–48.

    Article  Google Scholar 

  • Hajdu, L.J. 1981. Graphical comparison of resemblance coefficients in phytosociology. Vegetatio 48: 47–59.

    Article  Google Scholar 

  • Hill, M.O., R.G.H. Bunce and M.W. Shaw. 1975. Indicator species analysis, a divisive polythetic method of classification and its application to a survey of native pine-woods in Scotland. J. Ecol. 63: 597–613.

    Article  Google Scholar 

  • Janson, S. and J. Vegelius. 1981. Measures of ecological association. Oecologia 49: 371–376.

    Article  Google Scholar 

  • Juhász-Nagy P. 1984. Spatial dependence in plant populations 2. A family of new models. Acta Bot. Hung. 30: 363–402.

    Google Scholar 

  • Kashyap, R.L. and B.J. Oommen. 1983a. A common basis for similarity measures involving two strings. Int. J. Comput. Math. 13: 17–40.

    Article  Google Scholar 

  • Kashyap, R.L. and B.J. Oommen. 1983b. Similarity measures for sets of strings. Intern. J. Comput. Math. 13: 95–104.

    Article  Google Scholar 

  • Klopman, G. and O.T. Macina. 1985. Use of the computer automated structure evaluation program in determining quantitative structure-activity relationships with hallucinogenic phenylalkylamines. J. Theor. Biol. 113: 637–648.

    Article  PubMed  CAS  Google Scholar 

  • Korhonen, T. 1984. Self-Organization and Associative Memory. Springer-Verlag, Berlin, pp. 125–188.

    Google Scholar 

  • Kullback, S. 1959. Information Theory and Statistics. Wiley, New York.

    Google Scholar 

  • Lambert, J.M. and W.T. Williams. 1962. Multivariate methods in plant ecology IV. Nodal analysis. J. Ecol. 50: 775–802.

    Article  Google Scholar 

  • Lamont, B.B. and K.J. Grant. 1979. A comparison of twenty-one measures of site dissimilarity In: L. Orlóci, C.R. Rao and W.M. Stiteler (eds.), Multivariate Methods in Ecological Work, pp. 101–126. International Coop. Publ. House, Fairland, Maryland.

    Google Scholar 

  • Lance, G.N. and W.T. Williams. 1967. Mixed data classificatory programs Agglomerative systems. Austral. Comput. J. 1: 82–85.

    Google Scholar 

  • Lance, G.N. and W.T. Williams. 1968. Mixed data classificatory programs II divisive systems. Austral. Comput. J. 1: 82–85.

    Google Scholar 

  • Le Quense, W.J. 1974. The uniquely derived character concept and its cladistic application. Syst. Zool. 23: 513–517.

    Article  Google Scholar 

  • Lehmann, D.R. 1972. Judged similarity and brand-switching data as similarity measures. J. Marketing Res. 9: 331–334.

    Article  Google Scholar 

  • Lemone, K.A. 1982. Similarity measures between strings extended to lets of strings. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-4; 345–347.

    Article  Google Scholar 

  • Lerman, I.-C. and P. Peter. 1985. Elaboration et logiciel d’un indice de similarité entre objets d’un type quelconque. Application au probleme de consensus en classification. IRISA, Rennes. Publ. Intern. 262. p. 72.

    Google Scholar 

  • Levandowsky, M. 1972. An ordination of phytoplankton populations of ponds of varying salinity and temperature. Ecology 53: 398–407.

    Article  Google Scholar 

  • Levandowsky, M. and D. Winter. 1971. Distance between sets. Nature 234: 34–35.

    Article  Google Scholar 

  • Lewis, P.A.W., Baxendale, P.B. and J.L. Bennet. 1967. Statistical discrimination of the Synonymy/Antonymy relationship between words. Assoc. Comput. Mach. J. 14: 20–44.

    Article  Google Scholar 

  • Littlem, I.P. and D.R. Ross. 1985. The Levenshtein metric, a new means for soil classification tested by data from a sandpodzol chronosequence and evaluated by discriminant analysis. Austral. J. Soil Res. 23: 115–130.

    Article  Google Scholar 

  • Lu, S.-Y. and K.S. Fu. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems, Man and Cybernetics SMC-8: 381–389.

    Article  Google Scholar 

  • Micalaski, S. and R.E. Stepp. 1985. Automated construction of classifications: conceptual clustering versus numerical texonomy. IEEE Trans. Patt. Anal. Mach Intel. PAMI-5: 396–410.

    Article  Google Scholar 

  • Miyamoto, S. and K. Nakayama. 1986. Similarity measures based on a fuzzy set model and application to hierarchical clustering. IEEE Trans. Systems, Man and Cybernetics. SMC-16: 479–482.

    Article  Google Scholar 

  • Moore, R.K. 1979. A dynamic programming algorithm for the distance between two finite areas. IEEE Trans. Patt. Anal. Machine Intell. PAMI-1: 86–88.

    Article  Google Scholar 

  • Mountfort, M.S. 1971. A test of the difference between two clusters. In: Patil G.P., Pielou, E.C. and W.E. Waters. Statistical Ecology 3. pp. 237–251. Penn. State Univ. Press.

    Google Scholar 

  • Nakamura, K. and S. Iwai. 1982. A representation of analogical inference by fuzzy sets and its application to information retrieval system. In: M.M. Grupta and E. Sanchew (eds.), Fuzzy Information and Decision Processes pp. 373–386. North-Holland.

    Google Scholar 

  • Orlóci, L. 1978. Multivariate Analysis in Vegetation Research. Dr. W. Junk, The Hague, p. 451.

    Google Scholar 

  • Orlóci, L. 1969. Information theory models for hierarchic and non hierarchic classification. In: A.J. Cole (ed.), Numerical Taxonomy, pp. 148–164. Academic Press, London.

    Google Scholar 

  • Rao, C.R. 1982. Diversity and dissimilarity coefficients: unified approach. Theor. Popultn. Biol. 21: 24–43.

    Article  Google Scholar 

  • Rajski, C. 1961. Entropy and metric spaces. In: C. Cheny (ed.) Information Theory. pp. 41–45. Butterworth, London.

    Google Scholar 

  • Sankoff, D. and J.B. Kruskal. 1983. Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison. Addison-Wesley, London.

    Google Scholar 

  • Samdal, C.E.A. 1974. A comparative study of association measures. Psychometrika 39: 165–187.

    Article  Google Scholar 

  • Sattath, S. and A. Tversky. 1977. Additive similarity trees. Psychometrika 42: 319–345.

    Article  Google Scholar 

  • Sepolsky, J.J. 1974. Quantified coefficients of association and measurement of similarity. Math. Geol. 6: 135–152.

    Article  Google Scholar 

  • Tversky, A. 1977. Features of similarity. Psychol. Rev. 84: 327–352.

    Article  Google Scholar 

  • Van Rijsbergen, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29: 481–485.

    Article  Google Scholar 

  • Vašiček, Z. and R. Jicin. 1976. The problem of similarity of shape. Syst. Zool. 21: 91–96.

    Google Scholar 

  • Venot, A., J.F. Leubruchec and J.C. Roucayrol. 1984. A new class of similarity measures for robust image registration. Comput. Vision, Graphics. Image Process. 28: 176–184.

    Article  Google Scholar 

  • Vesely, A. 1981. Logically oriented cluster analysis. Kybernetika 17: 82–92.

    Google Scholar 

  • Wahl, F.M. 1983. A new distance mapping and its use for shape measurement of binary patterns. Comput. Vision, Graph. Image Process. 23: 218–226.

    Article  Google Scholar 

  • Wallbrecher, E. 1976. Ein-Cluster-Vertahren wur richtungsstatistichen Analyse tektonischer Daten. Geol. Rdsch. 67: 840–857.

    Article  Google Scholar 

  • Werman, M., S. Pelg and A. Rosenfeld. 1985. A distance metric for multidimensional histograms. Comput. Vision, Graphics and Image Process. 32: 328–336.

    Article  Google Scholar 

  • Williams, W.T. 1973. Partition of information. Austral. J. Bot. 21: 277–281.

    Article  Google Scholar 

  • Wolds, H. 1986. Similarity indices, sample size and diversity. Oecologia 50: 296–302.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

E. Feoli L. Orlóci

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Dale, M. (1991). Mutational and Nonmutational Similarity Measures: A Preliminary Examination. In: Feoli, E., Orlóci, L. (eds) Computer assisted vegetation analysis. Handbook of vegetation science, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-3418-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-3418-7_12

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-5512-3

  • Online ISBN: 978-94-011-3418-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics