Computers and the Humanities

, Volume 37, Issue 3, pp 273–291 | Cite as

Phonetic Alignment and Similarity

  • Grzegorz Kondrak


The computation of the optimal phonetic alignment andthe phonetic similarity between wordsis an important step in many applications in computational phonology,including dialectometry.After discussing several related algorithms,I present a novel approach to the problem that employsa scoring scheme for computing phonetic similarity between phonetic segmentson the basis of multivalued articulatory phonetic features.The scheme incorporates the key concept of feature salience,which is necessary to properly balance the importance of various features.The new algorithm combines several techniquesdeveloped for sequence comparison:an extended set of edit operations,local and semiglobal modes of alignment,and the capability of retrieving a set of near-optimal alignments.On a set of 82 cognate pairs,it performs better than comparable algorithms reported in the literature.

cognates dialects features phonetic alignment phonetic similarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Connolly J. H. (1997) Quantifying Target-realization Differences. Clinical Linguistics & Phonetics, 11, pp. 267–298.Google Scholar
  2. Covington M. A. (1996) An Algorithm to Align Words for Historical Comparison. Computational Linguistics, 22(4), pp. 481–496.Google Scholar
  3. Covington M. A. (1998) Alignment of Multiple Languages for Historical Comparison. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 275–280.Google Scholar
  4. Dayhoff M. O., Baker W. C., Hunt L. T. (1983) Establishing Homologies in Protein Sequences. Methods in Enzymology, 91, pp. 524–545.Google Scholar
  5. Durbin, R., Eddy S. R., Krogh A., Mitchison G. (1998) Biological Sequence Analysis. Cambridge University Press.Google Scholar
  6. Eppstein D. (1998) Finding the k Shortest Paths. SIAM Journal on Computing, 28(2), pp. 652–673.Google Scholar
  7. Gildea D., Jurafsky D. (1996) Learning Bias and Phonological-Rule Induction. Computational Linguistics, 22(4), pp. 497–530.Google Scholar
  8. Gotoh O. (1982) An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology, 162, pp. 705–708.Google Scholar
  9. Hartman S. L. (1981) A Universal Alphabet for Experiments in Comparative Phonology. Computers and the Humanities, 15, pp. 75–82.Google Scholar
  10. Heeringa W., Nerbonne J., Kleiweg P. (2002) Validating Dialect Comparison Methods. In Gaul W. and Ritter G. (eds.), Classification, Automation, and New Media. Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e. V, pp. 445–452.Google Scholar
  11. Hewson J. (1993) A Computer-Generated Dictionary of Proto-Algonquian. Canadian Museum of Civilization, Hull, Quebec.Google Scholar
  12. Kessler B. (1995) Computational Dialectology in Irish Gaelic. In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics, pp. 60–67.Google Scholar
  13. Kondrak G. (2002) Algorithms for Language Reconstruction. Ph.D. thesis, University of Toronto. Available at∼kondrak.Google Scholar
  14. Ladefoged P. (1975) A Course in Phonetics. Harcourt Brace Jovanovich, New York.Google Scholar
  15. Lowrance R., Wagner R. A. (1975) An Extension of the String-to-String Correction Problem. Journal of the Association for Computing Machinery, 22, pp. 177–183.Google Scholar
  16. Myers E. W. (1995) Seeing Conserved Signals. In Lander E. S. and Waterman M. S. (eds.), Calculating the Secrets of Life, National Academy Press, Washington, DC, pp. 56–89.Google Scholar
  17. Nerbonne J., Heeringa W. (1997) Measuring Dialect Distance Phonetically. In Proceedings of the 3rd Meeting of the ACL Special Interest Group in Computational Phonology.Google Scholar
  18. Oakes M. P. (2000) Computer Estimation of Vocabulary in Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics, 7(3), pp. 233–243.Google Scholar
  19. Oommen B. J. (1995) String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations. Information Sciences, 83, pp. 89–107.Google Scholar
  20. Oommen B. J., Loke R. K. S. (1997) Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions. Pattern Recognition, 30(5), pp. 789–800.Google Scholar
  21. Smith T. F., Waterman M. S. (1981) Identification of Common Molecular Sequences. Journal of Molecular Biology, 147, pp. 195–197.Google Scholar
  22. Somers H. L. (1998) Similarity Metrics for Aligning Children's Articulation Data. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 1227–1231.Google Scholar
  23. Somers H. L. (1999) Aligning Phonetic Segments for Children's Articulation Assessment. Computational Linguistics, 25(2), pp. 267–275.Google Scholar
  24. Wagner R. A., Fischer M. J. (1974) The String-to-String Correction Problem. Journal of the Association for Computing Machinery, 21(1), pp. 168–173.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Grzegorz Kondrak
    • 1
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations