Skip to main content
Log in

Phonetic Alignment and Similarity

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

The computation of the optimal phonetic alignment andthe phonetic similarity between wordsis an important step in many applications in computational phonology,including dialectometry.After discussing several related algorithms,I present a novel approach to the problem that employsa scoring scheme for computing phonetic similarity between phonetic segmentson the basis of multivalued articulatory phonetic features.The scheme incorporates the key concept of feature salience,which is necessary to properly balance the importance of various features.The new algorithm combines several techniquesdeveloped for sequence comparison:an extended set of edit operations,local and semiglobal modes of alignment,and the capability of retrieving a set of near-optimal alignments.On a set of 82 cognate pairs,it performs better than comparable algorithms reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Connolly J. H. (1997) Quantifying Target-realization Differences. Clinical Linguistics & Phonetics, 11, pp. 267–298.

    Google Scholar 

  • Covington M. A. (1996) An Algorithm to Align Words for Historical Comparison. Computational Linguistics, 22(4), pp. 481–496.

    Google Scholar 

  • Covington M. A. (1998) Alignment of Multiple Languages for Historical Comparison. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 275–280.

  • Dayhoff M. O., Baker W. C., Hunt L. T. (1983) Establishing Homologies in Protein Sequences. Methods in Enzymology, 91, pp. 524–545.

    Google Scholar 

  • Durbin, R., Eddy S. R., Krogh A., Mitchison G. (1998) Biological Sequence Analysis. Cambridge University Press.

  • Eppstein D. (1998) Finding the k Shortest Paths. SIAM Journal on Computing, 28(2), pp. 652–673.

    Google Scholar 

  • Gildea D., Jurafsky D. (1996) Learning Bias and Phonological-Rule Induction. Computational Linguistics, 22(4), pp. 497–530.

    Google Scholar 

  • Gotoh O. (1982) An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology, 162, pp. 705–708.

    Google Scholar 

  • Hartman S. L. (1981) A Universal Alphabet for Experiments in Comparative Phonology. Computers and the Humanities, 15, pp. 75–82.

    Google Scholar 

  • Heeringa W., Nerbonne J., Kleiweg P. (2002) Validating Dialect Comparison Methods. In Gaul W. and Ritter G. (eds.), Classification, Automation, and New Media. Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e. V, pp. 445–452.

  • Hewson J. (1993) A Computer-Generated Dictionary of Proto-Algonquian. Canadian Museum of Civilization, Hull, Quebec.

    Google Scholar 

  • Kessler B. (1995) Computational Dialectology in Irish Gaelic. In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics, pp. 60–67.

  • Kondrak G. (2002) Algorithms for Language Reconstruction. Ph.D. thesis, University of Toronto. Available at http://www.cs.ualberta.ca/∼kondrak.

  • Ladefoged P. (1975) A Course in Phonetics. Harcourt Brace Jovanovich, New York.

    Google Scholar 

  • Lowrance R., Wagner R. A. (1975) An Extension of the String-to-String Correction Problem. Journal of the Association for Computing Machinery, 22, pp. 177–183.

    Google Scholar 

  • Myers E. W. (1995) Seeing Conserved Signals. In Lander E. S. and Waterman M. S. (eds.), Calculating the Secrets of Life, National Academy Press, Washington, DC, pp. 56–89.

    Google Scholar 

  • Nerbonne J., Heeringa W. (1997) Measuring Dialect Distance Phonetically. In Proceedings of the 3rd Meeting of the ACL Special Interest Group in Computational Phonology.

  • Oakes M. P. (2000) Computer Estimation of Vocabulary in Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics, 7(3), pp. 233–243.

    Google Scholar 

  • Oommen B. J. (1995) String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations. Information Sciences, 83, pp. 89–107.

    Google Scholar 

  • Oommen B. J., Loke R. K. S. (1997) Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions. Pattern Recognition, 30(5), pp. 789–800.

    Google Scholar 

  • Smith T. F., Waterman M. S. (1981) Identification of Common Molecular Sequences. Journal of Molecular Biology, 147, pp. 195–197.

    Google Scholar 

  • Somers H. L. (1998) Similarity Metrics for Aligning Children's Articulation Data. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 1227–1231.

  • Somers H. L. (1999) Aligning Phonetic Segments for Children's Articulation Assessment. Computational Linguistics, 25(2), pp. 267–275.

    Google Scholar 

  • Wagner R. A., Fischer M. J. (1974) The String-to-String Correction Problem. Journal of the Association for Computing Machinery, 21(1), pp. 168–173.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kondrak, G. Phonetic Alignment and Similarity. Computers and the Humanities 37, 273–291 (2003). https://doi.org/10.1023/A:1025071200644

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025071200644

Navigation