Abstract
The computation of the optimal phonetic alignment andthe phonetic similarity between wordsis an important step in many applications in computational phonology,including dialectometry.After discussing several related algorithms,I present a novel approach to the problem that employsa scoring scheme for computing phonetic similarity between phonetic segmentson the basis of multivalued articulatory phonetic features.The scheme incorporates the key concept of feature salience,which is necessary to properly balance the importance of various features.The new algorithm combines several techniquesdeveloped for sequence comparison:an extended set of edit operations,local and semiglobal modes of alignment,and the capability of retrieving a set of near-optimal alignments.On a set of 82 cognate pairs,it performs better than comparable algorithms reported in the literature.
Similar content being viewed by others
References
Connolly J. H. (1997) Quantifying Target-realization Differences. Clinical Linguistics & Phonetics, 11, pp. 267–298.
Covington M. A. (1996) An Algorithm to Align Words for Historical Comparison. Computational Linguistics, 22(4), pp. 481–496.
Covington M. A. (1998) Alignment of Multiple Languages for Historical Comparison. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 275–280.
Dayhoff M. O., Baker W. C., Hunt L. T. (1983) Establishing Homologies in Protein Sequences. Methods in Enzymology, 91, pp. 524–545.
Durbin, R., Eddy S. R., Krogh A., Mitchison G. (1998) Biological Sequence Analysis. Cambridge University Press.
Eppstein D. (1998) Finding the k Shortest Paths. SIAM Journal on Computing, 28(2), pp. 652–673.
Gildea D., Jurafsky D. (1996) Learning Bias and Phonological-Rule Induction. Computational Linguistics, 22(4), pp. 497–530.
Gotoh O. (1982) An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology, 162, pp. 705–708.
Hartman S. L. (1981) A Universal Alphabet for Experiments in Comparative Phonology. Computers and the Humanities, 15, pp. 75–82.
Heeringa W., Nerbonne J., Kleiweg P. (2002) Validating Dialect Comparison Methods. In Gaul W. and Ritter G. (eds.), Classification, Automation, and New Media. Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e. V, pp. 445–452.
Hewson J. (1993) A Computer-Generated Dictionary of Proto-Algonquian. Canadian Museum of Civilization, Hull, Quebec.
Kessler B. (1995) Computational Dialectology in Irish Gaelic. In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics, pp. 60–67.
Kondrak G. (2002) Algorithms for Language Reconstruction. Ph.D. thesis, University of Toronto. Available at http://www.cs.ualberta.ca/∼kondrak.
Ladefoged P. (1975) A Course in Phonetics. Harcourt Brace Jovanovich, New York.
Lowrance R., Wagner R. A. (1975) An Extension of the String-to-String Correction Problem. Journal of the Association for Computing Machinery, 22, pp. 177–183.
Myers E. W. (1995) Seeing Conserved Signals. In Lander E. S. and Waterman M. S. (eds.), Calculating the Secrets of Life, National Academy Press, Washington, DC, pp. 56–89.
Nerbonne J., Heeringa W. (1997) Measuring Dialect Distance Phonetically. In Proceedings of the 3rd Meeting of the ACL Special Interest Group in Computational Phonology.
Oakes M. P. (2000) Computer Estimation of Vocabulary in Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics, 7(3), pp. 233–243.
Oommen B. J. (1995) String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations. Information Sciences, 83, pp. 89–107.
Oommen B. J., Loke R. K. S. (1997) Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions. Pattern Recognition, 30(5), pp. 789–800.
Smith T. F., Waterman M. S. (1981) Identification of Common Molecular Sequences. Journal of Molecular Biology, 147, pp. 195–197.
Somers H. L. (1998) Similarity Metrics for Aligning Children's Articulation Data. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 1227–1231.
Somers H. L. (1999) Aligning Phonetic Segments for Children's Articulation Assessment. Computational Linguistics, 25(2), pp. 267–275.
Wagner R. A., Fischer M. J. (1974) The String-to-String Correction Problem. Journal of the Association for Computing Machinery, 21(1), pp. 168–173.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kondrak, G. Phonetic Alignment and Similarity. Computers and the Humanities 37, 273–291 (2003). https://doi.org/10.1023/A:1025071200644
Issue Date:
DOI: https://doi.org/10.1023/A:1025071200644