K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics
This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an “orthographic” form, which is a quasi-phonological representation, a “conservative” form, which maintains all known contrasts, and a “modern” form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.
KeywordsKorean Phonological neighborhood density Lexicon Lexical database
|Funder Name||Grant Number||Funding Note|
|Hankuk University of Foreign Studies|