Romanization of Thai Proper Names Based on Popularity of Usages

  • Akegapon Tangverapong
  • Atiwong Suchato
  • Proadpran Punyabukkana
Conference paper

DOI: 10.1007/978-3-642-01307-2_56

Volume 5476 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Tangverapong A., Suchato A., Punyabukkana P. (2009) Romanization of Thai Proper Names Based on Popularity of Usages. In: Theeramunkong T., Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg

Abstract

The lack of standards for Romanization of Thai proper names makes searching activity a challenging task. This is particularly important when searching for people-related documents based on orthographic representation of their names using either solely Thai or English alphabets. Romanization based directly on the names’ pronunciations often fails to deliver exact English spellings due to the non-1-to-1 mapping from Thai to English spelling and personal preferences. This paper proposes a Romanization approach where popularity of usages is taken into consideration. Thai names are parsed into sequences of grams, units of syllable-sized or larger governed by pronunciation and spelling constraints in both Thai and English writing systems. A Gram lexicon is constructed from a corpus of more than 130,000 names. Statistical models are trained accordingly based on the Gram lexicon. The proposed method significantly outperformed the current Romanization approach. Approximately 46% to 75% of the correct English spellings are covered when the number of proposed hypotheses increases from 1 to 15.

Keywords

Thai Romanization Statistical Language Processing Machine Translation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Akegapon Tangverapong
    • 1
  • Atiwong Suchato
    • 1
  • Proadpran Punyabukkana
    • 1
  1. 1.Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of EngineeringChulalongkorn UniversityBangkokThailand