Language Resources and Evaluation

, Volume 42, Issue 2, pp 137–149 | Cite as

Automatic building of an ontology on the basis of text corpora in Thai

  • Aurawan ImsombutEmail author
  • Asanee Kawtrakul


This paper presents a methodology for automatic learning of ontologies from Thai text corpora, by extraction of terms and relations. A shallow parser is used to chunk texts on which we identify taxonomic relations with the help of cues: lexico-syntactic patterns and item lists. The main advantage of the approach is that it simplify the task of concept and relation labeling since cues help for identifying the ontological concept and hinting their relation. However, these techniques pose certain problems, i.e. cue word ambiguity, item list identification, and numerous candidate terms. We also propose the methodology to solve these problems by using lexicon and co-occurrence features and weighting them with information gain. The precision, recall and F-measure of the system are 0.74, 0.78 and 0.76, respectively.


Thai ontology learning Lexico-syntactic patterns Taxonomic list 



The authors would like to present deeply thanks to Michael Zock and Mathieu Lafourcade for their patience to review this work. The work described in this paper has been supported by the grant of NECTEC No. NT-B-22-14-12-46-06. It was also funded in part by the KURDI; Kasetsart University Research and Development Institute.


  1. Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2000). Enriching very large ontologies using the WWW. In Proceedings of the Workshop on Ontology Construction of the European Conference of AI (ECAI-00).Google Scholar
  2. Ayan, N. F. (1999). Using information gain as feature weight. In Eighth Turkish Symposium on Artificial Intelligence and Neural Networks.Google Scholar
  3. Bisson, G., Nedellec, C., & Cañamero, D. (2000). Designing clustering methods for ontology building – The Mo’K Workbench. In: Proceedings of the Workshop on Ontology Learning, 14th European Conference on Artificial Intelligence, ECAI’00, Berlin, Germany.Google Scholar
  4. Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In Proceedings of the IJCNLP’ 2004, Hainan Island, China.Google Scholar
  5. Church, K. W., & Hanks, P. P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Meeting of the ACL (pp. 76–83). Vancouver.Google Scholar
  6. Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74. Cambrigde: The MIT Press.Google Scholar
  7. Girju, R., Badulescu, A., & Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In The Proceedings of the Human Language Technology Conference, Edmonton.Google Scholar
  8. Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics.Google Scholar
  9. Kawtrakul, A., Suktarachan, A., & Imsombut, A. (2004). Automatic Thai ontology construction and maintenance system. In Workshop on OntoLex LREC Conference, Lisbon.Google Scholar
  10. Maedche, A., & Staab, S. (2001) Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 72–79.CrossRefGoogle Scholar
  11. Navigli, R., et al. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(1), 22–31.CrossRefGoogle Scholar
  12. Nedellec, C. (2000). Corpus-based learning of semantic relations by the ILP system, ASIUM. In Learning Language in Logic, Lecture Notes in Computer Science (Vol. 1925, pp. 259–278). Springer-Verlag.Google Scholar
  13. Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06). Sydney.Google Scholar
  14. Pengphon, N., Kawtrakul, A., & Suktarachan, M. (2002). Word formation approach to noun phrase analysis for Thai. In Proceedings of SNLP2002, Thailand.Google Scholar
  15. Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston: Addison-Wesley Longman Publishing Co, Inc.Google Scholar
  16. Shinzato, K., & Torisawa, K. (2004). Acquiring hyponymy relations from web documents. In Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Boston.Google Scholar
  17. Sudprasert, S., & Kawtrakul, A. (2003). Thai word segmentation based on global and local unsupervised learning. In Proceedings of NCSEC2003, Chonburi, Thailand.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.NAiST LaboratoryKasetsart UniversityBangkokThailand

Personalised recommendations