Advertisement

Sentence Boundary Detection in Turkish

  • B. Taner Dinçer
  • Bahar Karaoğlan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3261)

Abstract

In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.

Keywords

False Alarm Letter Sequence Full Stop Test Corpus Sentence Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tür, G.: A Statistical Information Extraction System. PhD Thesis, Bilkent University, Ankara, Turkey (2000)Google Scholar
  2. 2.
    Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: Mitre: Description of the alembic system used for muc-6. In: he Proceedings of the Sixth Message Under-standing Conference (MUC-6), Columbia, Maryland. Morgan Kaufmann, San Francisco (1995)Google Scholar
  3. 3.
    Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth A CL Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C (1997)Google Scholar
  4. 4.
    Riley, M.D.: Some applications of tree-based modeling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339–352. Morgan Kaufman, San Francisco (1989)CrossRefGoogle Scholar
  5. 5.
    Palmer, D.D., Hearst, M.A.: Adaptive multilingual sentence boundary disambiguation. Computational Linguistics (1997)Google Scholar
  6. 6.
    Mikheev, A.: Tagging Sentence Boundaries. Language Technology Group, University of Edinburgh (1997)Google Scholar
  7. 7.
    Oflazer, K., Say, B., Hakkani-Tür, D., Tur, G.: Building a Turkish Treebank. In: Abeillé, A. (ed.) Chapter in Building and Using Parsed Corpora. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  8. 8.
    Ziegenhain, U., Arranz, V., Bisani, M., Bonafonte, A., Castell, C., Conejero, D., Hartikainen, E., Maltese, G., Oflazer, K., Rabie, A., Razumikin, D., Shammass, S., Zong, C.: The LC-STAR: Lexica and Corpora for Speech-to-Speech Translation Technologies. Technical Report, IST-2001-32216, Siemens AG, CT IC 5, München, Germany (2003), http://www.lc-star.com
  9. 9.
    Hakkani-Tür, D.Z., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities (2002)Google Scholar
  10. 10.
    Dalkılıç, M.E., Dalkılıç, G.B.: Türkçe’nin önemli bazı istatistiksel özellikleri. İstatistik Araştırma Dergisi 1(1), 113–130 (2002)Google Scholar
  11. 11.
    Barton, G., Edward: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)Google Scholar
  12. 12.
    Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics. Istanbul, Turkey (1984)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • B. Taner Dinçer
    • 1
  • Bahar Karaoğlan
    • 1
  1. 1.Uluslararası Bilgisayar EnstitüsüEge ÜniversitesiBornova, İzmirTürkiye

Personalised recommendations