Multimedia Tools and Applications

, Volume 74, Issue 11, pp 3933–3946

Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary

Article

DOI: 10.1007/s11042-013-1805-1

Cite this article as:
Ikegami, Y. & Tsuruta, S. Multimed Tools Appl (2015) 74: 3933. doi:10.1007/s11042-013-1805-1
  • 159 Downloads

Abstract

The rapid growth of globalization requires handling a large number of multilingual documents, where Japanese input co-exist with English and other languages, which use the Roman alphabet. Conventional methods for Japanese input require Japanese users to switch the input mode between Japanese and the Latin alphabet. As current solution, there is a modeless Japanese input method that automatically switches the input mode. However, those need training with a large amount of text data for improving the performance. This paper proposes a hybrid modeless Japanese input method that is based on the non-Japanese word dictionary and n-gram character sequence features to decide whether to convert and switch to Kana input or not. The aim of using the non-Japanese word dictionary is decreasing false positive against non-Japanese language words. This dictionary is composed by text data available on the Web. The n-gram based discriminative model are learned by a Support Vector Machine from a balanced corpus, which contains various domain texts. The evaluation of our method has shown that its statistical accuracy according to F-measure for prediction of non-Kana characters improves 7.7 % compared to n-gram only based method. In addition, the real user test has shown the average value of inputted time was agreeside for our method, against disagree side for conventional Japanese input method that requires switching input mode.

Keywords

Multilingual documents Modeless Japanese input 

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Tokyo Denki UniversityChibaJapan

Personalised recommendations