Abstract
In this paper a novel word-segmentation algorithm is presented to delimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable literatures on Chinese segmentation, they cannot satisfy particular requirements in this system. The novel word-segmentation algorithm is based on the database semantics, namely Semantic Conceptual Model (SCM) for specific domain knowledge. Based on SCM, the segmenter labels the database semantics to words directly, which eases the disambiguation and translation (from natural language to database query) in NChiql.
Similar content being viewed by others
References
Copestake A, Jones K S. Natural language interfaces to databases.The Knowledge Engineering Review, 1990, 5(4): 225–249.
Sproat Ret al. A stochastic finite-state word-segmentation algorithm for chinese. Available at URL: http://xxx.lanl.gov/abs/cmp-lg
Yu S W. The ambiguity in natural language and the strategy in machine language.Journal of Chinese Information, 1989, 3(2).
Feng Z W. Computer Processing to Natural Languages. Shanghai Foreign Education Press, 1996.
Meng X Fet al. Investigation and evaluation of Chinese natural language queries. Technical Report, Renmin University of China. 1998.
Cercone N, McCalla G. Accessing knowledge through natural language.Advances in Computers, 1986, 25(1): 1–99.
Meng X F, Zhou Y, Wang S. Domain knowledge extracting in a Chinese natural language interface to databases: NChiql InProc. PAKDD’99, Spinger-Verlag, Beijing, April 1999.
Meng X F, Wang S. Researches on the Chinese restricted natural language interface to databases. InProc. the Fifth International Conference for Young Computer Scientists, ICYCS’99, Nanjing, August 1999.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Natural Science Foundation of China under grant No.69633020.
MENG Xiaofeng is an associate professor of School of Information, Renmin University of China. He obtained the M.S. degree from Renmin University of China in 1993 and Ph.D. degree from the Institute of Computing Technology, the Chinese Academy of Sciences in 1999. His research interests include database systems, natural language interface, mobile and embedded software, and Web application.
LIU Shuang is a Ph.D. candidate at Institute of Computing Technology, the Chinese Academy of Sciences. She obtained the M.S. degree from Renmin University of China in 1999. Her research interests include database systems.
WANG Shan is a professor and dean of School of Information, Renmin University of China. She obtained the M.S. degree from Renmin University of China in 1982. Her research interests include database systems, datawarehouse & data mining, and information systems.
Rights and permissions
About this article
Cite this article
Meng, X., Liu, S. & Wang, S. Word segmentation based on database semantics in NChiql. J. Comput. Sci. & Technol. 15, 346–354 (2000). https://doi.org/10.1007/BF02948870
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948870