Shedding Light on a Troublesome Issue in NLIDBS

Word Economy in Query Formulation
  • Rodolfo Pazos
  • René Santaolalaya S.
  • Juan C. Rojas P.
  • Joaquín Pérez O.
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5246)


A natural language interface to databases (NLIDB) without help mechanisms that permit clarifying queries is prone to incorrect query translation. In this paper we draw attention to a problem in NLIDBs that has been overlooked and has not been dealt with systematically: word economy; i.e., the omission of words when expressing a query in natural language (NL). In order to get an idea of the magnitude of this problem, we conducted experiments on EnglishQuery when applied to a corpora of economized-wording queries. The results show that the percentage of correctly answered queries is 18%, which is substantially lower than those obtained with corpora of regular queries (53%–83%). In this paper we describe a typification of problems found in economized-wording queries, which has been used to implement domain-independent dialog processes for an NLIDB in Spanish. The incorporation of dialog processes in an NLIDB permits users to clarify queries in NL, thus improving the percentage of correctly answered queries. This paper presents the tests of a dialog manager that deals with four types of query problems, which permits to improve the percentage of correctly answered queries from 60% to 91%. Due to the generality of our approach, we claim that it can be applied to other domain-dependent or domain-independent NLIDBs, as well as other NLs such as English, French, Italian, etc.


Natural Language (NL) Natural language interface to databases (NLIDB) Dialog manager 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pazos, R., Pérez, J., González, B., Gelbukh, A., Sidorov, G., Rodríguez, M.: A Domain Independent Natural Language Interface to Databases Capable of Processing Complex Queries. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 833–842. Springer, Heidelberg (2005)Google Scholar
  2. 2.
    VILIB Virtual Library (1999),
  3. 3.
    Chae, J., Lee, S.: Frame-based Decomposition Method for Korean Language Query Processing. Computer Processing of Oriental Languages (1998)Google Scholar
  4. 4.
    Popescu, A.: Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability, University of Washington (2004)Google Scholar
  5. 5.
    Waltz, D.: An English Language Question Answering System for a Large Relational Database. Communications of the ACM (1978)Google Scholar
  6. 6.
    Microsoft TechNet., ch. 32- English Query Best Practices (2008),
  7. 7.
    Popescu, A., Etzioni, O., Kautz, H.: Towards a Theory of Natural Language Interfaces to Databases. In: Proc. IUI-2003, Miami, USA (2003)Google Scholar
  8. 8.
    ELF Software, ELF Software Documentation Series (2002),
  9. 9.
  10. 10.
    Androutsopoulus, I., Ritchie, G., Thanish, P.: MASQUE/SQL, An Efficient and Portables Language Query Interface for Relational Databases, Department of Artificial Intelligence, University of Edinburgh (1993)Google Scholar
  11. 11.
    Minock, M.: A STEP Towards Realizing Codd’s Vision of Rendezvous with the Casual User. In: Proc. 33rd International Conference on Very Large Databases (VLDB-2007), Demonstration Session, Vienna, Austria (2007)Google Scholar
  12. 12.
    Minock, M.: Natural Language Access to Relational Databases through STEP. Technical report, Department of Computer Science, Umea University (2004)Google Scholar
  13. 13.
    Bagnasco, C., Bresciani, P., Magnini, B., Strapparava, C.: Natural Language Interpretation for Public Administrations Database Querying in the TAMIC Demonstrator. In: The Proc. Second International Workshop on Applications of Natural Language to Information Systems (1996)Google Scholar
  14. 14.
    Chu, W., Yang, H., Chiang, K., Minock, M., Chow, G., Larson, C.: Cobase – A Scalable and Extensible Cooperative Information System. Journal of Intelligent Information System 6, 253–259 (1996)Google Scholar
  15. 15.
    Alshawi, H., Carter, D., Crouch, R., Pulman, S.: CLARE: A Contextual Reasoning and Cooperative Response Framework for the Core Language Engine. Technical report CRC-028 (1994)Google Scholar
  16. 16.
    Binot, J., Debille, L., Sedlock, D., Vandecapelle, B.: Natural Language Interfaces: A New Philosophy, SunExpert, Magazine (1991)Google Scholar
  17. 17.
    Boldasov, M., Sokolova, G.E.: QGen – Generation Module for the Register Restricted InBASE System. In: Computational Linguistics and Intelligent Text Processing, 4th International Conference, vol. 2588, pp. 465–476 (2003)Google Scholar
  18. 18.
    Ott, N.: Aspects of the Automatic Generation of SQL Statements in a Natural Language Query Interface. Information Systems 17(2), 147–159 (1992)CrossRefGoogle Scholar
  19. 19.
    Bates, M.: Rapid Porting of the Parlance Natural Language Interface. In: Proc. Workshop on Speech and Natural Language, pp. 83–88 (1989)Google Scholar
  20. 20.
    Manferdelli, J.: Natural Language Inc., Sun Technology (1989)Google Scholar
  21. 21.
    Naslund, A., Olofsson, P.: Authoring Semantic Grammars in a Web Environment, MS thesis in Computer science, Umea University, Sweden (2007)Google Scholar
  22. 22.
    Gonzalez, B.J.J.: Traductor de Lenguaje Natural Espańol a SQL para un Sistema de Consultas a Bases de Datos. Ph.D. dissertation, Depto. de Ciencias Computacionales, Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, Mexico (2005)Google Scholar
  23. 23.
    DARPA Air Travel Information System (ATIS0),
  24. 24.
    Bhootra, R.: Natural Language Interfaces: Comparing English Language Front End and English Query. MS thesis, Virginia Commonwealth University (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Rodolfo Pazos
    • 1
    • 2
  • René Santaolalaya S.
    • 1
  • Juan C. Rojas P.
    • 1
  • Joaquín Pérez O.
    • 1
  1. 1.Departamento de Ciencias ComputacionalesCentro Nacional de Investigación y Desarrollo Tecnológico (CENIDET)Cuernavaca 62490México
  2. 2.Instituto Tecnológico de Ciudad Madero Cd. MaderoMéxico

Personalised recommendations