NLP for Shallow Question Answering of Legal Documents Using Graphs

  • Alfredo Monroy
  • Hiram Calvo
  • Alexander Gelbukh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)

Abstract

Previous work has shown that modeling relationships between articles of a regulation as vertices of a graph network works twice as better than traditional information retrieval systems for returning articles relevant to the question. In this work we experiment by using natural language techniques such as lemmatizing and using manual and automatic thesauri for improving question based document retrieval. For the construction of the graph, we follow the approach of representing the set of all the articles as a graph; the question is split in two parts, and each of them is added as part of the graph. Then several paths are constructed from part A of the question to part B, so that the shortest path contains the relevant articles to the question. We evaluate our method comparing the answers given by a traditional information retrieval system—vector space model adjusted for article retrieval, instead of document retrieval—and the answers to 21 questions given manually by the general lawyer of the National Polytechnic Institute, based on 25 different regulations (academy regulation, scholarships regulation, postgraduate studies regulation, etc.); with the answer of our system based on the same set of regulations. We found that lemmatizing increases performance in around 10%, while the use of thesaurus has a low impact.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hirschman, L., Gaizauskas, R.: Natural Language Question Answering: The View From Here. Natural Language Engineering 7(4), 275–300 (2001)CrossRefGoogle Scholar
  2. 2.
    Hoojung, C., Song, Y.-I., Han, K.-S., Yoon, D.-S., Lee, J.-Y., Rim, H.-C.: A Practical QA System in Restricted Domains. In: Workshop on Question Answering in Restricted Domains. 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 39–45 (2004)Google Scholar
  3. 3.
    Erik, T., Sang, K., Bouma, G., de Rijke, M.: Developing Offline Strategies for Answering Medical Questions. In: Workshop on Question Answering in Restricted Domains. 20th National Conference on Artificial Intelligence (AAAI 2005), Pittsburgh, PA, pp. 41–45 (2005)Google Scholar
  4. 4.
    Fabio, R., Dowdall, J., Schneider, G.: Answering questions in the genomics domain. In: Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, Barcelona, Spain, pp. 46–53 (2004)Google Scholar
  5. 5.
    Niu, Y., Graeme, H.: Analysis of Semantic Classes in Medical Text for Question Answering. In: Workshop on Question Answering in Restricted Domains. 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 54–61 (2004)Google Scholar
  6. 6.
    Zhuo, Z., Da Sylva, L., Davidson, C., Lizarralde, G., Nie, J.-Y.: Domain-Specific QA for the Construction Sector. In: Workshop of IR4QA: Information Retrieval for Question Answering, 27th ACM-SIGIR, Sheffield (July 2004)Google Scholar
  7. 7.
    Paulo, Q., Rodrigues, I.P.: A question-answering system for Portuguese juridical documents. In: Proceedings of the 10th international conference on Artificial intelligence and law. International Conference on Artificial Intelligence and Law, Bologna, Italy, pp. 256–257 (2005)Google Scholar
  8. 8.
    Paulo, Q., Rodrigues, I.P.: A collaborative legal information retrieval system using dynamic logic programming. In: Proceedings of the 7th International Conference on Artificial Intelligence and Law, Oslo, Norway, pp. 190–191 (1999)Google Scholar
  9. 9.
    Doan-Nguyen, H., Kosseim, L.: The problem of precision in restricted-domain question-answering. Some proposed methods of improvement. In: Workshop on Question Answering in Restricted Domains. 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 8–15 (2004)Google Scholar
  10. 10.
    Diekema Anne, R., Yilmazel, O., Liddy, E.D.: Evaluation of restricted domain question-answering systems. In: Workshop on Question Answering in Restricted Domains. 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 2–7 (2004)Google Scholar
  11. 11.
    Rada, M.: Random Walks on Text Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Manning Christopher, D., Schutze, H.: Foundations of Statistical Natural Language processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  13. 13.
    Salton, G., Wong, A., Yang, C.S.: A vector Space Model for Automatic Indexing. Information Retrieval and Language Processing (1975)Google Scholar
  14. 14.
    Dan, M., Clark, C., Harabagiu, S., Maiorano, S.: COGEX: a logic prover for question answering. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, vol. 1, pp. 87–93 (2003)Google Scholar
  15. 15.
    Rila, M., Tokunaga, T., Tanaka, H.: Query expansion using heterogeneus thesauri. Information Processing and Management 36, 361–378 (2000)CrossRefGoogle Scholar
  16. 16.
    Pizzato, L.A.S., de Lima, V.L.S.: Evaluation of a thesaurus-based query expansion technique. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 251–258. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  17. 17.
    Calvo, H., Gelbukh, A., Kilgarriff, A.: Distributional thesaurus versus wordNet: A comparison of backoff techniques for unsupervised PP attachment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 177–188. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)Google Scholar
  19. 19.
    Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)Google Scholar
  20. 20.
    Alfredo, M., Calvo, H., Gelbukh, A.: Using Graphs for Shallow Question Answering on Legal Documents. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS, vol. 5317, pp. 165–173. Springer, Heidelberg (2008)Google Scholar
  21. 21.
    Lázaro Carreter, F. (ed.): Diccionario Anaya de la Lengua, Vox (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alfredo Monroy
    • 1
  • Hiram Calvo
    • 1
    • 2
  • Alexander Gelbukh
    • 1
  1. 1.Center for Computing ResearchNational Polytechnic InstituteMexico CityMexico
  2. 2.Nara Institute of Science and Technology, Takayama, IkomaNaraJapan

Personalised recommendations