Heuristic Algorithm for Extraction of Facts Using Relational Model and Syntactic Data

  • Grigori Sidorov
  • Juve Andrea Herrera-de-la-Cruz
  • Sofía N. Galicia-Haro
  • Juan Pablo Posadas-Durán
  • Liliana Chanona-Hernandez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7094)


From semantic point of view, information is usually contained in small units, called facts that are usually smaller than sentences. Identification of these facts in a text is not a trivial task. We present a heuristic algorithm for extraction of facts from sentences using a simple representation based on a relational data model. We focus our study on texts that contain a lot of facts by their nature: structured textbooks. The algorithm is based on data obtained by a syntactic analyzer. The obtained facts can be useful for information retrieval tasks, automatic summarization, etc. Our experiments are conducted for Spanish language. We obtained better results than the similar methods.


fact extraction learning by reading syntactic analysis relational data model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barker, K., Agashe, B., Chaw, S.-Y., Fan, J., Friedland, N., Glass, M., Hobbs, J., Hovy, E., Israel, D., Kim, D.S., Mulkar-Mehta, R., Patwardhan, S., Porter, B., Tecuci, D., Yeh, P.: Learning by reading: a prototype system, performance baseline and lessons learned. In: AAAI 2007: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 280–286. AAAI Press (2007)Google Scholar
  2. 2.
    Calvo, H., Gelbukh, A.: Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora. Computación y Sistemas 12(1), 128–150 (2008)Google Scholar
  3. 3.
    Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 164–175. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Hovy, E., Kwon, N., Zhou, L.: A semi-automatic evaluation scheme: automated nuggetization for manual annotation. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2007, pp. 217–220 (2007)Google Scholar
  5. 5.
    Mann, G.: Multi Document Statistical Fact Extraction and Fusion. PhD Thesis, John Hopkins University, Maryland, 238 (2006)Google Scholar
  6. 6.
    Martínez-Santiago, F., García-Cumbreras, M.: Identifiación de formas lógicas en el caso del español: propuesta de un modelo basado en reglas y aprendizaje automático. In: Procesamiento del Lenguaje Natural, pp. 245–252 (2005)Google Scholar
  7. 7.
    Montes-y-Gómez, M., Gelbukh, A., López-López, A.: Mining the news: trends, associations, and deviations. Computación y Sistemas 5(1), 14–24 (2001)zbMATHGoogle Scholar
  8. 8.
    Moreno, T., Moreno, G.: Lengua y Literatura 2, cuarta edn, Editorial Santillana, México (1991)Google Scholar
  9. 9.
    Mulkar, R., Hobbs, J., Hovy, E., Chalupsky, H., Lin, C.: Learning by reading: Two experiments. In: Proceedings of the IJCAI Workshop on Knowledge and Reasoning for Answering Questions, KRAQ (2007)Google Scholar
  10. 10.
    Nieto-López, J., Betancourt-Suárez, M., Nieto-López, R.: Historia 1, tercera edn, Sistemas Técnicos de Edición. México (1994)Google Scholar
  11. 11.
    Pasca, M., Lin, D., Bigham, J., Lifchits, A., Jain, A.: Names and Similarities on the Web: Fact Extraction in the Fast Lane. In: Proc. ACL 2006 (2006)Google Scholar
  12. 12.
    Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), ELRA, La Valletta, Malta (May 2010)Google Scholar
  13. 13.
    Rincón, A., Rocha, A.: ABC de Física. Tercer curso, sexta edn, Editorial Herrero, México (1984)Google Scholar
  14. 14.
    Stephen, A., Jon, P.: Dependency based logical form transformations. In: Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2006)Google Scholar
  15. 15.
    Zhao, S., Betz, J.: Corroborate and Learn Facts from the Web (2006),

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Grigori Sidorov
    • 1
  • Juve Andrea Herrera-de-la-Cruz
    • 1
  • Sofía N. Galicia-Haro
    • 2
  • Juan Pablo Posadas-Durán
    • 1
  • Liliana Chanona-Hernandez
    • 3
  1. 1.Natural Language and Text Processing Laboratory, Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico CityMexico
  2. 2.Faculty of sciencesAutonomous National University of Mexico (UNAM)Mexico CityMexico
  3. 3.Engineering faculty (ESIME)National Polytechnic Institute (IPN)Mexico CityMexico

Personalised recommendations