Summaries on the Fly: Query-Based Extraction of Structured Knowledge from Web Documents

Fetahu, Besnik; Pereira Nunes, Bernardo; Dietze, Stefan

doi:10.1007/978-3-642-39200-9_22

Besnik Fetahu¹⁹,
Bernardo Pereira Nunes^19,20 &
Stefan Dietze¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7977))

Included in the following conference series:

International Conference on Web Engineering

3607 Accesses

Abstract

A large part of Web resources consists of unstructured textual content. Processing and retrieving relevant content for a particular information need is challenging for both machines and humans. While information retrieval techniques provide methods for detecting suitable resources for a particular query, information extraction techniques enable the extraction of structured data and text summarization allows the detection of important sentences. However, these techniques usually do not consider particular user interests and information needs. In this paper, we present a novel method to automatically generate structured summaries from user queries that uses POS patterns to identify relevant statements and entities in a certain context. Finally, we evaluate our work using the publicly available New York Times corpus, which shows the applicability of our method and the advantages over previous works.

Download to read the full chapter text

Chapter PDF

Semantic Summarization of News from Heterogeneous Sources

Semantic WordRank: Generating Finer Single-Document Summarizations

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Article 23 September 2015

Keywords

References

Augenstein, I., Padó, S., Rudolph, S.: Lodifier: Generating linked data from unstructured text. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 210–224. Springer, Heidelberg (2012)
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Bouayad-Agha, N., Casamayor, G., Wanner, L., Díez, F., López Hernández, S.: FootbOWL: Using a generic ontology of football competition for planning match summaries. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 230–244. Springer, Heidelberg (2011)
Chapter Google Scholar
Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. 31(5), 675–685 (1995)
Article Google Scholar
Bryl, V., Giuliano, C., Serafini, L., Tymoshenko, K.: Supporting natural language processing with background knowledge: Coreference resolution case. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 80–95. Springer, Heidelberg (2010)
Chapter Google Scholar
Cheng, G., Tran, T., Qu, Y.: Relin: Relatedness and informativeness-based centrality for entity summarization. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 114–129. Springer, Heidelberg (2011)
Chapter Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust nlp tools and applications. In: ACL, pp. 168–175 (2002)
Google Scholar
Dietze, S., Maynard, D., Demidova, E., Risse, T., Peters, W., Doka, K., Stavrakas, Y.: Entity extraction and consolidation for social web content preservation. In: SDA, pp. 18–29 (2012)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)
Article Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL (2005)
Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR, pp. 19–25 (2001)
Google Scholar
Grefenstette, G.: Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS, vol. 1299, pp. 97–114. Springer, Heidelberg (1997)
Google Scholar
Hovy, D., Fan, J., Gliozzo, A.M., Patwardhan, S., Welty, C.A.: When did that happen? - linking events and relations to timestamps. In: EACL, pp. 185–193 (2012)
Google Scholar
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task 2011, Stroudsburg, PA, USA, pp. 28–34. Association for Computational Linguistics (2011)
Google Scholar
Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Marie-Francine Moens, S.S. (ed.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics (2004)
Google Scholar
Mausam, M., Schmitz, S., Soderland, R.: Bart, and O. Etzioni. Open language learning for information extraction. In: EMNLP-CoNLL, pp. 523–534 (2012)
Google Scholar
Pereira Nunes, B., Kawase, R., Dietze, S., Taibi, D., Casanova, M.A., Nejdl, W.: Can entities be friends? In: Reggio, G., Astesiano, E., Tarlecki, A. (eds.) Abstract Data Types 1994 and COMPASS 1994. LNCS, vol. 906, pp. 45–57. Springer, Heidelberg (1995)
Google Scholar
Radev, D.R., McKeown, K.: Generating natural language summaries from multiple on-line sources. Computational Linguistics 24(3), 469–500 (1998)
Google Scholar
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.D.: A multi-pass sieve for coreference resolution. In: EMNLP, pp. 492–501 (2010)
Google Scholar
Ritter, A., Mausam, Etzioni, O., Clark, S.: Open domain event extraction from twitter. In: KDD, pp. 1104–1112 (2012)
Google Scholar
Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR, pp. 2–10 (1998)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, Stroudsburg, PA, USA, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, EMNLP 2000, Stroudsburg, PA, USA, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Google Scholar
Wan, X.: Topic analysis for topic-focused multi-document summarization. In: CIKM, pp. 1609–1612 (2009)
Google Scholar
Wang, D., Zhu, S., Li, T., Chi, Y., Gong, Y.: Integrating document clustering and multidocument summarization. TKDD 5(3), 14 (2011)
Article Google Scholar
White, M., Korelsky, T.: Multidocument summarization via information extraction. In: Proceedings of the HLT Conference, pp. 263–269 (2001)
Google Scholar
Zhou, Y., Guo, Z., Ren, P., Yu, Y.: Applying wikipedia-based explicit semantic analysis for query-biased document summarization. In: Huang, D.-S., Zhao, Z., Bevilacqua, V., Figueroa, J.C. (eds.) ICIC 2010. LNCS, vol. 6215, pp. 474–481. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

L3S Research Center, Leibniz University Hannover, Germany
Besnik Fetahu, Bernardo Pereira Nunes & Stefan Dietze
Department of Informatics, PUC-Rio, Rio de Janeiro, RJ, Brazil
Bernardo Pereira Nunes

Authors

Besnik Fetahu
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Pereira Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Dietze
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Trento, Via Sommarive 5, 38123, Povo, TN, Italy
Florian Daniel
Department of Computer Science, Aalborg University, Selma Lagerloefs Vej 300, 9220, Aalborg, Denmark
Peter Dolog
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong, China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fetahu, B., Pereira Nunes, B., Dietze, S. (2013). Summaries on the Fly: Query-Based Extraction of Structured Knowledge from Web Documents. In: Daniel, F., Dolog, P., Li, Q. (eds) Web Engineering. ICWE 2013. Lecture Notes in Computer Science, vol 7977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39200-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-39200-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39199-6
Online ISBN: 978-3-642-39200-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Summaries on the Fly: Query-Based Extraction of Structured Knowledge from Web Documents

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Summarization of News from Heterogeneous Sources

Semantic WordRank: Generating Finer Single-Document Summarizations

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Summaries on the Fly: Query-Based Extraction of Structured Knowledge from Web Documents

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Summarization of News from Heterogeneous Sources

Semantic WordRank: Generating Finer Single-Document Summarizations

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation