Abstract
We introduce Sapiens, a platform for extracting quotations from news wires, associated with their author and context. The originality of Sapiens is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the architecture of Sapiens and how it was applied to process a corpus of French news wires from the AFP news agency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Danlos, L.: ILIMP: Outil pour repérer les occurrences du pronom impersonnel il. In: Proceedings of TALN 2005, Dourdan, France (2005)
Danlos, L., Sagot, B., Stern, R.: Analyse discursive des incises de citation. In: Actes du Deuxième Colloque Mondial de Linguistique Française, p. (à paraî tre), La Nouvelle-Orléans, Louisiane, USA (2010)
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4), 535–561 (1994)
Mitkov, R.: Robust pronoun resolution with limited knowledge. In: Proceedings of COLING-ACL, pp. 869–875 (1998)
Pouliquen, B., Steinberger, R., Best, C.: Automatic detection of quotations in multilingual news (european commission - joint research centre). In: Proceedings of RANLP 2007 (2007)
Sagot, B., Boullier, P.: From raw corpus to word lattices: robust pre-parsing processing with SxPipe. Archives of Control Sciences, special issue on Language and Technology 15(4), 653–662 (2005)
Sagot, B., Clément, L., Villemonte de La Clergerie, E., Boullier, P.: The Lefff 2 syntactic lexicon for French: architecture, acquisition, use. In: Proc. of LREC 2006 (2006), http://atoll.inria.fr/~sagot/pub/LREC06b.pdf
Sagot, B., Boullier, P.: SxPipe 2: architecture pour le traitement présyntaxique de corpus bruts. Traitement Automatique des Langues (T.A.L.) 49(2), 155–188 (2008)
Sagot, B., Danlos, L., Stern, R.: A lexicon of french quotation verbs for automatic quotation extraction. In: Proceedings of LREC 2010, La Valette, Malte (2010)
Thomasset, F., Villemonte de la Clergerie, E.: Comment obtenir plus des méta-grammaires. In: Proceedings of TALN 2005, ATALA, Dourdan, France (June 2005), ftp://ftp.inria.fr/INRIA/Projects/Atoll/Eric.Clergerie/mg05.pdf
Villemonte de la Clergerie, E.: Convertir des dérivations TAG en dépendances. In: Proc. of TALN 2010 (July 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de La Clergerie, É., Sagot, B., Stern, R., Denis, P., Recourcé, G., Mignot, V. (2011). Extracting and Visualizing Quotations from News Wires. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)